›› 2015,Vol. 30 ›› Issue (2): 340-352.doi: 10.1007/s11390-015-1527-0

所属专题: Artificial Intelligence and Pattern Recognition Computer Graphics and Multimedia

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

基于异构特征融合的RGB-D手持物体识别

Xiong Lv(吕雄), Shu-Qiang Jiang(蒋树强), Senior Member, IEEE, Member, CCF, ACM, Luis Herranz, Shuang Wang(王双)   

  1. Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences, Institute of Computing Technology Chinese Academy of Sciences, Beijing 100190, China
  • 收稿日期:2014-12-29 修回日期:2015-02-22 出版日期:2015-03-05 发布日期:2015-03-05
  • 作者简介:Xiong Lv received his B.S. degree in computer science and engineering from Beihang University, Beijing, in 2013. He is currently a graduate student of the Institute of Computing Technology, Chinese Academy of Sciences, Beijing. His research interests include image understanding, human-system interaction with image, 2D and 3D object recognition.
  • 基金资助:

    This work was supported in part by the National Basic Research 973 Program of China under Grant No. 2012CB316400, the National Natural Science Foundation of China under Grant Nos. 61322212 and 61450110446, the National High Technology Research and Development 863 Program of China under Grant No. 2014AA015202, and the Chinese Academy of Sciences Fellowships for Young International Scientists under Grant No. 2011Y1GB05. This work is also funded by Lenovo Outstanding Young Scientists Program (LOYS).

RGB-D Hand-Held Object Recognition Based on Heterogeneous Feature Fusion

Xiong Lv(吕雄), Shu-Qiang Jiang(蒋树强), Senior Member, IEEE, Member, CCF, ACM, Luis Herranz, Shuang Wang(王双)   

  1. Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences, Institute of Computing Technology Chinese Academy of Sciences, Beijing 100190, China
  • Received:2014-12-29 Revised:2015-02-22 Online:2015-03-05 Published:2015-03-05
  • About author:Xiong Lv received his B.S. degree in computer science and engineering from Beihang University, Beijing, in 2013. He is currently a graduate student of the Institute of Computing Technology, Chinese Academy of Sciences, Beijing. His research interests include image understanding, human-system interaction with image, 2D and 3D object recognition.
  • Supported by:

    This work was supported in part by the National Basic Research 973 Program of China under Grant No. 2012CB316400, the National Natural Science Foundation of China under Grant Nos. 61322212 and 61450110446, the National High Technology Research and Development 863 Program of China under Grant No. 2014AA015202, and the Chinese Academy of Sciences Fellowships for Young International Scientists under Grant No. 2011Y1GB05. This work is also funded by Lenovo Outstanding Young Scientists Program (LOYS).

物体识别在人机交互和多媒体检索中有很多应用.然而因为巨大的类内多样性和偶发的类间相似性,依赖RGB数据的准确物体识别依然是一个巨大挑战。最近,随着廉价的RGB-D设备出现,利用深度信息可以更好的解决这个挑战。同时在物体识别中一个特殊但是很重要的情况就是手持物体识别,因为使用手操作物体在人与人和人机交互中非常普遍并且直观。在本文中,我们研究了这个问题并且提出了一种有效的解决框架。这个框架首先结合骨骼信息和深度信息检测和分割手中所持物体。然后在物体识别阶段,从不同模态中提取异质特征进行融合以提高识别准确度。特别是我们把手工设计的特征和深度学习特征合并然后研究了几种多步融合方法。我们还介绍了手持物体数据集(Hand-held Object Dataset),并使用它评价手持物体识别的方法。

Abstract: Object recognition has many applications in human-machine interaction and multimedia retrieval. However, due to large intra-class variability and inter-class similarity, accurate recognition relying only on RGB data is still a big challenge. Recently, with the emergence of inexpensive RGB-D devices, this challenge can be better addressed by leveraging additional depth information. A very special yet important case of object recognition is hand-held object recognition, as manipulating objects with hands is common and intuitive in human-human and human-machine interactions. In this paper, we study this problem and introduce an effective framework to address it. This framework first detects and segments the hand-held object by exploiting skeleton information combined with depth information. In the object recognition stage, this work exploits heterogeneous features extracted from different modalities and fuses them to improve the recognition accuracy. In particular, we incorporate handcrafted and deep learned features and study several multi-step fusion variants. Experimental evaluations validate the effectiveness of the proposed method.

[1] Li L, Jiang S, Huang Q. Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Transactions on Multimedia, 2012, 14(5):1401-1413.

[2] Bo L, Ren X, Fox D. Unsupervised feature learning for RGB-D based object recognition. In Springer Tracts in Ad-vanced Robotics 88, Desai J P, Dudek G, Khatib O, Kumar V (eds.), Springer, pp.387-402.

[3] Gupta S, Arbeláez P, Girshick R, Malik J. Indoor scene understanding with RGB-D images: Bottom up segmentation, object detection and semantic segmentation. International Journal of Computer Vision, 2014. http://link.springer.com/article/10.1007/s11263-014-0777-6#, Feb. 2015

[4] Chai X, Li G, Lin Y, Xu Z, Tang Y, Chen X, Zhou M. Sign language recognition and translation with Kinect. In Proc. IEEE International Conference on Automatic Face and Gesture Recognition, April 2013.

[5] Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2):91-110.

[6] Johnson A E, Hebert M. Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 1999, 21(5):433-449.

[7] Morisset B, Rusu R B, Sundaresan A, Hauser K, Agrawal M, Latombe J C, Beetz M. Leaving flatland: Toward realtime 3D navigation. In Proc. IEEE International Confer-ence on Robotics and Automation, May 2009, pp.3786- 3793.

[8] Hinterstoisser S, Holzer S, Cagniart C et al. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In Proc. IEEE International Con-ference on Computer Vision (ICCV), Nov. 2011, pp.858- 865.

[9] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. Neural Information Processing Systems, Dec. 2012.

[10] Zhang Z, Zhou C, Xin B,Wang Y, Gao W. An interactive system of stereoscopic video conversion. In Proc. the 20th ACM International Conference on Multimedia, Oct. 29- Nov. 2, 2012, pp.149-158.

[11] Izadi S, Kim D, Hilliges O et al. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proc. the 24th Annual ACM Symposium on User Interface Software and Technology, Nov. 2011, pp.559-568.

[12] Liu S,Wang S,Wu L, Jiang S. Multiple feature fusion based hand-held object recognition with RGB-D data. In Proc. In-ternational Conference on Internet Multimedia Computing and Service, July 2014, p.303.

[13] Lv X, Wang S, Li X, Jiang S. Combining heterogenous features for 3D handheld object recognition. In Proc. SPIE Optoelectronic Imaging and Multimedia Technology III, Oct. 2014.

[14] Rivera-Rubio J, Idrees S, Alexiou I, Hadjilucas L, Bharath A. Small hand-held object recognition test (short). In Proc. the 2014 IEEE Winter Conference on Applications of Com-puter Vision (WACV), March 2014, pp.524-531.

[15] Beck C, Broun A, Mirmehdi M, Pipe A, Melhuish C. Text line aggregation. In Proc. International Conference on Pat-tern Recognition Applications and Methods (ICPRAM), Mar. 2014, pp.393-401.

[16] Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. In Proc. the 12th ECCV, Part 5, Oct. 2012, pp.746-760

[17] Koppula H S, Anand A, Joachims T, Saxena A. Semantic labeling of 3D point clouds for indoor scenes. In Proc. the 25th Neural Information Processing Systems, Dec. 2011.

[18] Kanezaki A, Suzuki T, Harada T, Kuniyoshi Y. Fast object detection for robots in a cluttered indoor environment using integral 3D feature table. In Proc. the 2011 IEEE Inter-national Conference on Robotics and Automation (ICRA), May 2011, pp.4026-4033.

[19] Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 2010, 32(9):1627-1645.

[20] Alexandre L A. 3D object recognition using convolutional neural networks with transfer learning between input channels. In Proc. the 13th International Conference on Intelli-gent Autonomous Systems, July 2014.

[21] Gupta S, Girshick R, Arbeláez P, Malik J. Learning rich features from RGB-D images for object detection and segmentation. In Proc. the 13th ECCV, Part 7, Sept. 2014, pp.345-360.

[22] Cimpoi M, Maji S, Kokkinos I, Mohamed S, Vedaldi A. Describing textures in the wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.3606-3613

[23] Xiao J, Ehinger K, Hays J, Torralba A, Oliva A. SUN database: Exploring a large collection of scene categories. International Journal of Computer Vision, 2014. http://link.springer.com/article/10.1007/s11263-014-0748-y, Feb. 2015.

[24] Fu Y, Cao L, Guo G, Huang T S. Multiple feature fusion by subspace learning. In Proc. the 2008 International Con-ference on Content-Based Image and Video Retrieval, July 2008, pp.127-134.

[25] Sun Q S, Jin Z, Heng P A, Xia D S. A novel feature fusion method based on partial least squares regression. In Proc. the 3rd International Conference on Advances in Pattern Recognition, Part 1, Aug. 2005, pp.268-277.

[26] Barker M, Rayens W. Partial least squares for discrimination. Journal of Chemometrics, 2003, 17(3):166-173.

[27] Wohlkinger W, Vincze M. Ensemble of shape functions for 3D object classification. In Proc. the 2011 IEEE Interna-tional Conference on Robotics and Biomimetics (ROBIO), Dec. 2011, pp.2987-2992.

[28] Kanezaki A, Marton Z C, Pangercic D, Harada T, Kuniyoshi Y, Beetz M. Voxelized shape and color histograms for RGBD. In Proc. IROS Workshop on Active Seman-tic Perception and Object Search in the Real World, Sept. 2011.

[29] Jia Y, Shelhamer Evan, Donahue J et al. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093, 2014. http://arxiv.org/abs/1408.5093, Feb. 2015.

[30] Marton Z C, Pangercic D, Rusu R B, Holzbach A, Beetz M. Hierarchical object geometric categorization and appearance classification for mobile manipulation. In Proc. the 10th IEEE-RAS International Conference on Humanoid Robots, Dec. 2010, pp.365-370

[31] Snoek C G, Worring M, Smeulders A W. Early versus late fusion in semantic video analysis. In Proc. the 13th Annual ACM International Conference on Multimedia, Nov. 2005, pp.399-402.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 衷仁保; 邢林; 任朝阳;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .
[2] 沈一栋;. Form alizing Incomplete Knowledge in Incomplete Databases[J]. , 1992, 7(4): 295 -304 .
[3] Adelino Santos;. Cooperative Hypermedia Editing with CoMEdiA[J]. , 1993, 8(3): 67 -79 .
[4] 房至一; 鞠九滨;. NONH:A New Cache-Based Coherence Protocol for Linked List Structure DSM System and Its Performance Evaluation[J]. , 1996, 11(4): 405 -415 .
[5] 汪国平; 华宣积; 孙家广;. The Differential Equation Algorithm for General Deformed Swept Volumes[J]. , 2000, 15(6): 604 -610 .
[6] . 基于可观测性语句覆盖准则的评估分析与激励生成方法[J]. , 2005, 20(6): 875 -884 .
[7] . 动态隐式曲线曲面重构的初始形状指定[J]. , 2006, 21(2): 249 -254 .
[8] . 暂缺[J]. , 2006, 21(6): 952 -964 .
[9] . 关于素平方序列的几个注记[J]. , 2007, 22(3): 481 -486 .
[10] 高庆狮, 高小宇, 胡玥. 一个满足所有经典集合公式的新模糊集合论[J]. , 2009, 24(4): 798 -804 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: