计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (3): 555-571.doi: 10.1007/s11390-021-1272-5

所属专题: Computer Graphics and Multimedia

• • 上一篇    下一篇

基于自适应加权局部集束结构的三维物体跟踪

Jiachen Li1, Fan Zhong2,*, Member, CCF, Songhua Xu3, and Xueying Qin1,*, Senior Member, CCF, Member, IEEE   

  1. 1 School of Software, Shandong University, Jinan 250101, China;
    2 School of Computer Science and Technology, Shandong University, Qingdao 266237, China;
    3 College of Engineering and Computing, University of South Carolina, Columbia 29208, U.S.A.
  • 收稿日期:2021-01-06 修回日期:2021-04-23 出版日期:2021-05-05 发布日期:2021-05-31
  • 通讯作者: Fan Zhong, Xueying Qin E-mail:zhongfan@sdu.edu.cn;qxy@sdu.edu.cn
  • 作者简介:Jiachen Li is a Ph.D. candidate of School of Software, Shandong University, Jinan. He received his B.E. degree in exploration technology and engineering from Ocean University of China, Qingdao, in 2016. His research interests include augmented reality, 3D object tracking, pose estimation, head posture analysis, etc.
  • 基金资助:
    This work was partially supported by Zhejiang Lab under Grant No. 2020NB0AB02, and the Industrial Internet Innovation and Development Project in 2019 of China.

3D Object Tracking with Adaptively Weighted Local Bundles

Jiachen Li1, Fan Zhong2,*, Member, CCF, Songhua Xu3, and Xueying Qin1,*, Senior Member, CCF, Member, IEEE        

  1. 1 School of Software, Shandong University, Jinan 250101, China;
    2 School of Computer Science and Technology, Shandong University, Qingdao 266237, China;
    3 College of Engineering and Computing, University of South Carolina, Columbia 29208, U.S.A.
  • Received:2021-01-06 Revised:2021-04-23 Online:2021-05-05 Published:2021-05-31
  • Contact: Fan Zhong, Xueying Qin E-mail:zhongfan@sdu.edu.cn;qxy@sdu.edu.cn
  • About author:Jiachen Li is a Ph.D. candidate of School of Software, Shandong University, Jinan. He received his B.E. degree in exploration technology and engineering from Ocean University of China, Qingdao, in 2016. His research interests include augmented reality, 3D object tracking, pose estimation, head posture analysis, etc.
  • Supported by:
    This work was partially supported by Zhejiang Lab under Grant No. 2020NB0AB02, and the Industrial Internet Innovation and Development Project in 2019 of China.

1、研究背景(context):
三维物体跟踪能够连续获得三维物体与相机之间的空间位置关系,是计算机视觉中的一项重要任务,已经广泛应用在工业制造、医学诊断、娱乐游戏、机器人等领域。目前,无纹理或弱纹理物体的跟踪则仍然面临诸多挑战,基于单类特征的方法往往存在固有的缺陷。例如基于边缘的方法依赖于图像边缘线提取效果,当背景复杂或运动模糊时,边缘特征不易提取,算法容易跟踪失败;基于颜色的方法在前背景颜色相似、光照变化剧烈等场景,会导致图像颜色变化剧烈,使颜色模型不能及时更新导致跟踪失败。另外有一些简单的融合方式直接将多种特征的能量项相加,因而不能充分发挥各类特征的优势并且易受跟踪环境影响。
2、目的(Objective):针对目前基于单类特征跟踪方法及简单融合方法的不足,本文提出一种基于自适应加权局部集束结构的三维物体跟踪方法。本文方法使用局部集束结构将轮廓点与区域点重新组合并统一到一个能量函数中,考虑两类采样点间的空间关系并自适应调整权重,以充分发挥各类特征的优势,达到提高跟踪性能的目的。
3、方法(Method):本文提出了一种自适应加权的局部集束结构,并基于该结构定义多特征融合的能量函数以处理复杂场景情形。每个集束结构代表一个包含一组局部特征的局部子区域,即轮廓点和其周围的区域点,通过集束结构将两种特征结合可以处理不同特征的空间不一致性,并且将优化区域划分为若干个子区域可以发挥每个子区域中不同特征的作用。为了处理每个局部集束结构中不同特征的权重,对集束结构中每类特征点分别计算置信度并自动归一化,以适应不同的跟踪场景。根据置信度,可以进一步减少低置信度区域中特征的负面影响,并使用自适应权重对集束结构进行加权。因此,每帧中的每个区域都可以根据每个集束能量的权重适应不同情况,可以使不同特征在不同场景、同一帧的不同区域中发挥不同的作用。
4、结果(Result & Findings):定量结果表示,本文方法在RBOT和OPT数据集上优于现有三维物体跟踪方法。消融实验展示了局部集束、置信度、特征融合各自的作用。大量的定量分析展示了算法的中间结果,包括视觉分析、概率图、置信度图、特征权重和外点分析等,利于对算法的理解。
5、结论(Conclusions):本文提出了一种自适应加权的局部集束结构,它是一种优化的融合方式,避免了以均匀权重进行简单特征融合的缺点,跟踪性能优于现有的三维物体跟踪方法。我们将区域和边缘特征分组为一组局部束,然后根据所涉及特征的置信度值对它们进行自适应加权。提出的方法能够在不同条件下平衡每个特征的整体效果,并且可以避免特征间平衡参数的设置。在未来的工作中,将会考虑其他特征融合,例如将有纹理和无纹理物体统一到一个框架中。

关键词: 三维物体跟踪, 局部集束, 特征融合, 置信度

Abstract: The 3D object tracking from a monocular RGB image is a challenging task. Although popular color and edgebased methods have been well studied, they are only applicable to certain cases and new solutions to the challenges in real environment must be developed. In this paper, we propose a robust 3D object tracking method with adaptively weighted local bundles called AWLB tracker to handle more complicated cases. Each bundle represents a local region containing a set of local features. To alleviate the negative effect of the features in low-confidence regions, the bundles are adaptively weighted using a spatially-variant weighting function based on the confidence values of the involved energy terms. Therefore, in each frame, the weights of the energy items in each bundle are adapted to different situations and different regions of the same frame. Experiments show that the proposed method can improve the overall accuracy in challenging cases. We then verify the effectiveness of the proposed confidence-based adaptive weighting method using ablation studies and show that the proposed method overperforms the existing single-feature methods and multi-feature methods without adaptive weighting.

Key words: 3D tracking, local bundle, feature fusion, confidence map

[1] Lepetit V, Fua P. Monocular model-based 3D tracking of rigid objects:A survey. Found. Trendsr in Comput. Graph. Vis., 2005, 1(1):1-89. DOI:10.1561/0600000001
[2] Vacchetti L, Lepetit V, Fua P. Stable real-time 3D tracking using online and offline information. IEEE Trans. Pattern Anal. Mach. Intell., 2004, 26(10):1385-1391. DOI:10.1109/TPAMI.2004.92.
[3] Lourakis M I A, Zabulis X. Model-based pose estimation for rigid objects. In Proc. the 9th International Conference on Computer Vision Systems, July 2013, pp.83-92. DOI:10.1007/978-3-642-39402-7_9.
[4] Tan D J, Tombari F, Ilic S, Navab N. A versatile learning-based 3D temporal tracker:Scalable, robust, online. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.693-701. DOI:10.1109/ICCV.2015.86.
[5] Besl P J, McKay N D. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell., 1992, 14(2):239-256. DOI:10.1109/34.121791.
[6] Peng S, Liu Y, Huang Q, Zhou X, Bao H. PVNet:Pixel-wise voting network for 6DoF pose estimation. In Proc. the 2019 IEEE Conference on Computer Vision and Pattern Recognition, June 2019, pp.4561-4570. DOI:10.1109/CVPR.2019.00469.
[7] Ye Y, Zhang C, Hao X. ARPNET:Attention region proposal network for 3D object detection. Sci. China Inf. Sci., 2019, 62(12):Article No. 220104. DOI:10.1007/s11432-019-2636-x.
[8] Garon M, Lalonde J. Deep 6-DOF tracking. IEEE Trans. Vis. Comput. Graph., 2017, 23(11):2410-2418. DOI:10.1109/TVCG.2017.2734599.
[9] Li Y, Wang G, Ji X, Xiang Y, Fox D. DeepIM:Deep iterative matching for 6D pose estimation. Int. J. Comput. Vis., 2020, 128(3):657-678. DOI:10.1007/s11263-019-01250-9.
[10] Harris C, Stennett C. RAPID-A video rate object tracker. In Proc. the 1990 British Machine Vision Conference, September 1990, pp.73-77. DOI:10.5244/C.4.15.
[11] Seo B, Park H, Park J, Hinterstoisser S, Ilic S. Optimal local searching for fast and robust textureless 3D object tracking in highly cluttered backgrounds. IEEE Trans. Vis. Comput. Graph., 2014, 20(1):99-110. DOI:10.1109/TVCG.2013.94.
[12] Wang G, Wang B, Zhong F, Qin X, Chen B. Global optimal searching for textureless 3D object tracking. The Visual Computer, 2015, 31(6/7/8):979-988. DOI:10.1007/s00371-015-1098-7.
[13] Wang B, Zhong F, Qin X. Robust edge-based 3D object tracking with direction-based pose validation. Multimedia Tools Appl., 2019, 78(9):12307-12331. DOI:10.1007/s11042-018-6727-5.
[14] Zhang Y, Li X, Liu H, Shang Y. Comparative study of visual tracking method:A probabilistic approach for pose estimation using lines. IEEE Trans. Circuits Syst. Video Technol., 2017, 27(6):1222-1234. DOI:10.1109/TCSVT.2016.2527219.
[15] Prisacariu V A, Reid I D. PWP3D:Real-time segmentation and tracking of 3D objects. Int. J. Comput. Vis., 2012, 98(3):335-354. DOI:10.1007/s11263-011-0514-3.
[16] Tjaden H, Schwanecke U, Schömer E. Real-time monocular segmentation and pose tracking of multiple objects. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.423-438. DOI:10.1007/978-3-319-46493-0_26.
[17] Hexner J, Hagege R R. 2D-3D pose estimation of heterogeneous objects using a region based approach. Int. J. Comput. Vis., 2016, 118(1):95-112. DOI:10.1007/s11263-015-0873-2.
[18] Tjaden H, Schwanecke U, Schömer E. Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.124-132. DOI:10.1109/ICCV.2017.23.
[19] Tjaden H, Schwanecke U, Schömer E, Cremers D. A region-based gauss-newton approach to real-time monocular multiple object tracking. IEEE Trans. Pattern Anal. Mach. Intell., 2019, 41(8):1797-1812. DOI:10.1109/TPAMI.2018.2884990.
[20] Marchand É, Bouthemy P, Chaumette F. A 2D-3D model-based approach to real-time visual tracking. Image Vis. Comput., 2001, 19(13):941-955. DOI:10.1016/S0262-8856(01)00054-3.
[21] Drummond T, Cipolla R. Real-time visual tracking of complex structures. IEEE Trans. Pattern Anal. Mach. Intell., 2002, 24(7):932-946. DOI:10.1109/TPAMI.2002.1017620.
[22] Wuest H, Vial F, Stricker D. Adaptive line tracking with multiple hypotheses for augmented reality. In Proc. the 4th IEEE/ACM International Symposium on Mixed and Augmented Reality, October 2005, pp.62-69. DOI:10.1109/ISMAR.2005.8.
[23] Choi C, Christensen H I. Robust 3D visual tracking using particle filtering on the special Euclidean group:A combined approach of keypoint and edge features. The International Journal of Robotics Research, 2012, 31(4):498-519. DOI:10.1177/0278364912437213.
[24] Wang B, Zhong F, Qin X. Pose optimization in edge distance field for textureless 3D object tracking. In Proc. the 2017 Computer Graphics International Conference, June 2017, Article No. 32. DOI:10.1145/3095140.3095172.
[25] Osher S, Sethian J A. Fronts propagating with curvaturedependent speed:Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics, 1988, 79(1):12-49. DOI:10.1016/0021-9991(88)90002-2.
[26] Zhong L, Zhao X, Zhang Y, Zhang S, Zhang L. Occlusionaware region-based 3D pose tracking of objects with temporally consistent polar-based local partitioning. IEEE Trans. Image Process., 2020, 29:5065-5078. DOI:10.1109/TIP.2020.2973512.
[27] Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11):2278-2324. DOI:10.1109/5.726791.
[28] Crivellaro A, Rad M, Verdie Y, Yi K M, Fua P, Lepetit V. Robust 3D object tracking from monocular images using stable parts. IEEE Trans. Pattern Anal. Mach. Intell., 2018, 40(6):1465-1479. DOI:10.1109/TPAMI.2017.2708711.
[29] Zhong L, Zhang L. A robust monocular 3D object tracking method combining statistical and photometric constraints. Int. J. Comput. Vis., 2019, 127(8):973-992. DOI:10.1007/s11263-018-1119-x.
[30] Ma Y, Soatto S, Košecká J, Sastry S S. An Invitation to 3-D Vision:From Images to Geometric Models (1st edition). Springer-Verlag New York Publishers, 2004.
[31] Zhong F, Qin X, Chen J, Hua W, Peng Q. Confidencebased color modeling for online video segmentation. In Proc. the 9th Asian Conference on Computer Vision, September 2009, pp.697-706. DOI:10.1007/978-3-642-12304-766.
[32] Wu P, Lee Y, Tseng H, Ho H, Yang M, Chien S. A benchmark dataset for 6DoF object pose tracking. In Proc. the 2017 IEEE International Symposium on Mixed and Augmented Reality Adjunct, October 2017, pp.186-191. DOI:10.1109/ISMAR-Adjunct.2017.62.
[33] Brachmann E, Michel F, Krull A, Yang M Y, Gumhold S, Rother C. Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.3364-3372. DOI:10.1109/CVPR.2016.366.
[34] Whelan T, Leutenegger S, Salas-Moreno R F, Glocker B, Davison A J. ElasticFusion:Dense SLAM without a pose graph. In Proc. the 2015 Robotics:Science and Systems, July 2015. DOI:10.15607/RSS.2015.XI.001.
[35] Mur-Artal R, Tardós J D. ORB-SLAM2:An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robotics, 2017, 33(5):1255-1262. DOI:10.1109/TRO.2017.2705103.
[36] Marchand É, Uchiyama H, Spindler F. Pose estimation for augmented reality:A hands-on survey. IEEE Trans. Vis. Comput. Graph., 2016, 22(12):2633-2651. DOI:10.1109/TVCG.2015.2513408.
[37] Cheng M, Liu Y, Lin W, Zhang Z, Rosin P L, Torr P H S. BING:Binarized normed gradients for objectness estimation at 300fps. Comput. Vis. Media, 2019, 5(1):3-20. DOI:10.1007/s41095-018-0120-1.
[1] Cheng-Zhang Zhu, Rong Hu, Bei-Ji Zou, Rong-Chang Zhao, Chang-Long Chen, Ya-Long Xiao. 基于图像与病灶融合特征的自动糖网病筛查级联框架[J]. 计算机科学技术学报, 2019, 34(6): 1307-1318.
[2] Xiong Lv, Shu-Qiang Jiang, Luis Herranz, Shuang Wang . 基于异构特征融合的RGB-D手持物体识别[J]. , 2015, 30(2): 340-352.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[2] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[3] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[4] 孙永强; 陆汝占; 黄小戎;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .
[5] 戚余禄;. A Systolic Approach for an Improvement of a Finite Field Multiplier[J]. , 1987, 2(4): 303 -309 .
[6] 冯玉琳;. Hierarchical Protocol Analysis by Temporal Logic[J]. , 1988, 3(1): 56 -69 .
[7] 闵应骅; Yashwant K. Malaiya; 金博平;. Aliasing Errors in Parallel Signature Analyzers[J]. , 1990, 5(1): 24 -40 .
[8] 朱明远;. Two Congruent Semantics for Prolog with CUT[J]. , 1990, 5(1): 82 -91 .
[9] 徐洁; 李庆南; 黄世泽; 徐江峰;. DFTSNA:A Distributed Fault-Tolerant Shipboard System[J]. , 1990, 5(2): 109 -116 .
[10] 李锦涛; 闵应骅;. Product-Oriented Test-Pattern Generation for Programmable Logic Arrays[J]. , 1990, 5(2): 164 -174 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: