计算机科学技术学报 ›› 2022,Vol. 37 ›› Issue (3): 615-625.doi: 10.1007/s11390-022-2185-7

所属专题: Artificial Intelligence and Pattern Recognition Computer Graphics and Multimedia

• • 上一篇    下一篇

少纹理区域的局部单应性矩阵估计

  

  • 出版日期:2022-05-30 发布日期:2022-05-30

Local Homography Estimation on User-Specified Textureless Regions

Zheng Chen (陈铮), Xiao-Nan Fang (方晓楠), and Song-Hai Zhang* (张松海), Member, IEEE        

  1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Online:2022-05-30 Published:2022-05-30
  • Contact: Song-Hai Zhang E-mail:shz@tsinghua.edu.cn
  • About author:Song-Hai Zhang received his Ph.D. degree in computer science and technology, Tsinghua University, Beijing, in 2007. He is currently an associate professor in the Department of Computer Science and Technology at Tsinghua University, Beijing. His research interests include computer graphics, virtual reality and image/video processing.
  • Supported by:
    This work was supported by the Key Research Projects of the Foundation Strengthening Program under Grant No. 2020JCJQZD01412 and the National Natural Science Foundation of China under Grant No. 61832016.

1、研究背景(context):用现有方法预测无纹理背景区域上的、由局部单应关系关联的四个指定点的运动是非常困难的。光流方法单独预测四个指定点的运动,没有考虑其相关性。已有目标跟踪方法不适合跟踪背景上的点,因为它们通常用于跟踪前景对象,如行人、动物、自行车、车辆等。与待跟踪的指定点定义的背景区域相比,这些被跟踪的对象具有更多的鉴别特征。单应性估计方法考虑了四个指定点在同一平面上的特性。然而,在四个指定点的区域内,几乎无法检测到特征。在这种情况下,基于模板的单应性矩阵估计方法无法得到合理的局部单应矩阵。如果输入完整图像,在两帧中提取并匹配特征然后再去除异常值,这样几乎所有匹配的特征被利用去估计出了一个全局的单应性矩阵。当指定的平面需要估计局部单应性矩阵时,使用现有方法为指定平面区域选择相关特征是非常困难的。如果直接使用已有的平面检测方法,用平面掩模过滤特征点,可能会忽略平面边缘的重要特征点。此外,这些平面检测方法对较小的平面区域不具有鲁棒性。如果没有与指定平面相关的足够的对应特征,估计的单应性矩阵就会偏离实际的单应性矩阵。
2、目的(Objective):本文目的是通过提出一个少纹理的局部单应性矩阵的深度数据集和上下文感知的两阶段深度神经网络的方法,来改善目前难以估计少纹理区域的局部单应性矩阵的窘境。
3、方法(Method):我们使用最先进的光流方法RAFT来分别计算下一帧中每个点的大致位置。然后我们采用热图来表示指定点的位置,并尝试通过两阶段网络对其进行优化。第一阶段包含两个编码器分支(位置编码器和上下文编码器)和一个解码器,第二阶段包含迭代优化模块(IHRM)。我们以有监督的方式改进了带有局部单应性约束的RAFT算法的结果。通过聚合其他三个点的空间信息来优化每个点的位置。根据四点参数化的设置,我们的网络通过联合监督四个指定点的位置来学习指定平面的局部单应性矩阵。此外,除了四个指定点的内部区域外,我们还利用上下文编码器来补充周围边缘的特征,尤其是当指定平面内的特征很少时,这会带来很大的好处。由于双编码器结构和局部单应性矩阵监督,我们的网络可以获得细粒度的预测。在第二阶段,受迭代方法在其他领域的成功启发,我们提出利用迭代优化模块反复细化四个指定点的位置。在每次迭代中,预测都受到指定点的局部单应性矩阵的约束,这与第一阶段相同。经过多次迭代和中间监督,我们可以得到更精确的细粒度预测。我们在基于ScanNet构造了少纹理局部单应性矩阵估计的数据集ScanDPT,并在此数据集上验证了我们的方法显著地改善了现有的方法。
4、结果(Result & Findings):在ScanDPT数据集上,关于前120帧指定四个点的平均跟踪误差,我们的方法MSE误差比已有的SOTA方法RAFT低29%。
5、结论(Conclusions):在这项工作中,我们提出了一种新的网络来解决少纹理平面区域跟踪的实际问题,即指定点跟踪。现有的基于模板的方法只能跟踪有纹理的对象,因为在无纹理平面区域的模板中,特征是罕见的。另一方面,以完整图像的输入为参考,现有方法不能很好地处理四个指定点的先验:局部单应性。我们的网络以有监督的方式学习四个指定点的先验相关性,即使输入完整的图像。我们利用光流法RAFT预测四个点初始位置,然后我们的模型通过双编码器结构在中间预测四个点位置,并使用递归模块进一步优化它们。此外,为了训练和评估我们的网络,我们提出了第一个少纹理的局部单应性矩阵估计数据集ScanDPT。最后我们利用对比实验和消融研究证明了我们的网络设计的有效性,并显示了我们的方法相对于其他方法的优越性,这说明,利用上下文感知模块提取少纹理区域外面的边缘信息,对于少纹理平面区域的跟踪是有帮助的,然后进行迭代优化对于此任务也是有帮助的。

关键词: 单应性矩阵估计, 神经网络, 跟踪

Abstract: This paper presents a novel deep neural network for designated point tracking (DPT) in a monocular RGB video, VideoInNet. More concretely, the aim is to track four designated points correlated by a local homography on a textureless planar region in the scene. DPT can be applied to augmented reality and video editing, especially in the field of video advertising. Existing methods predict the location of four designated points without appropriately considering the point correlation. To solve this problem, VideoInNet predicts the motion of the four designated points correlated by a local homography within the heatmap prediction framework. Our network refines the heatmaps of designated points through two stages. On the first stage, we introduce a context-aware and location-aware structure to learn a local homography for the designated plane in a supervised way. On the second stage, we introduce an iterative heatmap refinement module to improve the tracking accuracy. We propose a dataset focusing on textureless planar regions, named ScanDPT, for training and evaluation. We show that the error rate of VideoInNet is about 29% lower than that of the state-of-the-art approach when testing in the first 120 frames of testing videos on ScanDPT.

Key words: homography estimation, neural network, designated point tracking (DPT)

[1] Mémin É, Pérez P. Dense estimation and object-based segmentation of the optical flow with robust techniques. IEEE Trans. Image Process., 1998, 7(5): 703-719. DOI: 10.1109/83.668027.

[2] Dosovitskiy A, Fischer P, Ilg E et al. FlowNet: Learning optical flow with convolutional networks. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2758-2766. DOI: 10.1109/ICCV.2015.316.

[3] Ilg E, Mayer N, Saikia T et al. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1647-1655. DOI: 10.1109/CVPR.2017.179.

[4] Ranjan A, Black M J. Optical flow estimation using a spatial pyramid network. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2720-2729. DOI: 10.1109/CVPR.2017.291.

[5] Sun D Q, Yang X D, Liu M Y, Kautz J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.8934-8943. DOI: 10.1109/CVPR.2018.00931.

[6] Zhao S Y, Sheng Y L, Dong Y et al. MaskFlownet: Asymmetric feature matching with learnable occlusion mask. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6277-6286. DOI: 10.1109/CVPR42600.2020.00631.

[7] Teed Z, Deng J. RAFT: Recurrent all-pairs field transforms for optical flow. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.402-419. DOI: 10.1007/978-3-030-58536-5.

[8] Lowe D G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 2004, 60(2): 91-110. DOI: 10.1023/B:VISI.0000029664.99615.94.

[9] DeTone D, Malisiewicz T, Rabinovich A. Superpoint: Self-supervised interest point detection and description. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2018, pp.224-236. DOI: 10.1109/CVPRW.2018.00060.

[10] Luo Z X, Zhou L, Bai X Y et al. ASLFeat: Learning local features of accurate shape and localization. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6588-6597. DOI: 10.1109/CVPR42600.2020.00662.

[11] Sarlin P E, DeTone D, Malisiewicz T, Rabinovich A. SuperGlue: Learning feature matching with graph neural networks. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.4937-4946. DOI: 10.1109/CVPR42600.2020.00499.

[12] Jiang W, Trulls E, Hosang J et al. COTR: Correspondence transformer for matching across images. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, October 2021, pp.6187-6197. DOI: 10.1109/ICCV48922.2021.00615.

[13] Efe U, Ince K G, Alatan A A. DFM: A performance baseline for deep feature matching. In Proc. the 2021 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2021, pp.4284-4293. DOI: 10.1109/CVPRW53098.2021.00484.

[14] Evangelidis G D, Psarakis E Z. Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans. Pattern Anal. Mach. Intell., 2008, 30(10): 1858-1865. DOI: 10.1109/TPAMI.2008.113.

[15] Benhimane S, Malis E. Real-time image-based tracking of planes using efficient second-order minimization. In Proc. the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, September 28-October 2, 2004, pp.943-948. DOI: 10.1109/IROS.2004.1389474.

[16] Chen L, Zhou F, Shen Y et al. Illumination insensitive efficient second-order minimization for planar object tracking. In Proc. the 2017 IEEE International Conference on Robotics and Automation, May 29-June 3, 2017, pp.4429-4436. DOI: 10.1109/ICRA.2017.7989512.

[17] DeTone D, Malisiewicz T, Rabinovich A. Deep image homography estimation. arXiv:1606.03798, 2016. https:// arxiv.org/pdf/1606.03798.pdf, Jan. 2022.

[18] Dai A, Chang A X, Savva M et al. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2432-2443. DOI: 10.1109/CVPR.2017.261.

[19] Dai A, Niesner M, Zollhöfer M et al. BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. arXiv:1604.01093, 2016. https: //arxiv.org/pdf/1604.01093.pdf, Jan. 2022.

[20] Li J W, Gao W, Wu Y H et al. High-quality indoor scene 3D reconstruction with RGB-D cameras: A brief review. Computational Visual Media, 2022, 8(3): 369-393. DOI: 10.1007/s41095-021-0250-8.

[21] Muratov O, Slynko Y, Chernov V et al. 3DCapture: 3D reconstruction for a smartphone. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 26-July 1, 2016, pp.893-900. DOI: 10.1109/CVPRW.2016.116.

[22] Yang X B, Zhou L Y, Jiang H Q et al. Mobile3DRecon: Real-time monocular 3D reconstruction on a mobile phone. IEEE Trans. Vis. Comput. Graph., 2020, 26(12): 3446-3456. DOI: 10.1109/TVCG.2020.3023634.

[23] Zhang S H, Li X L, Liu Y T. Scale-aware insertion of virtual objects in monocular videos. In Proc. the 2020 IEEE International Symposium on Mixed and Augmented Reality, November 2020, pp.36-44. DOI: 10.1109/ISMAR50242.2020.00022.

[24] Chen D, Tang F, Dong W M et al. SiamCPN: Visual tracking with the Siamese center-prediction network. Comput. Vis. Media, 2021, 7(2): 253-265. DOI: 10.1007/s41095-021-0212-1.

[25] Xue Z X, Wu W. Anomaly detection by exploiting the tracking trajectory in surveillance videos. Sci. China: Inf. Sci., 2020, 63(5): Article No. 154101. DOI: 10.1007/s11432-018-9792-8.

[26] Zhang D, Li T S, Chen C L. Target tracking algorithm based on a broad learning system. Science China: Information Sciences, 2022, 65(5): Article No. 154201. DOI: 10.1007/s11432-020-3272-y.

[27] Li K, He F, Yu H. Robust visual tracking based on convolutional features with illumination and occlusion handing. J. Comput. Sci. Technol., 2018, 33(1): 223-236. DOI: 10.1007/s11390-017-1764-5.

[28] Li J C, Zhong F, Xu S H, Qin X Y. 3D object tracking with adaptively weighted local bundles. J. Comput. Sci. Technol., 2021, 36(3): 555-571. DOI: 10.1007/s11390-021-1272-5.

[29] Avidan S. Support vector tracking. IEEE Trans. Pattern Anal. Mach. Intell., 2004, 26(8): 1064-1072. DOI: 10.1109/TPAMI.2004.53.

[30] Ross D A, Lim J, Lin R S, Yang M H. Incremental learning for robust visual tracking. Int. J. Comput. Vis., 2008, 77(1/2/3): 125-141. DOI: 10.1007/s11263-007-0075-7.

[31] Lucas B D, Kanade T. An iterative image registration technique with an application to stereo vision. In Proc. the 7th International Joint Conference on Artificial Intelligence, August 1981, pp.674-679.

[32] Henriques J F, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(3): 583-596. DOI: 10.1109/TPAMI.2014.2345390.

[33] Arulampalam M S, Maskell S, Gordon N J, Clapp T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process., 2002, 50(2): 174-188. DOI: 10.1109/78.978374.

[34] Li B, Wu W, Wang Q et al. SiamRPN + +: Evolution of Siamese visual tracking with very deep networks. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2019, pp.4282-4291. DOI: 10.1109/CVPR.2019.00441.

[35] Guo D Y, Wang J, Cui Y et al. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6268-6276. DOI: 10.1109/CVPR42600.2020.00630.

[36] Guo D Y, Shao Y Y, Cui Y et al. Graph attention tracking. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp.9543-9552. DOI: 10.1109/CVPR46437.2021.00942.

[37] Chen X, Yan B, Zhu J W et al. Transformer tracking. In , June 2021, pp.8126-8135. DOI: 10.1109/CVPR46437.2021.00803.

[38] Wang N, Zhou W G, Wang J et al. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp.1571-1580. DOI: 10.1109/CVPR46437.2021.00162.

[39] Horn B K P, Schunck B G. Determining optical flow. Artif. Intell., 1981, 17(1/2/3): 185-203. DOI: 10.1016/0004-3702(81)90024-2.

[40] Hartley R, Zisserman A. Multiple view geometry in computer vision. Robotica, 2001, 19(2): 233-236. DOI: 10.1017/S0263574700223217.

[41] Muja M, Lowe D G. Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36(11): 2227-2240. DOI: 10.1109/TPAMI.2014.2321376.

[42] Fischler M A, Bolles R C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 1981, 24(6): 381-395. DOI: 10.1145/358669.358692.

[43] Barath D, Matas J, Noskova J. MAGSAC: Marginalizing sample consensus. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.10197-10205. DOI: 10.1109/CVPR.2019.01044.

[44] Nguyen T, Chen S W, Shivakumar S S et al. Unsupervised deep homography: A fast and robust homography estimation model. IEEE Robotics Autom. Lett., 2018, 3(3): 2346-2353. DOI: 10.1109/LRA.2018.2809549.

[45] Zhang J R, Wang C, Liu S C et al. Content-aware unsupervised deep homography estimation. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.653-669. DOI: 10.1007/978-3-030-58452-8.

[46] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90.

[47] Chen L C, Papandreou G, Kokkinos I et al. DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell., 2018, 40(4): 834-848. DOI: 10.1109/TPAMI.2017.2699184.

[48] Chung J Y, Gülçehre Ç, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014. https://arxiv.org/pdf/ 1412.3555.pdf, Jan. 2022.

[49] Sun X, Xiao B, Wei F Y et al. Integral human pose regression. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.536-553. DOI: 10.1007/978-3-030-01231-1.

[50] Girshick R B. Fast R-CNN. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1440-1448. DOI: 10.1109/ICCV.2015.169.

[51] Deng J, Dong W, Socher R et al. ImageNet: A large-scale hierarchical image database. In Proc. the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009, pp.248-255. DOI: 10.1109/CVPR.2009.5206848.

[52] Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.

[1] 吴之璟, 刘奕群, 毛佳昕, 张敏, 马少平. 基于文档级与查询级段落累积收益的文档排序方法[J]. 计算机科学技术学报, 2022, 37(4): 814-838.
[2] 冯欣, 吴浩铭, 殷一皓, 兰利彬. 基于目标中心图网络的一阶段行人多目标检测与跟踪方法[J]. 计算机科学技术学报, 2022, 37(3): 626-640.
[3] 魏华鹏, 邓盈盈, 唐帆, 潘兴甲, 董未名. 基于卷积神经网络和Transformer的视觉风格迁移的比较研究[J]. 计算机科学技术学报, 2022, 37(3): 601-614.
[4] 解晓政, 牛建伟, 刘雪峰, 李青锋, 王勇, 韩洁, 唐少杰. 基于卷积神经网络并融合边界信息的乳腺癌超声图像诊断[J]. 计算机科学技术学报, 2022, 37(2): 277-294.
[5] 王新峰、周翔、饶家华、张柱金、杨跃东. 基于迁移学习的DNA甲基化缺失数据补齐[J]. 计算机科学技术学报, 2022, 37(2): 320-329.
[6] 张鑫, 陆思源, 王水花, 余翔, 王甦菁, 姚仑, 潘毅, 张煜东. 通过新型深度学习架构诊断COVID-19肺炎[J]. 计算机科学技术学报, 2022, 37(2): 330-343.
[7] Qing-Bin Liu, Shi-Zhu He, Kang Liu, Sheng-Ping Liu, Jun Zhao. 一种用于对话状态跟踪的统一共享私有网络和去燥方法[J]. 计算机科学技术学报, 2021, 36(6): 1407-1419.
[8] Dan-Hao Zhu, Xin-Yu Dai, Jia-Jun Chen. 预训练和学习:在图神经网络中保留全局信息[J]. 计算机科学技术学报, 2021, 36(6): 1420-1430.
[9] Yi Zhong, Jian-Hua Feng, Xiao-Xin Cui, Xiao-Le Cui. 机器学习辅助的抗逻辑块加密密钥猜测攻击范式[J]. 计算机科学技术学报, 2021, 36(5): 1102-1117.
[10] Feng Wang, Guo-Jie Luo, Guang-Yu Sun, Yu-Hao Wang, Di-Min Niu, Hong-Zhong Zheng. 在忆阻器中基于模式表示法的二值神经网络权重映射法[J]. 计算机科学技术学报, 2021, 36(5): 1155-1166.
[11] Shao-Jie Qiao, Guo-Ping Yang, Nan Han, Hao Chen, Fa-Liang Huang, Kun Yue, Yu-Gen Yi, Chang-An Yuan. 基数估计器:利用垂直扫描卷积神经网络处理SQL[J]. 计算机科学技术学报, 2021, 36(4): 762-777.
[12] Chen-Chen Sun, De-Rong Shen. 面向深度实体匹配的混合层次网络[J]. 计算机科学技术学报, 2021, 36(4): 822-838.
[13] Yang Liu, Ruili He, Xiaoqian Lv, Wei Wang, Xin Sun, Shengping Zhang. 婴儿的年龄和性别容易被识别吗?[J]. 计算机科学技术学报, 2021, 36(3): 508-519.
[14] Jiachen Li, Fan Zhong, Songhua Xu, Xueying Qin. 基于自适应加权局部集束结构的三维物体跟踪[J]. 计算机科学技术学报, 2021, 36(3): 555-571.
[15] Zhang-Jin Huang, Xiang-Xiang He, Fang-Jun Wang, Qing Shen. 基于卷积神经网络的实时多阶段斑马鱼头部姿态估计框架[J]. 计算机科学技术学报, 2021, 36(2): 434-444.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 郑国梁; 李辉;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[7] 黄学东; 蔡莲红; 方棣棠; 迟边进; 周立; 蒋力;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[8] 许小曙;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[9] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[10] 衷仁保; 邢林; 任朝阳;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: