Journal of Computer Science and Technology ›› 2022, Vol. 37 ›› Issue (3): 615-625.doi: 10.1007/s11390-022-2185-7

Special Issue: Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia

• Special Section of CVM 2022 • Previous Articles     Next Articles

Local Homography Estimation on User-Specified Textureless Regions

Zheng Chen (陈铮), Xiao-Nan Fang (方晓楠), and Song-Hai Zhang* (张松海), Member, IEEE        

  1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Online:2022-05-30 Published:2022-05-30
  • Contact: Song-Hai Zhang E-mail:shz@tsinghua.edu.cn
  • About author:Song-Hai Zhang received his Ph.D. degree in computer science and technology, Tsinghua University, Beijing, in 2007. He is currently an associate professor in the Department of Computer Science and Technology at Tsinghua University, Beijing. His research interests include computer graphics, virtual reality and image/video processing.
  • Supported by:
    This work was supported by the Key Research Projects of the Foundation Strengthening Program under Grant No. 2020JCJQZD01412 and the National Natural Science Foundation of China under Grant No. 61832016.

This paper presents a novel deep neural network for designated point tracking (DPT) in a monocular RGB video, VideoInNet. More concretely, the aim is to track four designated points correlated by a local homography on a textureless planar region in the scene. DPT can be applied to augmented reality and video editing, especially in the field of video advertising. Existing methods predict the location of four designated points without appropriately considering the point correlation. To solve this problem, VideoInNet predicts the motion of the four designated points correlated by a local homography within the heatmap prediction framework. Our network refines the heatmaps of designated points through two stages. On the first stage, we introduce a context-aware and location-aware structure to learn a local homography for the designated plane in a supervised way. On the second stage, we introduce an iterative heatmap refinement module to improve the tracking accuracy. We propose a dataset focusing on textureless planar regions, named ScanDPT, for training and evaluation. We show that the error rate of VideoInNet is about 29% lower than that of the state-of-the-art approach when testing in the first 120 frames of testing videos on ScanDPT.

Key words: homography estimation; neural network; designated point tracking (DPT);

[1] Mémin É, Pérez P. Dense estimation and object-based segmentation of the optical flow with robust techniques. IEEE Trans. Image Process., 1998, 7(5): 703-719. DOI: 10.1109/83.668027.

[2] Dosovitskiy A, Fischer P, Ilg E et al. FlowNet: Learning optical flow with convolutional networks. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2758-2766. DOI: 10.1109/ICCV.2015.316.

[3] Ilg E, Mayer N, Saikia T et al. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1647-1655. DOI: 10.1109/CVPR.2017.179.

[4] Ranjan A, Black M J. Optical flow estimation using a spatial pyramid network. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2720-2729. DOI: 10.1109/CVPR.2017.291.

[5] Sun D Q, Yang X D, Liu M Y, Kautz J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.8934-8943. DOI: 10.1109/CVPR.2018.00931.

[6] Zhao S Y, Sheng Y L, Dong Y et al. MaskFlownet: Asymmetric feature matching with learnable occlusion mask. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6277-6286. DOI: 10.1109/CVPR42600.2020.00631.

[7] Teed Z, Deng J. RAFT: Recurrent all-pairs field transforms for optical flow. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.402-419. DOI: 10.1007/978-3-030-58536-5.

[8] Lowe D G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 2004, 60(2): 91-110. DOI: 10.1023/B:VISI.0000029664.99615.94.

[9] DeTone D, Malisiewicz T, Rabinovich A. Superpoint: Self-supervised interest point detection and description. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2018, pp.224-236. DOI: 10.1109/CVPRW.2018.00060.

[10] Luo Z X, Zhou L, Bai X Y et al. ASLFeat: Learning local features of accurate shape and localization. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6588-6597. DOI: 10.1109/CVPR42600.2020.00662.

[11] Sarlin P E, DeTone D, Malisiewicz T, Rabinovich A. SuperGlue: Learning feature matching with graph neural networks. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.4937-4946. DOI: 10.1109/CVPR42600.2020.00499.

[12] Jiang W, Trulls E, Hosang J et al. COTR: Correspondence transformer for matching across images. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, October 2021, pp.6187-6197. DOI: 10.1109/ICCV48922.2021.00615.

[13] Efe U, Ince K G, Alatan A A. DFM: A performance baseline for deep feature matching. In Proc. the 2021 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2021, pp.4284-4293. DOI: 10.1109/CVPRW53098.2021.00484.

[14] Evangelidis G D, Psarakis E Z. Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans. Pattern Anal. Mach. Intell., 2008, 30(10): 1858-1865. DOI: 10.1109/TPAMI.2008.113.

[15] Benhimane S, Malis E. Real-time image-based tracking of planes using efficient second-order minimization. In Proc. the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, September 28-October 2, 2004, pp.943-948. DOI: 10.1109/IROS.2004.1389474.

[16] Chen L, Zhou F, Shen Y et al. Illumination insensitive efficient second-order minimization for planar object tracking. In Proc. the 2017 IEEE International Conference on Robotics and Automation, May 29-June 3, 2017, pp.4429-4436. DOI: 10.1109/ICRA.2017.7989512.

[17] DeTone D, Malisiewicz T, Rabinovich A. Deep image homography estimation. arXiv:1606.03798, 2016. https:// arxiv.org/pdf/1606.03798.pdf, Jan. 2022.

[18] Dai A, Chang A X, Savva M et al. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2432-2443. DOI: 10.1109/CVPR.2017.261.

[19] Dai A, Niesner M, Zollhöfer M et al. BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. arXiv:1604.01093, 2016. https: //arxiv.org/pdf/1604.01093.pdf, Jan. 2022.

[20] Li J W, Gao W, Wu Y H et al. High-quality indoor scene 3D reconstruction with RGB-D cameras: A brief review. Computational Visual Media, 2022, 8(3): 369-393. DOI: 10.1007/s41095-021-0250-8.

[21] Muratov O, Slynko Y, Chernov V et al. 3DCapture: 3D reconstruction for a smartphone. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 26-July 1, 2016, pp.893-900. DOI: 10.1109/CVPRW.2016.116.

[22] Yang X B, Zhou L Y, Jiang H Q et al. Mobile3DRecon: Real-time monocular 3D reconstruction on a mobile phone. IEEE Trans. Vis. Comput. Graph., 2020, 26(12): 3446-3456. DOI: 10.1109/TVCG.2020.3023634.

[23] Zhang S H, Li X L, Liu Y T. Scale-aware insertion of virtual objects in monocular videos. In Proc. the 2020 IEEE International Symposium on Mixed and Augmented Reality, November 2020, pp.36-44. DOI: 10.1109/ISMAR50242.2020.00022.

[24] Chen D, Tang F, Dong W M et al. SiamCPN: Visual tracking with the Siamese center-prediction network. Comput. Vis. Media, 2021, 7(2): 253-265. DOI: 10.1007/s41095-021-0212-1.

[25] Xue Z X, Wu W. Anomaly detection by exploiting the tracking trajectory in surveillance videos. Sci. China: Inf. Sci., 2020, 63(5): Article No. 154101. DOI: 10.1007/s11432-018-9792-8.

[26] Zhang D, Li T S, Chen C L. Target tracking algorithm based on a broad learning system. Science China: Information Sciences, 2022, 65(5): Article No. 154201. DOI: 10.1007/s11432-020-3272-y.

[27] Li K, He F, Yu H. Robust visual tracking based on convolutional features with illumination and occlusion handing. J. Comput. Sci. Technol., 2018, 33(1): 223-236. DOI: 10.1007/s11390-017-1764-5.

[28] Li J C, Zhong F, Xu S H, Qin X Y. 3D object tracking with adaptively weighted local bundles. J. Comput. Sci. Technol., 2021, 36(3): 555-571. DOI: 10.1007/s11390-021-1272-5.

[29] Avidan S. Support vector tracking. IEEE Trans. Pattern Anal. Mach. Intell., 2004, 26(8): 1064-1072. DOI: 10.1109/TPAMI.2004.53.

[30] Ross D A, Lim J, Lin R S, Yang M H. Incremental learning for robust visual tracking. Int. J. Comput. Vis., 2008, 77(1/2/3): 125-141. DOI: 10.1007/s11263-007-0075-7.

[31] Lucas B D, Kanade T. An iterative image registration technique with an application to stereo vision. In Proc. the 7th International Joint Conference on Artificial Intelligence, August 1981, pp.674-679.

[32] Henriques J F, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(3): 583-596. DOI: 10.1109/TPAMI.2014.2345390.

[33] Arulampalam M S, Maskell S, Gordon N J, Clapp T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process., 2002, 50(2): 174-188. DOI: 10.1109/78.978374.

[34] Li B, Wu W, Wang Q et al. SiamRPN + +: Evolution of Siamese visual tracking with very deep networks. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2019, pp.4282-4291. DOI: 10.1109/CVPR.2019.00441.

[35] Guo D Y, Wang J, Cui Y et al. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6268-6276. DOI: 10.1109/CVPR42600.2020.00630.

[36] Guo D Y, Shao Y Y, Cui Y et al. Graph attention tracking. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp.9543-9552. DOI: 10.1109/CVPR46437.2021.00942.

[37] Chen X, Yan B, Zhu J W et al. Transformer tracking. In , June 2021, pp.8126-8135. DOI: 10.1109/CVPR46437.2021.00803.

[38] Wang N, Zhou W G, Wang J et al. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp.1571-1580. DOI: 10.1109/CVPR46437.2021.00162.

[39] Horn B K P, Schunck B G. Determining optical flow. Artif. Intell., 1981, 17(1/2/3): 185-203. DOI: 10.1016/0004-3702(81)90024-2.

[40] Hartley R, Zisserman A. Multiple view geometry in computer vision. Robotica, 2001, 19(2): 233-236. DOI: 10.1017/S0263574700223217.

[41] Muja M, Lowe D G. Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36(11): 2227-2240. DOI: 10.1109/TPAMI.2014.2321376.

[42] Fischler M A, Bolles R C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 1981, 24(6): 381-395. DOI: 10.1145/358669.358692.

[43] Barath D, Matas J, Noskova J. MAGSAC: Marginalizing sample consensus. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.10197-10205. DOI: 10.1109/CVPR.2019.01044.

[44] Nguyen T, Chen S W, Shivakumar S S et al. Unsupervised deep homography: A fast and robust homography estimation model. IEEE Robotics Autom. Lett., 2018, 3(3): 2346-2353. DOI: 10.1109/LRA.2018.2809549.

[45] Zhang J R, Wang C, Liu S C et al. Content-aware unsupervised deep homography estimation. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.653-669. DOI: 10.1007/978-3-030-58452-8.

[46] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90.

[47] Chen L C, Papandreou G, Kokkinos I et al. DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell., 2018, 40(4): 834-848. DOI: 10.1109/TPAMI.2017.2699184.

[48] Chung J Y, Gülçehre Ç, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014. https://arxiv.org/pdf/ 1412.3555.pdf, Jan. 2022.

[49] Sun X, Xiao B, Wei F Y et al. Integral human pose regression. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.536-553. DOI: 10.1007/978-3-030-01231-1.

[50] Girshick R B. Fast R-CNN. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1440-1448. DOI: 10.1109/ICCV.2015.169.

[51] Deng J, Dong W, Socher R et al. ImageNet: A large-scale hierarchical image database. In Proc. the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009, pp.248-255. DOI: 10.1109/CVPR.2009.5206848.

[52] Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.

[1] Zhi-Jing Wu, Yi-Qun Liu, Jia-Xin Mao, Min Zhang, and Shao-Ping Ma. Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking [J]. Journal of Computer Science and Technology, 2022, 37(4): 814-838.
[2] Hua-Peng Wei, Ying-Ying Deng, Fan Tang, Xing-Jia Pan, and Wei-Ming Dong. A Comparative Study of CNN- and Transformer-Based Visual Style Transfer [J]. Journal of Computer Science and Technology, 2022, 37(3): 601-614.
[3] Xiao-Zheng Xie, Jian-Wei Niu, Xue-Feng Liu, Qing-Feng Li, Yong Wang, Jie Han, and Shaojie Tang. DG-CNN: Introducing Margin Information into Convolutional Neural Networks for Breast Cancer Diagnosis in Ultrasound Images [J]. Journal of Computer Science and Technology, 2022, 37(2): 277-294.
[4] Xin-Feng Wang, Xiang Zhou, Jia-Hua Rao, Zhu-Jin Zhang, and Yue-Dong Yang. Imputing DNA Methylation by Transferred Learning Based Neural Network [J]. Journal of Computer Science and Technology, 2022, 37(2): 320-329.
[5] Xin Zhang, Siyuan Lu, Shui-Hua Wang, Xiang Yu, Su-Jing Wang, Lun Yao, Yi Pan, and Yu-Dong Zhang. Diagnosis of COVID-19 Pneumonia via a Novel Deep Learning Architecture [J]. Journal of Computer Science and Technology, 2022, 37(2): 330-343.
[6] Dan-Hao Zhu, Xin-Yu Dai, Jia-Jun Chen. Pre-Train and Learn: Preserving Global Information for Graph Neural Networks [J]. Journal of Computer Science and Technology, 2021, 36(6): 1420-1430.
[7] Yi Zhong, Jian-Hua Feng, Xiao-Xin Cui, Xiao-Le Cui. Machine Learning Aided Key-Guessing Attack Paradigm Against Logic Block Encryption [J]. Journal of Computer Science and Technology, 2021, 36(5): 1102-1117.
[8] Feng Wang, Guo-Jie Luo, Guang-Yu Sun, Yu-Hao Wang, Di-Min Niu, Hong-Zhong Zheng. Area Efficient Pattern Representation of Binary Neural Networks on RRAM [J]. Journal of Computer Science and Technology, 2021, 36(5): 1155-1166.
[9] Shao-Jie Qiao, Guo-Ping Yang, Nan Han, Hao Chen, Fa-Liang Huang, Kun Yue, Yu-Gen Yi, Chang-An Yuan. Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network [J]. Journal of Computer Science and Technology, 2021, 36(4): 762-777.
[10] Chen-Chen Sun, De-Rong Shen. Mixed Hierarchical Networks for Deep Entity Matching [J]. Journal of Computer Science and Technology, 2021, 36(4): 822-838.
[11] Yang Liu, Ruili He, Xiaoqian Lv, Wei Wang, Xin Sun, Shengping Zhang. Is It Easy to Recognize Baby's Age and Gender? [J]. Journal of Computer Science and Technology, 2021, 36(3): 508-519.
[12] Yang-Jie Cao, Shuang Wu, Chang Liu, Nan Lin, Yuan Wang, Cong Yang, Jie Li. Seg-CapNet: A Capsule-Based Neural Network for the Segmentation of Left Ventricle from Cardiac Magnetic Resonance Imaging [J]. Journal of Computer Science and Technology, 2021, 36(2): 323-333.
[13] Zhang-Jin Huang, Xiang-Xiang He, Fang-Jun Wang, Qing Shen. A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks [J]. Journal of Computer Science and Technology, 2021, 36(2): 434-444.
[14] Bo-Wei Zou, Rong-Tao Huang, Zeng-Zhuang Xu, Yu Hong, Guo-Dong Zhou. Language Adaptation for Entity Relation Classification via Adversarial Neural Networks [J]. Journal of Computer Science and Technology, 2021, 36(1): 207-220.
[15] Yue-Huan Wang, Ze-Nan Li, Jing-Wei Xu, Ping Yu, Taolue Chen, Xiao-Xing Ma. Predicted Robustness as QoS for Deep Neural Network Models [J]. Journal of Computer Science and Technology, 2020, 35(5): 999-1015.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[7] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[8] Xu Xiaoshu;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[9] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[10] Zhong Renbao; Xing Lin; Ren Zhaoyang;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved