Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (3): 494-507.doi: 10.1007/s11390-021-1373-1

Special Issue: Computer Graphics and Multimedia

• Special Section of CVM 2021 • Previous Articles     Next Articles

ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis

Hui-Xuan Wang1, Jing-Liang Peng2, Senior Member, CCF, Member, IEEE, Shi-Yi Lu3, Xin Cao3, Xue-Ying Qin3, Senior Member, CCF, Member, IEEE, and Chang-He Tu1,*, Senior Member, CCF        

  1. 1 School of Computer Science and Technology, Shandong University, Qingdao 266237, China;
    2 School of Information Science and Engineering, University of Jinan, Jinan 250022, China;
    3 School of Software, Shandong University, Jinan 250101, China
  • Received:2021-02-15 Revised:2021-04-26 Online:2021-05-05 Published:2021-05-31
  • Contact: Chang-He Tu E-mail:chtu@sdu.edu.cn
  • About author:Hui-Xuan Wang is a Ph.D. candidate at School of Computer Science and Technology, Shandong University, Qingdao. She was also a visiting Ph.D. student in University of Illinois at Urbana-Champaign and University of Pennsylvania. She earned her M.S. degree in power electronics and power transmission from Shandong University, Jinan, in 2012, and B.S. degree in automation from Shandong University, Weihai, in 2009. Her current research interests include 3D object tracking, simultaneous localization and mapping, and autonomous driving.
  • Supported by:
    This work was (partially) supported by the National Natural Science Foundation of China under Grant Nos. 62072284 and 61772318, and the Special Project of Science and Technology Innovation Base of Key Laboratory of Shandong Province for Software Engineering under Grant No. 11480004042015.

Indoor visual localization, i.e., 6 Degree-of-Freedom camera pose estimation for a query image with respect to a known scene, is gaining increased attention driven by rapid progress of applications such as robotics and augmented reality. However, drastic visual discrepancies between an onsite query image and prerecorded indoor images cast a significant challenge for visual localization. In this paper, based on the key observation of the constant existence of planar surfaces such as floors or walls in indoor scenes, we propose a novel system incorporating geometric information to address issues using only pixelated images. Through the system implementation, we contribute a hierarchical structure consisting of pre-scanned images and point cloud, as well as a distilled representation of the planar-element layout extracted from the original dataset. A view synthesis procedure is designed to generate synthetic images as complementary to that of a sparsely sampled dataset. Moreover, a global image descriptor based on the image statistic modality, called block mean, variance, and color (BMVC), was employed to speed up the candidate pose identification incorporated with a traditional convolutional neural network (CNN) descriptor. Experimental results on a popular benchmark demonstrate that the proposed method outperforms the state-of-the-art approaches in terms of visual localization validity and accuracy.

Key words: visual localization; planar surface; statistic information; view synthesis;

[1] Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz S M, Szeliski R. Building rome in a day. Communications of the ACM, 2011, 54(10):105-112. DOI:10.1145/2001269.2001293.
[2] Dai A, Nießner M, Zollhöfer M, Izadi S, Theobalt C. BundleFusion:Real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Transactions on Graphics, 2017, 36(4):Article No. 76a. DOI:10.1145/3072959.3054739.
[3] Mur-Artal R, Tardós J D. ORB-SLAM2:An open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics, 2017, 33(5):1255-1262. DOI:10.1109/TRO.2017.2705103.
[4] Li Y, Snavely N, Huttenlocher D, Fua P. Worldwide pose estimation using 3D point clouds. In Proc. the 12th European Conference on Computer Vision, October 2012, pp.15-29. DOI:10.1007/978-3-642-33718-5_2.
[5] Zeisl B, Sattler T, Pollefeys M. Camera pose voting for large-scale image-based localization. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2704-2712. DOI:10.1109/ICCV.2015.310.
[6] Sattler T, Havlena M, Radenovic F, Schindler K, Pollefeys M. Hyperpoints and fine vocabularies for large-scale location recognition. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2102-2110. DOI:10.1109/ICCV.2015.243.
[7] Sattler T, Leibe B, Kobbelt L. Efficient & effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(9):1744-1756. DOI:10.1109/TPAMI.2016.2611662.
[8] Arandjelović R, Zisserman A. All about VLAD. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.1578-1585. DOI:10.1109/CVPR.2013.207.
[9] Torii A, Arandjelović R, Sivic J, Okutomi M, Pajdla T. 24/7 place recognition by view synthesis. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1808-1817. DOI:10.1109/CVPR.2015.7298790.
[10] Sattler T, Havlena M, Schindler K, Pollefeys M. Large-scale location recognition and the geometric burstiness problem. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.1582-1590. DOI:10.1109/CVPR.2016.175.
[11] Arandjelović R, Zisserman A. DisLocation:Scalable descriptor distinctiveness for location recognition. In Proc. the 12th Asian Conference on Computer Vision, November 2014, pp.188-204. DOI:10.1007/978-3-319-16817-3_13.
[12] Taira H, Okutomi M, Sattler T, Cimpoi M, Pollefeys M, Sivic J, Pajdla T, Torii A. InLoc:Indoor visual localization with dense matching and view synthesis. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.7199-7209. DOI:10.1109/CVPR.2018.00752.
[13] Taira H, Rocco I, Sedlar J, Okutomi M, Sivic J, Pajdla T, Sattler T, Torii A. Is this the right place? Geometricsemantic pose verification for indoor visual localization. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.4372-4382. DOI:10.1109/ICCV.2019.00447.
[14] Kendall A, Grimes M, Cipolla R. PoseNet:A convolutional network for real-time 6-DoF camera relocalization. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2938-2946. DOI:10.1109/ICCV.2015.336.
[15] Balntas V, Li S, Prisacariu V. RelocNet:Continuous metric learning relocalisation using neural nets. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.782-799. DOI:10.1007/978-3-030-01264-9_46.
[16] Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A, Sattler T. D2-Net:A trainable CNN for joint description and detection of local features. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.8092-8101. DOI:10.1109/CVPR.2019.00828.
[17] Sattler T, Zhou Q, Pollefeys M, Leal-Taixé L. Understanding the limitations of CNN-based absolute camera pose regression. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.3297-3307. DOI:10.1109/CVPR.2019.00342.
[18] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, Jan. 2021.
[19] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI:10.1109/CVPR.2016.90.
[20] Sarlin P E, Cadena C, Siegwart R, Dymczyk M. From coarse to fine:Robust hierarchical localization at large scale. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.12708-12717. DOI:10.1109/CVPR.2019.01300.
[21] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. MobileNetV2:Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.4510-4520. DOI:10.1109/CVPR.2018.00474.
[22] Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J. NetVLAD:CNN architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6):1437-1451. DOI:10.1109/TPAMI.2017.2711011.
[23] Zhang W, Kosecka J. Image based localization in urban environments. In Proc. the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission, June 2006, pp.33-40. DOI:10.1109/3DPVT.2006.80.
[24] Maddern W, Pascoe G, Linegar C, Newman P. 1 year, 1000 km:The Oxford RobotCar dataset. The International Journal of Robotics Research, 2017, 36(1):3-15. DOI:10.1177/0278364916679498.
[25] Sattler T, Weyand T, Leibe B, Kobbelt L. Image retrieval for image-based localization revisited. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 72. DOI:10.5244/C.26.76.
[26] Badino H, Huber D, Kanade T. Visual topometric localization. In Proc. the 2011 IEEE Intelligent Vehicles Symposium, June 2011, pp.794-799. DOI:10.1109/IVS.2011.5940504.
[27] Cavallari T, Golodetz S, Lord N A, Valentin J, Di Stefano L, Torr P H. On-the-fly adaptation of regression forests for online camera relocalisation. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.218-227. DOI:10.1109/CVPR.2017.31.
[28] Meng L, Chen J, Tung F, Little J J, Valentin J, De Silva C W. Backtracking regression forests for accurate camera relocalization. In Proc. the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, September 2017, pp.6886-6893. DOI:10.1109/IROS.2017.8206611.
[29] DeTone D, Malisiewicz T, Rabinovich A. SuperPoint:Selfsupervised interest point detection and description. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, June 2018, pp.224-236. DOI:10.1109/CVPRW.2018.00060.
[30] Clark R, Wang S, Markham A, Trigoni N, Wen H. VidLoc:A deep spatio-temporal model for 6-DoF video-clip relocalization. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2652-2660. DOI:10.1109/CVPR.2017.284.
[31] Newcombe R A, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison A J, Kohi P, Shotton J, Hodges S, Fitzgibbon A. KinectFusion:Real-time dense surface mapping and tracking. In Proc. the 10th IEEE International Symposium on Mixed and Augmented Reality, October 2011, pp.127-136. DOI:10.1109/ISMAR.2011.6092378.
[32] Taguchi Y, Jian Y D, Ramalingam S, Feng C. Pointplane SLAM for hand-held 3D sensors. In Proc. the 2013 IEEE International Conference on Robotics and Automation, May 2013, pp.5182-5189. DOI:10.1109/ICRA.2013.6631318.
[33] Kim P, Coltin B, Kim H J. Linear RGB-D SLAM for planar environments. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.350-366. DOI:10.1007/978-3-030-01225-0_21.
[34] Shi T, Cui H, Song Z, Shen S. Dense semantic 3D map based long-term visual localization with hybrid features. arXiv:2005.10766, 2020. https://arxiv.org/abs/2005.10766, Jan. 2021.
[35] Fischler M A, Bolles R C. Random sample consensus:A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981, 24(6):381-395. DOI:10.1145/358669.358692.
[36] Schönberger J L, Frahm J M. Structure-from-motion revisited. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4104-4113. DOI:10.1109/CVPR.2016.445.
[37] Radwan N, Valada A, Burgard W. Vlocnet++:Deep multitask learning for semantic visual localization and odometry. IEEE Robotics and Automation Letters, 2018, 3(4):4407-4414. DOI:10.1109/LRA.2018.2869640.
[38] Schönberger J L, Pollefeys M, Geiger A, Sattler T. Semantic visual localization. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.6896-6906. DOI:10.1109/CVPR.2018.00721.
[39] Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2):91-110. DOI:10.1023/B:VISI.0000029664.99615.94.
[40] Liu W, Li W, Huang Y, Peng J. Image retrieval by subspace-projected color and texture features. In Proc. the 2017 IEEE International Conference on Image Processing, September 2017, pp.2891-2895. DOI:10.1109/ICIP.2017.8296811.
[41] Su Q, Huang Y, Peng J. CoLDImage:Contrast and luminance distribution for content-based image retrieval. In Proc. the 2011 International Conference on Image Analysis and Signal Processing, October 2011, pp.143-146. DOI:10.1109/IASP.2011.6109015.
[42] Osada R, Funkhouser T, Chazelle B, Dobkin D. Shape distributions. ACM Transactions on Graphics, 2002, 21(4):807-832. DOI:10.1145/571647.571648.
[43] Ghanem B, Thabet A, Carlos Niebles J, Caba Heilbron F. Robust Manhattan frame estimation from a single RGB-D image. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3772-3780. DOI:10.1109/CVPR.2015.7299001.
[44] Feng C, Taguchi Y, Kamat V R. Fast plane extraction in organized point clouds using agglomerative hierarchical clustering. In Proc. the 2014 IEEE International Conference on Robotics and Automation, May 31-June 7, 2014, pp.6218-6225. DOI:10.1109/ICRA.2014.6907776.
[45] Chen D M, Baatz G, Köser K, Tsai S S, Vedantham R, Pylvänäinen T, Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R. City-scale landmark identification on mobile devices. In Proc. the 2011 IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.737-744. DOI:10.1109/CVPR.2011.5995610.
[46] Torii A, Sivic J, Okutomi M, Pajdla T. Visual place recognition with repetitive structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11):2346-2359. DOI:10.1109/TPAMI.2015.2409868.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[2] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[3] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[4] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[5] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[6] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[7] Shi Zhongzhi;. Knowledge-Based Decision Support System[J]. , 1987, 2(1): 22 -29 .
[8] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[9] Zhong Renbao; Xing Lin; Ren Zhaoyang;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .
[10] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved