计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (3): 594-608.doi: 10.1007/s11390-019-1929-5

所属专题: 不能删除 Artificial Intelligence and Pattern Recognition Computer Graphics and Multimedia

• Special Section of CVM 2019 • 上一篇    下一篇

三维室内场景合成综述

Song-Hai Zhang1,2, Member, CCF, ACM, IEEE, Shao-Kui Zhang1, Yuan Liang1, Member, CCF, ACM, Peter Hall3   

  1. 1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
    2 Beijing National Research Center for Information Science and Technology (BNRist), Beijing 100084, China;
    3 Department of Computer Science, University of Bath, Claverton Down, Bath, BA2 7AY, U.K.
  • 收稿日期:2019-03-15 修回日期:2019-04-17 出版日期:2019-05-05 发布日期:2019-05-06
  • 作者简介:Song-Hai Zhang received his Ph.D. degree in computer science and technology from Tsinghua University, Beijing, in 2007. He is currently an associate professor in the Department of Computer Science and Technology at Tsinghua University, Beijing. His research interests include image/video analysis and processing as well as geometric computing.
  • 基金资助:
    This work was supported by the National Key Technology Research and Development Program under Grant No. 2017YFB1002604, the National Natural Science Foundation of China under Grant Nos. 61772298 and 61832016, the Research Grant of Beijing Higher Institution Engineering Research Center and Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology.

A Survey of 3D Indoor Scene Synthesis

Song-Hai Zhang1,2, Member, CCF, ACM, IEEE, Shao-Kui Zhang1, Yuan Liang1, Member, CCF, ACM, Peter Hall3   

  1. 1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
    2 Beijing National Research Center for Information Science and Technology (BNRist), Beijing 100084, China;
    3 Department of Computer Science, University of Bath, Claverton Down, Bath, BA2 7AY, U.K.
  • Received:2019-03-15 Revised:2019-04-17 Online:2019-05-05 Published:2019-05-06
  • About author:Song-Hai Zhang received his Ph.D. degree in computer science and technology from Tsinghua University, Beijing, in 2007. He is currently an associate professor in the Department of Computer Science and Technology at Tsinghua University, Beijing. His research interests include image/video analysis and processing as well as geometric computing.
  • Supported by:
    This work was supported by the National Key Technology Research and Development Program under Grant No. 2017YFB1002604, the National Natural Science Foundation of China under Grant Nos. 61772298 and 61832016, the Research Grant of Beijing Higher Institution Engineering Research Center and Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology.

室内场景合成在近几年逐渐成为了讨论广泛的课题。由于需要大量关于选取物体类别和物体布局的先验知识,合成功能完整且合理的室内场景是一个困难的任务。本篇综述将提出四种准则,从不同的方面把各个场景合成任务进行分类。同时,通过全面地比较所有工作并展示每个工作的优势与劣势,本文也将给出现阶段室内场景合成潜在的、遗留的问题。

关键词: 内容生成, 室内场景合成, 室内布局, 概率模型

Abstract: Indoor scene synthesis has become a popular topic in recent years. Synthesizing functional and plausible indoor scenes is an inherently difficult task since it requires considerable knowledge to both choose reasonable object categories and arrange objects appropriately. In this survey, we propose four criteria which group a wide range of 3D (three-dimensional) indoor scene synthesis techniques according to various aspects (specifically, four groups of categories). It also provides hints, through comprehensively comparing all the techniques to demonstrate their effectiveness and drawbacks, and discussions of potential remaining problems.

Key words: content generation, indoor scene synthesis, layout arrangement, probabilistic model

[1] Lyons G H. Ten Common Home Decorating Mistakes & How to Avoid Them. Blue Sage Press, 2008.
[2] Germer T, Schwarz M. Procedural arrangement of furniture for real-time walkthroughs. Computer Graphics Forum, 2009, 28(8):2068-2078.
[3] Merrell P, Schkufza E, Li Z et al. Interactive furniture layout using interior design guidelines. ACM Transactions on Graphics, 2011, 30(4):Article No. 87.
[4] Yu L F, Yeung S K, Terzopoulos D. The clutterpalette:An interactive tool for detailing indoor scenes. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(2):1138-1148.
[5] Song S, Yu F, Zeng A et al. Semantic scene completion from a single depth image. In Proc. the 2017 IEEE Conf. Computer Vision and Pattern Recognition, July 2017, pp.1746- 1754.
[6] Fu Q, Chen X, Wang X et al. Adaptive synthesis of indoor scenes via activity-associated object relation graphs. ACM Transactions on Graphics, 2017, 36(6):Article No. 201.
[7] Li W, Saeedi S, McCormac J et al. InteriorNet:Mega-scale multi-sensor photo-realistic indoor scenes dataset. In Proc. the 29th British Machine Vision Conference, September 2018, Article No. 77.
[8] Qi S, Zhu Y, Huang S et al. Human-centric indoor scene synthesis using stochastic grammar. In Proc. the 2018 IEEE Conf. Computer Vision and Pattern Recognition, June 2018, pp.5899-5908.
[9] Li Y, Zhang J, Cheng Y et al. DF2Net:Discriminative feature learning and fusion network for RGB-D indoor scene classification. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.7041-7048.
[10] Chang A, Savva M, Manning C D. Learning spatial knowledge for text to 3D scene generation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, October 2014, pp.2028-2038.
[11] Xie H, Xu W, Wang B. Reshuffle-based interior scene synthesis. In Proc. the 12th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry, November 2013, pp.191-198.
[12] Nan L, Xie K, Sharf A. A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics, 2012, 31(6):Article No. 137.
[13] Yang S, Xu J, Chen K et al. View suggestion for interactive segmentation of indoor scenes. Computational Visual Media, 2017, 3(2):131-146.
[14] Satkin S, Lin J, Hebert M. Data-driven scene understanding from 3D models. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 128.
[15] Lim J J, Pirsiavash H, Torralba A. Parsing IKEA objects:Fine pose estimation. In Proc. the 2013 IEEE International Conference on Computer Vision, December 2013, pp.2992- 2999.
[16] Lim J J, Khosla A, Torralba A. FPM:Fine pose parts-based model with 3D CAD models. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.478- 493.
[17] Kim Y M, Mitra N J, Yan D M et al. Acquiring 3D indoor environments with variability and repetition. ACM Transactions on Graphics, 2012, 31(6):Article No. 138.
[18] Savva M, Chang A X, Hanrahan P et al. PiGraphs:Learning interaction snapshots from observations. ACM Transactions on Graphics, 2016, 35(4):Article No. 139.
[19] Bao S Y, Sun M, Savarese S. Toward coherent object detection and scene layout understanding. Image and Vision Computing, 2011, 29(9):569-579.
[20] Jiang Y, Lim M, Zheng C et al. Learning to place new objects in a scene. The International Journal of Robotics Research, 2012, 31(9):1021-1043.
[21] Cheng M M, Hou Q B, Zhang S H et al. Intelligent visual media processing:When graphics meets vision. Journal of Computer Science and Technology, 2017, 32(1):110-121.
[22] Xu K, Ma R, Zhang H et al. Organizing heterogeneous scene collections through contextual focal points. ACM Transactions on Graphics, 2014, 33(4):Article No. 35.
[23] Fisher M, Savva M, Hanrahan P. Characterizing structural relationships in scenes using graph kernels. ACM Transactions on Graphics, 2011, 30(4):Article No. 34.
[24] Wu W, Fan L, Liu L et al. MIQP-based layout design for building interiors. Computer Graphics Forum, 2018, 37(2):511-521.
[25] Sanchez V, Zakhor A. Planar 3D modeling of building interiors from point cloud data. In Proc. the 19th IEEE International Conference on Image Processing, September 2012, pp.1777-1780
[26] Merrell P, Schkufza E, Koltun V. Computer-generated residential building layouts. ACM Transactions on Graphics, 2010, 29(6):Article No. 181.
[27] Wang W, Gao W, Hu Z. Effectively modeling piecewise planar urban scenes based on structure priors and CNN. Science China Information Sciences, 2019, 62(2):Article No. 29102.
[28] Fisher M, Hanrahan P. Context-based search for 3D models. ACM Transactions on Graphics, 2010, 29(6):Article No. 182.
[29] Ovsjanikov M, Li W, Guibas L et al. Exploration of continuous variability in collections of 3D shapes. ACM Transactions on Graphics, 2011, 30(4):Article No. 33.
[30] Chen D Y, Tian X P, Shen Y T et al. On visual similarity based 3D model retrieval. Computer Graphics Forum, 2003, 22(3):223-232.
[31] Eitz M, Richter R, Boubekeur T et al. Sketch-based shape retrieval. ACM Transactions on Graphics, 2012, 31(4):Article No. 31.
[32] Chen K, Lai Y, Wu Y X et al. Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information. ACM Transactions on Graphics, 2014, 33(6):Article No. 208.
[33] Shen C H, Fu H, Chen K et al. Structure recovery by part assembly. ACM Transactions on Graphics, 2012, 31(6):Article No. 180.
[34] Schuster S, Krishna R, Chang A et al. Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In Proc. the 4th Workshop on Vision and Language, September 2015, pp.70-80.
[35] Koller D, Friedman N. Probabilistic Graphical Models:Principles and Techniques. MIT Press, 2009.
[36] Handa A, Patraucean V, Badrinarayanan V et al. Understanding real world indoor scenes with synthetic data. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4077-4085.
[37] Fisher M, Ritchie D, Savva M et al. Example-based synthesis of 3D object arrangements. ACM Transactions on Graphics, 2012, 31(6):Article No. 135.
[38] Xu K, Chen K, Fu H et al. Sketch2Scene:Sketch-based co-retrieval and co-placement of 3D models. ACM Transactions on Graphics, 2013, 32(4):Article No. 123.
[39] Chang A X, Eric M, Savva M et al. SceneSeer:3D scene design with natural language. arXiv:1703.00050, 2017. https://arxiv.org/abs/1703.00050, March 2019.
[40] Yu L F, Yeung S K, Tang C K et al. Make it home:Automatic optimization of furniture arrangement. ACM Transactions on Graphics, 2011, 30(4):Article No. 86.
[41] Wang K, Savva M, Chang A X et al. Deep convolutional priors for indoor scene synthesis. ACM Transactions on Graphics, 2018, 37(4):Article No. 70.
[42] Savva M, Chang A X, Agrawala M. SceneSuggest:Context-driven 3D scene design. arXiv:1703.00061, 2017. https://arxiv.org/abs/1703.00061, March 2019.
[43] Ma R, Li H, Zou C et al. Action-driven 3D indoor scene evolution. ACM Transactions on Graphics, 2016, 35(6):Article No. 173.
[44] Fisher M, Savva M, Li Y et al. Activity-centric scene synthesis for functional 3D scene modeling. ACM Transactions on Graphics, 2015, 34(6):Article No. 179.
[45] Li G, Zheng Y, Fan J et al. Crowdsourced data management:Overview and challenges. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.1711-1716.
[46] Chen P P, Sun H L, Fang Y L et al. Collusion-proof result inference in crowdsourcing. Journal of Computer Science and Technology, 2018, 33(2):351-365.
[47] Shao L, Chang A X, Su H et al. Cross-modal attribute transfer for rescaling 3D models. In Proc. the 2017 International Conference on 3D Vision, October 2017, pp.640-648.
[48] Savva M, Chang A X, Bernstein G et al. On being the right scale:Sizing large collections of 3D models. In Proc. the 2014 SIGGRAPH Asia Indoor Scene Understanding Where Graphics Meets Vision, December 2014, Article No. 4.
[49] Zhu Y, Tian Y, Metaxas D et al. Semantic amodal segmentation. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3001-3009.
[50] Du G G, Yin C L, Zhou M Q et al. Isometric 3D shape partial matching using GD-DNA. Journal of Computer Science and Technology, 2018, 33(6):1178-1191.
[51] Jo S, Jeong Y, Lee S. GPU-driven scalable parser for OBJ models. Journal of Computer Science and Technology, 2018, 33(2):417-428.
[52] Yin L, Guo K, Zhou B et al. 3D shape co-segmentation via sparse and low rank representations. Science China Information Sciences, 2018, 61(5):Article No. 054101.
[53] Silberman N, Hoiem D, Kohli P et al. Indoor segmentation and support inference from RGBD images. In Proc. the 12th European Conference on Computer Vision, October 2012, pp.746-760.
[54] Song S, Lichtenberg S P, Xiao J. SUN RGB-D:A RGBD scene understanding benchmark suite. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.567-576.
[55] Anand A, Koppula H S, Joachims T et al. Contextually guided semantic labeling and search for three-dimensional point clouds. The International Journal of Robotics Research, 2013, 32(1):19-34.
[56] Lai K, Bo L, Fox D. Unsupervised feature learning for 3D scene labeling. In Proc. the 2014 IEEE International Conference on Robotics and Automation, May 2014, pp.3050- 3057.
[57] Mattausch O, Panozzo D, Mura C et al. Object detection and classification from large-scale cluttered indoor scans. Computer Graphics Forum, 2014, 33(2):11-21.
[58] Chen K, Lai Y K, Hu S M. 3D indoor scene modeling from RGB-D data:A survey. Computational Visual Media, 2015, 1(4):267-278.
[59] Hua B S, Pham Q H, Nguyen D T et al. SceneNN:A scene meshes dataset with annotations. In Proc. the 4th International Conference on 3D Vision, October 2016, pp.92-101.
[60] Xiao J, Owens A, Torralba A. SUN3D:A database of big spaces reconstructed using SfM and object labels. In Proc. the 2013 IEEE International Conference on Computer Vision, December 2013, pp.1625-1632.
[61] Dai A, Chang A X, Savva M et al. ScanNet:Richlyannotated 3D reconstructions of indoor scenes. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2432-2443.
[62] Handa A, Pătrăucean V, Stent S et al. SceneNet:An annotated model generator for indoor scene understanding. In Proc. the 2016 IEEE International Conference on Robotics and Automation, May 2016, pp.5737-5743.
[63] McCormac J, Handa A, Leutenegger S et al. SceneNet RGB-D:Can 5M synthetic images beat generic imageNet pre-training on indoor segmentation? In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2697-2706.
[64] Chang A, Monroe W, Savva M et al. Text to 3D scene generation with rich lexical grounding. arXiv:1505.06289, 2015. https://arxiv.org/abs/1505.06289, March 2019.
[65] Chang A X, Funkhouser T, Guibas L et al. ShapeNet:An information-rich 3D model repository. arXiv:1512.03012, 2015. https://arxiv.org/abs/1512.03012, March 2019.
[66] Savva M, Chang A X, Hanrahan P. Semantically-enriched 3D models for common-sense knowledge. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2015, pp.24-31.
[67] Avetisyan A, Dahnert M, Dai A et al. Scan2CAD:Learning CAD model alignment in RGB-D scans. arXiv:1811.11187, 2018. https://arxiv.org/abs/1811.11187, March 2019.
[68] Li M, Patil A G, Xu K et al. GRAINS:Generative recursive autoencoders for indoor scenes. ACM Transactions on Graphics, 2019, 38(2):Article No. 12.
[69] Yeh Y T, Yang L, Watson M et al. Synthesizing open worlds with constraints using locally annealed reversible jump MCMC. ACM Transactions on Graphics, 2012, 31(4):Article No. 56.
[70] Liang Y, Zhang S H, Martin R R. Automatic data-driven room design generation. In Proc. the 3rd International Workshop on Next Generation Computer Animation Techniques, June 2017, pp.133-148.
[71] Ikehata S, Yang H, Furukawa Y. Structured indoor modeling. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1323-1331.
[72] Zhu J Z, Jia Y T, Xu J et al. Modeling the correlations of relations for knowledge graph embedding. Journal of Computer Science and Technology, 2018, 33(2):323-334.
[73] Zhu S C, Mumford D. A stochastic grammar of images. Foundations and Trendsr in Computer Graphics and Vision, 2006, 2(4):259-362.
[74] Savva M, Chang A X, Hanrahan P et al. SceneGrok:Inferring action maps in 3D environments. ACM Transactions on Graphics, 2014, 33(6):Article No. 212.
[75] Ritchie D, Wang K, Lin Y. Fast and flexible indoor scene synthesis via deep convolutional generative models. arXiv:1811.12463, 2018. https://arxiv.org/abs/1811.12463, March 2019.
[76] Xu W, Wang B, Yan D M. Wall grid structure for interior scene synthesis. Computers & Graphics, 2015, 46:231-243.
[77] Kschischang F R, Frey B J, Loeliger H A. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 2001, 47(2):498-519.
[78] Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers. Machine Learning, 1997, 29(2/3):131-163.
[79] Jiang Y, Lim M, Saxena A. Learning object arrangements in 3D scenes using human context. arXiv:1206.6462, 2012. https://arxiv.org/abs/1206.6462, March 2019.
[80] Gibson J J. The Ecological Approach to Visual Perception (1st edition). Routledge, 2014.
[81] Jiang Y, Koppula H S, Saxena A. Modeling 3D environments through hidden human context. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10):2040-2053.
[82] Socher R, Lin C C, Manning C et al. Parsing natural scenes and natural language with recursive neural networks. In Proc. the 28th International Conference on Machine Learning, June 2011, pp.129-136.
[83] Kingma D P, Welling M. Auto-encoding variational Bayes. arXiv:1312.6114, 2013. https://arxiv.org/abs/1312.6114, March 2019.
[84] Lyu F, Xi R, Han Y et al. MagicMark:A marking menu using 2D direction and 3D depth information. Science China Information Sciences, 2018, 61(6):Article No. 064101.
[85] Talton J O, Lou Y, Lesser S et al. Metropolis procedural modeling. ACM Transactions on Graphics, 2011, 30(2):Article No. 11.
[86] Kirkpatrick S. Optimization by simulated annealing:Quantitative studies. Journal of Statistical Physics, 1984, 34(5/6):975-986.
[87] Hastings W K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 1970, 57(1):97-109.
[88] Metropolis N, Rosenbluth A W, Rosenbluth M N et al. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 1953, 21(6):1087-1092.
[89] Ramage D, Hall D, Nallapati R et al. Labeled LDA:A supervised topic model for credit attribution in multi-labeled corpora. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.248- 256.
[90] Chen C, Wang W, Zhang Y et al. A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC. Science China Information Sciences, 2018, 62(1):Article No. 12101.
[91] Chang A, Savva M, Manning C. Interactive learning of spatial knowledge for text to 3D scene generation. In Proc. the 2014 Association for Computational Linguistics Workshop on Interactive Language Learning, Visualization, and Interfaces, June 2014, pp.14-21.
[92] Kermani Z S, Liao Z, Tan P et al. Learning 3D scene synthesis from annotated RGB-D images. Computer Graphics Forum, 2016, 35(5):197-206.
[93] Liang Y, Xu F, Zhang S H et al. Knowledge graph construction with structure and parameter learning for indoor scene design. Computational Visual Media, 2018, 4(2):123-137.
[94] Ma R, Patil A G, Fisher M et al. Language-driven synthesis of 3D scenes from scene databases. In Proc. SIGGRAPH Asia 2018, September 2018, Article No. 212.
[95] Shao T, Xu W, Zhou K et al. An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Transactions on Graphics, 2012, 31(6):Article No. 136.
[96] Silberman N, Fergus R. Indoor scene segmentation using a structured light sensor. In Proc. the 2011 IEEE International Conference on Computer Vision Workshops, November 2011, pp.601-608.
[97] Berge C. Hypergraphs:Combinatorics of Finite Sets (1st edition). North Holland, 1989.
[98] Liu T, Hertzmann A, Li W et al. Style compatibility for 3D furniture models. ACM Transactions on Graphics, 2015, 34(4):Article No. 85.
[1] Qing Cui, Feng-Shan Bai, Bin Gao, Tie-Yan Liu. 付费搜索广告中广告选择的全局优化[J]. , 2015, 30(2): 295-310.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] 李明慧;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: