Journal of Computer Science and Technology ›› 2019, Vol. 34 ›› Issue (3): 594-608.doi: 10.1007/s11390-019-1929-5

Special Issue: Surveys; Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia

Previous Articles     Next Articles

A Survey of 3D Indoor Scene Synthesis

Song-Hai Zhang1,2, Member, CCF, ACM, IEEE, Shao-Kui Zhang1, Yuan Liang1, Member, CCF, ACM, Peter Hall3   

  1. 1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
    2 Beijing National Research Center for Information Science and Technology (BNRist), Beijing 100084, China;
    3 Department of Computer Science, University of Bath, Claverton Down, Bath, BA2 7AY, U.K.
  • Received:2019-03-15 Revised:2019-04-17 Online:2019-05-05 Published:2019-05-06
  • About author:Song-Hai Zhang received his Ph.D. degree in computer science and technology from Tsinghua University, Beijing, in 2007. He is currently an associate professor in the Department of Computer Science and Technology at Tsinghua University, Beijing. His research interests include image/video analysis and processing as well as geometric computing.
  • Supported by:
    This work was supported by the National Key Technology Research and Development Program under Grant No. 2017YFB1002604, the National Natural Science Foundation of China under Grant Nos. 61772298 and 61832016, the Research Grant of Beijing Higher Institution Engineering Research Center and Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology.

Indoor scene synthesis has become a popular topic in recent years. Synthesizing functional and plausible indoor scenes is an inherently difficult task since it requires considerable knowledge to both choose reasonable object categories and arrange objects appropriately. In this survey, we propose four criteria which group a wide range of 3D (three-dimensional) indoor scene synthesis techniques according to various aspects (specifically, four groups of categories). It also provides hints, through comprehensively comparing all the techniques to demonstrate their effectiveness and drawbacks, and discussions of potential remaining problems.

Key words: content generation; indoor scene synthesis; layout arrangement; probabilistic model;

[1] Lyons G H. Ten Common Home Decorating Mistakes & How to Avoid Them. Blue Sage Press, 2008.
[2] Germer T, Schwarz M. Procedural arrangement of furniture for real-time walkthroughs. Computer Graphics Forum, 2009, 28(8):2068-2078.
[3] Merrell P, Schkufza E, Li Z et al. Interactive furniture layout using interior design guidelines. ACM Transactions on Graphics, 2011, 30(4):Article No. 87.
[4] Yu L F, Yeung S K, Terzopoulos D. The clutterpalette:An interactive tool for detailing indoor scenes. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(2):1138-1148.
[5] Song S, Yu F, Zeng A et al. Semantic scene completion from a single depth image. In Proc. the 2017 IEEE Conf. Computer Vision and Pattern Recognition, July 2017, pp.1746- 1754.
[6] Fu Q, Chen X, Wang X et al. Adaptive synthesis of indoor scenes via activity-associated object relation graphs. ACM Transactions on Graphics, 2017, 36(6):Article No. 201.
[7] Li W, Saeedi S, McCormac J et al. InteriorNet:Mega-scale multi-sensor photo-realistic indoor scenes dataset. In Proc. the 29th British Machine Vision Conference, September 2018, Article No. 77.
[8] Qi S, Zhu Y, Huang S et al. Human-centric indoor scene synthesis using stochastic grammar. In Proc. the 2018 IEEE Conf. Computer Vision and Pattern Recognition, June 2018, pp.5899-5908.
[9] Li Y, Zhang J, Cheng Y et al. DF2Net:Discriminative feature learning and fusion network for RGB-D indoor scene classification. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.7041-7048.
[10] Chang A, Savva M, Manning C D. Learning spatial knowledge for text to 3D scene generation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, October 2014, pp.2028-2038.
[11] Xie H, Xu W, Wang B. Reshuffle-based interior scene synthesis. In Proc. the 12th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry, November 2013, pp.191-198.
[12] Nan L, Xie K, Sharf A. A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics, 2012, 31(6):Article No. 137.
[13] Yang S, Xu J, Chen K et al. View suggestion for interactive segmentation of indoor scenes. Computational Visual Media, 2017, 3(2):131-146.
[14] Satkin S, Lin J, Hebert M. Data-driven scene understanding from 3D models. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 128.
[15] Lim J J, Pirsiavash H, Torralba A. Parsing IKEA objects:Fine pose estimation. In Proc. the 2013 IEEE International Conference on Computer Vision, December 2013, pp.2992- 2999.
[16] Lim J J, Khosla A, Torralba A. FPM:Fine pose parts-based model with 3D CAD models. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.478- 493.
[17] Kim Y M, Mitra N J, Yan D M et al. Acquiring 3D indoor environments with variability and repetition. ACM Transactions on Graphics, 2012, 31(6):Article No. 138.
[18] Savva M, Chang A X, Hanrahan P et al. PiGraphs:Learning interaction snapshots from observations. ACM Transactions on Graphics, 2016, 35(4):Article No. 139.
[19] Bao S Y, Sun M, Savarese S. Toward coherent object detection and scene layout understanding. Image and Vision Computing, 2011, 29(9):569-579.
[20] Jiang Y, Lim M, Zheng C et al. Learning to place new objects in a scene. The International Journal of Robotics Research, 2012, 31(9):1021-1043.
[21] Cheng M M, Hou Q B, Zhang S H et al. Intelligent visual media processing:When graphics meets vision. Journal of Computer Science and Technology, 2017, 32(1):110-121.
[22] Xu K, Ma R, Zhang H et al. Organizing heterogeneous scene collections through contextual focal points. ACM Transactions on Graphics, 2014, 33(4):Article No. 35.
[23] Fisher M, Savva M, Hanrahan P. Characterizing structural relationships in scenes using graph kernels. ACM Transactions on Graphics, 2011, 30(4):Article No. 34.
[24] Wu W, Fan L, Liu L et al. MIQP-based layout design for building interiors. Computer Graphics Forum, 2018, 37(2):511-521.
[25] Sanchez V, Zakhor A. Planar 3D modeling of building interiors from point cloud data. In Proc. the 19th IEEE International Conference on Image Processing, September 2012, pp.1777-1780
[26] Merrell P, Schkufza E, Koltun V. Computer-generated residential building layouts. ACM Transactions on Graphics, 2010, 29(6):Article No. 181.
[27] Wang W, Gao W, Hu Z. Effectively modeling piecewise planar urban scenes based on structure priors and CNN. Science China Information Sciences, 2019, 62(2):Article No. 29102.
[28] Fisher M, Hanrahan P. Context-based search for 3D models. ACM Transactions on Graphics, 2010, 29(6):Article No. 182.
[29] Ovsjanikov M, Li W, Guibas L et al. Exploration of continuous variability in collections of 3D shapes. ACM Transactions on Graphics, 2011, 30(4):Article No. 33.
[30] Chen D Y, Tian X P, Shen Y T et al. On visual similarity based 3D model retrieval. Computer Graphics Forum, 2003, 22(3):223-232.
[31] Eitz M, Richter R, Boubekeur T et al. Sketch-based shape retrieval. ACM Transactions on Graphics, 2012, 31(4):Article No. 31.
[32] Chen K, Lai Y, Wu Y X et al. Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information. ACM Transactions on Graphics, 2014, 33(6):Article No. 208.
[33] Shen C H, Fu H, Chen K et al. Structure recovery by part assembly. ACM Transactions on Graphics, 2012, 31(6):Article No. 180.
[34] Schuster S, Krishna R, Chang A et al. Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In Proc. the 4th Workshop on Vision and Language, September 2015, pp.70-80.
[35] Koller D, Friedman N. Probabilistic Graphical Models:Principles and Techniques. MIT Press, 2009.
[36] Handa A, Patraucean V, Badrinarayanan V et al. Understanding real world indoor scenes with synthetic data. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4077-4085.
[37] Fisher M, Ritchie D, Savva M et al. Example-based synthesis of 3D object arrangements. ACM Transactions on Graphics, 2012, 31(6):Article No. 135.
[38] Xu K, Chen K, Fu H et al. Sketch2Scene:Sketch-based co-retrieval and co-placement of 3D models. ACM Transactions on Graphics, 2013, 32(4):Article No. 123.
[39] Chang A X, Eric M, Savva M et al. SceneSeer:3D scene design with natural language. arXiv:1703.00050, 2017. https://arxiv.org/abs/1703.00050, March 2019.
[40] Yu L F, Yeung S K, Tang C K et al. Make it home:Automatic optimization of furniture arrangement. ACM Transactions on Graphics, 2011, 30(4):Article No. 86.
[41] Wang K, Savva M, Chang A X et al. Deep convolutional priors for indoor scene synthesis. ACM Transactions on Graphics, 2018, 37(4):Article No. 70.
[42] Savva M, Chang A X, Agrawala M. SceneSuggest:Context-driven 3D scene design. arXiv:1703.00061, 2017. https://arxiv.org/abs/1703.00061, March 2019.
[43] Ma R, Li H, Zou C et al. Action-driven 3D indoor scene evolution. ACM Transactions on Graphics, 2016, 35(6):Article No. 173.
[44] Fisher M, Savva M, Li Y et al. Activity-centric scene synthesis for functional 3D scene modeling. ACM Transactions on Graphics, 2015, 34(6):Article No. 179.
[45] Li G, Zheng Y, Fan J et al. Crowdsourced data management:Overview and challenges. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.1711-1716.
[46] Chen P P, Sun H L, Fang Y L et al. Collusion-proof result inference in crowdsourcing. Journal of Computer Science and Technology, 2018, 33(2):351-365.
[47] Shao L, Chang A X, Su H et al. Cross-modal attribute transfer for rescaling 3D models. In Proc. the 2017 International Conference on 3D Vision, October 2017, pp.640-648.
[48] Savva M, Chang A X, Bernstein G et al. On being the right scale:Sizing large collections of 3D models. In Proc. the 2014 SIGGRAPH Asia Indoor Scene Understanding Where Graphics Meets Vision, December 2014, Article No. 4.
[49] Zhu Y, Tian Y, Metaxas D et al. Semantic amodal segmentation. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3001-3009.
[50] Du G G, Yin C L, Zhou M Q et al. Isometric 3D shape partial matching using GD-DNA. Journal of Computer Science and Technology, 2018, 33(6):1178-1191.
[51] Jo S, Jeong Y, Lee S. GPU-driven scalable parser for OBJ models. Journal of Computer Science and Technology, 2018, 33(2):417-428.
[52] Yin L, Guo K, Zhou B et al. 3D shape co-segmentation via sparse and low rank representations. Science China Information Sciences, 2018, 61(5):Article No. 054101.
[53] Silberman N, Hoiem D, Kohli P et al. Indoor segmentation and support inference from RGBD images. In Proc. the 12th European Conference on Computer Vision, October 2012, pp.746-760.
[54] Song S, Lichtenberg S P, Xiao J. SUN RGB-D:A RGBD scene understanding benchmark suite. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.567-576.
[55] Anand A, Koppula H S, Joachims T et al. Contextually guided semantic labeling and search for three-dimensional point clouds. The International Journal of Robotics Research, 2013, 32(1):19-34.
[56] Lai K, Bo L, Fox D. Unsupervised feature learning for 3D scene labeling. In Proc. the 2014 IEEE International Conference on Robotics and Automation, May 2014, pp.3050- 3057.
[57] Mattausch O, Panozzo D, Mura C et al. Object detection and classification from large-scale cluttered indoor scans. Computer Graphics Forum, 2014, 33(2):11-21.
[58] Chen K, Lai Y K, Hu S M. 3D indoor scene modeling from RGB-D data:A survey. Computational Visual Media, 2015, 1(4):267-278.
[59] Hua B S, Pham Q H, Nguyen D T et al. SceneNN:A scene meshes dataset with annotations. In Proc. the 4th International Conference on 3D Vision, October 2016, pp.92-101.
[60] Xiao J, Owens A, Torralba A. SUN3D:A database of big spaces reconstructed using SfM and object labels. In Proc. the 2013 IEEE International Conference on Computer Vision, December 2013, pp.1625-1632.
[61] Dai A, Chang A X, Savva M et al. ScanNet:Richlyannotated 3D reconstructions of indoor scenes. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2432-2443.
[62] Handa A, Pătrăucean V, Stent S et al. SceneNet:An annotated model generator for indoor scene understanding. In Proc. the 2016 IEEE International Conference on Robotics and Automation, May 2016, pp.5737-5743.
[63] McCormac J, Handa A, Leutenegger S et al. SceneNet RGB-D:Can 5M synthetic images beat generic imageNet pre-training on indoor segmentation? In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2697-2706.
[64] Chang A, Monroe W, Savva M et al. Text to 3D scene generation with rich lexical grounding. arXiv:1505.06289, 2015. https://arxiv.org/abs/1505.06289, March 2019.
[65] Chang A X, Funkhouser T, Guibas L et al. ShapeNet:An information-rich 3D model repository. arXiv:1512.03012, 2015. https://arxiv.org/abs/1512.03012, March 2019.
[66] Savva M, Chang A X, Hanrahan P. Semantically-enriched 3D models for common-sense knowledge. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2015, pp.24-31.
[67] Avetisyan A, Dahnert M, Dai A et al. Scan2CAD:Learning CAD model alignment in RGB-D scans. arXiv:1811.11187, 2018. https://arxiv.org/abs/1811.11187, March 2019.
[68] Li M, Patil A G, Xu K et al. GRAINS:Generative recursive autoencoders for indoor scenes. ACM Transactions on Graphics, 2019, 38(2):Article No. 12.
[69] Yeh Y T, Yang L, Watson M et al. Synthesizing open worlds with constraints using locally annealed reversible jump MCMC. ACM Transactions on Graphics, 2012, 31(4):Article No. 56.
[70] Liang Y, Zhang S H, Martin R R. Automatic data-driven room design generation. In Proc. the 3rd International Workshop on Next Generation Computer Animation Techniques, June 2017, pp.133-148.
[71] Ikehata S, Yang H, Furukawa Y. Structured indoor modeling. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1323-1331.
[72] Zhu J Z, Jia Y T, Xu J et al. Modeling the correlations of relations for knowledge graph embedding. Journal of Computer Science and Technology, 2018, 33(2):323-334.
[73] Zhu S C, Mumford D. A stochastic grammar of images. Foundations and Trendsr in Computer Graphics and Vision, 2006, 2(4):259-362.
[74] Savva M, Chang A X, Hanrahan P et al. SceneGrok:Inferring action maps in 3D environments. ACM Transactions on Graphics, 2014, 33(6):Article No. 212.
[75] Ritchie D, Wang K, Lin Y. Fast and flexible indoor scene synthesis via deep convolutional generative models. arXiv:1811.12463, 2018. https://arxiv.org/abs/1811.12463, March 2019.
[76] Xu W, Wang B, Yan D M. Wall grid structure for interior scene synthesis. Computers & Graphics, 2015, 46:231-243.
[77] Kschischang F R, Frey B J, Loeliger H A. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 2001, 47(2):498-519.
[78] Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers. Machine Learning, 1997, 29(2/3):131-163.
[79] Jiang Y, Lim M, Saxena A. Learning object arrangements in 3D scenes using human context. arXiv:1206.6462, 2012. https://arxiv.org/abs/1206.6462, March 2019.
[80] Gibson J J. The Ecological Approach to Visual Perception (1st edition). Routledge, 2014.
[81] Jiang Y, Koppula H S, Saxena A. Modeling 3D environments through hidden human context. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10):2040-2053.
[82] Socher R, Lin C C, Manning C et al. Parsing natural scenes and natural language with recursive neural networks. In Proc. the 28th International Conference on Machine Learning, June 2011, pp.129-136.
[83] Kingma D P, Welling M. Auto-encoding variational Bayes. arXiv:1312.6114, 2013. https://arxiv.org/abs/1312.6114, March 2019.
[84] Lyu F, Xi R, Han Y et al. MagicMark:A marking menu using 2D direction and 3D depth information. Science China Information Sciences, 2018, 61(6):Article No. 064101.
[85] Talton J O, Lou Y, Lesser S et al. Metropolis procedural modeling. ACM Transactions on Graphics, 2011, 30(2):Article No. 11.
[86] Kirkpatrick S. Optimization by simulated annealing:Quantitative studies. Journal of Statistical Physics, 1984, 34(5/6):975-986.
[87] Hastings W K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 1970, 57(1):97-109.
[88] Metropolis N, Rosenbluth A W, Rosenbluth M N et al. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 1953, 21(6):1087-1092.
[89] Ramage D, Hall D, Nallapati R et al. Labeled LDA:A supervised topic model for credit attribution in multi-labeled corpora. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.248- 256.
[90] Chen C, Wang W, Zhang Y et al. A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC. Science China Information Sciences, 2018, 62(1):Article No. 12101.
[91] Chang A, Savva M, Manning C. Interactive learning of spatial knowledge for text to 3D scene generation. In Proc. the 2014 Association for Computational Linguistics Workshop on Interactive Language Learning, Visualization, and Interfaces, June 2014, pp.14-21.
[92] Kermani Z S, Liao Z, Tan P et al. Learning 3D scene synthesis from annotated RGB-D images. Computer Graphics Forum, 2016, 35(5):197-206.
[93] Liang Y, Xu F, Zhang S H et al. Knowledge graph construction with structure and parameter learning for indoor scene design. Computational Visual Media, 2018, 4(2):123-137.
[94] Ma R, Patil A G, Fisher M et al. Language-driven synthesis of 3D scenes from scene databases. In Proc. SIGGRAPH Asia 2018, September 2018, Article No. 212.
[95] Shao T, Xu W, Zhou K et al. An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Transactions on Graphics, 2012, 31(6):Article No. 136.
[96] Silberman N, Fergus R. Indoor scene segmentation using a structured light sensor. In Proc. the 2011 IEEE International Conference on Computer Vision Workshops, November 2011, pp.601-608.
[97] Berge C. Hypergraphs:Combinatorics of Finite Sets (1st edition). North Holland, 1989.
[98] Liu T, Hertzmann A, Li W et al. Style compatibility for 3D furniture models. ACM Transactions on Graphics, 2015, 34(4):Article No. 85.
[1] Xiang Chen, Wei-Wei Xu, Sai-Kit Yeung, Kun Zhou. View-Aware Image Object Compositing and Synthesis from Multiple Sources [J]. , 2016, 31(3): 463-478.
[2] Nan Ding, Shu-De Zhou, and Zeng-Qi Sun. Histogram-Based Estimation of Distribution Algorithm: A Competent Method for Continuous Optimization [J]. , 2008, 23(1): 35-43 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved