Abstract We present a system that automatically recovers scene geometry and illumination from a video, providing a basis for various applications. Previous image based illumination estimation methods either require user interaction or external information in the form of a database. We adopt structure-from-motion and multi-view stereo for initial scene reconstruction, and then estimate an environment map represented by spherical harmonics (as these perform better than other bases). We also demonstrate several video editing applications that exploit the recovered geometry and illumination, including object insertion (e.g. for augmented reality), shadow detection, and video relighting.
This work was supported by the National Natural Science Foundation of China (NSFC) and the Israel Science Foundation (ISF),Joint NSFC-ISF Research Program under Grant No.61561146393,the National Natural Science Foundation of China under Grant No.61521002,a research grant from the Beijing Higher Institution Engineering Research Center,and the Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology.
Corresponding Authors: Kun Xu
About author: Bin Liu is a Ph.D.student in the Department of Computer Science and Technology,Tsinghua University, Beijing.He received his Bachelor's degree in computer science from the same university in 2013.His research interests include image and video editing.
Cite this article:
Bin Liu, Kun Xu, Ralph R. Martin.Static Scene Illumination Estimation from Video with Applications[J] Journal of Computer Science and Technology, 2017,V32(3): 430-442
 Kee E, O'brien J F, Farid H. Exposing photo manipulation from shading and shadows. ACM Transactions on Graphics, 2014, 33(5): 165:1-165:21. Karsch K, Hedau V, Forsyth D, Hoiem D. Rendering synthetic objects into legacy photographs. ACM Transactions on Graphics, 2011, 30(6): 157:1-157:12. Karsch K, Sunkavalli K, Hadap S, Carr N, Jin H, Fonte R, Sittig M, Forsyth D. Automatic scene inference for 3D object compositing. ACM Transactions on Graphics, 2014, 33(3): 32:1-32:15. Debevec P. Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography. In Proc. the 25th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), Jul. 1998, pp.189- 198. Chen X, Xu W W, Yeung S K, Zhou K. View-aware image object compositing and synthesis from multiple sources. Journal of Computer Science and Technology, 2016, 31(3): 463-478. Bell S, Bala K, Snavely N. Intrinsic images in the wild. ACM Transactions on Graphics, 2014, 33(4): 159:1-159:12. Shen J, Yang X, Jia Y, Li X. Intrinsic images using optimization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2011, pp.3481-3487. Bousseau A, Paris S, Durand F. User-assisted intrinsic images. ACM Transactions on Graphics, 2009, 28(5): 130:1- 130:10. Bi S, Han X, Yu Y. An L1 image transform for edgepreserving smoothing and scene-level intrinsic decomposition. ACM Transactions on Graphics, 2015, 34(4): 78:1- 78:12. Laffont P Y, Bousseau A, Drettakis G. Rich intrinsic image decomposition of outdoor scenes from multiple views. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(2): 210-224. Laffont P Y, Bazin J C. Intrinsic decomposition of image sequences from local temporal variations. In Proc. IEEE International Conference on Computer Vision (ICCV), Dec. 2015, pp.433-441. Bonneel N, Sunkavalli K, Tompkin J, Sun D, Paris S, Pfister H. Interactive intrinsic video editing. ACM Transactions on Graphics, 2014, 33(6): 197:1-197:10. Kong N, Gehler P V, Black M J. Intrinsic video. In Proc. the 13th European Conference (ECCV), Sept. 2014, pp.360- 375. Ye G, Garces E, Liu Y, Dai Q, Gutierrez D. Intrinsic video and applications. ACM Transactions on Graphics, 2014, 33(4): 80:1-80:11. Dong Y, Chen G, Peers P, Zhang J, Tong X. Appearancefrom-motion: Recovering spatially varying surface reflectance under unknown lighting. ACM Transactions on Graphics, 2014, 33(6): 193:1-193:12. Sato I, Sato Y, Ikeuchi K. Illumination from shadows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(3): 290-300. Panagopoulos A, Samaras D, Paragios N. Robust shadow and illumination estimation using a mixture model. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2009, pp.651-658. Ramanarayanan G, Ferwerda J, Walter B, Bala K. Visual equivalence: Towards a new standard for image fidelity. ACM Transactions on Graphics (TOG), 2007, 26(3): 76:1- 76:11. Khan E A, Reinhard E, Fleming R W, Bülthoff H H. Imagebased material editing. ACM Transactions on Graphics (TOG), 2006, 25(3): 654-663. Lalonde J F, Efros A A. Synthesizing environment maps from a single image. Technical Report CMURITR-10-24, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 2010. Chen T, Zhu Z, Shamir A, Hu S M, Cohen-Or D. 3- sweep: Extracting editable objects from a single photo. ACM Transactions on Graphics, 2013, 32(6): 195:1-195:10. Cao Y P, Ju T, Fu Z, Hu S M. Interactive image-guided modeling of extruded shapes. Computer Graphics Forum, 2014, 33(7): 101-110. Zheng Y, Chen X, Cheng M M, Zhou K, Hu S M, Mitra N J. Interactive images: Cuboid proxies for smart image manipulation. ACM Transactions on Graphics, 2012, 31(4): 99:1-99:11. Wu J, Rosin P L, Sun X, Martin R R. Improving shape from shading with interactive tabu search. Journal of Computer Science and Technology, 2016, 31(3): 450-462. Gupta A, Satkin S, Efros A A, Hebert M. From 3D scene geometry to human workspace. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2011, pp.1961-1968. Jung C, Kim C. Real-time estimation of 3D scene geometry from a single image. Pattern Recognition, 2012, 45(9): 3256- 3269. Zhu Z, Martin R R, Pepperell R, Burleigh A. 3D modeling and motion parallax for improved videoconferencing. Computational Visual Media, 2016, 2(2): 131-142. Horry Y, Anjyo K, Arai K. Tour into the picture: Using a spidery mesh interface to make animation from a single image. In Proc. the 24th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), Aug. 1997, pp.225-232. Saxena A, Sun M, Ng A Y. Make3D: Learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5): 824-840. Hoiem D, Efros A A, Hebert M. Automatic photo popup. ACM transactions on graphics, 2005, 24(3): 577-584. Longuet-Higgins H C. A computer algorithm for reconstructing a scene from two projections. In Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, Fischler M A, Firschein O (eds.), Morgan Kanfmann Publishers Inc., 1987, pp.61-62. Zhang G, Jia J, Wong T T, Bao H. Recovering consistent video depth maps via bundle optimization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2008, pp.1-8. Jiang H, Zhang G, Wang H, Bao H. Spatio-temporal video segmentation of static scenes and its applications. IEEE Transactions on Multimedia, 2015, 17(1): 3-15. Kopf J, Cohen M F, Szeliski R. First-person hyperlapse videos. ACM Transactions on Graphics, 2014, 33(4): 78:1- 78:10. Engel J, Schöps T, Cremers D. LSD-slam: Large-scale direct monocular slam. In Proc. the 13th ECCV, Sept. 2014, pp.834-849. Snavely N, Seitz S M, Szeliski R. Photo tourism: Exploring photo collections in 3D. ACM Transactions on Graphics, 2006, 25(3): 835-846. Agarwal S, Snavely N, Simon I, Seitz S M, Szeliski R. Building rome in a day. In Proc. IEEE International Conference on Computer Vision (ICCV), Sept. 2009, pp.72-79. Fuhrmann S, Langguth F, Goesele M. MVE - A multiview reconstruction environment. In Proc. the Eurographics Workshop on Graphics and Cultural Heritage (GCH), Oct. 2014, pp.11-18. Kronander J, Banterle F, Gardner A, Miandji E, Unger J. Photorealistic rendering of mixed reality scenes. Computer Graphics Forum, 2015, 34(2): 643-665. Kholgade N, Simon T, Efros A A, Sheikh Y. 3D object manipulation in a single photograph using stock 3D models. ACM Transactions on Graphics, 2014, 33(4): 127:1-127:12. Zhang G, Dong Z, Jia J, Wan L, Wong T T, Bao H. Refilming with depth-inferred videos. IEEE Transactions on Visualization and Computer Graphics, 2009, 15(5): 828- 840. Zhang G, Dong Z, Jia J, Wong T T, Bao H. Efficient nonconsecutive feature tracking for structure-from-motion. In Proc. the 11th ECCV, Sept. 2011, pp.422-435. Wu C. Towards linear-time incremental structure from motion. In Proc. International Conference on 3D Vision, Jun. 2013, pp.127-134. OpenMVS: Open multi-view stereo reconstruction library. http://cdcseacave.github.io/openMVS/, Mar. 2017. Cignoni P, Corsini M, Ranzuglia G. MeshLab: An opensource 3D mesh processing system. ERCIM News, 2008, 2008(73). Kajiya J T. The rendering equation. In Proc. the 13th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), Aug. 1986, pp.143-150. Jakob W. Mitsuba renderer. http://www.mitsubarenderer.org, Mar. 2017. Ramamoorthi R, Hanrahan P. On the relationship between radiance and irradiance: Determining the illumination from images of a convex Lambertian object. Journal of the Optical Society of America A, 2001, 18(10): 2448-2459. Ng R, Ramamoorthi R, Hanrahan P. All-frequency shadows using non-linear wavelet lighting approximation. ACM Transactions on Graphics, 2003, 22(3): 376-381. Hara K, Nishino K, Ikeuchi K. Multiple light sources and reflectance property estimation based on a mixture of spherical distributions. In Proc. IEEE International Conference on Computer Vision (ICCV), Oct. 2005, pp.1627-1634. Russell M, Zou J J, Fang G. An evaluation of moving shadow detection techniques. Computational Visual Media, 2016, 2(3): 195-217. P'erez P, Gangnet M, Blake A. Poisson image editing. ACM Transactions on Graphics, 2003, 22(3): 313-318. Guo R, Dai Q, Hoiem D. Paired regions for shadow detection and removal. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2956-2967. Gong H, Cosker D. Interactive shadow removal and ground truth for variable scene categories. In Proc. the British Machine Vision Conference (BMVC), Sept. 2014. Chen X, Jin X, Zhao Q, Wu H. Artistic illumination transfer for portraits. Computer Graphics Forum, 2012, 31(4): 1425-1434. Chen X, Wu H, Jin X, Zhao Q. Face illumination manipulation using a single reference image by adaptive layer decomposition. IEEE Transactions on Image Processing, 2013, 22(11): 4249-4259. Xing G, Zhou X, Peng Q, Liu Y, Qin X. Lighting simulation of augmented outdoor scene based on a legacy photograph. Computer Graphics Forum, 2013, 32(7): 101-110.
Copyright 2010 by Journal of Computer Science and Technology