We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Xu K, Li J, Li ZQ et al. SG-NeRF: Sparse-input generalized neural radiance fields for novel view synthesis. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(4): 785−797 July 2024. DOI: 10.1007/s11390-024-4157-6.
Citation: Xu K, Li J, Li ZQ et al. SG-NeRF: Sparse-input generalized neural radiance fields for novel view synthesis. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(4): 785−797 July 2024. DOI: 10.1007/s11390-024-4157-6.

SG-NeRF: Sparse-Input Generalized Neural Radiance Fields for Novel View Synthesis

Funds: This work is supported by the Zhengzhou Collaborative Innovation Major Project under Grant No. 20XTZX06013 and the Henan Provincial Key Scientific Research Project of China under Grant No. 22A520042.
More Information
  • Author Bio:

    Kuo Xu is pursuing his M.S. degree in computer science and technology at the Hanwei Internet of Things Research Institute, School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou. He received his Bachelor's degree in software engineering from Henan Polytechnic University, Jiaozuo, in 2021. His main research interests include 3D reconstruction and 3D generation

    Jie Li is an Endowed Chair Professor in computer science and engineering of Shanghai Jiao Tong University (SJTU), Shanghai. He is an IEEE fellow. His current research interests include big data, AI, blockchain, network systems and security, and smart city

    Zhen-Qiang Li is currently a Ph.D. candidate in software engineering at the School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou. He received his M.Eng. degree in modern equipment engineering from Huazhong Agricultural University, Wuhan, in 2020. His current research interests include 3D generation and 3D scene understanding

    Yang-Jie Cao is a professor, doctoral supervisor, and director of the Institute of Internet of Things Engineering, School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou. He got his Ph.D. degree from Xi'an Jiaotong University, Xi'an, in 2012. His research interests are machine intelligence and human-computer interaction, intelligent processing of big data, cloud computing, and high-performance computing

  • Corresponding author:

    caoyj@zzu.edu.cn

  • Received Date: January 28, 2024
  • Accepted Date: March 28, 2024
  • Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization, which limits their practical applications. We propose a generalization method to infer scenes from input images and perform high-quality rendering without pre-scene optimization named SG-NeRF (Sparse-Input Generalized Neural Radiance Fields). Firstly, we construct an improved multi-view stereo structure based on the convolutional attention and multi-level fusion mechanism to obtain the geometric features and appearance features of the scene from the sparse input images, and then these features are aggregated by multi-head attention as the input of the neural radiance fields. This strategy of utilizing neural radiance fields to decode scene features instead of mapping positions and orientations enables our method to perform cross-scene training as well as inference, thus enabling neural radiance fields to generalize for novel view synthesis on unseen scenes. We tested the generalization ability on DTU dataset, and our PSNR (peak signal-to-noise ratio) improved by 3.14 compared with the baseline method under the same input conditions. In addition, if the scene has dense input views available, the average PSNR can be improved by 1.04 through further refinement training in a short time, and a higher quality rendering effect can be obtained.

  • [1]
    Mildenhall B, Srinivasan P P, Tancik M, Barron J T, Ramamoorthi R, Ng R. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2022, 65(1): 99–106. DOI: 10.1145/3503250.
    [2]
    Yu A, Ye V, Tancik M, Kanazawa A. pixelNeRF: Neural radiance fields from one or few images. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.4576–4585. DOI: 10.1109/CVPR46437.2021.00455.
    [3]
    Trevithick A, Yang B. GRF: Learning a general radiance field for 3D representation and rendering. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.15162–15172. DOI: 10.1109/ICCV48922.2021.01490.
    [4]
    Li J X, Feng Z J, She Q, Ding H H, Wang C H, Lee G H. MINE: Towards continuous depth MPI with NeRF for novel view synthesis. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.12558–12568. DOI: 10.1109/ICCV48922.2021.01235.
    [5]
    Deng K D, Liu A, Zhu J Y, Ramanan D. Depth-supervised NeRF: Fewer views and faster training for free. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.12872–12881. DOI: 10.1109/CVPR52688.2022.01254.
    [6]
    Jain A, Tancik M, Abbeel P. Putting NeRF on a diet: Semantically consistent few-shot view synthesis. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.5865–5874. DOI: 10.1109/ICCV48922.2021.00583.
    [7]
    Chen A P, Xu Z X, Zhao F Q, Zhang X S, Xiang F B, Yu J Y, Su H. MVSNeRF: Fast generalizable radiance field reconstruction from multi-view stereo. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.14104–14113. DOI: 10.1109/ICCV48922.2021.01386.
    [8]
    Johari M M, Lepoittevin Y, Fleuret F. GeoNeRF: Generalizing NeRF with geometry priors. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.18344–18347. DOI: 10.1109/CVPR52688.2022.01782.
    [9]
    Jensen R, Dahl A, Vogiatzis G, Tola E, Aanæs H. Large scale multi-view stereopsis evaluation. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp.406–413. DOI: 10.1109/CVPR.2014.59.
    [10]
    De Bonet J S, Viola P. Poxels: Probabilistic voxelized volume reconstruction. In Proc. International Conference on Computer Vision, Sept. 1999, p.2.
    [11]
    Furukawa Y, Ponce J. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Analysis and Machine Intelligence, 2010, 32(8): 1362–1376. DOI: 10.1109/TPAMI.2009.161.
    [12]
    Kolmogorov V, Zabih R. Multi-camera scene reconstruction via graph cuts. In Proc. the 7th European Conference on Computer Vision, Copenhagen, May 2002, pp.82–96. DOI: 10.1007/3-540-47977-5_6.
    [13]
    Schönberger J L, Zheng E L, Frahm J M, Pollefeys M. Pixelwise view selection for unstructured multi-view stereo. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.501–518. DOI: 10.1007/978-3-319-46487-9_31.
    [14]
    Yao Y, Luo Z X, Li S W, Fang T, Quan L. MVSNet: Depth inference for unstructured multi-view stereo. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.785–801. DOI: 10.1007/978-3-030-01237-3_47.
    [15]
    Yao Y, Luo Z X, Li S W, Shen T W, Fang T, Quan L. Recurrent MVSNet for high-resolution multi-view stereo depth inference. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.5520–5529. DOI: 10.1109/CVPR.2019.00567.
    [16]
    Cheng S, Xu Z X, Zhu S L, Li Z W, Li L E, Ramamoorthi R, Su H. Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.2521–2531. DOI: 10.1109/CVPR42600.2020.00260.
    [17]
    Gu X D, Fan Z W, Zhu S Y, Dai Z Z, Tan F T, Tan P. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.2492–2501. DOI: 10.1109/CVPR42600.2020.00257.
    [18]
    Yang J Y, Mao W, Alvarez J M, Liu M M. Cost volume pyramid based depth inference for multi-view stereo. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, May 2020, pp.4876–4885. DOI: 10.1109/CVPR42600.2020.00493.
    [19]
    Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010. DOI: 10.5555/3295222.3295349.
    [20]
    Guo M H, Xu T X, Liu J J, Liu Z N, Jiang P T, Mu T J, Zhang S H, Martin R R, Cheng M M, Hu S M. Attention mechanisms in computer vision: A survey. Computational Visual Media, 2022, 8(3): 331–368. DOI: 10.1007/s41095- 022-0271-y.
    [21]
    Shmatko A, Ghaffari Laleh N, Gerstung M, Kather J N. Artificial intelligence in histopathology: Enhancing cancer research and clinical oncology. Nature Cancer, 2022, 3(9): 1026–1038. DOI: 10.1038/s43018-022-00436-4.
    [22]
    Li Y H, Mao H Z, Girshick R, He K M. Exploring plain vision transformer backbones for object detection. In Proc. the 17th European Conference on Computer Vision, Oct. 2022, pp.280–296. DOI: 10.1007/978-3-031-20077-9_17.
    [23]
    Kalantari N K, Wang T C, Ramamoorthi R. Learning-based view synthesis for light field cameras. ACM Trans. Graphics, 2016, 35(6): Article No. 193. DOI: 10.1145/2980179.2980251.
    [24]
    Srinivasan P P, Wang T Z, Sreelal A, Ramamoorthi R, Ng R. Learning to synthesize a 4D RGBD light field from a single image. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2262–2270. DOI: 10.1109/ICCV.2017.246.
    [25]
    Chen A P, Wu M Y, Zhang Y L, Li N Y, Lu J, Gao S H, Yu J Y. Deep surface light fields. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 2018, 1(1): 14. DOI: 10.1145/3203192.
    [26]
    Chaurasia G, Duchene S, Sorkine-Hornung O, Drettakis G. Depth synthesis and local warps for plausible image-based navigation. ACM Trans. Graphics, 2013, 32(3): 30. DOI: 10.1145/2487228.2487238.
    [27]
    Chaurasia G, Sorkine O, Drettakis G. Silhouette-aware warping for image-based rendering. In Proc. of the 22nd Eurographics conference on Rendering, Jun. 2011, pp.1223–1232. DOI: 10.1111/j.1467-8659.2011.01981.x.
    [28]
    Sinha S N, Steedly D, Szeliski R. Piecewise planar stereo for image-based rendering. In Proc. the 12th IEEE International Conference on Computer Vision, Sept. 29–Oct. 2, 2009, pp.1881–1888. DOI: 10.1109/ICCV.2009.5459417.
    [29]
    Zhou T H, Tucker R, Flynn J, Fyffe G, Snavely N. Stereo magnification: Learning view synthesis using multiplane images. ACM Trans. Graphics, 2018, 37(4): 65. DOI: 10.1145/3197517.3201323.
    [30]
    Choi I, Gallo O, Troccoli A, Kim M H, Kautz J. Extreme view synthesis. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.7780–7789. DOI: 10.1109/ICCV.2019.00787.
    [31]
    Mildenhall B, Srinivasan P P, Ortiz-Cayon R, Kalantari N K, Ramamoorthi R, Ng R, Kar A. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graphics, 2019, 38(4): 29. DOI: 10.1145/3306346.3322980.
    [32]
    Srinivasan P P, Tucker R, Barron J T, Ramamoorthi R, Ng R, Snavely N. Pushing the boundaries of view extrapolation with multiplane images. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.175–184. DOI: 10.1109/CVPR.2019.00026.
    [33]
    Huang J W, Thies J, Dai A G L, Kundu A, Jiang C Y, Guibas L J, Nießner M, Funkhouser T. Adversarial texture optimization from RGB-D scans. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1556–1565. DOI: 1 0.1109/CVPR42600.2020.00163.
    [34]
    Aliev K A, Sevastopolsky A, Kolos M, Ulyanov D, Lempitsky V. Neural point-based graphics. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.696–712. DOI: 10.1007/978-3-030-58542-6_42.
    [35]
    Meshry M, Goldman D B, Khamis S, Hoppe H, Pandey R, Snavely N, Martin-Brualla R. Neural rerendering in the wild. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.6871–6880. DOI: 10.1109/CVPR.2019.00704.
    [36]
    Barron J T, Mildenhall B, Tancik M, Hedman P, Martin-Brualla R, Srinivasan P P. MIP-NeRF: A multiscale representation for anti-aliasing neural radiance fields. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.5835–5844. DOI: 10.1109/ICCV48922.2021.00580.
    [37]
    DeVries T, Bautista M A, Srivastava N, Taylor G W, Susskind J M. Unconstrained scene generation with locally conditioned radiance fields. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.14284–14293. DOI: 10.1109/ICCV48922.2021.01404.
    [38]
    Martin-Brualla R, Radwan N, Sajjadi M S M, Barron J T, Dosovitskiy A, Duckworth D. NeRF in the wild: Neural radiance fields for unconstrained photo collections. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.7206–7215. DOI: 10.1109/CVPR46437.2021.00713.
    [39]
    Wang Q Q, Wang Z C, Genova K, Srinivasan P, Zhou H, Barron J T, Martin-Brualla R, Snavely N, Funkhouser T. IBRNet: Learning multi-view image-based rendering. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Jun. 2021, pp.4688–4697. DOI: 10.1109/CVPR46437.2021.00466.
    [40]
    Varma T M, Wang P H, Chen X X, Chen T L, Venugopalan S, Wang Z Y. Is attention all that NeRF needs? In Proc. the 11th International Conference on Learning Representations, May 2023.
    [41]
    Max N. Optical models for direct volume rendering. IEEE Trans. Visualization and Computer Graphics, 1995, 1(2): 99–108. DOI: 10.1109/2945.468400.
    [42]
    Schönberger J L, Frahm J M. Structure-from-motion revisited. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.4104–4113. DOI: 10.1109/CVPR.2016.445.
    [43]
    Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Processing, 2004, 13(4): 600–612. DOI: 10.1109/TIP.2003.819861.
    [44]
    Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.586–595. DOI: 10.1109/CVPR.2018.00068.
  • Related Articles

    [1]Qi-Tong Zhang, Shan Luo, Lei Wang, Jie-Qing Feng. CNLPA-MVS: Coarse-Hypotheses Guided Non-Local PAtchMatch Multi-View Stereo[J]. Journal of Computer Science and Technology, 2021, 36(3): 572-587. DOI: 10.1007/s11390-021-1299-7
    [2]Hui-Xuan Wang, Jing-Liang Peng, Shi-Yi Lu, Xin Cao, Xue-Ying Qin, Chang-He Tu. ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis[J]. Journal of Computer Science and Technology, 2021, 36(3): 494-507. DOI: 10.1007/s11390-021-1373-1
    [3]Xiang Chen, Wei-Wei Xu, Sai-Kit Yeung, Kun Zhou. View-Aware Image Object Compositing and Synthesis from Multiple Sources[J]. Journal of Computer Science and Technology, 2016, 31(3): 463-478. DOI: 10.1007/s11390-016-1640-8
    [4]Rajesh Narang, K.D. Sharma. View Creation for Queriesin Object Oriented Databases[J]. Journal of Computer Science and Technology, 1999, 14(4): 349-362.
    [5]Zhang Yongyue, Peng Zhenyun, You Suya, Xu Guangyou. A Multi-View Face Recognition System[J]. Journal of Computer Science and Technology, 1997, 12(5): 400-407.
    [6]Zhu Zhigang, Xu Guangyou. Neural Networks for Omni-View Road Image Understanding[J]. Journal of Computer Science and Technology, 1996, 11(6): 570-580.
    [7]Xia Yunjun. Theory and Practice of the Stereo-View on the CRT Screen[J]. Journal of Computer Science and Technology, 1996, 11(5): 519-528.
    [8]Peng Chenglian. Combining Gprof and Event-Driven Monitoring for Analyzing Distributed Programs:A Rough View of NCSA Mosaic[J]. Journal of Computer Science and Technology, 1996, 11(4): 427-432.
    [9]Ma Guangsheng, Zhang Zhongwei, Huang Shaobin. A New Method of Solving Kernels in Algebraic Decomposition for the Synthesis of Logic Cell Array[J]. Journal of Computer Science and Technology, 1995, 10(6): 569-573.
    [10]Chen Shicheng, Zhou Zhongyi. On Interrupt Strategy from the Point of View of System Efficiency[J]. Journal of Computer Science and Technology, 1987, 2(3): 217-225.
  • Others

  • Cited by

    Periodical cited type(1)

    1. Shao-Feng Zhao, Fang Wang, Bo Liu, et al. LayCO: Achieving Least Lossy Accuracy for Most Efficient RRAM-Based Deep Neural Network Accelerator via Layer-Centric Co-Optimization. Journal of Computer Science and Technology, 2023, 38(2): 328. DOI:10.1007/s11390-023-2545-y

    Other cited types(0)

Catalog

    Article views (209) PDF downloads (44) Cited by(1)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return