? 基于深度卷积神经网络和条件随机场的时空一致性深度恢复
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (3) :443-456    DOI: 10.1007/s11390-017-1735-x
Special Section of CVM 2017 << Previous Articles | Next Articles >>
基于深度卷积神经网络和条件随机场的时空一致性深度恢复
Xu-Ran Zhao, Xun Wang*, Senior Member, CCF, Member, ACM, IEEE, Qi-Chao Chen
School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China
Temporally Consistent Depth Map Prediction Using Deep CNN and Spatial-temporal Conditional Random Field
Xu-Ran Zhao, Xun Wang*, Senior Member, CCF, Member, ACM, IEEE, Qi-Chao Chen
School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China

摘要
参考文献
相关文章
Download: [PDF 3384KB]  
摘要 基于深度卷积神经网络的方法近年来持续刷新着单目图像深度恢复任务的精确度记录。然而当处理基于视频的深度恢复应用,例如影视作品的2D转3D问题时,由于现存的方法都是针对单帧图像进行神经网络的优化,恢复出的深度图经常会出现时域上的不连续性。本文提出了一种新型的时空一致性条件随机场模型,对相邻帧的估计深度图进行时域上的约束,并且集成到深度神经网络模型中。我们首先使用时域一致性超像素分割的方法来建立相邻帧间对应物体的联系,然后使用卷积神经网络对每一个超像素回归出一个单一的深度值。之后,我们提出了时空一致性条件随机场模型,对这些超像素回归深度值在时域和空域上的连续性同时进行约束。卷积神经网络和条件随机场中的参数可以通过反向传播同时更新,实现端对端的学习。实验结果表明,我们提出的方法对比基于单帧图像的方法不仅明显的提高了深度恢复的时域连续性,也提高了深度恢复的精度。
关键词深度恢复   时空一致性   卷积神经网络   条件随机场     
Abstract: Deep convolutional neural networks (DCNN) based methods recently keep setting new records on tasks of predicting depth maps from monocular images. When dealing with video-based applications such as 2D to 3D video conversion, however, these approaches tend to produce temporally inconsistent depth maps, since their CNN models are optimized over single frames. In this paper, we address this problem by introducing a novel spatial-temporal Conditional Random Fields (CRF) model into the DCNN architecture, which is able to enforce temporal consistency between depth map estimations over consecutive video frames. In our approach, temporally consistent superpixel (TSP) is first applied to an image sequence to establish correspondence of targets in consecutive frames. A DCNN network is then used to regress the depth value of each temporal superpixel, followed by a spatial-temporal CRF layer to model the relationship of the estimated depths in both spatial and temporal domain. The parameters in both DCNN and CRF models are jointly optimized with back propagation. Experimental results show that our approach not only is able to significantly enhance the temporal consistency of estimated depth maps over existing single-frame-based approaches, but also improves the depth estimation accuracy in terms of various evaluation metrics.
Keywordsdepth estimation   temporal consistency   convolutional neural networks   conditional random fields     
Received 2016-12-23;
本文基金:

This work is supported in part by the Natural Science Foundation of Zhejiang Province of China under Grant No.LQ17F030001,the National Natural Science Foundation of China under Grant No.U1609215,Qianjiang Talent Program of Zhejiang Province of China under Grant No.QJD1602021,the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant No.2014BAK14B01,and Beihang University Virtual Reality Technology and System National Key Laboratory Open Project under Grant No.BUAA-VR-16KF-17.

通讯作者: Xun Wang     Email: wx@zjgsu.edu.cn
About author: Xu-Ran Zhao is currently an assistant professor at the School of Computer Science and Information Engineering, Zhejiang Gongshang University, Hangzhou. He received his B.S. degree in electronic and information technologies from Shanghai University, Shanghai, and M.S. degree in electrical and computer engineering from Georgia Institute of Technology, Atlanta, in 2006 and 2010 respectively. He received his Ph.D. degree from Telecom ParisTech, Paris, in 2013. During 2014~2016, he worked as a postdoctoral researcher on machine learning in School of Computer Science at Aalto University, Helsinki. His current research interests include pattern recognition, computer vision and biometric recognition.
引用本文:   
Xu-Ran Zhao, Xun Wang, Qi-Chao Chen.基于深度卷积神经网络和条件随机场的时空一致性深度恢复[J]  Journal of Computer Science and Technology , 2017,V32(3): 443-456
Xu-Ran Zhao, Xun Wang, Qi-Chao Chen.Temporally Consistent Depth Map Prediction Using Deep CNN and Spatial-temporal Conditional Random Field[J]  Journal of Computer Science and Technology, 2017,V32(3): 443-456
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-017-1735-x
Copyright 2010 by Journal of Computer Science and Technology