基于全局信息的抠像技术
Natural Image Matting with Attended Global Context
-
摘要:研究背景 抠像(Image Matting)操作在形式上是指估算图像中前景对象的不透明度,它被广泛运用于电影制作中,如背景替换、图层分离和颜色校正等。现有的一些基于深度学习的抠像方法在捕获近距离空间信息方面表现良好,但无法捕获全局信息,而全局信息已被证明对提高抠图性能具有重要作用。这是因为被抠图像的分辨率可能高达几百万像素,受到感受野的大小限制,神经网络很难捕捉到全局信息。虽然对被抠图像进行均匀下采样可以缓解这一问题,但这会导致抠像性能下降。目的 我们研究的目标是抠像过程中压缩存储全局的前景和背景并利用注意力机制提取所需要的前景和背景信息,同时通过新的抠像框架获取更优质的前景。方法 文提出了一种基于全局信息的抠像技术。它可以从整幅图像中提取全局信息,并将它们压缩成适合神经网络学习的大小。具体来说,我们首先利用可变形采样层(deformable sampling)分别获得压缩后的前景和背景,然后从前景和背景中通过上下文注意模块(contextual attention layer)提取待预测区域相关的信息。此外我们提出的网络结构可以同时预测背景和透明系数alpha,通过多任务之间的信息共享帮助获得更优质的前景。结果 我们在composition-1k和alphamatting.com数据集上分别做了测试,在composition-1k数据集中我们在现有的方法中取得了最好的成绩,在alphamatting.com数据集中我们在绝对误差和(SAD)以及均方误差(MSE)指标中也取得了有竞争力的成绩。此外,我们的方法在通过预测的透明系数alpha提取前景的应用中,直观视觉效果优于现有的方法。结论 在本文中,我们提出了一种端到端的抠像网络,它可以从整个图像中获取和未知区域相关的全局信息并从中提取与待预测区域相关的信息。具体来说,我们使用可变形采样层来获得压缩后的前景和背景,然后使用上下文注意模块从前景和背景中定位与未知区域相关的信息。我们的方法可以同时预测背景和透明系数alpha以提取更纯净的前景。我们的方法在composition-1k和alphamatting.com数据集上都取得了较好的结果,并在通过预测的透明系数alpha提取前景的应用中,直观视觉效果优于现有的方法。综合实验表明本文提出的抠像框架有效提高了抠像应用的表现。目前的方法不能解决待预测区域几乎占全图的情况,我们将在未来的研究中继续探讨。Abstract: Image matting is to estimate the opacity of foreground objects from an image. A few deep learning based methods have been proposed for image matting and perform well in capturing spatially close information. However, these methods fail to capture global contextual information, which has been proved essential in improving matting performance. This is because a matting image may be up to several megapixels, which is too big for a learning-based network to capture global contextual information due to the limit size of a receptive field. Although uniformly downsampling the matting image can alleviate this problem, it may result in the degradation of matting performance. To solve this problem, we introduce a natural image matting with the attended global context method to extract global contextual information from the whole image, and to condense them into a suitable size for learning-based network. Specifically, we first leverage a deformable sampling layer to obtain condensed foreground and background attended images respectively. Then, we utilize a contextual attention layer to extract information related to unknown regions from condensed foreground and background images generated by a deformable sampling layer. Besides, our network predicts a background as well as the alpha matte to obtain more purified foreground, which contributes to better qualitative performance in composition. Comprehensive experiments show that our method achieves competitive performance on both Composition-1k and the alphamatting.com benchmark quantitatively and qualitatively.