基于对象区域增强网络的场景解析

doi:10.1007/s11390-017-1751-x

基于对象区域增强网络的场景解析

Objectness Region Enhancement Networks for Scene Parsing

摘要

摘要: 近年来，语义分割得到了飞速的发展，然而现有的方法主要解决如何更好地识别和解析对象或实例。本文的任务是利用深度学习技术解决场景的语义理解问题。与现有方法不同，本文的目标不是构建一个全新的解析网络，而是通过提出一些有效的技术来推动现有场景解析算法的发展。对象区域增强是第一个有效的技术。它利用检测模块去生成带类别概率的对象区域，这些区域被用来直接对基准模型的解析特征进行加权。“额外背景类”作为一个特殊的类别被设计在检测和解析任务中收集困难像素和对象从而提高系统整体性能。在场景解析任务中，额外背景类依然有利于在训练中改进模型的性能。但是，在测试阶段一些像素可能被分配到这个并不存在的类别中。黑洞填充技术被提出用于避免这种算法缺陷带来的错分类问题。本文将这两个技术集成到整个解析框架中用于生成最终的解析结果。最终的统一框架被称为对象增强网络（OENet）。与当前的工作相比，OENet有效地改进了基准模型的性能，在MIT SceneParsing150数据集上获得了38.4的IoU和77.9%的像素准确率。方法的有效性同样在Cityscapes上得到验证。

Abstract: Semantic segmentation has recently witnessed rapid progress, but existing methods only focus on identifying objects or instances. In this work, we aim to address the task of semantic understanding of scenes with deep learning. Different from many existing methods, our method focuses on putting forward some techniques to improve the existing algorithms, rather than to propose a whole new framework. Objectness enhancement is the first effective technique. It exploits the detection module to produce object region proposals with category probability, and these regions are used to weight the parsing feature map directly. "Extra background" category, as a specific category, is often attached to the category space for improving parsing result in semantic and instance segmentation tasks. In scene parsing tasks, extra background category is still beneficial to improve the model in training. However, some pixels may be assigned into this nonexistent category in inference. Black-hole filling technique is proposed to avoid the incorrect classification. For verifying these two techniques, we integrate them into a parsing framework for generating parsing result. We call this unified framework as Objectness Enhancement Network (OENet). Compared with previous work, our proposed OENet system effectively improves the performance over the original model on SceneParse150 scene parsing dataset, reaching 38.4 mIoU (mean intersectionover-union) and 77.9% accuracy in the validation set without assembling multiple models. Its effectiveness is also verified on the Cityscapes dataset.

HTML全文

参考文献()

施引文献

资源附件()