RC-Net: 基于文本特征的行列约束平面图分割网络
RC-Net: Row and Column Network with Text Feature for Parsing Floor Plan Images
-
摘要:研究背景 随着科技的发展,在线装修设计、户型图预设计越来越受到人们的追捧,使设计师和用户之间的距离不断缩短。深度学习算法的不断发展,促进了基于深度学习的户型图矢量化、户型图生成、三维模型重建等任务的性能提升。此外,海量的历史户型图都是以纸质的形式存储的,使用深度学习对户型图进行重建能够将其转化成电子设计图,从而减轻户型图设计工程师的工作。目的 户型图中,房间类型和其标注的文本是一一对应的,因此使用文字分支提取文字的特征可以增强端到端的房间类型分割的准确率。此外,绝大部分户型图中的房间都是长方形的,而墙体都是水平或者垂直的,因此使用行-列约束的方法使特征在行和列上进行共享,能够减少大量的噪声点(即房间中会出现其他错误的分类预测)。我们提出的行-列约束模块不仅能够在行和列上进行特征的约束,而且能够将文字分支的特征映射到房间的预测中,从而加强房间类型的预测。方法 首先使用VGG-16网络提取输入户型图的特征,随后提取的特征分为两个分支:1)房间预测分支,该分支主要提取房间中的一些标识性物体(如浴缸、床和沙发等)的特征,此外该分支还能够提取窗户、门以及墙体分界线的特征;2)文字提取分支,该分支主要是提取房间中的文字信息,该信息与房间的类型是一一对应的。随后,将两个分支进行合并,这里使用行-列约束模块进行特征的共享,使得文字分支的房间特征在行和列上得到扩展。合并的分支将会再次分为房间类型预测和房间边界预测两个分支。房间类型只关注房间类型的预测,边界预测则是预测窗户、门和墙体,两个分支的特征不断的进行融合、自我约束,能够大大减少预测的噪点。结果 通过在R2V、R3D和CubiCase数据集上同Raster-to-Vector和DeepFloorPlan算法的对比,我们的框架达到了最优的结果。从分割的效果来看,我们的算法使用行-列约束模块能够大大减少房间内预测的噪声,边界也处理的更为平滑。在有文字的房间内,我们的算法拥有更高的预测准确率。此外,通过消融实验,我们得出使用行-列约束对房间的边界提供了更强的约束,能够使分割的边界更加明显,呈现一条较直的线,因而能够提升分割结果的精度。结论 户型图房间类型大多有文字信息,使用文字分支的特征能大大增强房间预测的准确率。其次,户型图的墙体多为水平或垂直的直线,房间多为规整的矩形,在深度学习中使用行-列约束能够大大减少房间分割中的噪点,从而实现端到端的准确分割。实验也表明了行-列约束网络能够更为准确地对户型图进行分割。Abstract: The popularity of online home design and floor plan customization has been steadily increasing. However, the manual conversion of floor plan images from books or paper materials into electronic resources can be a challenging task due to the vast amount of historical data available. By leveraging neural networks to identify and parse floor plans, the process of converting these images into electronic materials can be significantly streamlined. In this paper, we present a novel learning framework for automatically parsing floor plan images. Our key insight is that the room type text is very common and crucial in floor plan images as it identifies the important semantic information of the corresponding room. However, this clue is rarely considered in previous learning-based methods. In contrast, we propose the Row and Column network (RC-Net) for recognizing floor plan elements by integrating the text feature. Specifically, we add the text feature branch in the network to extract text features corresponding to the room type for the guidance of room type predictions. More importantly, we formulate the Row and Column constraint module (RC constraint module) to share and constrain features across the entire row and column of the feature maps to ensure that only one type is predicted in each room as much as possible, making the segmentation boundaries between different rooms more regular and cleaner. Extensive experiments on three benchmark datasets validate that our framework substantially outperforms other state-of-the-art approaches in terms of the metrics of FWIoU, mACC and mIoU.