We use cookies to improve your experience with our site.

RGB图像遮挡场景中的6D对象姿态估计

6D Object Pose Estimation in Cluttered Scenes from RGB Images

  • 摘要: 6D对象姿态估计对于许多现实世界的视觉和图形应用至关重要,例如机器人的抓取和操纵,自主导航和增强/混合现实等。理想情况下,好的解决方案应该能处理会产生形变的或各式纹理的对象,对严重的遮挡,传感器噪声和变化的照明条件具有鲁棒性,并达到实时速度。许多基于RGB-D的算法可以准确地推断出无纹理对象的姿态,但这对应用程序有某些限制,例如需要RGB-D传感器等硬件设施,也增加了计算的负担。这也限制了它们在日常生活情形中广泛的应用。 传统的仅依赖于RGB数据的方法对严重遮挡和照明变化剧烈的情况并不适应,难以满足准确的姿势估计要求。为此,我们提出了一种融合网络来结合几何与纹理的特征,尽可能减少重度遮挡对特征提取的影响。同时,我们将融合网络嵌入由分割流和回归流组成的双流网络,实现高精度的语义分割进行物体检测和高效的PnP算法进行3D-2D坐标对回归。最后,我们在主网络后设计了一个迭代优化模块,可以通过自校正进一步提升姿态估计的精度。我们在两个公开的、广泛使用且具有挑战性的数据集YCB-Video和Occluded-LineMOD上进行了对比实验,在精度和速度上的领先说明了我们提出方法的有效性。此外,我们还讨论了其他潜在的改进,对目前尚不能完美解决的同时重度遮挡加无纹理的情形提供了一个研究方向。也对我们的方法进行了拓展研究,可推广应用到更多实例,例如广告替换和墙面装饰推荐等多个领域。

     

    Abstract: We propose a feature-fusion network for pose estimation directly from RGB images without any depth information in this study. First, we introduce a two-stream architecture consisting of segmentation and regression streams. The segmentation stream processes the spatial embedding features and obtains the corresponding image crop. These features are further coupled with the image crop in the fusion network. Second, we use an efficient perspective-n-point (E-PnP) algorithm in the regression stream to extract robust spatial features between 3D and 2D keypoints. Finally, we perform iterative refinement with an end-to-end mechanism to improve the estimation performance. We conduct experiments on two public datasets of YCB-Video and the challenging Occluded-LineMOD. The results show that our method outperforms state-of-the-art approaches in both the speed and the accuracy.

     

/

返回文章
返回