适用于移动系统中的同步定位与建图硬件加速器

樊哲; 郝一帆; 支天; 郭崎; 杜子东

doi:10.1007/s11390-021-1523-5

适用于移动系统中的同步定位与建图硬件加速器

Hardware Acceleration for SLAM in Mobile Systems

摘要

摘要:
同步定位与建图（SLAM）是机器智能化道路上的一个重要问题，在移动智能系统中更是扮演着重要角色。各种SLAM算法被提出解决该问题，但现存硬件平台在运行SLAM算法时难以满足移动系统对实时性和低功耗的要求。虽然设计专门的硬件来解决性能和功耗问题是十分有前景的，但是SLAM算法的多样性对这样的硬件设计提出了巨大的挑战。目前相关硬件工作都仅仅针对特定的SLAM算法或SLAM算法中特定阶段进行加速，却无法涵盖多类SLAM算法。本文旨在设计出一种既能够满足实时性、低功耗要求，又能支持各种SLAM算法的通用SLAM硬件加速器。为了实现这一目的，我们分析了几类有代表性的SLAM算法，发现SLAM算法中存在大量不同的运算模式且具有不规整的控制流。于是我们设计了一个包含矩阵、向量、标量三种不同粒度运算单元的加速器以覆盖SLAM算法多样的运算模式；同时设计了一种层次化的指令集，简化SLAM算法的控制流并支持更多SLAM算法。我们以几种有代表性的SLAM算法作为测试用例，以Intel i7-3770和ARM Cortex A57作为比较对象进行实验。结果表明，相对于Intel处理器性能，本文加速器性能提升10.52倍，能耗减少112.62倍；相对于ARM处理器，本文加速器性能提升33.03倍，能耗减少62.64倍。

Abstract: The emerging mobile robot industry has spurred a flurry of interest in solving the simultaneous localization and mapping (SLAM) problem. However, existing SLAM platforms have difficulty in meeting the real-time and low-power requirements imposed by mobile systems. Though specialized hardware is promising with regard to achieving high performance and lowering the power, designing an efficient accelerator for SLAM is severely hindered by a wide variety of SLAM algorithms. Based on our detailed analysis of representative SLAM algorithms, we observe that SLAM algorithms advance two challenges for designing efficient hardware accelerators: the large number of computational primitives and irregular control flows. To address these two challenges, we propose a hardware accelerator that features composable computation units classified as the matrix, vector, scalar, and control units. In addition, we design a hierarchical instruction set for coping with a broad range of SLAM algorithms with irregular control flows. Experimental results show that, compared against an Intel x86 processor, on average, our accelerator with the area of 7.41 mm² achieves 10.52x and 112.62x better performance and energy savings, respectively, across different datasets. Compared against a more energy-efficient ARM Cortex processor, our accelerator still achieves 33.03x and 62.64x better performance and energy savings, respectively.

HTML全文

参考文献()

施引文献

资源附件()