State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
2.
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
Funds: The work is supported in part by NKRDP (2021YFB0300202), the National Key R&D Program of China
under Grants No. 2022YFB4500403, National Natural Science Foundation of China, under Grant No. 62202454, 62032023, T2125013.
Base-calling is an essential step in the analysis of third-generation genome data. Many previous hardware efforts aimed at enhancing processing in the workflow. However, an order of magnitude throughput gap still exists. In this paper, we propose FuHsi to improve the end-to-end throughput of the base-calling process. FuHsi is an in-cache accelerator that only introduces three components to the traditional CPUs in the sequencer. We propose FuHsi Cache, which offloads the bottleneck operations to cache arithmetic. Specifically, we accelerate beam search, string conversion, and MAC using algorithm/hardware co-design. We also introduce FuHsi APIs and FuHsi Controller to provide coarse-grained control for FuHsi Cache. Experimental results show that FuHsi can achieve 45.7×, 113.1×, and 100×throughput per watt speedup when compared with an NVIDIA Jetson baseline, an NVIDIA A100 GPU baseline, and the Helix accelerator, respectively. FuHsi can provide base-calling requests for up to 15 ONT sequencers simultaneously.