We use cookies to improve your experience with our site.

使用多层数据缓冲和预取实现I/O加速

I/O Acceleration via Multi-Tiered Data Buffering and Prefetching

  • 摘要: 现代高性能计算(HPC)系统为内存和存储层次结构增加额外的层级,称为深度内存和存储层次结构(DMSH),提升了I/O性能。突发缓冲装置引入了新硬件技术,如NVMe和SSD,以减少外部存储器压力,提升现代I/O系统的突发性。DMSH已经在实践中证实了其实力和潜力。然而,DMSH每个层级均为一个独立的异构系统,显然,即便不考虑其异质性,数据在越多层次间移动,就越复杂。如何有效使用DMSH成为HPC领域所面临的一个重要研究问题。此外,以高吞吐量和低延迟方式访问数据的需求比以往更加迫切。数据预取(prefetching)为一种众所周知的隐藏读延迟的技术,其实现方式为在需要数据从高延迟介质(如磁盘)转移到低延迟介质(如主储存器)之前请求数据。然而,现有的解决方案并未考虑新深度内存(deep memory)和存储层次结构,对预取资源利用不足,并有不必要的驱赶(eviction)。此外,现有的方法实现客户端拉取(client-pull)模型,也就是对应用的I/O行为的理解影响预取决策。面对逐步来临的百万兆级规模,也就是机器通过访问一个工作流中的文件并发运行多个应用,人们通常一种更加以数据为中心的方法以解决诸如缓存污染与冗余等问题。本文介绍了Hermes的设计与运用,它是一个新的、异构感知的、多层的、动态和分布式的I/O缓冲系统。Hermes启动,管理,监督,并在某种程度上扩充了I/O缓存,使之完全集成至DMSH。我们介绍了三种新的数据放置原则,以有效利用所有层次,并为执行分级缓冲系统里的内存、元数据和通信管理提出了三种新技术。此外,我们证明了采用服务器推动方法进行数据预取的真正层次式的数据预取器的好处。评估显示,除了层次间的自动数据移动,Hermes可以显著加速I/O,性能优于当前最优缓冲平台2倍多。实验结果显示其性能比现有预取器高10%-35%,比非预取系统性能高50%多。

     

    Abstract: Modern High-Performance Computing (HPC) systems are adding extra layers to the memory and storage hierarchy, named deep memory and storage hierarchy (DMSH), to increase I/O performance. New hardware technologies, such as NVMe and SSD, have been introduced in burst buffer installations to reduce the pressure for external storage and boost the burstiness of modern I/O systems. The DMSH has demonstrated its strength and potential in practice. However, each layer of DMSH is an independent heterogeneous system and data movement among more layers is significantly more complex even without considering heterogeneity. How to efficiently utilize the DMSH is a subject of research facing the HPC community. Further, accessing data with a high-throughput and low-latency is more imperative than ever. Data prefetching is a well-known technique for hiding read latency by requesting data before it is needed to move it from a high-latency medium (e.g., disk) to a low-latency one (e.g., main memory). However, existing solutions do not consider the new deep memory and storage hierarchy and also suffer from under-utilization of prefetching resources and unnecessary evictions. Additionally, existing approaches implement a client-pull model where understanding the application's I/O behavior drives prefetching decisions. Moving towards exascale, where machines run multiple applications concurrently by accessing files in a workflow, a more data-centric approach resolves challenges such as cache pollution and redundancy. In this paper, we present the design and implementation of Hermes:a new, heterogeneous-aware, multi-tiered, dynamic, and distributed I/O buffering system. Hermes enables, manages, supervises, and, in some sense, extends I/O buffering to fully integrate into the DMSH. We introduce three novel data placement policies to efficiently utilize all layers and we present three novel techniques to perform memory, metadata, and communication management in hierarchical buffering systems. Additionally, we demonstrate the benefits of a truly hierarchical data prefetcher that adopts a server-push approach to data prefetching. Our evaluation shows that, in addition to automatic data movement through the hierarchy, Hermes can significantly accelerate I/O and outperforms by more than 2x state-of-the-art buffering platforms. Lastly, results show 10%-35% performance gains over existing prefetchers and over 50% when compared to systems with no prefetching.

     

/

返回文章
返回