|
计算机科学技术学报 ›› 2020,Vol. 35 ›› Issue (1): 145-160.doi: 10.1007/s11390-020-9822-9
所属专题: Computer Architecture and Systems
Suren Byna1,*, M. Scot Breitenfeld2, Bin Dong1, Quincey Koziol1, Elena Pourmal2, Dana Robinson2, Jerome Soumagne2, Houjun Tang1, Venkatram Vishwanath3, Richard Warren2
Suren Byna1,*, M. Scot Breitenfeld2, Bin Dong1, Quincey Koziol1, Elena Pourmal2, Dana Robinson2, Jerome Soumagne2, Houjun Tang1, Venkatram Vishwanath3, Richard Warren2
百万兆级的科学应用产生并分析了大量数据。此类应用急需有效访问和管理百万兆次系统中的数据。并行I/O,是使得数据能在计算结点和存储间移动的关键技术。它面临来自百万兆级系统设计中应考虑的新应用、内存和存储系统结构所产生的巨大挑战。随着存储层次结构不断扩展,包括了结点本地持久内存、突发缓存等,以及基于磁盘的存储,这些层次间的数据移动必须是有效的。将来的并行I/O库应能处理兆字节及以上的大小的文件。本文描述了分层数据格式版本5(Hierarchical Data Format version 5,HDF5)中研发的新功能。HDF5为最流行的用于科学应用的平行I/O库,是现有HPC系统中执行并行I/O的主导计算设施所使用的最常用函数库之一。我们描述的具有代表性的特征包括:虚拟对象层(VOL),数据电梯(Data Elevator),异步I/O,全功能单写多读(Full SWMR),以及并行查询。本文我们介绍了这些特征及其实现,以及它们的性能和能为应用和其它函数库所能带来的好处。
[1] Folk M, Heber G, Koziol Q, Pourmal E, Robinson D. An overview of the HDF5 technology suite and its applications. In Proc. the 2011 EDBT/ICDT Workshop on Array Databases, March 2011, pp.36-47. [2] Li J W, Liao W K, Choudhary A N et al. Parallel netCDF:A high-performance scientific I/O interface. In Proc. the 2003 ACM/IEEE Conference on Supercomputing, November 2003, Article No. 39. [3] Lofstead J, Zheng F, Klasky S, Schwan K. Adaptable, metadata rich IO methods for portable high performance IO. In Proc. the 23rd IEEE International Symposium on Parallel Distributed Processing, May 2009, Article No. 44. [4] Dong B, Byna S, Wu K S et al. Data elevator:Lowcontention data movement in hierarchical storage system. In Proc. the 23rd IEEE International Conference on High Performance Computing, December 2016, pp.152-161. [5] Dong B, Wang T, Tang H, Koziol Q, Wu K, Byna S. ARCHIE:Data analysis acceleration with array caching in hierarchical storage. In Proc. the 2018 IEEE International Conference on Big Data, December 2018, pp.211-220. [6] Seo S, Amer A, Balaji P et al. Argobots:A lightweight lowlevel threading and tasking framework. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(3):512-526. [7] Wu K. FastBit:An efficient indexing technology for accelerating data-intensive science. Journal of Physics:Conference Series, 2005, 16(16):556-560. [8] Racah E, Beckham C, Maharaj T, Kahou S E, Prabhat, Pal C. ExtremeWeather:A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events. In Proc. the 31st Annual Conference on Neural Information Processing Systems, December 2017, pp.3402-3413. [9] Byna S, Chou J C Y, Rübel O et al. Parallel I/O, analysis, and visualization of a trillion particle simulation. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, November 2012, Article No. 59. [10] Chen J H, Choudhary A, de Supinski B et al. Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science & Discovery, 2009, 2(1). [11] Dong B, Wu K S, Byna S, Liu J L, Zhao W J, Rusu F. ArrayUDF:User-defined scientific data analysis on arrays. In Proc. the 26th International Symposium on HighPerformance Parallel and Distributed Computing, June 2017, pp.53-64. |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |