|
计算机科学技术学报 ›› 2020,Vol. 35 ›› Issue (1): 61-71.doi: 10.1007/s11390-020-9803-z
所属专题: Computer Architecture and Systems
Osamu Tatebe1,*, Member, ACM, Shukuko Moriwake2, Yoshihiro Oyama3
Osamu Tatebe1,*, Member, ACM, Shukuko Moriwake2, Yoshihiro Oyama3
突发缓冲(Burst Buffer)已经成为实现高性能计算(HPC)突发流量I/O性能要求的重要方式之一。本文提出了Gfarm/BB,它是用于突发缓冲的一种文件系统,可以有效利用节点本地储存系统。虽然节点本地储存改善了储存性能,它们只在任务分配时可用。Gfarm/BB应该拥有更好的访问和元数据性能,并且应在任务执行之前按需构建。它利用文件描述符传递和远程直接内存访问(RDMA)提升读写性能。因为它是一个临时的文件系统,所以通过省略持续性和冗余提升了元数据性能。通过使用RDMA,与IP over InfiniBand(IPoIB)相比,写和读的带宽分别提升了1.7倍和2.2倍。在目录创建性能方面,它达到了每秒1.47万次操作,这比完全持续和冗余情况快14.4倍。Gfarm/BB的构建花了0.31秒,使用了2个节点。通过使用节点本地储存,IOR基准和ARGOT-IO应用I/O基准显示了可扩展的性能的提升。基于IOR写和读基准,Gfarm/BB的性能分别是BeeOND的2.6倍和2.4倍;基于ARGOT-IO基准,性能是其2.5倍。
[1] Bhimji W, Bard D, Romanus M et al. Accelerating science with the NERSC burst buffer early user program. In Proc. the 2016 Cray User Group, May 2016. [2] Bent J, Gibson G, Grider G, McClelland B, Nowoczynski P, Nunez J, Polte M, Wingate M. PLFS:A checkpoint filesystem for parallel applications. In Proc. the 2009 ACM/IEEE Conference on High Performance Computing Networking, Storage and Analysis, Nov. 2009, Article No. 6. [3] Nisar A, Liao W, Choudhary A. Delegation-based I/O mechanism for high performance computing systems. IEEE Trans. Parallel and Distributed Systems, 2012, 23(2):271-279. [4] Tatebe O, Hiraga K, Soda N. Gfarm grid file system. New Generation Computing, 2010, 28(3):257-275. [5] Callaghan B, Lingutla-Raj T, Chiu A, Staubach P, Asad O. NFS over RDMA. In Proc. the ACM SIGCOMM Workshop on Network-I/O Convergence:Experience, Lessons, Implications, August 2003, pp.196-208. [6] Talpey T, Callaghan B. Remote direct memory access transport for remote procedure call. https://tools.ietf.org/html/rfc5666,Sept.2019. [7] Talpey T, Callaghan B. Network file system (NFS) direct data placement. https://tools.ietf.org/html/rfc5667,Sept.2019. [8] Islam N S, Rahman M W, Jose J, Rajachandrasekar R, Wang H, Subramoni H, Murthy C, Panda D K. High performance RDMA-based design of HDFS over InfiniBand. In Proc. the 2012 Int. Conference on High Performance Computing, Networking, Storage and Analysis, November 2012, Article No. 35. [9] Cooper B F, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In Proc. the 1st ACM Symp. Cloud Computing, June 2010, pp.143-154. [10] Sasaki S, Takahashi K, Oyama Y, Tatebe O. RDMA-based direct transfer of file data to remote page cache. In Proc. the 2015 IEEE Int. Conference on Cluster Computing, September 2015, pp.214-225. [11] Rajachandrasekar R, Moody A, Mohror K, Panda D K. A 1 PB/s file system to checkpoint three million MPI tasks. In Proc. the 22nd Int. Symp. High-performance Parallel and Distributed Computing, June 2013, pp.143-154. [12] Moody A, Bronevetsky G, Mohror K, de Supinski B R. Design, modeling, and evaluation of a scalable multi-level checkpointing system. In Proc. the 2010 ACM/IEEE Int. Conference for High Performance Computing, Networking, Storage and Analysis, November 2010, Article No. 22. [13] Wang T, Mohror K, Moody A, Sato K, Yu W K. An ephemeral burst-buffer file system for scientific applications. In Proc. the 2016 Int. Conference for High Performance Computing, Networking, Storage and Analysis, November 2016, pp.807-818. [14] Greenberg H, Bent J, Grider G. MDHIM:A parallel key/value framework for HPC. In Proc. the 7th USENIX Workshop on Hot Topics in Storage and File Systems, July 2015, Article No. 10. [15] Wang T, Moody A, Zhu Y, Mohror K, Sato K, Islam T, Yu W. MetaKV:A key-value store for metadata management of distributed burst buffers. In Proc. the 2017 IEEE Int. Parallel and Distributed Processing Symp., May 2017, pp.1174-1183. [16] Vazhkudai S S, de Supinski B R, Bland A S et al. The design, deployment, and evaluation of the CORAL preexascale systems. In Proc. the 2018 Int. Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 52. [17] Hilland J, Culley P, Pinkerton J, Recio R. RDMA Protocol Verbs Specification. https://tools.ietf.org/html/drafthilland-rddp-verbs-00,Sept.2019. [18] Vangoor B K R, Tarasov V, Zadok E. To FUSE or not to FUSE:Performance of user-pace file systems. In Proc. the 15th USENIX Conference on File and Storage Technologies, February 2017, pp.59-72. |
[1] | André Brinkmann, Kathryn Mohror, Weikuan Yu, Philip Carns, Toni Cortes, Scott A. Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B. Ross, Marc-André Vef. 高性能计算专用文件系统[J]. 计算机科学技术学报, 2020, 35(1): 4-26. |
[2] | Marc-André Vef, Nafiseh Moti, Tim Süß, Markus Tacke, Tommaso Tocci, Ramon Nou, Alberto Miranda, Toni Cortes, André Brinkmann. GekkoFS—一种用于高性能计算应用的临时突发缓冲文件系统[J]. 计算机科学技术学报, 2020, 35(1): 72-91. |
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |