|
计算机科学技术学报 ›› 2020,Vol. 35 ›› Issue (1): 72-91.doi: 10.1007/s11390-020-9797-6
所属专题: Computer Architecture and Systems
Marc-André Vef1, Nafiseh Moti1, Tim Sü?1, Markus Tacke1, Tommaso Tocci2, Ramon Nou2, Alberto Miranda2, Toni Cortes2,3, André Brinkmann1, Member, ACM
Marc-André Vef1, Nafiseh Moti1, Tim Sü?1, Markus Tacke1, Tommaso Tocci2, Ramon Nou2, Alberto Miranda2, Toni Cortes2,3, André Brinkmann1, Member, ACM
越来越多的科学领域使用高性能计算(HPC)加工并分析大量实验数据,而在当今HPC环境下,储存系统必须能应对新的访问模式。这些模式包括很多元数据操作、小I/O请求,或者随机文件输入输出(I/O)。通用并行文件系统被优化,以实现对大文件的连续共享访问。突发缓冲文件系统为应用创建一个独立文件系统用以存储临时数据。它们在计算节点内聚合了可用的节点本地储存,或使用专用SSD集群,并提供一个比不干涉情况下后端并行文件系统的峰值带宽更高的峰值带宽。然而,突发缓冲文件系统提供了许多在有限时间内独立运行的科学应用不需要的特征。我们提出了GekkoFS,一个临时、高可扩展的文件系统,它已经针对上述提及的使用场景进行了有针对性的优化。GekkoFS提供了非严格的POSIX语义,此语义仅提供了大多数(非全部)应用实际需要的特征。因此,GekkoFS能够提供可扩展的I/O性能,并能在少量节点上完成数百万的元数据操作,明显优于通用并行文件系统。
[1] Hey T, Tansley S, Tolle K M. The Fourth Paradigm:DataIntensive Scientific Discovery (1st edition). Microsoft Research, 2009. [2] Ross R, Thakur R, Choudhary A. Achievements and challenges for I/O in computational science. Journal of Physics:Conference Series, 2005, 16(1):501-509. [3] Nieuwejaar N, Kotz D, Purakayastha A, Ellis C S, Best M L. File-access characteristics of parallel scientific workloads. IEEE Trans. Parallel Distrib. Syst., 1996, 7(10):1075-1089. [4] Wang F, Xin Q, Hong B, Brandt S A, Miller E, Long D, McLarty T. File system workload analysis for large scientific computing applications. In Proc. the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies, April 2004, pp.139-152. [5] Crandall P, Aydt R A, Chien A A, Reed D A. Input/output characteristics of scalable parallel applications. In Proc. the 1995 Supercomputing, December 1995, Article No. 59. [6] Dorier M, Antoniu G, Ross R B, Kimpe D, Ibrahim S. CALCioM:Mitigating I/O interference in HPC systems through cross-application coordination. In Proc. the 28th IEEE International Parallel and Distributed Processing Symposium, May 2014, pp.155-164. [7] Thapaliya S, Bangalore P, Lofstead J F, Mohror K, Moody A. Managing I/O interference in a shared burst buffer system. In Proc. the 45th International Conference on Parallel Processing, August 2016, pp.416-425. [8] Lofstead J F, Klasky S, Schwan K, Podhorszki N, Jin C. Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In Proc. the 6th International Workshop on Challenges of Large Applications in Distributed Environments, June 2008, pp.15-24. [9] Folk M, Cheng A, Yates K. HDF5:A file format and I/O library for high performance computing applications. In Proc. the 1999 Supercomputing (CD-ROM), November 1999, pp.5-33. [10] Liu N, Cope J, Carns P H, Carothers C D, Ross R B, Grider G, Crume A, Maltzahn C. On the role of burst buffers in leadership-class storage systems. In Proc. the 28th IEEE Symposium on Mass Storage Systems and Technologies, April 2012, Article No. 5. [11] Wang T, Mohror K, Moody A, Sato K, Yu W. An ephemeral burst-buffer file system for scientific applications. In Proc. the 2016 International Conference for High Performance Computing, November 2016, pp.807-818. [12] Bent J, Gibson G A, Grider G, McClelland B, Nowoczynski P, Nunez J, Polte M, Wingate M. PLFS:A checkpoint filesystem for parallel applications. In Proc. the 2009 ACM/IEEE Conference on High Performance Computing, November 2009, Article No. 26. [13] Vilayannur M, Nath P, Sivasubramaniam A. Providing tunable consistency for a parallel file store. In Proc. the 2005 Conference on File and Storage Technologies, December 2005, Article No. 3. [14] Lensing P H, Cortes T, Hughes J, Brinkmann A. File system scalability with highly decentralized metadata on independent storage devices. In Proc. the 16th the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2016, pp.366-375. [15] Soumagne J, Kimpe D, Zounmevo J A, Chaarawi M, Koziol Q, Afsahi A, Ross R B. Mercury:Enabling remote procedure call for high-performance computing. In Proc. the 2013 IEEE International Conference on Cluster Computing, September 2013, Article No. 50. [16] Seo S, Amer A, Balaji P, Bordage C et al. Argobots:A lightweight low-level threading and tasking framework. IEEE Trans. Parallel Distrib. Syst., 2018, 29(3):512-526. [17] Carns P H, Jenkins J, Cranor C D, Atchley S, Seo S, Snyder S, Ross R B. Enabling NVM for data-intensive scientific services. In Proc. the 4th Workshop on Interactions of NVM/Flash with Operating Systems and Workloads, November 2016, Article No. 4. [18] Jasak H, Jemcov A, Tukovic Z et al. OpenFOAM:A C++ library for complex physics simulations. In Proc. the International Workshop on Coupled Methods in Numerical Dynamics, September 2007, Article No. 3. [19] Vef M, Moti N, Süß T, Tocci T, Nou R, Miranda A, Cortes T, Brinkmann A. GekkoFS-A temporary distributed file system for HPC applications. In Proc. the 2018 IEEE International Conference on Cluster Computing, September 2018, pp.319-324. [20] Schmuck F B, Haskin R L. GPFS:A shared-disk file system for large computing clusters. In Proc. the 2002 Conference on File and Storage Technologies, January 2002, pp.231-244. [21] Braam P J, Schwan P. Lustre:The intergalactic file system. In Proc. the 2002 Ottawa Linux Symposium, June 2002, pp.50-54. [22] Qian Y, Li X, Ihara S, Zeng L, Kaiser J, Süß T, Brinkmann A. A configurable rule based classful token bucket filter network request scheduler for the Lustre file system. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 6. [23] Herold F, Breuner S. An introduction to BeeGFS. https://www.beegfs.io/docs/whitepapers/Introduction_to_BeeGFS_by_ThinkParQ.pdf,August 2019. [24] Ross R B, Latham R. PVFS-PVFS:A parallel file system. In Proc. the 2006 ACM/IEEE Conference on High Performance Networking and Computing, November 2006, Article No. 34. [25] Oral S, Shah G. Spectrum scale enhancements for CORAL. http://files.gpfsug.org/presentations/2016/SC16/11_Sarp_Oral_Gautam_Shah_Spectrum_Scale_Enhancements_for_CORAL_v2.pdf,August 2019. [26] Kougkas A, Devarajan H, Sun X. Hermes:A heterogeneousaware multi-tiered distributed I/O buffering system. In Proc. the 27th International Symposium on HighPerformance Parallel and Distributed Computing, June 2018, pp.219-230. [27] Latham R, Ross R B, Thakur R. The impact of file systems on MPI-IO scalability. In Proc. the 11th European PVM/MPI Users' Group Meeting, September 2004, pp.87-96. [28] Choudhary A, Liao W K, Gao K, Nisar A, Ross R, Thakur R, Latham R. Scalable I/O and analytics. Journal of Physics:Conference Series, 2009, 180(1):Article No. 012048. [29] Moore M, Bonnie D, Ligon B, Marshall M, Ligon W, Mills N, Quarles E, Sampson S, Yang S, Wilson B. OrangeFS:Advancing PVFS. https://www.usenix.org/legacy/event/fast11/posters_files/Moore.pdf,August 2019. [30] Ritchie D, Thompson K. The UNIX time-sharing system (reprint). Commun. ACM, 1983, 26(1):84-89. [31] Vef M A, Tarasov V, Hildebrand D, Brinkmann A. Challenges and solutions for tracing storage systems:A case study with spectrum scale. ACM Trans. Storage, 2018, 14(2):Article No. 18. [32] Patil S, Gibson G A. Scale and concurrency of GIGA+:File system directories with millions of files. In Proc. the 9th USENIX Conference on File and Storage Technologies, February 2011, pp.177-190. [33] Ren K, Zheng Q, Patil S, Gibson G A. IndexFS:Scaling file system metadata performance with stateless caching and bulk insertion. In Proc. the 2014 International Conference for High Performance Computing, November 2014, pp.237-248. [34] Carns P, Yao Y, Harms K, Latham R, Ross R, Antypas K. Production I/O characterization on the Cray XE6. In Proc. the Cray User Group Meeting, May 2013, Article No. 121. [35] Xing J, Xiong J, Sun N, Ma J. Adaptive and scalable metadata management to support a trillion files. In Proc. the 2009 ACM/IEEE Conference on High Performance Computing, November 2009, Article No. 31. [36] Frings W, Wolf F, Petkov V. Scalable massively parallel I/O to task-local files. In Proc. the 2009 ACM/IEEE Conference on High Performance Computing, November 2009, Article No. 22. [37] Yang S, Ligon III W B, Quarles E C. Scalable distributed directory implementation on orange file system. In Proc. the 7th IEEE International Workshop on Storage Network Architecture and Parallel I/Os, May 2011. [38] Patil S, Ren K, Gibson G. A case for scaling HPC metadata performance through de-specialization. In Proc. the 2012 SC Companion:High Performance Computing, Networking Storage and Analysis, November 2012, pp.30-35. [39] Carns P H, Ligon III W B, Ross R B, Thakur R. PVFS:A parallel file system for Linux clusters. In Proc. the 4th Annual Linux Showcase & Conference, October 2000, Article No. 4. [40] Dong S, Callaghan M, Galanis L, Borthakur D, Savor T, Strum M. Optimizing space amplification in RocksDB. In Proc. the 8th Biennial Conference on Innovative Data Systems Research, January 2017, Article No. 30. [41] Oral S, Dillow D A, Fuller D et al. OLCF's 1 Tb/s, nextgeneration Lustre file system. In Proc. the 2013 Cray User Group Conference, May 2013, Article No. 151. [42] Lofstead J F, Zheng F, Liu Q, Klasky S, Oldfield R, Kordenbrock T, Schwan K, Wolf M. Managing variability in the IO performance of petascale storage systems. In Proc. the 2010 Conference on High Performance Computing Networking, Storage and Analysis, November 2010, Article No. 35. [43] Xie B, Chase J S, Dillow D, Drokin O, Klasky S, Oral S, Podhorszki N. Characterizing output bottlenecks in a supercomputer. In Proc. the 2012 International Conference on High Performance Computing Networking, Storage and Analysis, November 2012, Article No. 8. [44] Kougkas A, Devarajan H, Sun X, Lofstead J F. Harmonia:An interference-aware dynamic I/O scheduler for shared non-volatile burst buffers. In Proc. the 2018 IEEE International Conference on Cluster Computing, September 2018, pp.290-301. [45] Hashimoto Y, Aida K. Evaluation of performance degradation in HPC applications with VM consolidation. In Proc. the 3rd International Conference on Networking and Computing, December 2012, pp.273-277. [46] Lofstead J F, Ross R. Insights for exascale IO APIs from building a petascale IO API. In Proc. the 2013 International Conference for High Performance Computing, November 2013, Article No. 87. [47] Reed D A, Dongarra J J. Exascale computing and big data. Commun. ACM, 2015, 58(7):56-68. |
[1] | André Brinkmann, Kathryn Mohror, Weikuan Yu, Philip Carns, Toni Cortes, Scott A. Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B. Ross, Marc-André Vef. 高性能计算专用文件系统[J]. 计算机科学技术学报, 2020, 35(1): 4-26. |
[2] | Osamu Tatebe, Shukuko Moriwake, Yoshihiro Oyama. Gfarm/BB—节点本地突发缓冲(Burst Buffer)的Gfarm文件系统[J]. 计算机科学技术学报, 2020, 35(1): 61-71. |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |