|
计算机科学技术学报 ›› 2020,Vol. 35 ›› Issue (1): 121-144.doi: 10.1007/s11390-020-9802-0
所属专题: Computer Architecture and Systems
Robert B. Ross1, George Amvrosiadis2, Philip Carns1, Charles D. Cranor2, Matthieu Dorier1, Kevin Harms1, Greg Ganger2, Garth Gibson3, Samuel K. Gutierrez4, Robert Latham1, Bob Robey4, Dana Robinson5, Bradley Settlemyer4, Galen Shipman4, Shane Snyder1, Jerome Soumagne5, Qing Zheng2
Robert B. Ross1, George Amvrosiadis2, Philip Carns1, Charles D. Cranor2, Matthieu Dorier1, Kevin Harms1, Greg Ganger2, Garth Gibson3, Samuel K. Gutierrez4, Robert Latham1, Bob Robey4, Dana Robinson5, Bradley Settlemyer4, Galen Shipman4, Shane Snyder1, Jerome Soumagne5, Qing Zheng2
技术的提升和运行在高性能计算(HPC)平台的应用工作流不断增长的广度推动了新数据服务的发展,从而为这些新的平台提供高性能,为各种不同应用提供了有效的接口与数据抽象,同时也能适应新技术的部署。本文所提出的Mochi框架可对来自一组可连接的模块和子服务集合中的特定的分布式数据服务进行组合。Mochi允许每个应用根据其需求和访问模块使用专门的数据服务,而不是强迫所有的应用使用通用的数据分级与I/O软件配置。本文介绍了Mochi框架和方法论,描述了Mochi核心组件和微服务,详述了四个将Mochi方法论运用于专用服务开发的实例。最后,对一个Mochi核心组件、一个Mochi微服务和一个提供对象模型的组合服务进行了性能评估。本文最后介绍了HPC领域中Mochi的相关工作,并指出了今后的工作方向。
[1] Venkatesan S, Aoulaiche M. Overview of 3D NAND technologies and outlook invited paper. In Proc. the 2018 NonVolatile Memory Technology Symposium, Oct. 2018, Article No. 15. [2] Hady F T, Foong A, Veal B, Williams D. Platform storage performance with 3D XPoint technology. Proceedings of the IEEE, 2017, 105(9):1822-1833. [3] Kim J, Dally W J, Scott S, Abts D. Technology-driven, highly-scalable dragonfly topology. ACM SIGARCH Comput. Architecture News, 2008, 36(3):77-88. [4] Besta M, Hoeer T. Slim Fly:A cost effective low-diameter network topology. In Proc. the Int. Conf. for High Performance Comput., Networking, Storage and Anal., November 2014, pp.348-359. [5] Flajslik M, Borch E, Parker M A. Megafly:A topology for exascale systems. In Proc. the 33rd International Conference on High Performance Computing, June 2018, pp.289-310. [6] Shpiner A, Haramaty Z, Eliad S, Zdornov V, Gafni B, Zahavi E. Dragonfly+:Low cost topology for scaling datacenters. In Proc. the 3rd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, February 2017, pp.1-8. [7] Sivaraman G, Beard E, Vazquez-Mayagoitia A, Vishwanath V, Cole J. UV/vis absorption spectra database autogenerated for optical applications via the Argonne data science program. In Proc. the 2019 APS March Meeting, March 2019. [8] Lockwood G K, Hazen D, Koziol Q et al. Storage 2020:A vision for the future of HPC storage. Technical Report, National Energy Research Scientific Computing Center, 2017. https://escholarship.org/content/qt744479dp/qt744479dp.pdf,Sept.2019. [9] Seo S, Amer A, Balaji P et al. Argobots:A lightweight lowlevel threading and tasking framework. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(3):512-526. [10] Soumagne J, Kimpe D, Zounmevo J, Chaarawi M, Koziol Q, Afsahi A, Ross R. Mercury:Enabling remote procedure call for high-performance computing. In Proc. the 2013 IEEE International Conference on Cluster Computing, September 2013, Article No. 50. [11] Das A, Gupta I, Motivala A. SWIM:Scalable weaklyconsistent infection-style process group membership protocol. In Proc. the 2002 International Conference on Dependable Systems and Networks, June 2002, pp.303-312. [12] Rudoff A. Persistent memory programming. Login:The Usenix Magazine, 2017, 42(2):34-40. [13] Carns P, Jenkins J, Cranor C, Atchley S, Seo S, Snyder S, Hoeer T, Ross R. Enabling NVM for data-intensive scientific services. In Proc. the 4th Workshop on Interactions of NVM/Flash with Operating Systems and Workloads, November 2016, Article No. 4. [14] Ghemawat S, Dean J. LevelDB-A fast and lightweight key/value database library by Google. https://github.com/google/leveldb,Sept.2019. [15] Olson M A, Bostic K, Seltzer M I. Berkeley DB. In Proc. the 1999 USENIX Annual Technical Conference, June 1999, pp.183-191. [16] Dorier M, Carns P, Harms K et al. Methodology for the rapid development of scalable HPC data services. In Proc. the 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, November 2018, pp.76-87. [17] van der Walt S, Colbert S C, Varoquaux G. The NumPy array:A structure for efficient numerical computation. Computing in Science & Engineering, 2011, 13(2):22-30. [18] Rosenblum M, Ousterhout J K. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems, 1992, 10(1):26-52. [19] Brun R, Rademakers F. ROOT-An object oriented data analysis framework. Nuclear Instruments and Methods in Physics Research Section A:Accelerators, Spectrometers, Detectors and Associated Equipment, 1997, 389(1/2):81-86. [20] Perez D, Cubuk E D, Waterland A, Kaxiras E, Voter A F. Long-time dynamics through parallel trajectory splicing. Journal of Chemical Theory and Computation, 2015, 12(1):18-28. [21] Sevilla M A, Maltzahn C, Alvaro P, Nasirigerdeh R, Settlemyer B W, Perez D, Rich D, Shipman G M. Programmable caches with a data management language and policy engine. In Proc. the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2018, pp.203-212. [22] Zheng Q, Cranor C D, Guo D H, Ganger G R, Amvrosiadis G, Gibson G A, Settlemyer B W, Grider G, Guo F. Scaling embedded in-situ indexing with deltaFS. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2018, Article No. 3. [23] Greenberg H, Bent J, Grider G. MDHIM:A parallel key/value framework for HPC. In Proc. the 7th USENIX Workshop on Hot Topics in Storage and File Systems, July 2015, Article No. 10. [24] Weil S A, Leung A W, Brandt S A, Maltzahn C. RADOS:A scalable, reliable storage service for petabyte-scale storage clusters. In Proc. the 2nd International Petascale Data Storage Workshop, November 2007, pp.35-44. [25] Weil S A, Brandt S A, Miller E L, Long D D E, Maltzahn C. Ceph:A scalable, high-performance distributed file system. In Proc. the 7th USENIX Symposium on Operating Systems Design and Implementation, November 2006, pp.307-320. [26] Liu J L, Koziol Q, Butler G F, Fortner N, Chaarawi M, Tang H J, Byna S, Lockwood G K, Cheema R, Kallback-Rose K A, Hazen D, Prabhat. Evaluation of HPC application I/O on object storage systems. In Proc. the 3rd IEEE/ACM International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, November 2018, pp.24-34. [27] Escriva R, Sirer E G. The design and implementation of the warp transactional file system. In Proc. the 13th USENIX Symposium on Networked Systems Design and Implementation, March 2016, pp.469-483. [28] Kunkel J, Betke E. An MPI-IO in-memory driver for nonvolatile pooled memory of the Kove XPD. In Proc. the 2017 International Workshops on High Performance Computing, June 2017, pp.679-690. [29] Latham R, Ross R B, Thakur R. Can MPI be used for persistent parallel services? In Proc. the 13th European PVM/MPI Users' Group Meeting, September 2006, pp.275-284. [30] Vef M A, Moti N, Süß T, Tocci T, Nou R, Miranda A, Cortes T, Brinkmann A. GekkoFS-A temporary distributed file system for HPC applications. In Proc. the 2018 IEEE International Conference on Cluster Computing, September 2018, pp.319-324. [31] Wang T, Mohror K, Moody A, Sato K, Yu W K. An ephemeral burst-buffer file system for scientific applications. In Proc. the 2016 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2016, pp.807-818. [32] Tang H J, Byna S, Tessier F et al. Toward scalable and asynchronous object-centric data management for HPC. In Proc. the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2018, pp.113-122. [33] Intel Corporation. DAOS:Revolutionizing high-performance storage with Intel Optane technology. https://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/high-performance-storage-brief.pdf,June 2019. [34] Zhao D F, Zhang Z, Zhou X B, Li T L, Wang K, Kimpe D, Carns P, Ross R, Raicu I. FusionFS:Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems. In Proc. the 2014 IEEE International Conference on Big Data, October 2014, pp.61-70. [35] Docan C, Parashar M, Klasky S. DataSpaces:An interaction and coordination framework for coupled simulation workflows. Cluster Computing, 2011, 15(2):163-181. [36] Docan C, Parashar M, Klasky S. Enabling high-speed asynchronous data extraction and transfer using DART. Concurrency and Computation:Practice and Experience, 2010, 22(9):1181-1204. [37] Duro F R, Blas J G, Isaila F, Pérez J C, Wozniak J M, Ross R. Exploiting data locality in Swift/T workflows using Hercules. In Proc. the 1st Network for Sustainable Ultrascale Computing Workshop, October 2014. [38] Fitzpatrick B. Distributed caching with Memcached. Linux Journal, 2004, 2004(124):72-76. [39] Kim J, Lee S, Vetter J S. PapyrusKV:A high-performance parallel key-value store for distributed NVM architectures. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 57. [40] Frings W, Ahn D H, LeGendre M, Gamblin T, de Supinski B R, Wolf F. Massively parallel loading. In Proc. the 27th International ACM Conference on International Conference on Supercomputing, June 2013, pp.389-398. [41] Kougkas A, Devarajan H, Lofstead J, Sun X H. LABIOS:A distributed label-based I/O system. In Proc. the 28th International Symposium on High-Performance Parallel and Distributed Computing, June 2019, pp.13-24. [42] Anwar A, Cheng Y, Huang H, Han J, Sim H, Lee D, Douglis F, Butt A R. BESPOKV:Application tailored scale-out key-value stores. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2018, Article No. 2. [43] Ulmer C, Mukherjee S, Templet G, Levy S, Lofstead J, Widener P, Kordenbrock T, Lawson M. Faodel:Data management for next-generation application workflows. In Proc. the 9th Workshop on Scientific Cloud Computing, June 2018, Article No. 8. [44] Sevilla M A, Watkins N, Jimenez I, Alvaro P, Finkelstein S, LeFevre J, Maltzahn C. Malacology:A programmable storage system. In Proc. the 12th European Conference on Computer Systems, April 2017, pp.175-190. |
[1] | Hong-Mei Wei, Jian Gao, Peng Qing, Kang Yu, Yan-Fei Fang, Ming-Lu Li. MPI-RCDD:一种MPI运行时的通信死锁检测框架[J]. 计算机科学技术学报, 2020, 35(2): 395-411. |
[2] | André Brinkmann, Kathryn Mohror, Weikuan Yu, Philip Carns, Toni Cortes, Scott A. Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B. Ross, Marc-André Vef. 高性能计算专用文件系统[J]. 计算机科学技术学报, 2020, 35(1): 4-26. |
[3] | Yu-Tong Lu, Peng Cheng, Zhi-Guang Chen. Tianhe-2数据存储与管理系统设计与实现[J]. 计算机科学技术学报, 2020, 35(1): 27-46. |
[4] | Qi Chen, Kang Chen, Zuo-Ning Chen, Wei Xue, Xu Ji, Bin Yang. 神威存储系统面向应用I/O性能提升的优化介绍[J]. 计算机科学技术学报, 2020, 35(1): 47-60. |
[5] | Marc-André Vef, Nafiseh Moti, Tim Süß, Markus Tacke, Tommaso Tocci, Ramon Nou, Alberto Miranda, Toni Cortes, André Brinkmann. GekkoFS—一种用于高性能计算应用的临时突发缓冲文件系统[J]. 计算机科学技术学报, 2020, 35(1): 72-91. |
[6] | Xu Tan, Xiao-Wei Shen, Xiao-Chun Ye, Da Wang, Dong-Rui Fan, Lunkai Zhang, Wen-Mi. 一种面向数据流架构的无停顿双缓冲机制[J]. , 2018, 33(1): 145-157. |
[7] | Xiao-Wei Shen, Xiao-Chun Ye, Xu Tan, Da Wang, Lunkai Zhang, Wen-Ming Li, Zhi-Min Zhang, Dong-Rui Fan, Ning-Hui Sun. 一种面向数据流架构的高效片上路由结构[J]. , 2017, 32(1): 11-25. |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |