Journal of Computer Science and Technology ›› 2020, Vol. 35 ›› Issue (1): 121-144.doi: 10.1007/s11390-020-9802-0
Special Issue: Computer Architecture and Systems
• Special Section on Selected I/O Technologies for High-Performance Computing and Data Analytics • Previous Articles Next Articles
Robert B. Ross1, George Amvrosiadis2, Philip Carns1, Charles D. Cranor2, Matthieu Dorier1, Kevin Harms1, Greg Ganger2, Garth Gibson3, Samuel K. Gutierrez4, Robert Latham1, Bob Robey4, Dana Robinson5, Bradley Settlemyer4, Galen Shipman4, Shane Snyder1, Jerome Soumagne5, Qing Zheng2
|  Venkatesan S, Aoulaiche M. Overview of 3D NAND technologies and outlook invited paper. In Proc. the 2018 NonVolatile Memory Technology Symposium, Oct. 2018, Article No. 15.
 Hady F T, Foong A, Veal B, Williams D. Platform storage performance with 3D XPoint technology. Proceedings of the IEEE, 2017, 105(9):1822-1833.
 Kim J, Dally W J, Scott S, Abts D. Technology-driven, highly-scalable dragonfly topology. ACM SIGARCH Comput. Architecture News, 2008, 36(3):77-88.
 Besta M, Hoeer T. Slim Fly:A cost effective low-diameter network topology. In Proc. the Int. Conf. for High Performance Comput., Networking, Storage and Anal., November 2014, pp.348-359.
 Flajslik M, Borch E, Parker M A. Megafly:A topology for exascale systems. In Proc. the 33rd International Conference on High Performance Computing, June 2018, pp.289-310.
 Shpiner A, Haramaty Z, Eliad S, Zdornov V, Gafni B, Zahavi E. Dragonfly+:Low cost topology for scaling datacenters. In Proc. the 3rd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, February 2017, pp.1-8.
 Sivaraman G, Beard E, Vazquez-Mayagoitia A, Vishwanath V, Cole J. UV/vis absorption spectra database autogenerated for optical applications via the Argonne data science program. In Proc. the 2019 APS March Meeting, March 2019.
 Lockwood G K, Hazen D, Koziol Q et al. Storage 2020:A vision for the future of HPC storage. Technical Report, National Energy Research Scientific Computing Center, 2017. https://escholarship.org/content/qt744479dp/qt744479dp.pdf,Sept.2019.
 Seo S, Amer A, Balaji P et al. Argobots:A lightweight lowlevel threading and tasking framework. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(3):512-526.
 Soumagne J, Kimpe D, Zounmevo J, Chaarawi M, Koziol Q, Afsahi A, Ross R. Mercury:Enabling remote procedure call for high-performance computing. In Proc. the 2013 IEEE International Conference on Cluster Computing, September 2013, Article No. 50.
 Das A, Gupta I, Motivala A. SWIM:Scalable weaklyconsistent infection-style process group membership protocol. In Proc. the 2002 International Conference on Dependable Systems and Networks, June 2002, pp.303-312.
 Rudoff A. Persistent memory programming. Login:The Usenix Magazine, 2017, 42(2):34-40.
 Carns P, Jenkins J, Cranor C, Atchley S, Seo S, Snyder S, Hoeer T, Ross R. Enabling NVM for data-intensive scientific services. In Proc. the 4th Workshop on Interactions of NVM/Flash with Operating Systems and Workloads, November 2016, Article No. 4.
 Ghemawat S, Dean J. LevelDB-A fast and lightweight key/value database library by Google. https://github.com/google/leveldb,Sept.2019.
 Olson M A, Bostic K, Seltzer M I. Berkeley DB. In Proc. the 1999 USENIX Annual Technical Conference, June 1999, pp.183-191.
 Dorier M, Carns P, Harms K et al. Methodology for the rapid development of scalable HPC data services. In Proc. the 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, November 2018, pp.76-87.
 van der Walt S, Colbert S C, Varoquaux G. The NumPy array:A structure for efficient numerical computation. Computing in Science & Engineering, 2011, 13(2):22-30.
 Rosenblum M, Ousterhout J K. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems, 1992, 10(1):26-52.
 Brun R, Rademakers F. ROOT-An object oriented data analysis framework. Nuclear Instruments and Methods in Physics Research Section A:Accelerators, Spectrometers, Detectors and Associated Equipment, 1997, 389(1/2):81-86.
 Perez D, Cubuk E D, Waterland A, Kaxiras E, Voter A F. Long-time dynamics through parallel trajectory splicing. Journal of Chemical Theory and Computation, 2015, 12(1):18-28.
 Sevilla M A, Maltzahn C, Alvaro P, Nasirigerdeh R, Settlemyer B W, Perez D, Rich D, Shipman G M. Programmable caches with a data management language and policy engine. In Proc. the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2018, pp.203-212.
 Zheng Q, Cranor C D, Guo D H, Ganger G R, Amvrosiadis G, Gibson G A, Settlemyer B W, Grider G, Guo F. Scaling embedded in-situ indexing with deltaFS. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2018, Article No. 3.
 Greenberg H, Bent J, Grider G. MDHIM:A parallel key/value framework for HPC. In Proc. the 7th USENIX Workshop on Hot Topics in Storage and File Systems, July 2015, Article No. 10.
 Weil S A, Leung A W, Brandt S A, Maltzahn C. RADOS:A scalable, reliable storage service for petabyte-scale storage clusters. In Proc. the 2nd International Petascale Data Storage Workshop, November 2007, pp.35-44.
 Weil S A, Brandt S A, Miller E L, Long D D E, Maltzahn C. Ceph:A scalable, high-performance distributed file system. In Proc. the 7th USENIX Symposium on Operating Systems Design and Implementation, November 2006, pp.307-320.
 Liu J L, Koziol Q, Butler G F, Fortner N, Chaarawi M, Tang H J, Byna S, Lockwood G K, Cheema R, Kallback-Rose K A, Hazen D, Prabhat. Evaluation of HPC application I/O on object storage systems. In Proc. the 3rd IEEE/ACM International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, November 2018, pp.24-34.
 Escriva R, Sirer E G. The design and implementation of the warp transactional file system. In Proc. the 13th USENIX Symposium on Networked Systems Design and Implementation, March 2016, pp.469-483.
 Kunkel J, Betke E. An MPI-IO in-memory driver for nonvolatile pooled memory of the Kove XPD. In Proc. the 2017 International Workshops on High Performance Computing, June 2017, pp.679-690.
 Latham R, Ross R B, Thakur R. Can MPI be used for persistent parallel services? In Proc. the 13th European PVM/MPI Users' Group Meeting, September 2006, pp.275-284.
 Vef M A, Moti N, Süß T, Tocci T, Nou R, Miranda A, Cortes T, Brinkmann A. GekkoFS-A temporary distributed file system for HPC applications. In Proc. the 2018 IEEE International Conference on Cluster Computing, September 2018, pp.319-324.
 Wang T, Mohror K, Moody A, Sato K, Yu W K. An ephemeral burst-buffer file system for scientific applications. In Proc. the 2016 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2016, pp.807-818.
 Tang H J, Byna S, Tessier F et al. Toward scalable and asynchronous object-centric data management for HPC. In Proc. the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2018, pp.113-122.
 Intel Corporation. DAOS:Revolutionizing high-performance storage with Intel Optane technology. https://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/high-performance-storage-brief.pdf,June 2019.
 Zhao D F, Zhang Z, Zhou X B, Li T L, Wang K, Kimpe D, Carns P, Ross R, Raicu I. FusionFS:Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems. In Proc. the 2014 IEEE International Conference on Big Data, October 2014, pp.61-70.
 Docan C, Parashar M, Klasky S. DataSpaces:An interaction and coordination framework for coupled simulation workflows. Cluster Computing, 2011, 15(2):163-181.
 Docan C, Parashar M, Klasky S. Enabling high-speed asynchronous data extraction and transfer using DART. Concurrency and Computation:Practice and Experience, 2010, 22(9):1181-1204.
 Duro F R, Blas J G, Isaila F, Pérez J C, Wozniak J M, Ross R. Exploiting data locality in Swift/T workflows using Hercules. In Proc. the 1st Network for Sustainable Ultrascale Computing Workshop, October 2014.
 Fitzpatrick B. Distributed caching with Memcached. Linux Journal, 2004, 2004(124):72-76.
 Kim J, Lee S, Vetter J S. PapyrusKV:A high-performance parallel key-value store for distributed NVM architectures. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 57.
 Frings W, Ahn D H, LeGendre M, Gamblin T, de Supinski B R, Wolf F. Massively parallel loading. In Proc. the 27th International ACM Conference on International Conference on Supercomputing, June 2013, pp.389-398.
 Kougkas A, Devarajan H, Lofstead J, Sun X H. LABIOS:A distributed label-based I/O system. In Proc. the 28th International Symposium on High-Performance Parallel and Distributed Computing, June 2019, pp.13-24.
 Anwar A, Cheng Y, Huang H, Han J, Sim H, Lee D, Douglis F, Butt A R. BESPOKV:Application tailored scale-out key-value stores. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2018, Article No. 2.
 Ulmer C, Mukherjee S, Templet G, Levy S, Lofstead J, Widener P, Kordenbrock T, Lawson M. Faodel:Data management for next-generation application workflows. In Proc. the 9th Workshop on Scientific Cloud Computing, June 2018, Article No. 8.
 Sevilla M A, Watkins N, Jimenez I, Alvaro P, Finkelstein S, LeFevre J, Maltzahn C. Malacology:A programmable storage system. In Proc. the 12th European Conference on Computer Systems, April 2017, pp.175-190.
|||Hong-Mei Wei, Jian Gao, Peng Qing, Kang Yu, Yan-Fei Fang, Ming-Lu Li. MPI-RCDD: A Framework for MPI Runtime Communication Deadlock Detection [J]. Journal of Computer Science and Technology, 2020, 35(2): 395-411.|
|||André Brinkmann, Kathryn Mohror, Weikuan Yu, Philip Carns, Toni Cortes, Scott A. Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B. Ross, Marc-André Vef. Ad Hoc File Systems for High-Performance Computing [J]. Journal of Computer Science and Technology, 2020, 35(1): 4-26.|
|||Yu-Tong Lu, Peng Cheng, Zhi-Guang Chen. Design and Implementation of the Tianhe-2 Data Storage and Management System [J]. Journal of Computer Science and Technology, 2020, 35(1): 27-46.|
|||Marc-André Vef, Nafiseh Moti, Tim Süß, Markus Tacke, Tommaso Tocci, Ramon Nou, Alberto Miranda, Toni Cortes, André Brinkmann. GekkoFS—A Temporary Burst Buffer File System for HPC Applications [J]. Journal of Computer Science and Technology, 2020, 35(1): 72-91.|
|||Xu Tan, Xiao-Wei Shen, Xiao-Chun Ye, Da Wang, Dong-Rui Fan, Lunkai Zhang, Wen-Ming Li, Zhi-Min Zhang, Zhi-Min Tang. A Non-Stop Double Buffering Mechanism for Dataflow Architecture [J]. , 2018, 33(1): 145-157.|
|||Xiao-Wei Shen, Xiao-Chun Ye, Xu Tan, Da Wang, Lunkai Zhang, Wen-Ming Li, Zhi-Min Zhang, Dong-Rui Fan, Ning-Hui Sun. An Efficient Network-on-Chip Router for Dataflow Architecture [J]. , 2017, 32(1): 11-25.|