|
Journal of Computer Science and Technology ›› 2020, Vol. 35 ›› Issue (1): 27-46.doi: 10.1007/s11390-020-9799-4
Special Issue: Surveys; Computer Architecture and Systems
• Special Section on Selected I/O Technologies for High-Performance Computing and Data Analytics • Previous Articles Next Articles
Yu-Tong Lu1, Distinguished Member, CCF, Peng Cheng2, Zhi-Guang Chen1, Member, CCF
[1] Zhang Z, Barbary K, Nothaft F et al. Scientific computing meets big data technology:An astronomy use case. In Proc. the 2015 IEEE International Conference on Big Data, October 29-November 1, 2015, pp.918-927. [2] Yang X, Liu N, Feng B, Sun X H, Zhou S. PortHadoop:Support direct HPC data processing in Hadoop. In Proc. the 2015 IEEE International Conference on Big Data, October 29-November 1, 2015, pp.223-232. [3] Klein M, Sharma R, Bohrer C, Avelis C, Roberts E. Biospark:Scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark. Bioinformatics, 2017, 33(2):303-305. [4] Usman S, Mehmood R, Katib I. Big data and HPC convergence:The cutting edge and outlook. In Proc. the 1st International Conference on Smart Societies, Infrastructure, Technologies and Applications, November 2017, pp.11-26. [5] Kurth T, Treichler S, Romero J et al. Exascale deep learning for climate analytics. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 51. [6] Song F G, Dongarra J J. A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems. Concurrency and Computation:Practice and Experience, 2015, 27(14):3702-3723. [7] Karp R M, Zhang Y J. Randomized parallel algorithms for backtrack search and branch-and-bound computation. J. ACM, 1993, 40(3):765-789. [8] Schwan P. Lustre:Building a file system for 1,000-node clusters. In Proc. the 2013 Linux Symposium, July 2003, pp.380-386. [9] Li J W, Liao W K, Choudhary A N et al. Parallel netCDF:A high-performance scientific I/O interface. In Proc. the 2003 ACM/IEEE Conference on High Performance Networking and Computing, November 2003, Article No. 39. [10] Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop distributed file system. In Proc. the 26th IEEE Symposium on Mass Storage Systems and Technologies, May 2010, Article No. 9. [11] Barisits M, Beermann T, Berghaus F et al. Rucio-Scientific data management. arXiv:1902.09857, 2019. https://arxiv.org/abs/1902.09857,Oct.2019. [12] Narasimhamurthy S, Danilov N, Wu S, Umanesan G, Markidis S, Gomez S R, Peng I B, Laure E, Pleiter D, Witt S D. SAGE:Percipient storage for exascale data centric computing. Parallel Computing, 2019, 83:22-33. [13] Sewell C M, Heitmann K, Finkel H et al. Large-scale compute-intensive analysis via a combined in-situ and coscheduling workflow approach. In Proc. the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2015, Article No. 50. [14] Miyoshi T, Lien G Y, Satoh S et al. "Big data assimilation" toward post-petascale severe weather prediction:An overview and progress. Proceedings of the IEEE, 2016, 104(11):2155-2179. [15] Bhimji W, Bard D, Romanus M. Accelerating science with the NERSC burst buffer early user program. In Proc. the 2016 Cray User Group Meeting, May 2016. [16] Kakoulli E, Herodotou H. OctopusFS:A distributed file system with tiered storage management. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.65-78. [17] Dong B, Byna S, Wu K S, Prabhat, Johansen H, Johnson J N, Keen N. Data elevator:Low-contention data movement in hierarchical storage system. In Proc. the 23rd IEEE International Conference on High Performance Computing, December 2016, pp.152-161. [18] Lim S H, Sim H, Gunasekaran R, Vazhkudai S S. Scientific user behavior and data-sharing trends in a petascale file system. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 46. [19] Sim H, Kim Y, Vazhkudai S S, Vallée G R, Lim S H, Butt A R. Tagit:An integrated indexing and search service for file systems. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 5. [20] Jenkins J, Arkatkar I, Lakshminarasimhan S, Boyuka-II D A, Schendel E R, Shah N, Ethier S, Chang C S, Chen J, Kolla H, Klasky S, Ross R B, Samatova N F. ALACRITY:Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. Trans. Large-Scale Dataand Knowledge-Centered Systems, 2013, 10:95-114. [21] Lu T, Suchyta E, Pugmire D, Choi J, Klasky S, Liu Q, Podhorszki N, Ainsworth M, Wolf M. Canopus:A paradigm shift towards elastic extreme-scale data analytics on HPC storage. In Proc. the 2017 IEEE International Conference on Cluster Computing, September 2017, pp.58-69. [22] Foster I T, Ainsworth M, Allen B et al. Computing just what you need:Online data analysis and reduction at extreme scales. In Proc. the 23rd International Conference on Parallel and Distributed Computing, August 2017, pp.3-19. [23] Liao X K, Xiao L Q, Yang C Q, Lu Y T. MilkyWay-2 supercomputer:System and application. Frontiers Comput. Sci., 2014, 8(3):345-356. [24] Xu W X, Lu Y T, Li Q et al. Hybrid hierarchy storage system in MilkyWay-2 supercomputer. Frontiers Comput. Sci., 2014, 8(3):367-377. [25] Li H B, Cheng P, Chen Z G, Xiao N. Pream:Enhancing HPC storage system performance with pre-allocated metadata management mechanism. In Proc. the 21st IEEE International Conference on High Performance Computing and Communications, August 2019, pp.413-420. [26] Cheng P, Lu Y T, Du Y F, Chen Z G. Accelerating scientific workflows with tiered data management system. In Proc. the 20th IEEE International Conference on High Performance Computing and Communications, June 2018, pp.75-82. [27] Kougkas A, Devarajan H, Sun X H. Hermes:A heterogeneous-aware multi-tiered distributed I/O buffering system. In Proc. the 27th International Symposium on High-Performance Parallel and Distributed Computing, June 2018, pp.219-230. [28] Wang T, Byna S, Dong B, Tang H J. UniviStor:Integrated hierarchical and distributed storage for HPC. In Proc. IEEE International Conference on Cluster Computing, September 2018, pp.134-144. [29] Dong B, Wang T, Tang H J, Koziol Q, Wu K S, Byna S. ARCHIE:Data analysis acceleration with array caching in hierarchical storage. In Proc. the 2018 IEEE International Conference on Big Data, December 2018, pp.211-220. [30] Feng K, Sun X H, Yang X, Zhou S J. SciDP:Support HPC and big data applications via integrated scientific data processing. In Proc. the 2018 IEEE International Conference on Cluster Computing, September 2018, pp.114-123. [31] Wasi-ur-Rahman M, Lu X Y, Islam N S, Rajachandrasekar R, Panda D K. High-performance design of YARN MapReduce on modern HPC clusters with Lustre and RDMA. In Proc. the 2015 IEEE International Parallel and Distributed Processing Symposium, May 2015, pp.291-300. [32] Pumma S, Si M, Feng W C, Balaji P. Parallel I/O optimizations for scalable deep learning. In Proc. the 23rd IEEE International Conference on Parallel and Distributed Systems, December 2017, pp.720-729. [33] Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R B, Guadarrama S, Darrell T. Caffe:Convolutional architecture for fast feature embedding. In Proc. the ACM International Conference on Multimedia, November 2014, pp.675-678. [34] Tomes E, Rush E N, Altiparmak N. Towards adaptive parallel storage systems. IEEE Trans. Computers, 2018, 67(12):1840-1848. [35] He S B, Sun X H, Wang Y, Xu C Z. A migratory heterogeneity-aware data layout scheme for parallel file systems. In Proc. the 2018 IEEE International Parallel and Distributed Processing Symposium, May 2018, pp.1133-1142. [36] Subedi P, Davis P E, Duan S H, Klasky S, Kolla H, Parashar M. Stacker:An autonomic data movement engine for extreme-scale data staging-based in-situ workflows. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 73. [37] Wu K, Ren J, Li D. Runtime data management on nonvolatile memory-based heterogeneous memory for taskparallel programs. In Proc. the International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 31. [38] Stonebraker M, Brown P, Zhang D H, Becla J. SciDB:A database management system for applications with complex analytics. Computing in Science and Engineering, 2013, 15(3):54-62. [39] Dong B, Wu K S, Byna S, Liu J L, Zhao W J, Rusu F. ArrayUDF:User-defined scientific data analysis on arrays. In Proc. the 26th International Symposium on HighPerformance Parallel and Distributed Computing, June 2017, pp.53-64. [40] Chou J, Howison M, Austin B, Wu K S, Qiang J, Bethel E W, Shoshani A, Rübel O, Prabhat, Ryne R D. Parallel index and query for large scale data analysis. In Proc. the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2011, Article No. 30. [41] Chiu H T, Chou J, Vishwanath V, Wu K S. In-memory query system for scientific dataseis. In Proc. the 21st IEEE International Conference on Parallel and Distributed Systems, December 2015, pp.362-371. [42] Dong B, Byna S, Wu K S. Spatially clustered join on heterogeneous scientific data sets. In Proc. the 2015 IEEE International Conference on Big Data, October 29-November 1, 2015, pp.371-380. [43] Gu J M, Klasky S, Podhorszki N, Qiang J, Wu K S. Querying large scientific data sets with adaptable IO system ADIOS. In Proc. the 4th Asian Conference on Supercomputing Frontiers, March 2018, pp.51-69. [44] Wu T H, Chou J, Hao S, Dong B, Klasky S, Wu K S. Optimizing the query performance of block index through data analysis and I/O modeling. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 12. [45] Kim J, Abbasi H, Chacón L, Docan C, Klasky S, Liu Q, Podhorszki N, Shoshani A, Wu K S. Parallel in situ indexing for data-intensive computing. In Proc. the IEEE Symposium on Large Data Analysis and Visualization, October 2011, pp.65-72. [46] Liu N, Cope J, Carns P H et al. On the role of burst buffers in leadership-class storage systems. In Proc. the 28th IEEE Symposium on Mass Storage Systems and Technologies, April 2012, Article No. 5. [47] Lee J Y, Lee J H. Pre-allocated duplicate name prefix detection mechanism using naming-pool in mobile contentcentric network. In Proc. the 7th International Conference on Ubiquitous and Future Networks, July 2015, pp.115-117. [48] Pagh R, Rodler F F. Cuckoo hashing. In Proc. the 9th Annual European Symposium, August 2001, pp.121-133. [49] Phillips D. A directory index for EXT2. In Proc. the 5th Annual Linux Showcase & Conference, November 2001. [50] Sweeney A, Doucette D, Hu W, Anderson C, Nishimoto M, Peck G. Scalability in the XFS file system. In Proc. the 1996 USENIX Annual Technical Conference, January 1996, pp.1-14. [51] Lensing P H, Cortes T, Brinkmann A. Direct lookup and hash-based metadata placement for local file systems. In Proc. the 6th Annual International Systems and Storage Conference, July 2013, Article No. 5. [52] Lensing P, Meister D, Brinkmann A. hashFS:Applying hashing to optimize file systems for small file reads. In Proc. the 2010 International Workshop on Storage Network Architecture and Parallel I/Os, May 2010, pp.33-42. [53] Mathur A, Cao M M, Bhattacharya S, Dilger A, Tomas A, Vivier L. The new ext4 filesystem:Current status and future plans. In Proc. the 2007 Linux Symposium, June 2007, pp.21-33. [54] Shibata T, Choi S J, Taura K. File-access characteristics of data-intensive workflow applications. In Proc. the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, May 2010, pp.522-525. [55] Katz D S, Armstrong T G, Zhang Z, Wilde M, Wozniak J M. Many-task computing and blue waters. arXiv:1202.3943, 2012. https://arxiv.org/abs/1202.3943,Oct.2019. [56] Yoo A B, Jette M A, Grondona M. SLURM:Simple Linux utility for resource management. In Proc. the 9th International Workshop on Job Scheduling Strategies for Parallel Processing, June 2003, pp.44-60. [57] Wu K S, Ahern S, Bethel E W et al. FastBit:Interactively searching massive data. Journal of Physics:Conference Series, 2009, 180(1):Article No. 012053. [58] Cheng P, Wang Y, Lu Y T, Du Y F, Chen Z G. IndexIt:Enhancing data locating services for parallel file systems. In Proc. the 21st IEEE International Conference on High Performance Computing and Communications, August 2019, pp.1011-1019. [59] Wu T H, Chou J, Podhorszki N, Gu J M, Tian Y, Klasky S, Wu K S. Apply block index technique to scientific data analysis and I/O systems. In Proc. the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2017, pp.865-871. [60] Chen D H, Xue J S, Yang X S et al. New generation of multi-scale NWP system (GRAPES):General scientific design. Chinese Science Bulletin, 2008, 53(22):3433-3445. [61] Bush W S, Moore J H. Chapter 11:Genome-wide association studies. PLoS Computational Biology, 2012, 8(12):Article No. e1002822. [62] Chaimov N, Malony A D, Canon S, Iancu C, Ibrahim K Z, Srinivasan J. Scaling spark on HPC systems. In Proc. the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, May 2016, pp.97-110. [63] Taft R, Vartak M, Satish N R, Sundaram N, Madden S, Stonebraker M. GenBase:A complex analytics genomics benchmark. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, June 2014, pp.177-188. [64] Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNet:A large-scale hierarchical image database. In Proc. the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009, pp.248-255. [65] Deelman E, Gannon D, Shields M S, Taylor I J. Workflows and e-science:An overview of workflow system features and capabilities. Future Generation Comp. Syst., 2009, 25(5):528-540. [66] Berriman B G, Good J C, Laity A C et al. Chapter 19:Web-based Tools-Montage:An astronomical image mosaic engine. In The National Virtual Observatory:Tools and Techniques for Astronomical Aesearch, Graham M J, Fitzpatrick M J, McGlynn T A (eds.), Astronomical Society of the Pacific, 2007, pp.179-189. [67] Hazekamp N, Kremer-Herman N, Tovar B et al. Combining static and dynamic storage management for data intensive scientific workflows. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(2):338-350. |
[1] | Kai Wu, Dong Li. Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing [J]. Journal of Computer Science and Technology, 2021, 36(1): 90-109. |
[2] | Zhi-Guang Chen, Yu-Bo Liu, Yong-Feng Wang, Yu-Tong Lu. A GPU-Accelerated In-Memory Metadata Management Scheme for Large-Scale Parallel File Systems [J]. Journal of Computer Science and Technology, 2021, 36(1): 44-55. |
[3] | Qi Chen, Kang Chen, Zuo-Ning Chen, Wei Xue, Xu Ji, Bin Yang. Lessons Learned from Optimizing the Sunway Storage System for Higher Application I/O Performance [J]. Journal of Computer Science and Technology, 2020, 35(1): 47-60. |
[4] | Marc-André Vef, Nafiseh Moti, Tim Süß, Markus Tacke, Tommaso Tocci, Ramon Nou, Alberto Miranda, Toni Cortes, André Brinkmann. GekkoFS—A Temporary Burst Buffer File System for HPC Applications [J]. Journal of Computer Science and Technology, 2020, 35(1): 72-91. |
[5] | Jun-Qiang Liu (刘君强). Publishing Set-Valued Data Against Realistic Adversaries [J]. , 2012, 27(1): 24-36. |
[6] | Gang Wu, Juan-Zi Li, Member, CCF, ACM, Jian-Qiang Hu, and Ke-Hong Wang, Member, CCF. System |Π: A Native RDF Repository Based on the Hypergraph Representation for RDF Data Model [J]. , 2009, 24(4): 652-664. |
[7] | Jing Zhou, Member, ACM, Wendy Hall, Member, ACM, and David De Roure, Member, ACM. Building a Distributed Infrastructure for Scalable Triple Stores [J]. , 2009, 24(3): 447-462. |
[8] | Shan Wang, Xiao-Yong Du, Xiao-Feng Meng, and Hong Chen. Database Research: Achievements and Challenges [J]. , 2006, 21(5): 823-837 . |
[9] | ZHOU AoYing (周傲英), QIAN WeiNing (钱卫宁),ZHOU ShuiGeng (周水庚), LING Bo (凌 波), XU LinHao (徐林昊)Ng Wee Siong (黄维雄), Ooi Beng Chin (黄铭钧). Data Management in Peer-to-Peer Environment: A Perspective of BestPeer [J]. , 2003, 18(4): 0-0. |
[10] | SUN Ninghui;. Reference Implementation of Scalable I/O Low-Level API on Intel Paragon [J]. , 1999, 14(3): 206-223. |
|