SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Yu-Tong Lu, Peng Cheng, Zhi-Guang Chen. Design and Implementation of the Tianhe-2 Data Storage and Management System[J]. Journal of Computer Science and Technology, 2020, 35(1): 27-46. DOI: 10.1007/s11390-020-9799-4 |
[1] |
Zhang Z, Barbary K, Nothaft F et al. Scientific computing meets big data technology:An astronomy use case. In Proc. the 2015 IEEE International Conference on Big Data, October 29-November 1, 2015, pp.918-927.
|
[2] |
Yang X, Liu N, Feng B, Sun X H, Zhou S. PortHadoop:Support direct HPC data processing in Hadoop. In Proc. the 2015 IEEE International Conference on Big Data, October 29-November 1, 2015, pp.223-232.
|
[3] |
Klein M, Sharma R, Bohrer C, Avelis C, Roberts E. Biospark:Scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark. Bioinformatics, 2017, 33(2):303-305.
|
[4] |
Usman S, Mehmood R, Katib I. Big data and HPC convergence:The cutting edge and outlook. In Proc. the 1st International Conference on Smart Societies, Infrastructure, Technologies and Applications, November 2017, pp.11-26.
|
[5] |
Kurth T, Treichler S, Romero J et al. Exascale deep learning for climate analytics. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 51.
|
[6] |
Song F G, Dongarra J J. A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems. Concurrency and Computation:Practice and Experience, 2015, 27(14):3702-3723.
|
[7] |
Karp R M, Zhang Y J. Randomized parallel algorithms for backtrack search and branch-and-bound computation. J. ACM, 1993, 40(3):765-789.
|
[8] |
Schwan P. Lustre:Building a file system for 1,000-node clusters. In Proc. the 2013 Linux Symposium, July 2003, pp.380-386.
|
[9] |
Li J W, Liao W K, Choudhary A N et al. Parallel netCDF:A high-performance scientific I/O interface. In Proc. the 2003 ACM/IEEE Conference on High Performance Networking and Computing, November 2003, Article No. 39.
|
[10] |
Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop distributed file system. In Proc. the 26th IEEE Symposium on Mass Storage Systems and Technologies, May 2010, Article No. 9.
|
[11] |
Barisits M, Beermann T, Berghaus F et al. Rucio-Scientific data management. arXiv:1902.09857, 2019. https://arxiv.org/abs/1902.09857,Oct.2019.
|
[12] |
Narasimhamurthy S, Danilov N, Wu S, Umanesan G, Markidis S, Gomez S R, Peng I B, Laure E, Pleiter D, Witt S D. SAGE:Percipient storage for exascale data centric computing. Parallel Computing, 2019, 83:22-33.
|
[13] |
Sewell C M, Heitmann K, Finkel H et al. Large-scale compute-intensive analysis via a combined in-situ and coscheduling workflow approach. In Proc. the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2015, Article No. 50.
|
[14] |
Miyoshi T, Lien G Y, Satoh S et al. "Big data assimilation" toward post-petascale severe weather prediction:An overview and progress. Proceedings of the IEEE, 2016, 104(11):2155-2179.
|
[15] |
Bhimji W, Bard D, Romanus M. Accelerating science with the NERSC burst buffer early user program. In Proc. the 2016 Cray User Group Meeting, May 2016.
|
[16] |
Kakoulli E, Herodotou H. OctopusFS:A distributed file system with tiered storage management. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.65-78.
|
[17] |
Dong B, Byna S, Wu K S, Prabhat, Johansen H, Johnson J N, Keen N. Data elevator:Low-contention data movement in hierarchical storage system. In Proc. the 23rd IEEE International Conference on High Performance Computing, December 2016, pp.152-161.
|
[18] |
Lim S H, Sim H, Gunasekaran R, Vazhkudai S S. Scientific user behavior and data-sharing trends in a petascale file system. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 46.
|
[19] |
Sim H, Kim Y, Vazhkudai S S, Vallée G R, Lim S H, Butt A R. Tagit:An integrated indexing and search service for file systems. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 5.
|
[20] |
Jenkins J, Arkatkar I, Lakshminarasimhan S, Boyuka-II D A, Schendel E R, Shah N, Ethier S, Chang C S, Chen J, Kolla H, Klasky S, Ross R B, Samatova N F. ALACRITY:Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. Trans. Large-Scale Dataand Knowledge-Centered Systems, 2013, 10:95-114.
|
[21] |
Lu T, Suchyta E, Pugmire D, Choi J, Klasky S, Liu Q, Podhorszki N, Ainsworth M, Wolf M. Canopus:A paradigm shift towards elastic extreme-scale data analytics on HPC storage. In Proc. the 2017 IEEE International Conference on Cluster Computing, September 2017, pp.58-69.
|
[22] |
Foster I T, Ainsworth M, Allen B et al. Computing just what you need:Online data analysis and reduction at extreme scales. In Proc. the 23rd International Conference on Parallel and Distributed Computing, August 2017, pp.3-19.
|
[23] |
Liao X K, Xiao L Q, Yang C Q, Lu Y T. MilkyWay-2 supercomputer:System and application. Frontiers Comput. Sci., 2014, 8(3):345-356.
|
[24] |
Xu W X, Lu Y T, Li Q et al. Hybrid hierarchy storage system in MilkyWay-2 supercomputer. Frontiers Comput. Sci., 2014, 8(3):367-377.
|
[25] |
Li H B, Cheng P, Chen Z G, Xiao N. Pream:Enhancing HPC storage system performance with pre-allocated metadata management mechanism. In Proc. the 21st IEEE International Conference on High Performance Computing and Communications, August 2019, pp.413-420.
|
[26] |
Cheng P, Lu Y T, Du Y F, Chen Z G. Accelerating scientific workflows with tiered data management system. In Proc. the 20th IEEE International Conference on High Performance Computing and Communications, June 2018, pp.75-82.
|
[27] |
Kougkas A, Devarajan H, Sun X H. Hermes:A heterogeneous-aware multi-tiered distributed I/O buffering system. In Proc. the 27th International Symposium on High-Performance Parallel and Distributed Computing, June 2018, pp.219-230.
|
[28] |
Wang T, Byna S, Dong B, Tang H J. UniviStor:Integrated hierarchical and distributed storage for HPC. In Proc. IEEE International Conference on Cluster Computing, September 2018, pp.134-144.
|
[29] |
Dong B, Wang T, Tang H J, Koziol Q, Wu K S, Byna S. ARCHIE:Data analysis acceleration with array caching in hierarchical storage. In Proc. the 2018 IEEE International Conference on Big Data, December 2018, pp.211-220.
|
[30] |
Feng K, Sun X H, Yang X, Zhou S J. SciDP:Support HPC and big data applications via integrated scientific data processing. In Proc. the 2018 IEEE International Conference on Cluster Computing, September 2018, pp.114-123.
|
[31] |
Wasi-ur-Rahman M, Lu X Y, Islam N S, Rajachandrasekar R, Panda D K. High-performance design of YARN MapReduce on modern HPC clusters with Lustre and RDMA. In Proc. the 2015 IEEE International Parallel and Distributed Processing Symposium, May 2015, pp.291-300.
|
[32] |
Pumma S, Si M, Feng W C, Balaji P. Parallel I/O optimizations for scalable deep learning. In Proc. the 23rd IEEE International Conference on Parallel and Distributed Systems, December 2017, pp.720-729.
|
[33] |
Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R B, Guadarrama S, Darrell T. Caffe:Convolutional architecture for fast feature embedding. In Proc. the ACM International Conference on Multimedia, November 2014, pp.675-678.
|
[34] |
Tomes E, Rush E N, Altiparmak N. Towards adaptive parallel storage systems. IEEE Trans. Computers, 2018, 67(12):1840-1848.
|
[35] |
He S B, Sun X H, Wang Y, Xu C Z. A migratory heterogeneity-aware data layout scheme for parallel file systems. In Proc. the 2018 IEEE International Parallel and Distributed Processing Symposium, May 2018, pp.1133-1142.
|
[36] |
Subedi P, Davis P E, Duan S H, Klasky S, Kolla H, Parashar M. Stacker:An autonomic data movement engine for extreme-scale data staging-based in-situ workflows. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 73.
|
[37] |
Wu K, Ren J, Li D. Runtime data management on nonvolatile memory-based heterogeneous memory for taskparallel programs. In Proc. the International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 31.
|
[38] |
Stonebraker M, Brown P, Zhang D H, Becla J. SciDB:A database management system for applications with complex analytics. Computing in Science and Engineering, 2013, 15(3):54-62.
|
[39] |
Dong B, Wu K S, Byna S, Liu J L, Zhao W J, Rusu F. ArrayUDF:User-defined scientific data analysis on arrays. In Proc. the 26th International Symposium on HighPerformance Parallel and Distributed Computing, June 2017, pp.53-64.
|
[40] |
Chou J, Howison M, Austin B, Wu K S, Qiang J, Bethel E W, Shoshani A, Rübel O, Prabhat, Ryne R D. Parallel index and query for large scale data analysis. In Proc. the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2011, Article No. 30.
|
[41] |
Chiu H T, Chou J, Vishwanath V, Wu K S. In-memory query system for scientific dataseis. In Proc. the 21st IEEE International Conference on Parallel and Distributed Systems, December 2015, pp.362-371.
|
[42] |
Dong B, Byna S, Wu K S. Spatially clustered join on heterogeneous scientific data sets. In Proc. the 2015 IEEE International Conference on Big Data, October 29-November 1, 2015, pp.371-380.
|
[43] |
Gu J M, Klasky S, Podhorszki N, Qiang J, Wu K S. Querying large scientific data sets with adaptable IO system ADIOS. In Proc. the 4th Asian Conference on Supercomputing Frontiers, March 2018, pp.51-69.
|
[44] |
Wu T H, Chou J, Hao S, Dong B, Klasky S, Wu K S. Optimizing the query performance of block index through data analysis and I/O modeling. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 12.
|
[45] |
Kim J, Abbasi H, Chacón L, Docan C, Klasky S, Liu Q, Podhorszki N, Shoshani A, Wu K S. Parallel in situ indexing for data-intensive computing. In Proc. the IEEE Symposium on Large Data Analysis and Visualization, October 2011, pp.65-72.
|
[46] |
Liu N, Cope J, Carns P H et al. On the role of burst buffers in leadership-class storage systems. In Proc. the 28th IEEE Symposium on Mass Storage Systems and Technologies, April 2012, Article No. 5.
|
[47] |
Lee J Y, Lee J H. Pre-allocated duplicate name prefix detection mechanism using naming-pool in mobile contentcentric network. In Proc. the 7th International Conference on Ubiquitous and Future Networks, July 2015, pp.115-117.
|
[48] |
Pagh R, Rodler F F. Cuckoo hashing. In Proc. the 9th Annual European Symposium, August 2001, pp.121-133.
|
[49] |
Phillips D. A directory index for EXT2. In Proc. the 5th Annual Linux Showcase & Conference, November 2001.
|
[50] |
Sweeney A, Doucette D, Hu W, Anderson C, Nishimoto M, Peck G. Scalability in the XFS file system. In Proc. the 1996 USENIX Annual Technical Conference, January 1996, pp.1-14.
|
[51] |
Lensing P H, Cortes T, Brinkmann A. Direct lookup and hash-based metadata placement for local file systems. In Proc. the 6th Annual International Systems and Storage Conference, July 2013, Article No. 5.
|
[52] |
Lensing P, Meister D, Brinkmann A. hashFS:Applying hashing to optimize file systems for small file reads. In Proc. the 2010 International Workshop on Storage Network Architecture and Parallel I/Os, May 2010, pp.33-42.
|
[53] |
Mathur A, Cao M M, Bhattacharya S, Dilger A, Tomas A, Vivier L. The new ext4 filesystem:Current status and future plans. In Proc. the 2007 Linux Symposium, June 2007, pp.21-33.
|
[54] |
Shibata T, Choi S J, Taura K. File-access characteristics of data-intensive workflow applications. In Proc. the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, May 2010, pp.522-525.
|
[55] |
Katz D S, Armstrong T G, Zhang Z, Wilde M, Wozniak J M. Many-task computing and blue waters. arXiv:1202.3943, 2012. https://arxiv.org/abs/1202.3943,Oct.2019.
|
[56] |
Yoo A B, Jette M A, Grondona M. SLURM:Simple Linux utility for resource management. In Proc. the 9th International Workshop on Job Scheduling Strategies for Parallel Processing, June 2003, pp.44-60.
|
[57] |
Wu K S, Ahern S, Bethel E W et al. FastBit:Interactively searching massive data. Journal of Physics:Conference Series, 2009, 180(1):Article No. 012053.
|
[58] |
Cheng P, Wang Y, Lu Y T, Du Y F, Chen Z G. IndexIt:Enhancing data locating services for parallel file systems. In Proc. the 21st IEEE International Conference on High Performance Computing and Communications, August 2019, pp.1011-1019.
|
[59] |
Wu T H, Chou J, Podhorszki N, Gu J M, Tian Y, Klasky S, Wu K S. Apply block index technique to scientific data analysis and I/O systems. In Proc. the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2017, pp.865-871.
|
[60] |
Chen D H, Xue J S, Yang X S et al. New generation of multi-scale NWP system (GRAPES):General scientific design. Chinese Science Bulletin, 2008, 53(22):3433-3445.
|
[61] |
Bush W S, Moore J H. Chapter 11:Genome-wide association studies. PLoS Computational Biology, 2012, 8(12):Article No. e1002822.
|
[62] |
Chaimov N, Malony A D, Canon S, Iancu C, Ibrahim K Z, Srinivasan J. Scaling spark on HPC systems. In Proc. the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, May 2016, pp.97-110.
|
[63] |
Taft R, Vartak M, Satish N R, Sundaram N, Madden S, Stonebraker M. GenBase:A complex analytics genomics benchmark. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, June 2014, pp.177-188.
|
[64] |
Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNet:A large-scale hierarchical image database. In Proc. the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009, pp.248-255.
|
[65] |
Deelman E, Gannon D, Shields M S, Taylor I J. Workflows and e-science:An overview of workflow system features and capabilities. Future Generation Comp. Syst., 2009, 25(5):528-540.
|
[66] |
Berriman B G, Good J C, Laity A C et al. Chapter 19:Web-based Tools-Montage:An astronomical image mosaic engine. In The National Virtual Observatory:Tools and Techniques for Astronomical Aesearch, Graham M J, Fitzpatrick M J, McGlynn T A (eds.), Astronomical Society of the Pacific, 2007, pp.179-189.
|
[67] |
Hazekamp N, Kremer-Herman N, Tovar B et al. Combining static and dynamic storage management for data intensive scientific workflows. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(2):338-350.
|