|
计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (1): 44-55.doi: 10.1007/s11390-020-0783-9
所属专题: Computer Architecture and Systems
Zhi-Guang Chen, Member, CCF, Yu-Bo Liu, Yong-Feng Wang, and Yu-Tong Lu*, Distinguished Member, CCF
Zhi-Guang Chen, Member, CCF, Yu-Bo Liu, Yong-Feng Wang, and Yu-Tong Lu*, Distinguished Member, CCF
元数据一直是文件系统的一大瓶颈。为了提升元数据性能,并行文件系统逐步转向分布式元数据管理方案。我们认为,分布式元数据管理在一致性和可靠性上仍然存在一定的缺陷,相比之下,单节点的元数据性能还存在很大的提升空间。随着存储设备IO性能的不断提升,元数据瓶颈的主要原因逐步由IO转向计算。在此背景下,我们提出基于GPU加速元数据的方案。具体地,我们设计了一种全新的元数据服务器架构,该架构包含CPU、GPU和SSD三个部分。其中,CPU主要负责与客户端交互,从客户端接收元数据请求,并打包传递到GPU中;GPU保存所有的元数据信息,当接收到CPU发来的批量元数据请求后,启动大量的并发线程实施元数据计算,GPU处理完元数据请求后将结果返回到CPU,并由CPU转发到客户端。为了保证元数据的持久化,我们以日志和检查点相结合的形式将元数据保存在SSD上。为了提升GPU中并发线程的计算效率,我们改进了元数据在内存中的数据结构,使之能够高效支持GPU的SIMT计算。我们以BeeGFS为原型实现了基于GPU的元数据加速系统,实验表明,基于GPU的加速方案显著优于基于CPU的元数据管理,在大量客户端并发访问的情况下优势尤其明显。总之,本文针对高性能计算场景,提出了一种新的元数据管理方案,借助GPU的高并发能力,显著缓解计算部件在元数据管理中的瓶颈效应,最终显著提升了单点的元数据性能。值得注意的是,本项工作与分布式元数据管理是不冲突的,所研发的系统能够直接融入元数据集群中。
[1] Braam P. The lustre storage architecture. arXiv:1903.01955, 2009. https://arxiv.org/pdf/1903.01955.pdf, Oct. 2020. [2] Weil S A, Brandt S A, Miller E L, Long D D E, Maltzahn C. Ceph:A scalable, high-performance distributed file system.In Proc. the 7th Symposium on Operating Systems Design and Implementation, November 2006, pp.307-320. [3] Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop distributed file system. In Proc. the 26th IEEE Symposium on Mass Storage Systems and Technologies, May 2010. DOI:10.1109/MSST.2010.5496972. [4] Ren K, Zheng Q, Patil S, Gibson G. IndexFS:Scaling file system metadata performance with stateless caching and bulk insertion. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2014, pp.237-248. DOI:10.1109/SC.2014.25. [5] Liao X, Pang Z, Wang K F, Lu Y, Xie M, Xia J, Dong D, Suo G. High performance interconnect network for Tianhe system. Journal of Computer Science and Technology, 2015, 30(2):259-272. DOI:10.1007/s11390-015-1520- 7. [6] Davies A, Orsaria A. Scale out with GlusterFS. Linux Journal, 2013, 235:Article No. 1. [7] Rodeh O, Bacik J, Mason C. BTRFS:The Linux B-tree file system. ACM Transactions on Storage, 2013, 9(3):Article No. 9. DOI:10.1145/2501620.2501623. [8] Xiao L, Ren K, Zheng Q, Gibson G A. ShardFS vs. IndexFS:Replication vs. caching strategies for distributed metadata management in cloud storage systems. In Proc. the 6th ACM Symposium on Cloud Computing, August 2015, pp.236-249. DOI:10.1145/2806777.2806844. [9] Li S, Lu Y, Shu J, Hu Y, Li T. LocoFS:A loosely-coupled metadata service for distributed file systems. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 4. DOI:10.1145/3126908.3126928. [10] Yuan J, Zhan Y, Jannen W et al. Optimizing every operation in a write-optimized file system. In Proc. the 14th USENIX Conference on File and Storage Technologies, February 2016, pp.1-14. [11] Zheng Q, Ren K, Gibson G, Settlemyer B W, Grider G. DeltaFS:Exascale file systems scale better without dedicated servers. In Proc. the 10th Parallel Data Storage Workshop, November 2015, pp.1-6. DOI:10.1145/2834976.2834984. [12] Zheng Q, Cranor C D, Guo D et al. Scaling embedded in-situ indexing with DeltaFS. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2018, Article No. 3. DOI:10.1109/SC.2018.00006. [13] Zheng Q, Ren K, Gibson G. BatchFS:Scaling the file system control plane with client-funded metadata servers. In Proc. the 9th Parallel Data Storage Workshop, November 2014, pp.1-6. DOI:10.1109/PDSW.2014.7. [14] Liu Y, Lu Y, Chen Z, Zhao M. Pacon:Improving scalability and efficiency of metadata service through partial consistency. In Proc. the IEEE International Parallel and Distributed Processing Symposium, May 2020, pp.986-996. DOI:10.1109/IPDPS47924.2020.00105. [15] Xu W, Lu Y, Li Q et al. Hybrid hierarchy storage system in MilkyWay-2 supercomputer. Frontiers of Computer Science, 2014, 8(3):367-377. DOI:10.1007/s11704-014-3499-6. |
[1] | Yu-Tong Lu, Peng Cheng, Zhi-Guang Chen. Tianhe-2数据存储与管理系统设计与实现[J]. 计算机科学技术学报, 2020, 35(1): 27-46. |
[2] | Qi Chen, Kang Chen, Zuo-Ning Chen, Wei Xue, Xu Ji, Bin Yang. 神威存储系统面向应用I/O性能提升的优化介绍[J]. 计算机科学技术学报, 2020, 35(1): 47-60. |
[3] | Fatemeh Azmandian, Ayse Yilmazer, Jennifer G. Dy Javed A. Aslam, and David R. Ka. 一种利用GPU处理加速异常探测特征选择的方法[J]. , 2014, 29(3): 408-422. |
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |