|
›› 2015,Vol. 30 ›› Issue (1): 30-41.doi: 10.1007/s11390-015-1502-9
所属专题: Computer Architecture and Systems
• Special Section on Selected Paper from NPC 2011 • 上一篇 下一篇
Dong Yan1(阎栋), Xu-Sen Yin2(尹绪森), Cheng Lian2(连城), Xiang Zhong2(钟翔), Xin Zhou2(周鑫), Member, CCF, ACM, Gan-Sha Wu2(吴甘沙), Senior Member, CCF, Member, ACM
Dong Yan1(阎栋), Xu-Sen Yin2(尹绪森), Cheng Lian2(连城), Xiang Zhong2(钟翔), Xin Zhou2(周鑫), Member, CCF, ACM, Gan-Sha Wu2(吴甘沙), Senior Member, CCF, Member, ACM
大数据处理正在数据中心的运行负载中占据越来越大的比重.但是最近的研究却指出,大数据的处理过程并没有有效的利用内存系统.我们发现,高昂的缓存缺失率和内存访问依赖是造成大数据处理的低效的两大原因.为解决此问题,本文引入了两个针对性的优化技术,切分合并策略和直接内存访问.切分合并策略能够大大减少在排序过程中的末级缓存的缺失率;而直接内存访问则重新设计了键值对的存储格式以消除内存访问依赖.在实验部分,我们从专有测试集和实际应用两个方面检验本文所引入的两个优化技术的效果.在专有测试集的试验中,CPU提供的硬件事件清楚的显示了优化技术所带来的性能上的改进.而在实际应用方面,我们选取HiBench所包含的八个典型的大数据应用作为代表.实验结果显示, HiBench性能的平均提升达1.21倍.这充分说明了,在软件架构设计中考虑硬件特性能够大大改善大数据处理过程中的访存效率.我们的工作已经被集成到了英特尔的Hadoop发行版中.
[1] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1):107-113.[2] Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein J M. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1006.4990, 2010. http://arxiv.org/abs/1006.4990, Dec. 2014.[3] Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M, Shenker S, Stoica I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. the 9th USENIX Conference on Networked Systems Design and Implementation, April 2012, pp.15-28.[4] Shafer J, Rixner S, Cox A L. The Hadoop distributed filesystem: Balancing portability and performance. In Proc. IEEE International Symposium on Performance Analysis of Systems and Software, March 2010, pp.122-133.[5] Wang Y, Xu C, Li X, Yu W. JVM-bypass for efficient Hadoop shuffling. In Proc. the 27th IEEE International Symposium on Parallel and Distributed Processing, May 2013, pp.569-578.[6] Hardavellas N, Ferdman M, Falsafi B, Ailamaki A. Toward dark silicon in servers. IEEE Micro, 2011, 31(4): 6-15.[7] Horowitz M, Alon E, Patil D, Naffziger S, Kumar R, Bernstein K. Scaling, power, and the future of CMOS. In Proc. IEEE Int. Electron Devices Meeting, December 2005, pp.7-15.[8] Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu A D, Ailamaki A, Falsafi B. Quantifying the mismatch between emerging scale-out applications and modern processors. ACM Transactions on Computer Systems, 2012, 30(4): Article No. 15.[9] Huang S, Huang J, Dai J, Xie T, Huang B. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In Proc. the 26th IEEE International Conference on Data Engineering Workshops, March 2010, pp.41-51.[10] Yang D, Zhong X, Yan D, Dai F, Yin X, Lian C, Zhu Z, Jiang W, Wu G. NativeTask: A Hadoop compatible framework for high performance. In Proc. IEEE International Conference on Big Data, October 2013, pp.94-101.[11] Chen R, Chen H, Zang B. Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling. In Proc. the 19th International Conference on Parallel Architectures and Compilation Techniques, September 2010, pp.523-534.[12] Brodal G S, Fagerberg R, Vinther K. Engineering a cacheoblivious sorting algorithm. Journal of Experimental Algorithmics, 2008, 12(2): Article No. 22.[13] Levinthal D. Cycle accounting analysis on Intel® CoreTM 2 processors. Intel Corp., 2012. https://software.intel.com/sites/ products/collateral/hpc/vtune/cycle accounting analysis. pdf, Dec. 2014.[14] Levinthal D. Performance analysis guide for Intel® CoreTM i7 processor and Intel® XeonTM 5500 processors. Intel Performance Analysis Guide, 2009. https://software. intel.com/sites/products/collateral/hpc/vtune/performance analysis guide.pdf, Dec. 2014.[15] Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C. Evaluating MapReduce for multi-core and multiprocessor systems. In Proc. the 13th IEEE International Symposium on High Performance Computer Architecture, February 2007, pp.13-24.[16] Yoo R M, Romano A, Kozyrakis C. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In Proc. IEEE International Symposium on Workload Characterization, October 2009, pp.198-207.[17] He B, Fang W, Luo Q, Govindaraju N K, Wang T. Mars: A MapReduce framework on graphics processors. In Proc. the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008, pp.260-269.[18] Hong C, Chen D, Chen W, Zheng W, Lin H. MapCG: Writing parallel program portable between CPU and GPU. In Proc. the 19th International Conference on Parallel Architectures and Compilation Techniques, September 2010, pp.217-226. |
No related articles found! |
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |