›› 2013,Vol. 28 ›› Issue (1): 3-13.doi: 10.1007/s11390-013-1308-6

所属专题: Computer Architecture and Systems

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

多核服务分工:Intel SCC下的XML数据解析研究

Jie Tang1 (唐洁), Student Member, IEEE, Pollawat Thanarungroj2, Chen Liu2 (刘晨), Shao-Shan Liu3 (刘少山), Zhi-Min Gu1 (古志民), and Jean-Luc Gaudiot4, Fellow, IEEE, Member, ACM   

  • 收稿日期:2011-12-31 修回日期:2012-05-10 出版日期:2013-01-05 发布日期:2013-01-05
  • 基金资助:

    This work is supported by the National Science Foundation of USA under Grant Nos. CCF-1065147, ECCS-1125762, the Scholarship Council of China, as well as the Beijing Institute of Technology Yu-Miao Ph.D. Scholarship of China. Any opinions, findings, and conclusions as well as recommendations expressed in this material are those of the authors and do not necessarily reflect the views neither of the National Science Foundation of USA nor of the Scholarship Council of China.

Pinned OS/Services: A Case Study of XML Parsing on Intel SCC

Jie Tang1 (唐洁), Student Member, IEEE, Pollawat Thanarungroj2, Chen Liu2 (刘晨), Shao-Shan Liu3 (刘少山), Zhi-Min Gu1 (古志民), and Jean-Luc Gaudiot4, Fellow, IEEE, Member, ACM   

  1. 1. School of Computer, Beijing Institute of Technology, Beijing 100081, China;
    2. Department of Electrical and Computer Engineering, Florida International University, Miami, Florida 33199, U.S.A.;
    3. Microsoft, Redmond, Washington 98223, U.S.A.;
    4. Department of Electrical Engineering and Computer Science, University of California, Irvine, California 92617, U.S.A.
  • Received:2011-12-31 Revised:2012-05-10 Online:2013-01-05 Published:2013-01-05
  • Supported by:

    This work is supported by the National Science Foundation of USA under Grant Nos. CCF-1065147, ECCS-1125762, the Scholarship Council of China, as well as the Beijing Institute of Technology Yu-Miao Ph.D. Scholarship of China. Any opinions, findings, and conclusions as well as recommendations expressed in this material are those of the authors and do not necessarily reflect the views neither of the National Science Foundation of USA nor of the Scholarship Council of China.

随着多核技术的发展,在不久的将来可将数百数千计算核集中在一块芯片上,然而,传统系统软件和中间件并不适应在如此大规模的系统中进行管理和提供服务.为了改善未来多核系统中操作系统和中间件服务的可扩展性和适应性,本章提出了多核架构下的操作系统/中间件层服务分工.通过把特定的操作系统/中间件服务移植到一个专属核上,片上各核执行不同的分工,以解除系统的性能瓶颈,达到性能提高和能耗节省的双重优化目标.在本章中以Intel 48核同构平台上XML解析服务为例探讨了同构多核下服务分工的可行性.实验结果表明移植后的XML专用解析核在能耗上是可行的,但是在解析过程中,存储子系统给系统性能也会造成很大的影响,限制了专用核在性能优化方面的表现.作为延伸工作,进一步提出了一种使用结合数据预取的存储加速方法来改善XML解析的性能,通过专用核裁剪,可达到20%的性能提升和12.27%的能耗节省.

Abstract: Nowadays, we are heading towards integrating hundreds to thousands of cores on a single chip. However, traditional system software and middleware are not well suited to manage and provide services at such large scale. To improve the scalability and adaptability of operating system and middleware services on future many-core platform, we propose the pinned OS/services. By porting each OS and runtime system (middleware) service to a separate core (special hardware acceleration), we expect to achieve maximal performance gain and energy efficiency in many-core environments. As a case study, we target on XML (Extensible Markup Language), the commonly used data transfer/store standard in the world. We have successfully implemented and evaluated the design of porting XML parsing service onto Intel 48-core Single-Chip Cloud Computer (SCC) platform. The results show that it can provide considerable energy saving. However, we also identified heavy performance penalties introduced from memory side, making the parsing service bloated. Hence, as a further step, we propose the memory-side hardware accelerator for XML parsing. With specified hardware design, we can further enhance the performance gain and energy efficiency, where the performance can be improved by 20% with 12.27% energy reduction.

[1] Moore G E. Cramming more components onto integrated circuits.Electronics, 1965, 38(8): 114-117.

[2] Gries M, Hoffmann U, Konow M, Riepen M. SCC: A flexiblearchitecture for many-core platform research. Computing inScience & Engineering, 2011, 13(6): 79-83

[3] Liu L, Li X, Chen M, Ju R D C. A throughput-driven taskcreation and mapping for network processors. In Proc. the2nd Int. Conf. High Performance Embedded Architecturesand Compilers, January 2007, pp.227-241.

[4] Kahle J A, Day M N, Hofstee H P, Johns C R, Maeurer T R,Shippy D. Introduction to the cell multiprocessor. IBM Journalof Research and Development, 2005, 49(4/5): 589-604.

[5] Chiu K, Govindaraju M, Bramley R. Investigating the limitsof SOAP performance for scientific computing. In Proc. the11th Int. Symp. High Performance Distributed Computing,July 2002, pp.246-254.

[6] Head M R, Govindaraju M, van Engelen R, Zhang W. BenchmarkingXML processors for applications in grid web services.In Proc. Conf. Supercomputing, November 2006, ArticleNo.121.

[7] Apparao P, Bhat M. A detailed look at the characteristics ofXML parsing. In Proc. the 1st Workshop on Building BlockEngine Architectures for Computers and Networks, October2004.

[8] Nicola M, John J. XML parsing: A threat to database performance.In Proc. the 12th Int. Conf. Information andKnowledge Management, November 2003, pp.175-178.

[9] Apparao P, Iyer R, Morin R et al. Architectural characterizationof an XML-centric commercial server workload. InProc. the 33rd Int. Conf. Parallel Processing, August 2004,pp.292-300.

[10] Howard J, Dighe S, Hoskote Y et al. A 48-core IA-32 messagepassingprocessor with DVFS in 45nm CMOS. In Proc. IEEEInt. Solid-State Circuits Conference Digest of Technical Papers,February 2010, pp.108-109.

[11] Mattson T G, Riepen M, Lehnig T et al. The 48-core SCCprocessor: The programmer’s view. In Proc. Int. Conf. HighPerformance Computing, Networking, Storage and Analysis,November 2010, pp.1-11.

[12] Intel labs. SCC platform overview. http://communities.intel.com/docs/DOC-5512.

[13] Jim H. Single-chip cloud computer. In Proc. Intel LabsSingle-Chip Cloud Computer Symposium, February 2010.

[14] Wentzlaff D, Agarwal A. The case for a factored operatingsystem (FOS). Technical Report, MIT-CSAIL-TR-2008-060,MIT CSAIL, October 2008.

[15] Boyd-Wickizer S, Chen H, Chen R et al. Corey: An operatingsystem for many cores. In Proc. the 8th USENIX Symp. OperatingSystems Design and Implementation, December 2008,pp.43-57.

[16] Goulding N, Sampson J, Venkatesh G et al. GreenDroid: Amobile application processor for a future of dark future. InProc. the 22nd Hot Chips, Aug. 2010.

[17] Adhianto L, Banerjee S, Fagan M et al. HPCToolkit: Toolsfor performance analysis of optimized parallel programs. Concurrencyand Computation: Practice and Experience, 2010,22(6): 685-701.

[18] Shivakumar P, Jouppi N P. CACTI3.0: An integrated cachetiming, power, and area model. Technical Report, CompaqWestern Research Laboratory, Feb. 2001.

[19] Tang J, Liu S S, Gu Z M, Liu C, Gaudiot J. Memorysideacceleration for XML parsing. In Proc. the 8th IFIP Int. Conf. Network and Parallel Computing, October 2011,pp.277-292.

[20] Jaleel A, Cohn R S, Luk C K, Jacob B. CMP im: A pin-basedon-the-fly multi-core cache simulator. In Proc. the 4th AnnualWorkshop on Modeling, Benchmarking and Simulation,June 2008.

[21] Tang J, Liu S S, Gu Z M et al. Hardware-assisted middleware:Acceleration of garbage collection operations. In Proc.the 21st Int. Conf. Application-Specific Systems, Architecturesand Processors, July 2010, pp.281-284.

[22] Tang J, Liu S S, Gu Z M et al. Achieving middleware executionefficiency: Hardware-assisted garbage collection operations.Journal of Supercomputing, 2012, 59(3): 1101-1119.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 张钹; 张铃;. Statistical Heuristic Search[J]. , 1987, 2(1): 1 -11 .
[10] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: