›› 2013, Vol. 28 ›› Issue (1): 3-13.doi: 10.1007/s11390-013-1308-6

Special Issue: Computer Architecture and Systems

• Special Section on Selected Paper from NPC 2011 • Previous Articles     Next Articles

Pinned OS/Services: A Case Study of XML Parsing on Intel SCC

Jie Tang1 (唐洁), Student Member, IEEE, Pollawat Thanarungroj2, Chen Liu2 (刘晨), Shao-Shan Liu3 (刘少山), Zhi-Min Gu1 (古志民), and Jean-Luc Gaudiot4, Fellow, IEEE, Member, ACM   

  1. 1. School of Computer, Beijing Institute of Technology, Beijing 100081, China;
    2. Department of Electrical and Computer Engineering, Florida International University, Miami, Florida 33199, U.S.A.;
    3. Microsoft, Redmond, Washington 98223, U.S.A.;
    4. Department of Electrical Engineering and Computer Science, University of California, Irvine, California 92617, U.S.A.
  • Received:2011-12-31 Revised:2012-05-10 Online:2013-01-05 Published:2013-01-05
  • Supported by:

    This work is supported by the National Science Foundation of USA under Grant Nos. CCF-1065147, ECCS-1125762, the Scholarship Council of China, as well as the Beijing Institute of Technology Yu-Miao Ph.D. Scholarship of China. Any opinions, findings, and conclusions as well as recommendations expressed in this material are those of the authors and do not necessarily reflect the views neither of the National Science Foundation of USA nor of the Scholarship Council of China.

Nowadays, we are heading towards integrating hundreds to thousands of cores on a single chip. However, traditional system software and middleware are not well suited to manage and provide services at such large scale. To improve the scalability and adaptability of operating system and middleware services on future many-core platform, we propose the pinned OS/services. By porting each OS and runtime system (middleware) service to a separate core (special hardware acceleration), we expect to achieve maximal performance gain and energy efficiency in many-core environments. As a case study, we target on XML (Extensible Markup Language), the commonly used data transfer/store standard in the world. We have successfully implemented and evaluated the design of porting XML parsing service onto Intel 48-core Single-Chip Cloud Computer (SCC) platform. The results show that it can provide considerable energy saving. However, we also identified heavy performance penalties introduced from memory side, making the parsing service bloated. Hence, as a further step, we propose the memory-side hardware accelerator for XML parsing. With specified hardware design, we can further enhance the performance gain and energy efficiency, where the performance can be improved by 20% with 12.27% energy reduction.

[1] Moore G E. Cramming more components onto integrated circuits.Electronics, 1965, 38(8): 114-117.

[2] Gries M, Hoffmann U, Konow M, Riepen M. SCC: A flexiblearchitecture for many-core platform research. Computing inScience & Engineering, 2011, 13(6): 79-83

[3] Liu L, Li X, Chen M, Ju R D C. A throughput-driven taskcreation and mapping for network processors. In Proc. the2nd Int. Conf. High Performance Embedded Architecturesand Compilers, January 2007, pp.227-241.

[4] Kahle J A, Day M N, Hofstee H P, Johns C R, Maeurer T R,Shippy D. Introduction to the cell multiprocessor. IBM Journalof Research and Development, 2005, 49(4/5): 589-604.

[5] Chiu K, Govindaraju M, Bramley R. Investigating the limitsof SOAP performance for scientific computing. In Proc. the11th Int. Symp. High Performance Distributed Computing,July 2002, pp.246-254.

[6] Head M R, Govindaraju M, van Engelen R, Zhang W. BenchmarkingXML processors for applications in grid web services.In Proc. Conf. Supercomputing, November 2006, ArticleNo.121.

[7] Apparao P, Bhat M. A detailed look at the characteristics ofXML parsing. In Proc. the 1st Workshop on Building BlockEngine Architectures for Computers and Networks, October2004.

[8] Nicola M, John J. XML parsing: A threat to database performance.In Proc. the 12th Int. Conf. Information andKnowledge Management, November 2003, pp.175-178.

[9] Apparao P, Iyer R, Morin R et al. Architectural characterizationof an XML-centric commercial server workload. InProc. the 33rd Int. Conf. Parallel Processing, August 2004,pp.292-300.

[10] Howard J, Dighe S, Hoskote Y et al. A 48-core IA-32 messagepassingprocessor with DVFS in 45nm CMOS. In Proc. IEEEInt. Solid-State Circuits Conference Digest of Technical Papers,February 2010, pp.108-109.

[11] Mattson T G, Riepen M, Lehnig T et al. The 48-core SCCprocessor: The programmer’s view. In Proc. Int. Conf. HighPerformance Computing, Networking, Storage and Analysis,November 2010, pp.1-11.

[12] Intel labs. SCC platform overview. http://communities.intel.com/docs/DOC-5512.

[13] Jim H. Single-chip cloud computer. In Proc. Intel LabsSingle-Chip Cloud Computer Symposium, February 2010.

[14] Wentzlaff D, Agarwal A. The case for a factored operatingsystem (FOS). Technical Report, MIT-CSAIL-TR-2008-060,MIT CSAIL, October 2008.

[15] Boyd-Wickizer S, Chen H, Chen R et al. Corey: An operatingsystem for many cores. In Proc. the 8th USENIX Symp. OperatingSystems Design and Implementation, December 2008,pp.43-57.

[16] Goulding N, Sampson J, Venkatesh G et al. GreenDroid: Amobile application processor for a future of dark future. InProc. the 22nd Hot Chips, Aug. 2010.

[17] Adhianto L, Banerjee S, Fagan M et al. HPCToolkit: Toolsfor performance analysis of optimized parallel programs. Concurrencyand Computation: Practice and Experience, 2010,22(6): 685-701.

[18] Shivakumar P, Jouppi N P. CACTI3.0: An integrated cachetiming, power, and area model. Technical Report, CompaqWestern Research Laboratory, Feb. 2001.

[19] Tang J, Liu S S, Gu Z M, Liu C, Gaudiot J. Memorysideacceleration for XML parsing. In Proc. the 8th IFIP Int. Conf. Network and Parallel Computing, October 2011,pp.277-292.

[20] Jaleel A, Cohn R S, Luk C K, Jacob B. CMP im: A pin-basedon-the-fly multi-core cache simulator. In Proc. the 4th AnnualWorkshop on Modeling, Benchmarking and Simulation,June 2008.

[21] Tang J, Liu S S, Gu Z M et al. Hardware-assisted middleware:Acceleration of garbage collection operations. In Proc.the 21st Int. Conf. Application-Specific Systems, Architecturesand Processors, July 2010, pp.281-284.

[22] Tang J, Liu S S, Gu Z M et al. Achieving middleware executionefficiency: Hardware-assisted garbage collection operations.Journal of Supercomputing, 2012, 59(3): 1101-1119.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhang Bo; Zhang Ling;. Statistical Heuristic Search[J]. , 1987, 2(1): 1 -11 .
[10] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved