1. School of Computer, Beijing Institute of Technology, Beijing 100081, China;
2. Department of Electrical and Computer Engineering, Florida International University, Miami, Florida 33199, U.S.A.;
3. Microsoft, Redmond, Washington 98223, U.S.A.;
4. Department of Electrical Engineering and Computer Science, University of California, Irvine, California 92617, U.S.A.
Nowadays, we are heading towards integrating hundreds to thousands of cores on a single chip. However, traditional system software and middleware are not well suited to manage and provide services at such large scale. To improve the scalability and adaptability of operating system and middleware services on future many-core platform, we propose the pinned OS/services. By porting each OS and runtime system (middleware) service to a separate core (special hardware acceleration), we expect to achieve maximal performance gain and energy efficiency in many-core environments. As a case study, we target on XML (Extensible Markup Language), the commonly used data transfer/store standard in the world. We have successfully implemented and evaluated the design of porting XML parsing service onto Intel 48-core Single-Chip Cloud Computer (SCC) platform. The results show that it can provide considerable energy saving. However, we also identified heavy performance penalties introduced from memory side, making the parsing service bloated. Hence, as a further step, we propose the memory-side hardware accelerator for XML parsing. With specified hardware design, we can further enhance the performance gain and energy efficiency, where the performance can be improved by 20% with 12.27% energy reduction.
This work is supported by the National Science Foundation of USA under Grant Nos. CCF-1065147, ECCS-1125762, the Scholarship Council of China, as well as the Beijing Institute of Technology Yu-Miao Ph.D. Scholarship of China. Any opinions, findings, and conclusions as well as recommendations expressed in this material are those of the authors and do not necessarily reflect the views neither of the National Science Foundation of USA nor of the Scholarship Council of China.
Jie Tang, Chen Liu, Shao-Shan Liu, Zhi-Min Gu, Jean-Luc Gaudiot.多核服务分工：Intel SCC下的XML数据解析研究[J] Journal of Computer Science and Technology , 2013,V28(1): 3-13
Jie Tang, Chen Liu, Shao-Shan Liu, Zhi-Min Gu, and Jean-Luc Gaudiot.Pinned OS/Services: A Case Study of XML Parsing on Intel SCC[J] Journal of Computer Science and Technology, 2013,V28(1): 3-13
 Moore G E. Cramming more components onto integrated circuits.Electronics, 1965, 38(8): 114-117. Gries M, Hoffmann U, Konow M, Riepen M. SCC: A flexiblearchitecture for many-core platform research. Computing inScience & Engineering, 2011, 13(6): 79-83 Liu L, Li X, Chen M, Ju R D C. A throughput-driven taskcreation and mapping for network processors. In Proc. the2nd Int. Conf. High Performance Embedded Architecturesand Compilers, January 2007, pp.227-241. Kahle J A, Day M N, Hofstee H P, Johns C R, Maeurer T R,Shippy D. Introduction to the cell multiprocessor. IBM Journalof Research and Development, 2005, 49(4/5): 589-604. Chiu K, Govindaraju M, Bramley R. Investigating the limitsof SOAP performance for scientific computing. In Proc. the11th Int. Symp. High Performance Distributed Computing,July 2002, pp.246-254. Head M R, Govindaraju M, van Engelen R, Zhang W. BenchmarkingXML processors for applications in grid web services.In Proc. Conf. Supercomputing, November 2006, ArticleNo.121. Apparao P, Bhat M. A detailed look at the characteristics ofXML parsing. In Proc. the 1st Workshop on Building BlockEngine Architectures for Computers and Networks, October2004. Nicola M, John J. XML parsing: A threat to database performance.In Proc. the 12th Int. Conf. Information andKnowledge Management, November 2003, pp.175-178. Apparao P, Iyer R, Morin R et al. Architectural characterizationof an XML-centric commercial server workload. InProc. the 33rd Int. Conf. Parallel Processing, August 2004,pp.292-300. Howard J, Dighe S, Hoskote Y et al. A 48-core IA-32 messagepassingprocessor with DVFS in 45nm CMOS. In Proc. IEEEInt. Solid-State Circuits Conference Digest of Technical Papers,February 2010, pp.108-109. Mattson T G, Riepen M, Lehnig T et al. The 48-core SCCprocessor: The programmer’s view. In Proc. Int. Conf. HighPerformance Computing, Networking, Storage and Analysis,November 2010, pp.1-11. Intel labs. SCC platform overview. http://communities.intel.com/docs/DOC-5512. Jim H. Single-chip cloud computer. In Proc. Intel LabsSingle-Chip Cloud Computer Symposium, February 2010. Wentzlaff D, Agarwal A. The case for a factored operatingsystem (FOS). Technical Report, MIT-CSAIL-TR-2008-060,MIT CSAIL, October 2008. Boyd-Wickizer S, Chen H, Chen R et al. Corey: An operatingsystem for many cores. In Proc. the 8th USENIX Symp. OperatingSystems Design and Implementation, December 2008,pp.43-57. Goulding N, Sampson J, Venkatesh G et al. GreenDroid: Amobile application processor for a future of dark future. InProc. the 22nd Hot Chips, Aug. 2010. Adhianto L, Banerjee S, Fagan M et al. HPCToolkit: Toolsfor performance analysis of optimized parallel programs. Concurrencyand Computation: Practice and Experience, 2010,22(6): 685-701. Shivakumar P, Jouppi N P. CACTI3.0: An integrated cachetiming, power, and area model. Technical Report, CompaqWestern Research Laboratory, Feb. 2001. Tang J, Liu S S, Gu Z M, Liu C, Gaudiot J. Memorysideacceleration for XML parsing. In Proc. the 8th IFIP Int. Conf. Network and Parallel Computing, October 2011,pp.277-292. Jaleel A, Cohn R S, Luk C K, Jacob B. CMP im: A pin-basedon-the-fly multi-core cache simulator. In Proc. the 4th AnnualWorkshop on Modeling, Benchmarking and Simulation,June 2008. Tang J, Liu S S, Gu Z M et al. Hardware-assisted middleware:Acceleration of garbage collection operations. In Proc.the 21st Int. Conf. Application-Specific Systems, Architecturesand Processors, July 2010, pp.281-284. Tang J, Liu S S, Gu Z M et al. Achieving middleware executionefficiency: Hardware-assisted garbage collection operations.Journal of Supercomputing, 2012, 59(3): 1101-1119.
Copyright 2010 by Journal of Computer Science and Technology