›› 2009, Vol. 24 ›› Issue (6): 1048-1060.

• Special Section on International Partnership Programs Supported by CAS • Previous Articles     Next Articles

SimK: A Large-Scale Parallel Simulation Engine

Jian-Wei Xu1,2 (许建卫), Student Member, CCF, Ming-Yu Chen1,* (陈明宇), Member, CCF, ACM, IEEE, Gui Zheng1,2 (郑规), Zheng Cao1,2 (曹政), Hui-Wei Lv1,2 (吕慧伟), and Ning-Hui Sun1 (孙凝晖), Senior Member, CCF, Member, IEEE   

  1. 1Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190| China
    2Graduate University of Chinese Academy of Sciences, Beijing 100039, China
  • Received:2009-03-13 Revised:2009-09-30 Online:2009-11-05 Published:2009-11-05
  • About author:
    Jian-Wei Xu was born in 1978. He received the Ph.D. degree from Institute of Computing Technology (ICT) in 2009. He is an assistant professor, and CCF student member. His main research interests include computer simulation, distributed computing, high performance computer architecture.
    Ming-Yu Chen was born in 1972. He received the Ph.D. degree from Institute of Computing Technology in 2000. He is a professor of Institute of Computing Technology, a member of CCF, IEEE, ACM. His main research interests include high performance computer architecure and operating system.
    Gui Zheng received his M.S. degree from the Institute of Computing Technology in 2009. His research interests focus on computer simulation, parallel and distributed computing.
    Zheng Cao was born in 1982. He received the Ph.D. degree from Institute of Computing Technology in 2009, and is an assistant professor. His main research interests include distributed computing, high performance computer architecture, and high performance interconnection networks.
    Hui-Wei Lv is a Ph.D. candidate in computer architecture at Key Laboratory of Computer System and Architecture, Institute of Computing Technology. He received his B.S. degree in computer science from Beijing University of Chemical Technology, China. His research interests include computer architecture, high performance computing and parallel computing.
    Ning-Hui Sun received the Bachelor's degree from Peking University in 1989 and the Master's and Ph.D. degrees from the Chinese Academy of Science (CAS) in 1992 and 1999, respectively. He is a professor in the Institute of Computing Technology (ICT), CAS. He is the architect and main designer of Dawning2000, Dawning3000, Dawning4000, and Dawning5000 high-performance computers. His major research interests include computer architecture, operating system, and parallel algorithm. He is a member of IEEE and senior member of CCF.
  • Supported by:

    Supported by the National Natural Science Foundation of China under Grant No. 60633040, the National High Technology Research and Development 863 Program of China under Grant Nos. 2006AA01A102 and 2007AA01Z115.

Simulation is an important method to evaluate future computer systems. Currently microprocessor architecture has switched to parallel, but almost all simulators remained at sequential stage, and the advantages brought by multi-core or many-core processors cannot be utilized. This paper presents a parallel simulator engine (SimK) towards the prevalent SMP/CMP platform, aiming at large-scale fine-grained computer system simulation. In this paper, highly efficient synchronization, communication and buffer management policies used in SimK are introduced, and a novel lock-free scheduling mechanism that avoids using any atomic instructions is presented. To deal with the load fluctuation at light load case, a cooperated dynamic task migration scheme is proposed. Based on SimK, we have developed large-scale parallel simulators HppSim and HppNetSim, which simulate a full supercomputer system and its interconnection network respectively. Results show that HppSim and HppNetSim both gain sound speedup with multiple processors, and the best normalized speedup reaches 14.95X on a two-way quad-core server.

[1] Chidester M, George A. Parallel simulation of chipmultiprocessor architectures. ACM Trans. Model. Comput. Simul., 2002, 12(3): 176–200.
[2] Penry D, Fay D, Hodgdon D, Wells R, Schelle G, August D, Connors D. Exploiting parallelism and structure to accelerate the simulation of chip multi-processors. In Proc. The Twelfth International Symposium on High-Performance Computer Architecture, Austin, USA, Feb. 11–15, 2006, pp.29–40.
[3] Zheng G, Kakulapati G, Kale L. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In Proc. 18th International Parallel and Distributed Processing Symposium, Santa Fe, USA, April 26–30, 2004, p.786.
[4] Prakash S, Bagrodia R L. Mpi-sim: Using parallel simulation to evaluate mpi programs. In Proc. the 30th Conference on Winter Simulation (WSC 1998), Los Alamitos, USA, Dec. 13– 16, 1998, pp.467–474.
[5] Sharma A, Nguyen A T, Torrellas J. Augmint: A multiprocessor simulation environment for Intel x86 architectures. Technical report, University of Illinois at Urbana-Champaign, 1996.
[6] Austin T, Larson E, Ernst D. Simplescalar: An infrastructure for computer system modeling. Computer, 2002, 35(2): 59–67.
[7] Vachharajani M, Vachharajani N, Penry D, Blome J, August D. Microarchitectural exploration with liberty. In Proc. 35th Annual International Symposium on Microarchitecture, (MICRO-35), Istanbul, Turkey, Nov. 18–22, 2002, pp.271– 282.
[8] Garey M R, Johnson D S, Stockmeyer L. Some simplified npcomplete problems. In Proc. the Sixth Annual ACM Symposium on Theory of Computing (STOC 1974), New York, USA, April 30–May 2, 1974, pp.47–63.
[9] Hendrickson B, Leland R. The chaco user’s guide: Version 2.0. Technical Report, Sandia National Laboratories, 1994.
[10] Fujimoto R M. Parallel discrete event simulation. Commun. ACM, 1990, 33(10): 30–53.
[11] Francis J A, Berkbigler K, Booker G, Bush B, Davis K, Hoisie A. An approach to extreme-scale simulation of novel architectures. In Proc. the Conference on Systemics, Cybernetics, and Informatics (SCI 2002), Orlando, Florida, July 14–18, 2002, Poster.
[12] Rao D, Wilsey P. An ultra-large scale simulation framework. Journal of Parallel and Distributed Computing, 2002, 62(11): 1670–1693.
[13] Sharma G D, Radhakrishnan R, Rajasekaran U K V, Abu- Ghazaleh N, Wilsey P A. Time warp simulation on clumps. In Proc. the Thirteenth Workshop on Parallel and Distributed Simulation (PADS 1999), Atlanta, USA, May 1–4, 1999, pp.174–181.
[14] Perumalla K. μsik -a micro-kernel for parallel/distributed simulation systems. In Proc. Workshop on Principles of Advanced and Distributed Simulation (PADS), Monterey, USA, June 1–3, 2005, pp.59–68.
[15] Wu M, Li X F. Task-pushing: A scalable parallel gc marking algorithm without synchronization operations. In Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, USA, March 26–30, 2007, pp.1–10.
[16] Fujimoto R M, Panesar K S. Buffer management in sharedmemory time warp systems. SIGSIM Simul. Dig., 1995, 25(1): 149–156.
[17] Boukerche A, Das S K. Reducing null messages overhead through load balancing in conservative distributed simulation systems. J. Parallel Distrib. Comput., 2004, 64(3): 330–344.
[18] Glazer D, Tropper C. On process migration and load balancing in time warp. IEEE Transactions on Parallel and Distributed Systems, Mar. 1993, 4(3): 318–327.
[19] Blumofe R D, Leiserson C E. Scheduling multithreaded computations by work stealing. J. ACM, 1999, 46(5): 720–748.
[20] Boukerche A, Das S K. Dynamic load balancing strategies for conservative parallel simulations. SIGSIM Simul. Dig., 1997, 27(1): 20–28.
[21] Boukerche A, Das S K. Null messages cancellation through load balancing in distributed simulations. In Proc. the 5th International Euro-Par Conference on Parallel Processing (Euro-Par 1999), London, UK, Aug. 31–Sept. 3, 1999, pp.562–569.
[22] Jiang M R, Shieh S P, Liu C L. Dynamic load balancing in parallel simulation using time warp mechanism. In Proc. International Conference on Parallel and Distributed Systems, Hsinchu, China, Dec. 19–21, 1994, pp.222–227.
[23] Arora N S, Blumofe R D, Plaxton C G. Thread scheduling for multiprogrammed multiprocessors. In Proc. the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA1998), Puerto Vallarta, Mexico, June 28–July 2, 1998, pp.119–129.
[24] Calder B, Krintz C, John S, Austin T. Cache-conscious data placement. SIGPLAN Not., 1998, 33(11): 139–149.
[25] Mellor-Crummey J M, Scott M L. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 1991, 9(1): 21–65.
[26] Sun N, Li K, Chen M, Hpp: An architecture for high performance and utility computing. Chinese Journal of Computer, 2008, 31(9): 1503–1508.
[27] Ceze L, Strauss K, Almasi G, Bohrer P, Brunheroto J, Cascaval C, Castanos J, Lieber D, Martorell X, Moreira J, Sanomiya A, Schenfeld E. Full circle: Simulating Linux clusters on Linux clusters. In Proc. the Fourth LCI International Conference on Linux Clusters: The HPC Revolution 2003, 2003.
[28] Mukherjee S. Wisconsin Wind Tunnel II: A fast and portable parallel architecture simulator. Workshop on Performance Analysis and Its Impact on Design, 1997.
[29] Presley M, Reiher P, Bellenot S. A time warp implementation of sharks world. In Proc. Winter Simulation Conference, New Orleans, USA, Dec. 9–12, 1990, pp.199–203.
[30] Jefferson D R. Virtual time. ACM Trans. Program. Lang. Syst., 1985, 7(3): 404–425.
[31] High Level Architecture Interface Specification Version 1.3 2, U. D. Defense, 1998.
[32] ssfnet. March 1, 2009, http://www.ssfnet.org/.

No related articles found!
Full text



[1] Lian Lin; Zhang Yili; Tang Changjie;. A Non-Recursive Algorithm Computing Set Expressions[J]. , 1988, 3(4): 310 -316 .
[2] Wang Hui; Liu Dayou; Wang Yafei;. Sequential Back-Propagation[J]. , 1994, 9(3): 252 -260 .
[3] Xiong Zhiguo; Xu Xi; Dong Shihai;. CX11: A Chinese Language Supporting Interface for X Window Environment[J]. , 1995, 10(1): 15 -22 .
[4] Zeng Jianchao; Hidehiko Sanada; Yoshikazu; Tezuka Xu Guangyou;. A Form-Correcting System of Chinese Characters Using a Model of Correcting Procedures of Calligraphists[J]. , 1995, 10(1): 23 -34 .
[5] Cao Cungen;. Expansion Nets and Expansion Processes of Elementary Net Systems[J]. , 1995, 10(4): 325 -333 .
[6] WANG Xiaodong; XU Ming; ZHOU Xingming;. Fast Multicast on Multistage Interconnection Networks Using Multi-Head Worms[J]. , 1999, 14(3): 250 -258 .
[7] FAN Xiaocong; XU Dianxiang; HOU Jianmin; ZHENG Guoliang;. Reasoning about Concurrent Actionsin Multi-Agent Systems[J]. , 1999, 14(4): 422 -428 .
[8] LIU Bin; LU Zengxiang; GAN Quan; FENG Ao; WANG Pu;. Infomarker—A New Internet Information Service System[J]. , 2000, 15(3): 300 -304 .
[9] XU Xiaofei; YE Dan; LI Quanlong; ZHAN Dechen;. Dynamic Organization and Methodology for Agile Virtual Enterprises[J]. , 2000, 15(4): 368 -375 .
[10] Dun-Ren Che. Accomplishing Deterministic XML Query Optimization[J]. , 2005, 20(3): 357 -366 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved