We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Xue-Jun Yang, Yong Dou, Qing-Feng Hu. Progress and Challenges in High Performance Computer Technology[J]. Journal of Computer Science and Technology, 2006, 21(5): 674-681.
Citation: Xue-Jun Yang, Yong Dou, Qing-Feng Hu. Progress and Challenges in High Performance Computer Technology[J]. Journal of Computer Science and Technology, 2006, 21(5): 674-681.

Progress and Challenges in High Performance Computer Technology

More Information
  • Received Date: May 18, 2006
  • Revised Date: August 22, 2006
  • Published Date: September 14, 2006
  • High performance computers provide strategic computing power in theconstruction of national economy and defense, and become one of symbolsof the country's overall strength. Over 30 years, with the supports ofgovernments, the technology of high performance computers is in theprocess of rapid development, during which the computing performanceincreases nearly 3 million times and the processors number expands over10 hundred thousands times. To solve the critical issues related withparallel efficiency and scalability, scientific researchers pursuedextensive theoretical studies and technical innovations. The paperbriefly looks back the course of building high performance computersystems both at home and abroad, and summarizes the significantbreakthroughs of international high performance computer technology. Wealso overview the technology progress of China in the area of parallelcomputer architecture, parallel operating system and resourcemanagement, parallel compiler and performance optimization, environmentfor parallel programming and network computing. Finally, we examine thechallenging issues, "memory wall", system scalability and "powerwall", and discuss the issues of high productivity computers, which isthe trend in building next generation high performance computers.
  • \REF{[1]} Susan L Graham, Marc Snir, Cynthia A Patterson. Getting up to speed: The future of supercomputing. Committee on the Future of Supercomputing, National Research Council. \REF{[2]}PITAC report. Computational science: Ensuring America's co\-mpetitiveness. http://www.nitrd.gov/pitac/reports/200506 09\_compu\-tational/compu\-tational.pdf \REF{[3]} Cray History. http://www.cray.com/about\_cray/history.html. \REF{[4]} CM-5 at UC Berkeley. http://www.eecs.berkeley.edu/Resea\-rch /Projects/CS/parallel/cm5.html. \REF{[5]} The development road of Chinese supercomputer. http://www.dawning.com.cn/4000A/test\_gx\_1.htm. (in Chinese) \REF{[6]} http://www.nti.org/e\_research/profiles/China/Chemical/in\-dex.html. \REF{[7]} ASCI Red SiteMap.http://www.sandia.gov/ASCI/Red/Site\-Map.htm. \REF{[8]} http://www.top500.org/. \REF{[9]} The earth simulator center.http://www.es.jamstec.go.jp/. \REF{[10]} BlueGene. http://www.research.ibm.com/bluegene/. \REF{[11]} John L. Gustafson. Reevaluating Amdahl's law. {\it Communications of the ACM}, May 1988, 31(5): 532--533. \REF{[12]}David Culler, Richard Karp, David Patterson \it et al. \rm LogP: Towards a realistic model of parallel computation. In \it Proc. the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, \rm New York, ACM Pres, 1993, pp.1--12. \REF{[13]}Michael J Quinn. Parallel programming in C with MPI and OpenMP. USA: McGraw-Hill, May 2003. \REF{[14]}Neil H E Weste, David Harris. CMOS VLSI design: A Circuits and Systems Perspective. 3rd Edition, USA: Addison-Wesley, May 2004. \REF{[15]}Jose Duato, Sudhakar Yalamanchili Lionel Ni. Interconnection Networks: An Engineering Approach. 2nd Edition, Morgan Kaufmann Publishers, 2002. \REF{[16]} Rajkumar Buyya. High Performance Cluster Computing Architectures and Systems, Volume 1. Prentice Hall, May 1999. \REF{[17]} Scientific computing and visualization. http://scv.bu.edu/. \REF{[18]}Francine Berman, Geoffrey C Fox, Tony Hey. Grid Computing: Making the Global Infrastructure a Reality. John Wiley and Sons, May 2003. \REF{[19]} Xubang Shen, Xixin Cao. The selection of models for LS MPP. \it Chinese Journal of Computers, \rm 1997, 20(5): 385--390. (in Chinese) \REF{[20]} Li li, Xubang Shen. The design of LS SIMD array microprocessor control logic. \it Chinese Journal of Computers, \rm 2000, 23(5): 557--560. (in Chinese) \REF{[21]} Caoyang Chen, Zhong Wang, Xubang Shen \it et al. \rm The LS MPP parallel image processor. \it Chinese Journal of Computers, \rm 2002, 25(3): 292--296. (in Chinese) \REF{[22]} Ying Zhang, Wei Huang, Qunsheng Ma, Sanli Li. The design and implementation of hierarchical parallel system: MP860 supercomputer. \it Chinese Journal of Computers, \rm 1998, 21(z1): 230--236. (in Chinese) \REF{[23]} Ling Qiao, Zhizhong Tang, Hongbo Rong, Chihong Zhang. The model of instruction level parallel program execution. \it Chinese Journal of Computers, \rm 1999, 22(5): 476--480. (in Chinese) \REF{[24]} Gang Xiao, Xingming Zhou, Ming Xu, Kun Deng. SMA: A speculative multithreaded architecture. \it Chinese Journal of Computers, \rm 1999, 22(6): 582--590. (in Chinese) \REF{[25]} Yunquan Zhang. DRAM(h): A parallel computation model for high performance numerical computing. \it Chinese Journal of Computers, \rm 2003, 26(12): 1660--1670. (in Chinese) \REF{[26]} Weiwu Hu, Peisu Xia. Out-of-order execution in sequentially consistent shared memory systems: Principles. \it Chinese Journal of Computers, \rm 1997, 20(6): 481--490. (in Chinese) \REF{[27]} Weiwu Hu, Peisu Xia. Out-of-order execution in sequentially consistent shared memory systems: Simulation results. \it Chinese Journal of Computers, \rm 1997, 20(6): 491--500. (in Chinese) \REF{[28]} Xianghui Xie, Chengde Han, Zhimin Tang. Data pre sending technique in distributed shared memory systems. \it Chinese Journal of Computers, \rm 1999, 22(3): 241--248. (in Chinese) \REF{[29]} Weiwu Hu, Weisong Shi, Zhimin Tang. A software DSM system based on a new cache coherence protocol. \it Chinese Journal of Computers, \rm 1999, 22(5): 467--475. (in Chinese) \REF{[30]} Huadong Dai, Xuejun Yang. An operating system-centric memory consistency model --- Thread consistency model. \it Journal of Computer Research and Development, \rm 2003, 40(2): 351--359. \REF{[31]} Yong Dou, Xingming Zhou. A software controlled data prefetching scheme based on weak order consistency model. \it Journal of Software, \rm 1997, 8(2): 81--86. \REF{[32]} Rong Zeng, Xiangjun Dong, Mingfa Zhu. Wormhole routing and its chip design. \it Chinese Journal of Computers, \rm 1997, 20(5): 404--411. (in Chinese) \REF{[33]} Feng Gao, Zhongcheng Li, Yinghua Min, Jie Wu. A fault-tolerant routing strategy based on extended safety vectors in hypercube multicomputers. \it Chinese Journal of Computers, \rm 2000, 23(3): 248--254. (in Chinese) \REF{[34]} Jianfeng Wu, Shanli Li, Yi Ge. Message memory network interface design in network parallel computing. \it Chinese Journal of Computers, \rm 2000, 23(2): 195--201. (in Chinese) \REF{[35]} Jun Shen, Weimin Zheng, Dapeng Ju. FMP: A fast message passing for workstation clusters. \it Chinese Journal of Computers, \rm 1998, 21(7): 595--602. (in Chinese) \REF{[36]} Zuo-ning Chen, Yi-lian Jin. A parallel operating system based on multi-virtual-space and multi-mapping technology. \it Journal of Software, \rm 2001, 12(10): 1562--1568. \REF{[37]} Ning-Hui Sun, Zhi-wei Xu. Design of system software of Dawning/2000 supercomputer. \it Chinese Journal of Computers, \rm 2000, 23(1): 9--20. (in Chinese) \REF{[38]} Dan Meng, Jian-feng Zhan, Lei Wang \it et al. \rm Fully integrated cluster operating system: Phoenix. \it Journal of Computer Research and Development, \rm 2005, 42(6): 979--986. (in Chinese) \REF{[39]} Hua-ping Chen, Liu-sheng Huang. Processor selection policy in heuristic task scheduling. \it Journal of Software, \rm 1999, 10(11): 1194--1198. (in Chinese) \REF{[40]} Jin-gui Huang, Jian-er Chen, Song-qiao Chen. Parallel-job scheduling on cluster computing systems. \it Chinese Journal of Computers, \rm 2004, 27(6): 765--771. (in Chinese) \REF{[41]} Qing-hua Li, Jian-jun Han, Abbas A Essa. A fast and effective static task scheduling algorithm in homogeneous computing environments. \it Journal of Computer Research and Development, \rm 2005, 42(1): 118--125. (in Chinese) \REF{[42]} Qiang Fu, Wei-min Zheng. A dynamic task scheduling method in cluster of workstations. \it Journal of Software, \rm 1999, 10(1): 19--23. (in Chinese) \REF{[43]} Hao Huang, Jian-cheng Du, Dao-xu Chen, Li Xie. Optimum degree of parallelism-based task dependence graph scheduling scheme. \it Journal of Software, \rm 1999, 10(10): 1038--1046. (in Chinese) \REF{[44]} Zhou Lei, Zhi-wei Xu, Ming-fa Zhu. A new adaptive processor allocation algorithm for cluster: Limited load balancing allocation (LLBA). \it Chinese Journal of Computers, \rm 1999, 22(8): 877--881. (in Chinese) \REF{[45]} Nong Xiao, Yu-tong Lu, Xi-cheng Lu. A dynamic load distributing algorithm based on a parallel computing network environment. \it Journal of Computer Research and Development, \rm 1999, 36(2): 238--241. (in Chinese) \REF{[46]} Zhi-yan Jin, Ding-xing Wang. Diffusion algorithm of dynamic load balancing for heterogeneous system. \it Chinese Journal of Computers, \rm 2003, 26(11): 1487--1493. (in Chinese) \REF{[47]} Yan-zhi Wen, Rui-qi Lian, Cheng-yong Wu \it et al. \rm A micro-scheduling method on directed cyclic graph. \it Journal of Computer Research and Development, \rm 2005, 42(3): 387--393. (in Chinese) \REF{[48]} Jin-Wei Hong, Guo-liang Chen, Zhao-qing Zhang, Feng Zhang. Compiling-support communication optimizations for SVMs. \it Chinese Journal of Computers, \rm 2000, 23(7): 738--743. (in Chinese) \REF{[49]} Rui-qi Lian, Zhao-qing Zhang, Ru-liang Qiao. A data prefetching method used in ILP compilers and its optimization. \it Chinese Journal of Computers, \rm 2000, 23(6): 576--584. (in Chinese) \REF{[50]} Rui-qi Lian, Cheng-yong Wu, Zhao-qing Zhang. Integrating code optimization and instruction scheduling. \it Chinese Journal of Computers, \rm 2001, 24(7): 694--701. (in Chinese) \REF{[51]} Yun-zhao Lu, Zhao-qing Zhang, Qui-qi Lian. Predicate analysis techniques in ILP. \it Chinese Journal of Computers, \rm 2003, 26(10): 1337--1342. (in Chinese) \REF{[52]} Wenlong Li, Haibo Lin, Zhizhong Tang. Cost model and decision framework for software pipelining. \it Journal of Software, \rm 2004, 15(7): 1005--1011. (in Chinese) \REF{[53]} Haibo Lin, Wenlong Li, Zhizhong Tang. Research on register requirements of software pipelined loops in the IA-64 architecture. \it Journal of Computer Research and Development, \rm 2004, 41(1): 22--27. (in Chinese) \REF{[54]} Li Liu, Wenlong Li, Zhenyu Gu, Shengmei Li, Zhizhong Tang. Optimization to prevent cache penalty in modulo scheduling. \it Journal of Software, \rm 2005, 16(10): 1842--1852. (in Chinese) \REF{[55]} Jun Xia, Xuejun Yang, Lifang Zeng, Haifang Zhou. A projection-delamination based approach to optimizing spatial locality in loop nests. \it Chinese Journal of Computers, \rm 2003, 26(5): 539--551. (in Chinese) \REF{[56]} Jun Xia, Huadong Dai, Xuejun Yang. A linear expressing based approach for optimizing locality using non-singular loop transformations. \it Chinese Journal of Computers, \rm 2003, 26(12): 1609--1620. (in Chinese) \REF{[57]} Jun Xia, Xuejun Yang. A data space fusion based approach for global computation and data decompositions. \it Journal of Software, \rm 2004, 15(9): 1311--1327. (in Chinese) \REF{[58]} Guokai Ma, Xin Wang, Peng Wwang \it et al. \rm Increase parallel granularity and data locality by unimodular metrics. \it Chinese Journal of Computers, \rm 2004, 27(4): 516--523. (in Chinese) \REF{[59]}Lifang Zeng, Xuejun Yang, Jun Xia, Juan Chen. Improving data locality and reducing false-sharing based on data fusion. \it Chinese Journal of Computers, \rm 2005, 27(1): 32--41. (in Chinese) \REF{[60]}Yijun Yu, Binyu Zang, Wu Shi, Chuanqi Zhu. Automatically computing unimodular transforming matrix to parallelize nested sequential loops. \it Journal of Software, \rm 1999, 10(4): 366--371. (in Chinese) \REF{[61]}Jianping Wang, Xu Cheng, Wenkui Ding \it et al. \rm The implementation strategy of communication in HPF compiler and related algorithms. \it Chinese Journal of Computers, \rm 1999, 22(5): 486--496. (in Chinese) \REF{[62]}Li Chen, Zhaoqing Zhang, Xiaobing Feng. Redundant computation partitioning in distributed-memory systems. \it Chinese Journal of Computers, \rm 2003, 26(2): 180--187. (in Chinese) \REF{[63]}Bo Yang, Dingxing Wang, Weimin Zheng. An algorithm on task scheduling in structural parallel control mechanism. \it Journal of Software, \rm 2001, 12(5): 698--705. (in Chinese) \REF{[64]}Qiang Liu, Zhaoqing Zhang, Ruliang Qiao. An integrated tool for debugging, monitoring and performance analysis. \it Journal of Software, \rm 1999, 10(2): 220--224. (in Chinese) \REF{[65]}Jian Liu, Hao Wang, Meiming Sheng, Weimin Zheng. A parallel debugger with fast conditional breakpoint. \it Journal of Software, \rm 2003, 14(11): 1827--1833. (in Chinese) \REF{[66]} Chao Yan, Taoying Liu, Guoliang Chen. A parallel debugger based on cluster operating system. \it Journal of Computer Research and Development, \rm 2004, 41(4): 630--636. (in Chinese) \REF{[67]} Zhiwei Xu, Wei Li. Research on Vega grid architecture. \it Journal of Computer Research and Development, \rm 2002, 39(8): 923--929. (in Chinese) \REF{[68]} Xicheng Lu, Huaimin Wang, Ji Wang. Internet-based virtual computing environment (iVCE): Concepts and architecture. \it Science in China, Series E, \rm 2006, 36(10). (To appear) \REF{[69]}Dongsheng Li, Xicheng Lu. A novel constant degree and constant congestion DHT scheme for peer-to-peer networks. \it Science in China, \rm 2005, 48(4): 421--436. \REF{[70]} Hai Zhuge. Semantic grid: Scientific issues, infrastructure, and methodology. \it Communications of the ACM, \rm 2005, 48(4): 117--119. \REF{[71]} HPCS program. http://www.highproductivity.org/. \REF{[72]} National energy research scientific computing center 2004 annual report. National Energy Research Scientific Computing Center, 2005, http://www.nersc.gov/news/annual\_reports/an\-nre\-p04/annrep04.pdf. \REF{[73]} Wulf W, McKee S. Hitting the memory wall: Implications of the obvious. \it Computer Architecture News, \rm 1995, 23(1): 20--24.
  • Related Articles

    [1]Ruo-Han Wu, Xian-Yu Zhu, Jun-Shi Chen, Hong An. SwFormer: Enabling Faster Foundation Models on new Sunway Supercomputer via Holistic Kernel Tiling and Scheduling[J]. Journal of Computer Science and Technology. DOI: 10.1007/s11390-025-4761-0
    [2]Kirk W. Cameron. Adventures Beyond Amdahl's Law: How Power-Performance Measurement and Modeling at Scale Drive Server and Supercomputer Design[J]. Journal of Computer Science and Technology, 2023, 38(1): 80-86. DOI: 10.1007/s11390-022-2950-7
    [3]Yu-Qi Li, Li-Quan Xiao, Jing-Hua Feng, Bin Xu, Jian Zhang. AquaSee: Predict Load and Cooling System Faults of Supercomputers Using Chilled Water Data[J]. Journal of Computer Science and Technology, 2020, 35(1): 221-230. DOI: 10.1007/s11390-019-1951-7
    [4]Rui-Tao Liu, Zuo-Ning Chen. A Large-Scale Study of Failures on Petascale Supercomputers[J]. Journal of Computer Science and Technology, 2018, 33(1): 24-41. DOI: 10.1007/s11390-018-1806-7
    [5]Xiang-Ke Liao, Can-Qun Yang, Tao Tang, Hui-Zhan Yi, Feng Wang, Qiang Wu, Jingling Xue. OpenMC:Towards Simplifying Programming for TianHe Supercomputers[J]. Journal of Computer Science and Technology, 2014, 29(3): 532-546. DOI: 10.1007/s11390-014-1447-4
    [6]Feng Wang , Can-Qun Yang, Yun-Fei Du, Juan Chen, Hui-Zhan Yi, Wei-Xia Xu. Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer[J]. Journal of Computer Science and Technology, 2011, 26(5): 854-865. DOI: 10.1007/s11390-011-0184-1
    [7]Ning-Hui Sun, Jing Xing, Zhi-Gang Huo, Guang-Ming Tan, Jin Xiong, Bo Li, Can Ma. Dawning Nebulae: A PetaFLOPS Supercomputer with a Heterogeneous Structure[J]. Journal of Computer Science and Technology, 2011, 26(3): 352-362. DOI: 10.1007/s11390-011-1138-3
    [8]Xue-Jun Yang, Xiang-Ke Liao, Kai Lu, Qing-Feng Hu, Jun-Qiang Song, Jin-Shu Su. The TianHe-1A Supercomputer: Its Hardware and Software[J]. Journal of Computer Science and Technology, 2011, 26(3): 344-351. DOI: 10.1007/s11390-011-1137-4
    [9]Lai Zhiyong, Zheng Shouqi. Simulation and Improvement of the Processing Subsystem of the Manchester Dataflow Computer[J]. Journal of Computer Science and Technology, 1995, 10(6): 557-563.
    [10]Huang Guoyong, Li Sanli. TSP: A Heterogeneous Multiprocessor Supercomputing System Based on i860XP[J]. Journal of Computer Science and Technology, 1994, 9(3): 285-288.

Catalog

    Article views (28) PDF downloads (2103) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return