\REF{[1]} Susan L Graham, Marc Snir, Cynthia A Patterson. Getting up to speed: The future of supercomputing. Committee on the Future of Supercomputing, National Research Council. \REF{[2]}PITAC report. Computational science: Ensuring America's co\-mpetitiveness. http://www.nitrd.gov/pitac/reports/200506 09\_compu\-tational/compu\-tational.pdf \REF{[3]} Cray History. http://www.cray.com/about\_cray/history.html. \REF{[4]} CM-5 at UC Berkeley. http://www.eecs.berkeley.edu/Resea\-rch /Projects/CS/parallel/cm5.html. \REF{[5]} The development road of Chinese supercomputer. http://www.dawning.com.cn/4000A/test\_gx\_1.htm. (in Chinese) \REF{[6]} http://www.nti.org/e\_research/profiles/China/Chemical/in\-dex.html. \REF{[7]} ASCI Red SiteMap.http://www.sandia.gov/ASCI/Red/Site\-Map.htm. \REF{[8]} http://www.top500.org/. \REF{[9]} The earth simulator center.http://www.es.jamstec.go.jp/. \REF{[10]} BlueGene. http://www.research.ibm.com/bluegene/. \REF{[11]} John L. Gustafson. Reevaluating Amdahl's law. {\it Communications of the ACM}, May 1988, 31(5): 532--533. \REF{[12]}David Culler, Richard Karp, David Patterson \it et al. \rm LogP: Towards a realistic model of parallel computation. In \it Proc. the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, \rm New York, ACM Pres, 1993, pp.1--12. \REF{[13]}Michael J Quinn. Parallel programming in C with MPI and OpenMP. USA: McGraw-Hill, May 2003. \REF{[14]}Neil H E Weste, David Harris. CMOS VLSI design: A Circuits and Systems Perspective. 3rd Edition, USA: Addison-Wesley, May 2004. \REF{[15]}Jose Duato, Sudhakar Yalamanchili Lionel Ni. Interconnection Networks: An Engineering Approach. 2nd Edition, Morgan Kaufmann Publishers, 2002. \REF{[16]} Rajkumar Buyya. High Performance Cluster Computing Architectures and Systems, Volume 1. Prentice Hall, May 1999. \REF{[17]} Scientific computing and visualization. http://scv.bu.edu/. \REF{[18]}Francine Berman, Geoffrey C Fox, Tony Hey. Grid Computing: Making the Global Infrastructure a Reality. John Wiley and Sons, May 2003. \REF{[19]} Xubang Shen, Xixin Cao. The selection of models for LS MPP. \it Chinese Journal of Computers, \rm 1997, 20(5): 385--390. (in Chinese) \REF{[20]} Li li, Xubang Shen. The design of LS SIMD array microprocessor control logic. \it Chinese Journal of Computers, \rm 2000, 23(5): 557--560. (in Chinese) \REF{[21]} Caoyang Chen, Zhong Wang, Xubang Shen \it et al. \rm The LS MPP parallel image processor. \it Chinese Journal of Computers, \rm 2002, 25(3): 292--296. (in Chinese) \REF{[22]} Ying Zhang, Wei Huang, Qunsheng Ma, Sanli Li. The design and implementation of hierarchical parallel system: MP860 supercomputer. \it Chinese Journal of Computers, \rm 1998, 21(z1): 230--236. (in Chinese) \REF{[23]} Ling Qiao, Zhizhong Tang, Hongbo Rong, Chihong Zhang. The model of instruction level parallel program execution. \it Chinese Journal of Computers, \rm 1999, 22(5): 476--480. (in Chinese) \REF{[24]} Gang Xiao, Xingming Zhou, Ming Xu, Kun Deng. SMA: A speculative multithreaded architecture. \it Chinese Journal of Computers, \rm 1999, 22(6): 582--590. (in Chinese) \REF{[25]} Yunquan Zhang. DRAM(h): A parallel computation model for high performance numerical computing. \it Chinese Journal of Computers, \rm 2003, 26(12): 1660--1670. (in Chinese) \REF{[26]} Weiwu Hu, Peisu Xia. Out-of-order execution in sequentially consistent shared memory systems: Principles. \it Chinese Journal of Computers, \rm 1997, 20(6): 481--490. (in Chinese) \REF{[27]} Weiwu Hu, Peisu Xia. Out-of-order execution in sequentially consistent shared memory systems: Simulation results. \it Chinese Journal of Computers, \rm 1997, 20(6): 491--500. (in Chinese) \REF{[28]} Xianghui Xie, Chengde Han, Zhimin Tang. Data pre sending technique in distributed shared memory systems. \it Chinese Journal of Computers, \rm 1999, 22(3): 241--248. (in Chinese) \REF{[29]} Weiwu Hu, Weisong Shi, Zhimin Tang. A software DSM system based on a new cache coherence protocol. \it Chinese Journal of Computers, \rm 1999, 22(5): 467--475. (in Chinese) \REF{[30]} Huadong Dai, Xuejun Yang. An operating system-centric memory consistency model --- Thread consistency model. \it Journal of Computer Research and Development, \rm 2003, 40(2): 351--359. \REF{[31]} Yong Dou, Xingming Zhou. A software controlled data prefetching scheme based on weak order consistency model. \it Journal of Software, \rm 1997, 8(2): 81--86. \REF{[32]} Rong Zeng, Xiangjun Dong, Mingfa Zhu. Wormhole routing and its chip design. \it Chinese Journal of Computers, \rm 1997, 20(5): 404--411. (in Chinese) \REF{[33]} Feng Gao, Zhongcheng Li, Yinghua Min, Jie Wu. A fault-tolerant routing strategy based on extended safety vectors in hypercube multicomputers. \it Chinese Journal of Computers, \rm 2000, 23(3): 248--254. (in Chinese) \REF{[34]} Jianfeng Wu, Shanli Li, Yi Ge. Message memory network interface design in network parallel computing. \it Chinese Journal of Computers, \rm 2000, 23(2): 195--201. (in Chinese) \REF{[35]} Jun Shen, Weimin Zheng, Dapeng Ju. FMP: A fast message passing for workstation clusters. \it Chinese Journal of Computers, \rm 1998, 21(7): 595--602. (in Chinese) \REF{[36]} Zuo-ning Chen, Yi-lian Jin. A parallel operating system based on multi-virtual-space and multi-mapping technology. \it Journal of Software, \rm 2001, 12(10): 1562--1568. \REF{[37]} Ning-Hui Sun, Zhi-wei Xu. Design of system software of Dawning/2000 supercomputer. \it Chinese Journal of Computers, \rm 2000, 23(1): 9--20. (in Chinese) \REF{[38]} Dan Meng, Jian-feng Zhan, Lei Wang \it et al. \rm Fully integrated cluster operating system: Phoenix. \it Journal of Computer Research and Development, \rm 2005, 42(6): 979--986. (in Chinese) \REF{[39]} Hua-ping Chen, Liu-sheng Huang. Processor selection policy in heuristic task scheduling. \it Journal of Software, \rm 1999, 10(11): 1194--1198. (in Chinese) \REF{[40]} Jin-gui Huang, Jian-er Chen, Song-qiao Chen. Parallel-job scheduling on cluster computing systems. \it Chinese Journal of Computers, \rm 2004, 27(6): 765--771. (in Chinese) \REF{[41]} Qing-hua Li, Jian-jun Han, Abbas A Essa. A fast and effective static task scheduling algorithm in homogeneous computing environments. \it Journal of Computer Research and Development, \rm 2005, 42(1): 118--125. (in Chinese) \REF{[42]} Qiang Fu, Wei-min Zheng. A dynamic task scheduling method in cluster of workstations. \it Journal of Software, \rm 1999, 10(1): 19--23. (in Chinese) \REF{[43]} Hao Huang, Jian-cheng Du, Dao-xu Chen, Li Xie. Optimum degree of parallelism-based task dependence graph scheduling scheme. \it Journal of Software, \rm 1999, 10(10): 1038--1046. (in Chinese) \REF{[44]} Zhou Lei, Zhi-wei Xu, Ming-fa Zhu. A new adaptive processor allocation algorithm for cluster: Limited load balancing allocation (LLBA). \it Chinese Journal of Computers, \rm 1999, 22(8): 877--881. (in Chinese) \REF{[45]} Nong Xiao, Yu-tong Lu, Xi-cheng Lu. A dynamic load distributing algorithm based on a parallel computing network environment. \it Journal of Computer Research and Development, \rm 1999, 36(2): 238--241. (in Chinese) \REF{[46]} Zhi-yan Jin, Ding-xing Wang. Diffusion algorithm of dynamic load balancing for heterogeneous system. \it Chinese Journal of Computers, \rm 2003, 26(11): 1487--1493. (in Chinese) \REF{[47]} Yan-zhi Wen, Rui-qi Lian, Cheng-yong Wu \it et al. \rm A micro-scheduling method on directed cyclic graph. \it Journal of Computer Research and Development, \rm 2005, 42(3): 387--393. (in Chinese) \REF{[48]} Jin-Wei Hong, Guo-liang Chen, Zhao-qing Zhang, Feng Zhang. Compiling-support communication optimizations for SVMs. \it Chinese Journal of Computers, \rm 2000, 23(7): 738--743. (in Chinese) \REF{[49]} Rui-qi Lian, Zhao-qing Zhang, Ru-liang Qiao. A data prefetching method used in ILP compilers and its optimization. \it Chinese Journal of Computers, \rm 2000, 23(6): 576--584. (in Chinese) \REF{[50]} Rui-qi Lian, Cheng-yong Wu, Zhao-qing Zhang. Integrating code optimization and instruction scheduling. \it Chinese Journal of Computers, \rm 2001, 24(7): 694--701. (in Chinese) \REF{[51]} Yun-zhao Lu, Zhao-qing Zhang, Qui-qi Lian. Predicate analysis techniques in ILP. \it Chinese Journal of Computers, \rm 2003, 26(10): 1337--1342. (in Chinese) \REF{[52]} Wenlong Li, Haibo Lin, Zhizhong Tang. Cost model and decision framework for software pipelining. \it Journal of Software, \rm 2004, 15(7): 1005--1011. (in Chinese) \REF{[53]} Haibo Lin, Wenlong Li, Zhizhong Tang. Research on register requirements of software pipelined loops in the IA-64 architecture. \it Journal of Computer Research and Development, \rm 2004, 41(1): 22--27. (in Chinese) \REF{[54]} Li Liu, Wenlong Li, Zhenyu Gu, Shengmei Li, Zhizhong Tang. Optimization to prevent cache penalty in modulo scheduling. \it Journal of Software, \rm 2005, 16(10): 1842--1852. (in Chinese) \REF{[55]} Jun Xia, Xuejun Yang, Lifang Zeng, Haifang Zhou. A projection-delamination based approach to optimizing spatial locality in loop nests. \it Chinese Journal of Computers, \rm 2003, 26(5): 539--551. (in Chinese) \REF{[56]} Jun Xia, Huadong Dai, Xuejun Yang. A linear expressing based approach for optimizing locality using non-singular loop transformations. \it Chinese Journal of Computers, \rm 2003, 26(12): 1609--1620. (in Chinese) \REF{[57]} Jun Xia, Xuejun Yang. A data space fusion based approach for global computation and data decompositions. \it Journal of Software, \rm 2004, 15(9): 1311--1327. (in Chinese) \REF{[58]} Guokai Ma, Xin Wang, Peng Wwang \it et al. \rm Increase parallel granularity and data locality by unimodular metrics. \it Chinese Journal of Computers, \rm 2004, 27(4): 516--523. (in Chinese) \REF{[59]}Lifang Zeng, Xuejun Yang, Jun Xia, Juan Chen. Improving data locality and reducing false-sharing based on data fusion. \it Chinese Journal of Computers, \rm 2005, 27(1): 32--41. (in Chinese) \REF{[60]}Yijun Yu, Binyu Zang, Wu Shi, Chuanqi Zhu. Automatically computing unimodular transforming matrix to parallelize nested sequential loops. \it Journal of Software, \rm 1999, 10(4): 366--371. (in Chinese) \REF{[61]}Jianping Wang, Xu Cheng, Wenkui Ding \it et al. \rm The implementation strategy of communication in HPF compiler and related algorithms. \it Chinese Journal of Computers, \rm 1999, 22(5): 486--496. (in Chinese) \REF{[62]}Li Chen, Zhaoqing Zhang, Xiaobing Feng. Redundant computation partitioning in distributed-memory systems. \it Chinese Journal of Computers, \rm 2003, 26(2): 180--187. (in Chinese) \REF{[63]}Bo Yang, Dingxing Wang, Weimin Zheng. An algorithm on task scheduling in structural parallel control mechanism. \it Journal of Software, \rm 2001, 12(5): 698--705. (in Chinese) \REF{[64]}Qiang Liu, Zhaoqing Zhang, Ruliang Qiao. An integrated tool for debugging, monitoring and performance analysis. \it Journal of Software, \rm 1999, 10(2): 220--224. (in Chinese) \REF{[65]}Jian Liu, Hao Wang, Meiming Sheng, Weimin Zheng. A parallel debugger with fast conditional breakpoint. \it Journal of Software, \rm 2003, 14(11): 1827--1833. (in Chinese) \REF{[66]} Chao Yan, Taoying Liu, Guoliang Chen. A parallel debugger based on cluster operating system. \it Journal of Computer Research and Development, \rm 2004, 41(4): 630--636. (in Chinese) \REF{[67]} Zhiwei Xu, Wei Li. Research on Vega grid architecture. \it Journal of Computer Research and Development, \rm 2002, 39(8): 923--929. (in Chinese) \REF{[68]} Xicheng Lu, Huaimin Wang, Ji Wang. Internet-based virtual computing environment (iVCE): Concepts and architecture. \it Science in China, Series E, \rm 2006, 36(10). (To appear) \REF{[69]}Dongsheng Li, Xicheng Lu. A novel constant degree and constant congestion DHT scheme for peer-to-peer networks. \it Science in China, \rm 2005, 48(4): 421--436. \REF{[70]} Hai Zhuge. Semantic grid: Scientific issues, infrastructure, and methodology. \it Communications of the ACM, \rm 2005, 48(4): 117--119. \REF{[71]} HPCS program. http://www.highproductivity.org/. \REF{[72]} National energy research scientific computing center 2004 annual report. National Energy Research Scientific Computing Center, 2005, http://www.nersc.gov/news/annual\_reports/an\-nre\-p04/annrep04.pdf. \REF{[73]} Wulf W, McKee S. Hitting the memory wall: Implications of the obvious. \it Computer Architecture News, \rm 1995, 23(1): 20--24.
|