Special Issue: Computer Architecture and Systems

• Architecture and High Performance Computer Systems • Previous Articles     Next Articles

Making Effective Decisions in Computer Architects Real-World: Lessons and Experiences with Godson-2 Processor Designs

Wei-Wu Hu1 and Jian Wang1,2   

  1. 1Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China 2Graduate University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2008-04-21 Revised:2008-05-19 Online:2008-07-10 Published:2008-07-10

Although the design of many kinds of microprocessors has been under developing for several decades, the computer architecture R\&D community lacks well documented lessons and experiences about design decisions in the research literature. In this paper, we systematically present the design decisions we made during the designing and prototyping of Godson-2 series processors. The 250MHz Godson-2B, 450MHz Godson-2C, and 1GHz Godson-2E processors that implement 64-bit, four-issue, out-of-order architecture were taped out in 2003, 2004, and 2005, respectively. Each processor triples its predecessor in the SPEC CPU2000 rates. Our first-hand experiences and lessons gained from these designs would provide unique perspectives and insights that are not available in any existing text books and/or published papers. We summarize 10 critical lessons and experiences based on hundreds of our attempts at architectural and design optimizations for performance improvement of Godson-2 series processors. The issues include silicon-simulation correlation, design balancing, performance optimizing, and pico-architecture tuning. We conclude that persistent improvement, attitude towards work-on-silicon design, and insightful understanding of software and fabrication process are the three most important factors for designing a high performance processor with low energy consumption.

Key words: multidimensional transaction database; data mining; Ndimensionalinter-transaction association rules (NDITAR); generalized multidimensional association rules (GMDAR);


[1] Weiwu Hu, Zhimin Tang. Microarchitecture design of the Godson-1 processor. {\it Chinese Journal of Computers}, April 2003, 26(4): 385--396. (in Chinese)
[2]} Weiwu Hu, Fuxin Zhang, Zusong Li. Microarchitecture of the Goodson-2 Processor. {\it Journal of Computer Science and Technology}, March 2005, 20(2): 243--249.
[3]} Wei-Wu Hu, Ji-Ye Zhao, Shi-Qiang Zhong, Xu Yang, Elio Guidetti, Chris Wu. Implementing a 1GHz four-issue out-of-order execution microprocessor in a standard cell ASIC methodology. {\it Journal of Computer Science and Technology}, January 2007, 22(1): 1--14.
[4]} Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, P Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. {\it In Proc. the 14th Int. Symp. High Performance Computer Architecture $($HPCA'08$)$}, Salt Lake City, Utah, February 16--20, 2008.
[5]} David A. Patterson, John L. Hennessy. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, Inc., 1996.
[6]} Kenneth C Yeager. The MIPS R10000 superscalar microprocessor. {\it IEEE Micro}, April 1996, 16(2): 28--41.
[7]} Zhang Fuxin. Performance analysis and optimizations of microprocessors [Dissertation]. Institute of Computing Technology, Chinese Academy of Sciences, 2005.
[8]} Llcbench. http://icl.cs.utk.edu/projects/llcbench.
[9]} Lin Wei. Improving performance of Linux memory management on Godson2 system [Thesis]. Institute of Computing Technology, Chinese Academy of Sciences, 2006.
[10]} Shiwen Hu, Lizy K. John. Avoiding store misses to fully modified cache blocks. Technical Report: TR-030701-01, The University of Texas at Austin, July 2003.
[11]} Huan Dandan. Research on high performance cache and memory system. [Dissertation] Institute of Computing Technology, Chinese Academy of Sciences, 2006.
[12]} Allen D, Dhong S, Hofstee H, Leenstra J, Nowka K, Stasiak D, Wendel D. Custom circuit design as a driver of microprocessor performance. {\it IBM Journal of Research and Development}, November 2000, 44(6): 799--822.
[13]} Eric Sprangle, Doug Carmean. Increase processor performance by implementing deeper pipelines. {\it In Proc. the 29th Int. Symp. Computer Architecture}, Anchorage, Alaska, May 25--29, 2002, pp.25--34.
[1] Hui-Na Chao, Hua-Wei Li, Xiaoyu Song, Tian-Cheng Wang, Xiao-Wei Li. Evaluating and Constraining Hardware Assertions with Absent Scenarios [J]. Journal of Computer Science and Technology, 2020, 35(5): 1198-1216.
[2] De-Fu Lian, Qi Liu. Jointly Recommending Library Books and Predicting Academic Performance: A Mutual Reinforcement Perspective [J]. , 2018, 33(4): 654-667.
[3] Guo-Wei Wang, Jin-Dou Zhang, Jing Li. Complete Your Mobility: Linking Trajectories Across Heterogeneous Mobility Data Sources [J]. , 2018, 33(4): 792-806.
[4] Yu-Geng Song, Hui-Min Cui, Xiao-Bing Feng. Parallel Incremental Frequent Itemset Mining for Large Data [J]. , 2017, 32(2): 368-385.
[5] Shi-Ming Guo, Hong Gao. HUITWU: An Efficient Algorithm for High-Utility Itemset Mining in Transaction Databases [J]. , 2016, 31(4): 776-786.
[6] Ke-Yan Cao, Guo-Ren Wang, Dong-Hong Han, Guo-Hui Ding, Ai-Xia Wang, and Ling-Xu Shi. Continuous Outlier Monitoring on Uncertain Data Streams [J]. , 2014, 29(3): 436-448.
[7] Philip Leroux, Student Member, IEEE, Bart Dhoedt, Member, IEEE, Piet Demeester, Fellow, IEEE, and Filip De Turck, Senior Member, IEEE. Performance Characterization of Game Recommendation Algorithms on Online Social Network Sites [J]. , 2012, 27(3): 611-623.
[8] Jun-Qiang Liu (刘君强). Publishing Set-Valued Data Against Realistic Adversaries [J]. , 2012, 27(1): 24-36.
[9] Xiu-Li Ma (马秀莉), Hai-Feng Hu (胡海峰), Shuang-Feng Li (李双峰), Hong-Mei Xiao (肖红梅), Qiong Luo (罗琼), Dong-Qing Yang (杨冬青), Member,CCF, and Shi-Wei Tang (唐世渭), Senior Member, CCF. DHC: Distributed, Hierarchical Clustering in Sensor Networks [J]. , 2011, 26(4): 643-662.
[10] Yuan Jiang (姜远), Member, CCF, Ming Li (黎铭), Member, CCF, ACM, IEEE, and Zhi-Hua Zhou (周志华), Senior Member, CCF, IEEE, <. Software Defect Detection with ROCUS [J]. , 2011, 26(2): 328-342.
[11] Ming-Wei Zhang (张明卫), Member, CCF, Bin Zhang (张斌), Senior Member, CCF, Ying Liu (刘莹), Jun Na (那俊) and Zhi-Liang Zhu (朱志良), Senior Member, CCF. Web Service Composition Based on QoS Rules [J]. , 2010, 25(6): 1143-1156.
[12] Chong Long(龙 翀), Min-Lie Huang(黄民烈), Xiao-Yan Zhu(朱小燕), Member, CCF and Ming Li(李 明), Fellow, ACM, IEEE. A New Approach for Multi-Document Update Summarization [J]. , 2010, 25(4): 739-749.
[13] Mohamed Farouk Abdel Hady and Friedhelm Schwenker. Combining Committee-Based Semi-Supervised Learning and Active Learning [J]. , 2010, 25(4): 681-698.
[14] Charu C. Aggarwal, Member, ACM, Fellow, IEEE, Chen Chen, and Jiawei Han, Fellow, ACM, IEEE. The Inverse Classification Problem [J]. , 2010, 25(3): 458-468.
[15] Xin-Dong Wu, Senior Member, IEEE, Xing-Quan Zhu, Member, ACM, IEEE, Qi-Jun Chen, and Fei-Yue Wang, Member, ACM, Fellow, IEEE. Ubiquitous Mining with Interactive Data Mining Agents [J]. , 2009, 24(6): 1018-1027.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Feng Yulin;. Hierarchical Protocol Analysis by Temporal Logic[J]. , 1988, 3(1): 56 -69 .
[2] Cai Zixing;. An Expert System for Robot Transfer Planning[J]. , 1988, 3(2): 153 -160 .
[3] Zheng Chongxun; Zhang Kenong;. Orthogonal Algorithm of Logic Probability and Syndrome-Testable Analysis[J]. , 1990, 5(2): 203 -209 .
[4] Liao Xianzhi; Jin Lan;. A Mechanism Supporting the Client/Server Relationship in the Operating System of Distributed System “THUDS”[J]. , 1991, 6(3): 256 -262 .
[5] Tan Jianrong; Zheng Jianmin; Peng Qunsheng;. A Unified Algorithm for Finding the Intersection Curve of Surfaces[J]. , 1994, 9(2): 107 -116 .
[6] Xu Jianguo; Wei Wenxin;. GUIDS: A Graphical User Interface Development System in UniECAD[J]. , 1994, 9(4): 342 -348 .
[7] Chen Yiyun;. Head Boundedness of Nonterminating Rewritings[J]. , 1995, 10(3): 281 -284 .
[8] Zhang Zhong;. Simulation of ATPG Neural Network and Its Experimental Results[J]. , 1995, 10(4): 310 -324 .
[9] Sun Yufang;. Hanzix and Chinese Open System Platform[J]. , 1997, 12(3): 283 -288 .
[10] Zhang Yongyue; Peng Zhenyun; You Suya; Xu Guangyou;. A Multi-View Face Recognition System[J]. , 1997, 12(5): 400 -407 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved