[1] Hu W, Wang J, Gao X, Chen Y, Liu Q, Li G.Godson-3: A scalable multi-core RISC processor with x86 emulationsupport. IEEE Micro, 2009, 29(2): 17-29.[2] Fan D R, Yuan N, Zhang J C et al. Godson-T: An efficientmany-core architecture for parallel program executions. Journal ofComputer Science and Technology, 2009, 24(6): 1061-1073.[3] Lv H, Cheng Y, Bai L, Chen M, Fan D, Sun N. P-GAS: Parallelizinga cycle-accurate event-driven many-core processor simulator using paralleldiscrete event simulation. In Proc. Workshop on Principle of Advancedand Distributed Simulation, Atlanta, USA, May 17-19, 2010, pp.1-8.[4] Tang D, Bao Y, Hu W, Chen M. DMA cache: Using on-chip storageto architecturally separate I/O data from CPU data for improving I/Operformance. In Proc. Int. Conf. High-Performance Computer Architecture,Bangalore, India, Jan.9-14, 2010, pp.1-12.[5] Long G, Franklin D, Biswas S, Ortiz P, Oberg J, Fan D, Chong F T.Minimal multi-threading: Finding and remo-ving redundant instructions inmulti-threaded processors. In Proc. IEEE/ACM Int. Symp. Microarchitecture,Atlanta, USA, Dec.4-8, 2010, pp.337-348.[6] Chen Y, Hu W, Chen T, Wu R. LReplay: A pending period baseddeterministic replay scheme. In Proc. Int. Symp. Computer Architecture,Saint-Malo, France, Jun.19-23, 2010, pp.187-197.end{multicolsbegin{multicols{2footnotesize[7] Su M, Chen Y, Gao X. A general method to make multi-clock systemdeterministic. In Proc. Conf. Design, Automation and Test in Europe,Dresden, Germany, Mar.8-12, 2010, pp.1480-1485.[8] Guo Q, Chen T, Chen Y, Zhou Z H, Hu W, Xu Z. Effective andefficient microprocessor design space exploration using unlabeled designconfigurations. In Proc. Int. Joint Conf. Artificial Intelligence,Spain, 2011. (To appear)[9] Xu D, Wu C, Yew P C. On mitigating memory bandwidth contentionthrough bandwidth-aware scheduling. In Proc. Int. Conf. ParallelArchitectures and Compilation Techniques, Vienna, Austria, Sept.11-15,2010, pp.237-247.[10] Chen L, Liu L, Tang S, Huang L, Jing Z, Xu S, Zhang D, Shou B.Unified parallel C for GPU clusters: Language extensions and compilerimplementation. In Proc. the 23rd International Workshop on Languagesand Compilers for Parallel Computing, Huston, USA, Oct.7-9, 2010, pp.151-165.[11] Wang L, Cui H, Duan Y, Lu F, Feng X, Yew P C. An adaptive taskcreation strategy for work-stealing scheduling. In Proc. Int. Conf.Code Generation and Optimization, Toronto, Canada, Apr.24-28, 2010, pp.266-277.[12] Liu L, Chen L, Wu C Y, Feng X B. Global tiling for communicationminimal parallelization on distributed memory systems. In Proc. Int.Euro-Par Conf. Parallel Processing, Klagenfurt, Austria, Aug.26-29, 2008,pp.382-391.[13] Chen Y, Huang Y, Eeckhout L, Fursin G, Peng L, Temam O, Wu C.Evaluating iterative optimization across 1000 data sets. In Proc. Conf.Programming Language Design and Implementation, Toronto, Canada, Jun.5-10,2010, pp.448-459.[14] Yu T, Xue J, Huo W, Feng X, Zhang Z. Level by level: Makingflow- and context-sensitive pointer analysis scalable for millions of linesof code. In Proc. Int. Conf. Code Generation and Optimization,Toronto, Canada, Apr.24-28, 2010, pp.218-229.[15] Wang Z, Wu C. Yew P C. On improving heap memory layout bydynamic pool allocation. In Proc. Int. Conf. Code Generation andOptimization, Toronto, Canada, Apr.24-28, 2010, pp.92-100.[16] Li J, Wu C, Hsu W C. An evaluation of misaligned data accesshandling mechanisms in dynamic binary translation systems. In Proc.Int. Conf. Code Generation and Optimization, Seattle, USA, Mar.22-25, 2009, pp.180-189.[17] Lv F, Wang L, Feng X, Li Z, Zhang Z. Exploiting idle registerclasses for fast spill destination. In Proc. Int. Conf. Supercomputing,Island of Kos, Greece, Jun.7-12, 2008, pp.319-326.[18] Zhang L, Han Y, Xu Q, Li X, Li H. On topology reconfigurationfor defect-tolerant NoC-based homogeneous manycore systems. IEEETrans. VLSI Systems, 2009, 17(9): 1173-1186.[19] Yan G, Liang X, Han Y, Li X. Leveraging the core-levelcomplementary effects of PVT variations to reduce timing emergencies inmulti-core processors. In Proc. Int. Symp. Computer Architecture,Saint-Malo, France, Jun.19-23, 2010, pp.485-496.[20] Pan S, Hu Y, Li X. IVF: Characterizing the vulnerability ofmicroprocessor structures to intermittent faults. In Proc. Conf.Design, Automation and Test in Europe, Dresden, Germany, Mar.8-12, 2010, pp.238-243.[21] Hu W, Wang R, Chen Y, Fan B, Zhong S, Gao X, Qi Z, Yang X.Godson-3B: A 1,GHz 40,W 8-Core 128,GFlops processor in 65,nm CMOS. In Proc. Int. Solid-State Circuits Conference, 2011. (To appear)[22] Zhang M, Li H, Li X. Path delay test generation toward activationof worst case coupling effects. IEEE Transactions on Very Large ScaleIntegration Systems, 2010, 18(12): 1-14.[23] Han Y, Hu Y, Li X, Li H, Chandra A. Embedded test decompressorto reduce the required channels and vector memory of tester for complexprocessor circuit. IEEE Transactions on Very Large Scale IntegrationSystems, 2007, 5(15): 531-540.[24] Wang D, Hu Y, Li H, Li X. The design-for-testability featuresand test implementation of a giga hertz general purpose microprocessor. Journal of Computer Science and Technology, 2008, 23(6): 1037-1046.[25] Chen Y, Lv Y, Hu W, Chen T, Shen H, Wang P, Pan H. Fast completememory consistency verification. In Proc. Int. Symp. High-PerformanceComputer Architecture, Raleigh, USA, Feb.14-18, 2009, pp.381-392.[26] Hu W, Chen Y, Chen T, Qian C, Li L. Linear time memoryconsistency verification. IEEE Transactions on Computers, 2011. (Accepted)[27] Li L, Chen T, Chen Y, Li L, Qian C, Hu W. Brief announcement:Program regularization in verifying memory consistency. In Proc. Symp.Parallelism in Algorithms and Architectures, San Jose, USA, Jun.4-6,2011. (To appear)[28] Guo Q, Chen T, Shen H, Chen Y, Wu Y, Hu W. Empirical designbugs prediction for verification. In Proc. Conf. Design, Automationand Test in Europe, Grenoble, France, Mar.14-18, 2011, pp.1-6.[29] Zhang T, Lv T, Li X. An abstraction-guided simulation approachusing Markov models for microprocessor verification. In Proc. Conf.Design, Automation and Test in Europe, Dresden, Germany, Mar.8-12, 2010,pp.484-489.[30] Hu W, Wang J, Gao X, Chen Y. Micro-architecture of Godson-3multi-core processor. In Proc. Symp. High Performance Chips, StanfordUniversity, USA, Aug.24-26, 2008.[31] Gao X, Chen Y J, Wang H D et al. System architecture ofGodson-3 multi-core processors. Journal of Computer Science andTechnology, 2010, 25(2): 181-191.[32] Hu W, Chen Y. GS464V: A high-performance low-power XPU with512-bit vector extension. In Proc. Symp. High Performance Chips,Aug.22-24, Stanford University, USA, 2010. |