[1] Ma K, Li X, Chen W et al. Green GPU: A holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In Proc. the 41st Int. Conf. Parallel Processing, September 2012, pp.48-57.[2] Lee J, Samadi M, Park Y et al. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In Proc. the 22nd Int. Conf. Parallel Architectures and Compilation Techniques, Sept. 2013, pp.245-255.[3] Lee J, Kim H. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture. In Proc. the 18th Int. Symp. High Performance Computer Architecture, February 2012, pp.91-102.[4] Borkar S. Thousand core chips: A technology perspective. In Proc. the 44th Conf. Design Automation, June 2007, pp.746-749.[5] Hoskote Y, Vangal S, Singh A et al. A 5-GHz mesh interconnect for a teraflops processor. IEEE Micro, 2007, 27(5): 51-61.[6] Owens J D, Dally W J, Ho R et al. Research challenges for on-chip interconnection networks. IEEE Micro, 2007, 27(5): 96-108.[7] Wentzlaff D, Griffin P, Hoffmann H et al. On-chip interconnection architecture of the tile processor. IEEE Micro, 2007, 27(5): 15-31.[8] Taylor M B, Lee W, Miller J et al. Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ILP and streams. ACM SIGARCH Computer Architecture News, 2004, 32(2): 2-13.[9] Moscibroda T, Mutlu O. A case for bufferless routing in on-chip networks. ACM SIGARCH Computer Architecture News, 2009, 37(3): 196-207.[10] Michelogiannakis G, Sanchez D, Dallv W J et al. Evaluating bufferless flow control for on-chip networks. In Proc. the 4th Int. Symp. Networks-on-Chip, May 2010, pp.9-16.[11] Jafri S A R, Hong Y J, Thottethodi M et al. Adaptive flow control for robust performance and energy. In Proc. the 43rd Int. Symp. Microarchitecture, December 2010, pp.433-444.[12] Nychis G P, Fallin C, Moscibroda T et al. On-chip networks from a networking perspective: Congestion and scalability in many-core interconnects. ACM SIGCOMM Computer Communication Review, 2012, 42(4): 407-418.[13] Fallin C, Craik C, Mutlu O. CHIPPER: A low-complexity bufferless deflection router. In Proc. the 17th Int. Symp. High Performance Computer Architecture, February 2011, pp.144-155.[14] Zhao H, Kandemir M, Ding W et al. Exploring heterogeneous NoC design space. In Proc. Int. Conf. ComputerAided Design, November 2011, pp.787-793.[15] Nilsson E. Design and implementation of a hot-potato switch in a network on chip [Master Thesis]. Royal Institute of Technology, Sweden, 2002.[16] Lee J, Li S, Kim H et al. Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures. ACM Trans. Design Automation of Electronic Systems, 2013, 18(4): 48:1-48:28.[17] Kim H, Kim Y, Kim J. Clumsy flow control for highthroughput bufferless on-chip networks. IEEE Computer Architecture Letters, 2013, 12(2): 47-50.[18] Kahng A B, Li B, Peh L S et al. ORION 2.0: A power-area simulator for interconnection networks. IEEE Trans. Very Large Scale Integration Systems, 2012, 20(1): 191-196.[19] Henning J L. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, 2006, 34(4): 1-17.[20] Che S, Boyer M, Meng J et al. Rodinia: A benchmark suite for heterogeneous computing. In Proc. Int. Symp. Workload Characterization, October 2009, pp.44-54.[21] Patil H, Cohn R, Charnev M et al. Pinpointing representative portions of large Intel® Itanium® programs with dynamic instrumentation. In Proc. the 37th Int. Symp. Microarchitecture, December 2004, pp.81-92.[22] Grot B, Hestness J, Keckler S W, Multu O. Express cube topologies for on-chip interconnects. In Proc. the 15th Int. Symp. High Performance Computer Architecture, February 2009, pp.163-174.[23] Balfour J, Dally W J, Black-Schaffer D et al. An energyefficient processor architecture for embedded systems. IEEE Computer Architecture Letters, 2008, 7(1):29-32. |