|  Yan G, Li Y, Han Y, Li X, Guo M, Liang X. AgileRegulator: A hybird voltage regulator scheme redeeming dark silicon for power effciency in a multicore architecture. In Proc. the 18th International Symposium on High Performance Com-puter Architecture, Feb. 2012, pp.287-298. Fu B, Han Y, Ma J, Li H, Li X. An abacus turn model for time/space-effcient reconfigurable routing. In Proc. the 38th International Symposium on Computer Architecture, June 2011, pp.259-270. Hameed R, Qadeer W, Wachs M, Azizi O, Solomatnikov A, Lee B C, Richardson S, Kozyrakis C, Horowitz M. Under-standing sources of ineffciency in general-purpose chips. In Proc. the 37th Annual International Symposium on Com-puter Architecture, June 2010, pp.37-47. Cong J, Grigorian B, Reinman G, Vitanza M. Accelerating vision and navigation applications on a customizable plat-form. In Proc. the 22nd IEEE International Conference on Application-Specific Systems, Architectures and Processors, Sept. 2011, pp.25-32. Auras D, Girbal S, Berry H et al. CMA: Chip multi-accelerator. In Proc. the 8th IEEE Symposium on Appli-cation Specific Processors, June 2010, pp.8-15. Girbal S, Temam O, Yehia S, Berry H, Li Z. A memory inter-face for multi-purpose multi-stream accelerators. In Proc. the 13rd International Conference on Compilers, Architectures and Synthesis for Embedded Systems, October 2010, pp.107-116. Chien A A, Snavely A, Gahagan M. 10×10: A general-purpose architectural approach to heterogeneity and energy effciency. In Proc. the 11th International Conference on Computational Science, June 2011, pp.1987-1996. Yoon D H, Jeong M K, Erez M. Adaptive granularity memory systems: A tradeoff between storage effciency and through-put. In Proc. the 38th Annual International Symposium on Computer Architecture, June 2011, pp.295-306. Rosenfeld P, Cooper-Balis E, Jacob B. DRAMSim2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011, 10(1): 16-19. Seznec A. Decoupled sectored caches: Conciliating low tag implementation cost. In Proc. the 21st Annual International Symposium on Computer Architecture, Apr. 1994, pp.384-393. Kumar S, Zhao H, Shriraman A, Matthews E, Dwarkadas S, Shannon L. Amoeba-cache: Adaptive blocks for eliminating waste in the memory hierarchy. In Proc. the 45th Annual In-ternational Symposium on Microarchitecture, December 2012, pp.376-388. Ahn J H, Leverich J, Schreiber R, Jouppi N P. Multicore DIMM: An energy effcient memory module with indepen-dently controlled DRAMs. IEEE Computer Architecture Let-ters, 2009, 8(1): 5-8. Udipi A N, Muralimanohar N, Chatterjee N, Balasubramo-nian R, Davis A, Jouppi N P. Rethinking DRAM design and organization for energy-constrained multi-cores. In Proc. the 37th Annual International Symposium on Computer Archi-tecture, June 2010, pp.175-186. Kim J S, Oh C S, Lee H et al. A 1.2V 12.8 GB/s 2Gb mobile Wide-I/O DRAM with 4×128 I/Os using TSV-based stack-ing. In Proc. the International Solid-State Circuits Confer-ence, February 2011, pp.496-498. Liu C, Zhang L, Han Y, Li X. Vertical interconnects squeezing in symmetric 3D mesh network-on-Chip. In Proc. the 16th Asia and South Pacific Design Automation Conference, Jan. 2011, pp.357-362 Wang Y, Zhang L, Han Y, Li H, Li X. FlexMemory: Exploit-ing and managing abundant off-chip optical bandwidth. In Proc. Design, Automation and Test in Europe, March 2011, pp.968-973 Rafique N, Lim W, Thottethodi M. Effective management of DRAM bandwidth in multicore processors. In Proc. the 16th International Conference on Parallel Architectures and Compilation Techniques, Sept. 2007, pp.245-258. Bitirgen R, Ipek E, Martinez J. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proc. the 41st IEEE/ACM International Symposium on Microarchitecture, Nov. 2008, pp.318-329. Liu F, Jiang X, Solihin Y. Understanding how off-chip mem-ory bandwidth partitioning in chip multiprocessors affects sys-tem performance. In Proc. the 16th IEEE International Sym-posium on High Performance Computer Architecture, Jan-uary 2010. Muralidhara S P, Subramanian L, Mutlu O et al. Reduc-ing memory interference in multicore systems via application-aware memory channel partitioning. In Proc. the 44th Inter-national Symposium on Microarchitecture, December 2011, pp.374-385. Liu L, Cui Z, Xing M, Bao Y, Chen M, Wu C. A software memory partition approach for eliminating bank-level interfe-rence in multicore systems. In Proc. the 21st International Conference on Parallel Architectures and Compilation Tech-niques, August 2012, pp.367-376. Thiebaut D, Stone H S. Footprints in the cache. ACM Trans. Computer Systems, 1987, 5(4): 305-329. Sudan K, Chatterjee N, Nellans D, Awasthi M, Balasubramo-nian R, Davis A. Micro-pages: Increasing DRAM effciency with locality-aware data placement. In Proc. the 15th Edi-tion of ASPLOS on Architectural Support for Programming Languages and Operating systems, March 2010, pp.219-230. Luk C K, Cohn R, Muth R et al. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. the 10th International Conference on Programming Language Design and Implementation, June 2005, pp.190-200.