[1] Lai A C, Fide C, Falsafi B. Dead-block prediction & deadblock correlating prefetchers. In Proc. the 28th ISCA, Jun. 2001, pp.144-154.[2] Qureshi M K, Patt Y N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proc. the 39th MICRO, Dec. 2006, pp.423-432.[3] Qureshi M K, Jaleel A, Patt Y N, Steely Jr S C, Emer J. Adaptive insertion policies for high performance caching. In Proc. the 34th ISCA, Jun. 2007, pp.381-391.[4] Kharbutli M, Solihin D. Counter-based cache replacement and bypassing algorithms. IEEE Transactions on Computers, 2008, 57(4): 433-447.[5] Xie Y, Loh G H. PIPP: Promotion/insertion pseudopartitioning of multi-core shared caches. In Proc. the 36th ISCA, Jun. 2009, pp.174-183.[6] Jaleel A, Theobald K B, Steely Jr S C, Emer J. High performance cache replacement using re-reference interval prediction (RRIP). In Proc. the 37th ISCA, Jun. 2010, pp.60-71.[7] Khan S M, Tian Y, Jimenez D A. Sampling dead block prediction for last-level caches. In Proc. the 43rd MICRO, Dec. 2010, pp.175-186.[8] Wu C J, Jaleel A, Hasenplaugh W, Martonosi M, Steely Jr S C, Emer J. SHiP: Signature-based hit predictor for high performance caching. In Proc. the 44th MICRO, Dec. 2011, pp.430-441.[9] Xie Y. Modeling, architecture, and applications for emerging memory technologies. IEEE Design & Test of Computers, 2011, 28(1): 44-51.[10] Loh G H, Hill M D. Effciently enabling conventional block sizes for very large die-stacked DRAM caches. In Proc. the 44th MICRO, Dec. 2011, pp.454-464.[11] Kroft D. Lockup-free instruction fetch/prefetch cache organization. In Proc. the 8th ISCA, May 1981, pp.81-87.[12] Vanderwiel S P, Lilja D J. Data prefetch mechanisms. ACM Comput. Surv., 2000, 32(2): 174-199.[13] Jeong J, Dubois M. Optimal replacements in caches with two miss costs. In Proc. the 11th SPAA, Jun. 1999, pp.155-164.[14] Jeong J, Stenström P, Dubois M. Simple penalty-sensitive replacement policies for caches. In Proc. the 3rd CF, May 2006, pp.341-352.[15] Ju R D C, Lebeck A R, Wilkerson C. Locality vs. criticality. In Proc. the 28th ISCA, Jun. 2001, pp.132-143.[16] Qureshi M K, Lynch D N, Mutlu O, Patt Y N. A case for MLP-aware cache replacement. In Proc. the 33rd ISCA, Jun. 2006, pp.167-178.[17] Sheikh R, Kharbutli M. Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms. In Proc. the 28th ICCD, Oct. 2010, pp.7683.[18] Kharbutli M, Sheikh R. LACS: A locality-aware cost-sensitive cache replacement algorithm. IEEE Transactions on Computers, 2013, 63(8): 1975-1987.[19] Chaudhuri M. Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches. In Proc. the 42nd MICRO, Dec. 2009, pp.401-412.[20] Keramidas G, Petoumenos P, Kaxiras S. Cache replacement based on reuse-distance prediction. In Proc. the 25th ICCD, Oct. 2007, pp.245-250.[21] Wu C J, Jaleel A, Martonosi M, Steely Jr S C, Emer J. PACMan: Prefetch-aware cache management for high performance caching. In Proc. the 44th MICRO, Dec. 2011, pp.442-453.[22] Duong N, Zhao D, Kim T, Cammarota R, Valero M, Veidenbaum A V. Improving cache management policies using dynamic reuse distances. In Proc. the 45th MICRO, Dec. 2012, pp.389-400.[23] Hu Z, Kaxiras S, Martonosi M. Timekeeping in the memory system: Predicting and optimizing memory behavior. In Proc. the 29th ISCA, May 2002, pp.209-220.[24] Liu H, Ferdman M, Huh J, Burger D. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proc. the 41st MICRO, Nov. 2008, pp.222-233.[25] Jalminger J, Stenstrom P. A novel approach to cache block reuse predictions. In Proc. the 2003 ICPP, Oct. 2003, pp.294-302.[26] Johnson T L, Connors D A, Merten M C, Hwu W M W. Run-time cache bypassing. IEEE Transactions on Computers, 1999, 48(12): 1338-1354.[27] Rivers J A, Davidson E S. Reducing conflicts in directmapped caches with a temporality-based design. In Proc. the 1996 ICPP, Aug. 1996, Vol. 1, pp.154-163.[28] Rivers J A, Tam E S, Tyson G S, Davidson E S, Farrens M. Utilizing reuse information in data cache management. In Proc. the 12th ICS, Jul. 1998, pp.449-456.[29] John L K, Subramanian A. Design and performance evaluation of a cache assist to implement selective caching. In Proc. the 1997 ICCD, Oct. 1997, pp.510-518.[30] Walsh S J, Board J A. Pollution control caching. In Proc. the 1995 ICCD, Oct. 1995, pp.300-306.[31] Chi C H, Dietz H. Improving cache performance by selective cache bypass. In Proc. the 22nd HICSS, Jan. 1989, Vol. 1, pp.277-285.[32] González, A, Aliagas C, Valero M. A data cache with multiple caching strategies tuned to different types of locality. In Proc. the 9th ICS, Jul. 1995, pp.338-347.[33] Tyson G, Farrens M, Matthews J, Pleszkun A R. A modified approach to data cache management. In Proc. the 28th MICRO, Dec. 1995, pp.93-103.[34] Xiang L, Chen T, Shi Q, Hu W. Less reused filter: Improving L2 cache performance via filtering less reused lines. In Proc. the 23rd ICS, Jun. 2009, pp.68-79.[35] Gao H, Wilkerson C. A dueling segmented LRU replacement algorithm with adaptive bypassing. In Proc. the 1st JWAC, Jun. 2010.[36] Gaur J, Chaudhuri M, Subramoney S. Bypass and insertion algorithms for exclusive last-level caches. In Proc. the 38th ISCA, Jun. 2011, pp.81-92.[37] Li L, Tong D, Xie Z, Lu J, Cheng X. Optimal bypass monitor for high performance last-level caches. In Proc. the 21st PACT, Sept. 2012, pp.315-324.[38] Jaleel A, Hasenplaugh W, Qureshi M, Sebot J, Steely Jr S, Emer J. Adaptive insertion policies for managing shared caches. In Proc. the 17th PACT, Oct. 2008, pp.208-219.[39] Manikantan R, Rajan K, Govindarajan R. NUcache: An effcient multicore cache organization based on next-use distance. In Proc. the 17th HPCA, Feb. 2011, pp.243-253.[40] Sanchez D, Kozyrakis C. Vantage: Scalable and effcient finegrain cache partitioning. In Proc. the 38th ISCA, Jun. 2011, pp.57-68.[41] Manikantan R, Rajan K, Govindarajan R. Probabilistic shared cache management (PriSM). In Proc. the 39th ISCA, Jun. 2012, pp.428-439.[42] Hsu L R, Reinhardt S K, Iyer R, Makineni S. Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. In Proc. the 15th PACT, Sept. 2006, pp.13-22.[43] Iyer R. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proc. the 18th ICS, Jun. 2004, pp.257266.[44] Kim S, Chandra D, Solihin Y. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proc. the 13th PACT, Sept. 2004, pp.111-122.[45] Jeong J, Dubois M. Cache replacement algorithms with nonuniform miss costs. IEEE Transactions on Computers, 2006, 55(4): 353-365.[46] Jeong J, Dubois M. Cost-sensitive cache replacement algorithms. In Proc. the 9th HPCA, Feb. 2003, pp.327-337.[47] Moreto M, Cazorla F, Ramirez A, Valero M. MLP-aware dynamic cache partitioning. In Proc. the 3rd HiPEAC, Jan. 2008, pp.337-352.[48] Kaseridis D, Iqbal M, John L. Cache friendliness-aware management of shared last-level caches for high performance multicore systems. IEEE Transactions on Computers, 2014, 63(4): 874-887.[49] Lee H H S, Tyson G S, Farrens M K. Eager writeback——A technique for improving bandwidth utilization. In Proc. the 33rd MICRO, Dec. 2000, pp.11-21.[50] Lee C J, Narasiman V, Ebrahimi E, Mutlu O, Patt Y N. DRAM-aware last-level cache writeback: Reducing writecaused interference in memory systems. Technical Report, TR-HPS-2010-002, High Performance Systems Group, Department of Electrical and Computer Engineering, The University of Texas at Austin & Department of Electrical and Computer Engineering, Carnegie Mellon University, April 2010.[51] Stuecheli J, Kaseridis D, Daly D, Hunter H C, John L K. The virtual write queue: Coordinating DRAM and last-level cache policies. In Proc. the 37th ISCA, Jun. 2010, pp.72-82.[52] Wang Z, Khan S M, Jiménez D A. Improving writeback efficiency with decoupled last-write prediction. In Proc. the 39th ISCA, Jun. 2012, pp.309-320.[53] Lee C J, Ebrahimi E, Narasiman V, Mutlu O , Patt Y N. DRAM-aware last-level cache replacement. Technical Report, TR-HPS-2010-007, High Performance Systems Group, Department of Electrical and Computer Engineering, The University of Texas at Austin & Department of Electrical and Computer Engineering, Carnegie Mellon University, Dec. 2010.[54] HP. Inside the Intelr® Itaniumr® 2 processor. HP Technical White Paper, July 2002. http://www.dig64.org/about/Itanium2 white paper public.pdf, Oct. 2014.[55] Oracle. UltraSPARC T2 supplement to the UltraSPARC architecture 2007. Draft D1.4.3, Sept. 2007. http://www.oracle.com/technetwork/systems/opensparc/t2-14-ust2-uasuppl -draft-hp-ext-1537761.html, Oct. 2014.[56] Jaleel A, Borch E, Bhandaru M, Steely Jr S C, Emer J. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (TLA) cache management policies. In Proc. the 43rd MICRO, Dec. 2010, pp.151-162.[57] Martin M M K, Hill M D, Sorin D J. Why on-chip cache coherence is here to stay. Commun. ACM, 2012, 55(7): 78-89.[58] Albericio J, Ibáñez P, Viñals V, Llabería J M. Exploiting reuse locality on inclusive shared last-level caches. ACM Trans. Archit. Code Optim., 2013, 9(4): Article No. 38.[59] Binkert N, Beckmann B, Black G, Reinhardt S K, Saidi A, Basu A, Hestness J, Hower D R, Krishna T, Sardashti S, Sen Ling-Da Li et al.: Retention Benefit Based Intelligent Cache Replacement R, Sewell K, Shoaib M, Vaish N, Hill M D, Wood D A. The gem5 simulator. SIGARCH Comput. Archit. News, 2011, 39(2): 1-7.[60] Henning J L. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 2006, 34(4): 1-17.[61] Perelman E, Hamerly G, Biesbrouck M V, Sherwood T, Calder B. Using SimPoint for accurate and effcient simu- lation. SIGMETRICS Perform. Eval. Rev., 2003, 31(1): 318-319. |