We use cookies to improve your experience with our site.
Zhen-Hao Zhang, Xiao-Yin Wang, Dong Tong, Jiang-Fang Yi, Jun-Lin Lu, Ke-Yi Wang. Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency[J]. Journal of Computer Science and Technology, 2012, 27(4): 769-780. DOI: 10.1007/s11390-012-1263-7
Citation: Zhen-Hao Zhang, Xiao-Yin Wang, Dong Tong, Jiang-Fang Yi, Jun-Lin Lu, Ke-Yi Wang. Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency[J]. Journal of Computer Science and Technology, 2012, 27(4): 769-780. DOI: 10.1007/s11390-012-1263-7

Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency

  • Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF (short for store sequence bloom filter) to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSQ design on most benchmarks by 10.22% and 8.71% respectively.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return