We use cookies to improve your experience with our site.
张栚滈, 王箫音, 佟冬, 易江芳, 陆俊林, 王克义. 基于活跃写指令窗口的可扩展访存长前递机制[J]. 计算机科学技术学报, 2012, 27(4): 769-780. DOI: 10.1007/s11390-012-1263-7
引用本文: 张栚滈, 王箫音, 佟冬, 易江芳, 陆俊林, 王克义. 基于活跃写指令窗口的可扩展访存长前递机制[J]. 计算机科学技术学报, 2012, 27(4): 769-780. DOI: 10.1007/s11390-012-1263-7
Zhen-Hao Zhang, Xiao-Yin Wang, Dong Tong, Jiang-Fang Yi, Jun-Lin Lu, Ke-Yi Wang. Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency[J]. Journal of Computer Science and Technology, 2012, 27(4): 769-780. DOI: 10.1007/s11390-012-1263-7
Citation: Zhen-Hao Zhang, Xiao-Yin Wang, Dong Tong, Jiang-Fang Yi, Jun-Lin Lu, Ke-Yi Wang. Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency[J]. Journal of Computer Science and Technology, 2012, 27(4): 769-780. DOI: 10.1007/s11390-012-1263-7

基于活跃写指令窗口的可扩展访存长前递机制

Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency

  • 摘要: 传统超标量处理器通常采用基于全相联查找的访存指令队列(Load/Store Queue)来实现访存指令之间的数据前递(Store-Load Forwarding)以及对访存指令违例(Store-Load Violation)的检测。但是由于CAM(Content Addressable Memory)结构的不可扩展性,这种基于全相联查找的微体系结构难以实现大范围的访存数据前递,从而逐渐成为进一步扩大指令窗口的主要瓶颈。
    本文提出一种基于活跃写指令窗口(Active Store Window)的推测式访存数据前递技术。该技术通过活跃写指令窗口能够实现将已经提交的写指令的数据前递给正在执行的读指令,使得大部分读指令能够在一个时钟周期内得到正确的数据。本文将这种区别于传统访存指令前递的情况称为访存指令长前递(Far Store-Load Forwarding)。传统超标量处理器的访存指令队列无法实现访存指令长前递功能,因此只能通过访问一级数据高速缓存甚至更低层次的数据缓存来得到已提交写指令的数据,在目前的超标量处理器设计当中这些访问延迟均超过一个时钟周期。因此,本文提出的推测式访存数据前递技术能够有效提高写指令的执行效率,从而提高处理器的访存性能。
    同时,为了有效检测访存指令违例,本文采用按序读指令重执行技术(Value-based Load Re-execution),并通过SSBF(Store Sequency Bloom Filter)来对不必要的读指令重执行进行过滤。活跃写指令窗口和SSBF均采用更具扩展性的组相联结构。实验表明这种更加简单且可扩展的设计在性能上平均超出传统LSQ结构10.22%。

     

    Abstract: Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF (short for store sequence bloom filter) to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSQ design on most benchmarks by 10.22% and 8.71% respectively.

     

/

返回文章
返回