We use cookies to improve your experience with our site.
Qiong Zou, Xiao-Feng Li, Long-Bing Zhang. Runtime Engine for Dynamic Profile Guided Stride Prefetching[J]. Journal of Computer Science and Technology, 2008, 23(4): 633-643.
Citation: Qiong Zou, Xiao-Feng Li, Long-Bing Zhang. Runtime Engine for Dynamic Profile Guided Stride Prefetching[J]. Journal of Computer Science and Technology, 2008, 23(4): 633-643.

Runtime Engine for Dynamic Profile Guided Stride Prefetching

  • Stride prefetching is recognized as an importanttechnique to improve memory access performance. The prior work usuallyprofiles and/or analyzes the program behavior offline, and uses theidentified stride patterns to guide the compilation process byinjecting the prefetch instructions at appropriate places. There aresome researches trying to enable stride prefetching in runtime systemswith online profiling, but they either cannot discover cross-proceduralprefetch opportunity, or require special supports in hardware orgarbage collection. In this paper, we present a prefetch engine for JVM(Java Virtual Machine). It firstly identifies the candidate loadoperations during just-in-time (JIT) compilation, and then instrumentsthe compiled code to profile the addresses of those loads. The runtimeprofile is collected in a trace buffer, which triggers a prefetchcontroller upon a protection fault. The prefetch controller analyzesthe trace to discover any stride patterns, then modifies the compiledcode to inject the prefetch instructions in place of theinstrumentations. One of the major advantages of this engine is that,it can detect striding loads in any virtual code places for bothregular and irregular code, not being limited with plain loop or procedurescopes. Actually we found the cross-procedural patterns take about 30\%of all the prefetchings in the representative Java benchmarks. Anothermajor advantage of the engine is that it has runtime overhead muchsmaller (the maximal is less than 4.0\%) than the benefits it brings.Our evaluation with Apache Harmony JVM shows that the engine canachieve an average 6.2\% speed-up with SPECJVM98 and DaCapo on IntelPentium 4 platform, in spite of the runtime overhead.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return