Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Practical Stride Prefetching Implementation in Global Optimizer

Similar presentations


Presentation on theme: "A Practical Stride Prefetching Implementation in Global Optimizer"— Presentation transcript:

1 A Practical Stride Prefetching Implementation in Global Optimizer
Hucheng Zhou, Xing Zhou Tsinghua University 11/28/2018

2 Outline Introduction Motivation Algorithm Phase Ordering
Prefetching Scheduling Experiments Future Work 11/28/2018

3 Introduction What’s data prefetching Compiler controlled prefetching
Brings data into cache ahead of its use Compiler controlled prefetching Prefetching candidates identification Prefetching timing determination Unnecessary prefetching elimination Other prefetching tuning 11/28/2018

4 Introduction Stride data prefetching Our focus
Massive consecutive memory references Cause to many cache misses, thus poor performance Our focus Compiler based stride data prefetching 11/28/2018

5 Motivation Dominant stride prefetching algorithm LNO based algorithm
Loop Nest Optimizer (LNO) based LNO based algorithm Locality analysis (reuse analysislocalized iteration spaceprefetching predicates) Loop splitting (loop peeling and unrolling) Scheduling prefetches (iterations ahead of use) Limitations of LNO based approach Observations 11/28/2018

6 LNO based algorithm Example: 11/28/2018

7 Limitations Only effective for affine array reference
Only handle with DO loop nest Due to the vector space model Just focus on numerical applications operate on dense matrices However, not all of the strided references are affine array references, such as c++ STL vector traversing and other wrap around data structures 11/28/2018

8 Necessity Four common ways of STL vector traversing 11/28/2018

9 The Component flow of Open64
11/28/2018

10 IR after PRE-OPT For ACCESS1 and ACCESS2 11/28/2018

11 Compare with array references
11/28/2018

12 Comparison LNO based approach exploits the tight affinity with locality analysis and vector space model to identify the prefetching candidates which suffer from cache misses However, this affinity limits itself only for affine array references, cannot handle STL style stride references From another angle, identify stride prefetching candidates as induction variable recognition, then exploit the phase ordering to avoid unnecessary prefetches 11/28/2018

13 Definitions and Observations
A linear inductive variable (expression) is an expression whose value is incremented by a nonzero integer loop invariant on every iterations Lemma 1: linear inductive expression can be recursively defined: If v is a linear induction variable with stride s, then i is a linear inductive expression with the same stride s; If expr is a linear inductive expression with stride s, then –expr is a linear inductive expression with the same stride -s; If expr is linear inductive expression with stride s and invar is a loop invariant, then expr + invar and invar + expr are all inductive expressions with stride s; If expr1 and expr2 are linear inductive expressions with stride s1 and s2 respectively, then expr1 + expr2 is a linear inductive expression with stride s1 + s2; If expr is linear inductive expression with stride s and invar is a loop invariant, then expr * invar and invar * expr are all inductive expressions with stride invar * s; If expr is linear inductive expression with stride s and invar is a loop invariant, then expr / invar is a linear inductive expression with stride s/invar. 11/28/2018

14 Definitions and Observations
So, Mathematically, it equals to the linear combination of linear induction variables and loop invariants, with the form: E = c1* i1 + c2*i2 + … + cn*in + invar, where stride value is A stride reference is the reference in a loop whose accessed memory address is incremented by a integer loop invariant on every iterations Lemma 2: If a reference in loop whose accessed memory address is represented as an inductive expression, then it is a stride reference 11/28/2018

15 Speculative Induction Variable Recognition for Stride Prefetching
Thus stride reference identification equals to induction expression recognition We have presented an algorithm for demand driven speculative recognition of induction expression 11/28/2018

16 Speculative Induction Variable Recognition for Stride Prefetching
Induction variables in SSA form must satisfy the following condition : there must be a live phi in the corresponding loop header BB among the two operands of the phi, the loop invariant operand must point to the initialization of the induction variable out of the loop, while the other operand must be defined within the loop body. We call them init and increment respectively After expanding the increment operand of phi by copy propagation, the expanding result must contain the result of that phi, with a loop invariant expression as stride of the induction variable 11/28/2018

17 Our algorithm 11/28/2018

18 11/28/2018

19 Comparison Traditional induction variable recognition Our algorithm
Equals to strongly connected component Just for variable Conservative due to alias Limitations of copy propagation Our algorithm Demand driven Symbolic interpretation Speculative determination Modify a few on the expansion process of current implementation 11/28/2018

20 Phase Ordering Implement our algorithm after SSAPRE will benefit from strength reduction and PRE optimizations 11/28/2018

21 Prefetching Scheduling
Leading reference determination Prefetching information collection Stride value, data/loop shape, target cache model Prefetching determination for the candidates Based on the heuristics, such as data and loop size as well as the number of prefetches in the loop Computation of prefetching distance division of memory latency and the estimated time per iteration Loop transformations based on locality information to further reduce the number of prefetches 11/28/2018

22 Experiments We have conducted experiments against SPEC2006 benchmark on IA64 Itanium 2 Madison 1.6GHz with 6MB L3 cache and 8 GBytes memory quad-processor server with RedhatLinux Advanced Server 4.0 compiler is Open64 4.1 11/28/2018

23 Normalized results of SPEC2006 FP
11/28/2018

24 Normalized results of SPEC2006 INT
11/28/2018

25 Conclusion and Future Work
we propose an alternative inductive data prefetching algorithm implemented in global optimizer at O2 level, which can in theory prefetch almost all of the stride references statically determined in compile time extend to prefetch periodic, polynomial, geometric, monotonic and wrap-around variables Totally integrated stride prefetching algorithm with strength reduction optimization in SSAPRE coordinate the data prefetch with data layout optimization further investigate the interaction between software and hardware prefetching according to the static compiler analysis and feedback information on X86 platform 11/28/2018

26 Thanks Thank you very much And any questions? 11/28/2018


Download ppt "A Practical Stride Prefetching Implementation in Global Optimizer"

Similar presentations


Ads by Google