Presentation on theme: "Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations."— Presentation transcript:
Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations
1. Motivation Many programs incur large cache bottlenecks. Mainly caused by poor locality (temporal or spatial) Temporal locality is hard to optimize automatically in a compiler Therefore: need to help programmer to pin-point sources of low temporal locality.
3. RDVIS by example: matrix multiplication Reuses between a[i*N+k] at distance 2^9 Reuses between b[k*N+j] at distance 2^17 How to bring reuses of b[k*N+j] closer together? What separates reuses? What code is executed between reuses?
3. RDVIS by example: matrix multiplication Reuses occur between iterations of i-loop Solution: bring iterations of i-loop inwards
3. RDVIS by example: matrix multiplication Next to optimize: reuses of A[i*N+k]
3. Matrix multiplication: final result L1 cache L2 cache Main memory Exec. Time on P4: Orig: 0.740s Opt.: 0.223s Speedup: 3.3
4. Cluster Analysis In more complex programs, there can be many arrows. Many arrows can often by optimized by the same program transformation. Key idea: “When the same code is executed between use and reuse, probably the same program transformation is needed.”
4. Cluster Analysis by example: equake Many different arrows contribute to long- distance reuse
4. Cluster Analysis by example: equake LOOP FUSION!
5. Experimental Results
6. Some Implementation Details Instrumentation added to GCC 4: –Exact source location info added to all abstract syntax tree nodes. –Source location info is added in language-specific front-end (currently only C, Fortran is being added). –Instrumentation occurs in language-independent middle-end. Inserts function call for each memory reference Inserts function call at begin of each basic block Writes out source location info for memory references and basic blocks
7. Conclusion Visualization indicates reuses at a long distance, and the code that is executed between those reuses. Clustering of intermediately executed code leads to reference pairs that are optimizable with the same program transformation. Give RDVIS a try: