SCIENCES USC INFORMATION INSTITUTE An Open64-based Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes.

SCIENCES USC INFORMATION INSTITUTE An Open64-based Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes Jeremy Abramson and Pedro C. Diniz University of Southern California / Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, California 90292

SCIENCES USC INFORMATION INSTITUTE Motivation Performance analysis is conceptually easy –Just run the program! The “what” of performance. Is this interesting? –Is that realistic? Huge programs with large data sets –“Uncertainty principle” and intractability of profiling/instrumenting Performance prediction and analysis is in practice very hard –Not just interested in wall clock time The “why” of performance is a big concern How to accurately characterize program behavior? What about architecture effects? –Can’t reuse wall clock time –Can reuse program characteristics

SCIENCES USC INFORMATION INSTITUTE Motivation (2) What about the future? Different architecture = better results? Compiler transformations (loop unrolling) Need a fast, scalable, automated way of determining program characteristics –Determine what causes poor performance What does profiling tell us? How can the programmer use profiling (low-level) information?

SCIENCES USC INFORMATION INSTITUTE Overview Approach –High level / low level synergy –Not architecture-bound Experimental results –CG core Caveats and future work Conclusion

SCIENCES USC INFORMATION INSTITUTE Low versus High level information la $r0, a lw $r1 i mult $offset, $r1, 4 add $offset, $offset, $r0 lw $r2, $offset add $r3, $r2, 1 la $r4, b sw $r4, $r3 or Which can provide meaningful performance information to a programmer? How do we capture the information at a low level while maintaining the structure of high level source?

SCIENCES USC INFORMATION INSTITUTE Low versus High level information (2) Drawbacks of looking at low-level –Too much data! –You found a “problem” spot. What now? How do programmers relate information back to source level? Drawbacks of looking at source-level –What about the compiler? Code may look very different –Architecture impacts? Solution: Look at high-level structure, try to anticipate compiler

SCIENCES USC INFORMATION INSTITUTE Experimental Approach Goal: Derive performance expectations from source code for different architectures –What should the performance be and why? –What is limiting the performance? Data-dependencies? Architecture limitations? Use high level information –WHIRL intermediate representation in Open64 Arrays not lowered Construct DFG –Decorate graph with latency information Schedule the DFG –Compute as-soon-as-possible schedule –Variable number of functional units ALU, Load/Store, Registers Pipelining of operations

SCIENCES USC INFORMATION INSTITUTE Compilation process OPR_STID: B OPR_ADD OPR_ARRAY OPR_LDA: A OPR_LDID: i OPR_CONST: 1 for (i; i < 0; … … B = A[i] + 1 … 1. Source (C/Fortran) 2. Open64 WHIRL (High-level) 3. Annotated DFG

SCIENCES USC INFORMATION INSTITUTE Memory modeling approach Array node represents address calculation at a high level i is a loop induction variable Array expression is affine. Assume a cache hit, and assign latency accordingly Register hit? Assign latency 0

SCIENCES USC INFORMATION INSTITUTE Example: CG do 200 j = 1, n xj = x(j) do 100 k = colstr(j), colstr(j+1)-1 y(rowidx(k)) = y(rowidx(k)) + a(k) + xj 100 continue 200 continue

SCIENCES USC INFORMATION INSTITUTE CG Analysis Results Figure 4. Validation results of CG on a MIPS R10000 machine Prediction results consistent with un-optimized version of the code

SCIENCES USC INFORMATION INSTITUTE CG Analysis Results (2) What’s the best way to use processor space? –Pipelined ALUs? –Replicate standard ALUs? Figure 5. Cycle time for an iteration of CG with varying architectural configurations

SCIENCES USC INFORMATION INSTITUTE Caveats, Future Work More compiler-like features are needed to improve accuracy –Control flow Implement trace scheduling Multiple-paths can give upper/lower performance bounds –Simple compiler transformations Common sub-expression elimination Strength reduction Constant folding –Register allocation “Distance”-based methods? Anticipate cache for spill code –Software pipelining? Unrolling exploits ILP Run-time data? –Array references, loop trip counts, access patterns from performance skeletons

SCIENCES USC INFORMATION INSTITUTE Conclusions SLOPE provides very fast performance prediction and analysis results High-level approach gives more meaningful information –Still try to anticipate compiler and memory hierarchy More compiler transformations to be added –Maintain high-level approach, refine low-level accuracy

SCIENCES USC INFORMATION INSTITUTE An Open64-based Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes Jeremy Abramson and Pedro C. Diniz University of Southern California / Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, California 90292

SCIENCES USC INFORMATION INSTITUTE An Open64-based Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes.

Similar presentations

Presentation on theme: "SCIENCES USC INFORMATION INSTITUTE An Open64-based Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SCIENCES USC INFORMATION INSTITUTE An Open64-based Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes.

Similar presentations

Presentation on theme: "SCIENCES USC INFORMATION INSTITUTE An Open64-based Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes."— Presentation transcript:

Similar presentations

About project

Feedback