Presentation is loading. Please wait.

Presentation is loading. Please wait.

PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.

Similar presentations


Presentation on theme: "PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance."— Presentation transcript:

1 PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance Modeling and Characterization (PMaC) Laboratory San Diego Supercomputer Center

2 PMaC Performance Modeling and Characterization Outline  Motivation –Performance modeling in High Performance Computing (HPC) –How does binary instrumentation fit in?  PEBIL = PMaC’s Efficient Binary Instrumentation for Linux/x86 –Binary instrumentation overview –Use case: memory tracing –Use case: function profiling

3 PMaC Performance Modeling and Characterization Convolution Methods map Application Signatures to Machine Profiles produce performance prediction HPC Target System Characteristics of HPC system – Machine Profile HPC Target System Machine Profile – characterizations of the rates at which a machine can carry out fundamental operations Measured or projected via simple benchmarks on 1-2 nodes of the system HPC Application Requirements of HPC Application – Application Signature PMaC HPC Performance Models Performance of Application on Target system HPC Application Application signature – detailed summaries of the fundamental operations to be carried out by the application Collected via trace tools Performance Model – a calculable expression of the runtime, efficiency, memory use, etc. of an HPC program on some machine

4 PMaC Performance Modeling and Characterization Application Signature  Application signature – fundamental operations used by the application –Requires low-level details of application –Details attached to specific structures within the application –Measurement? (e.g. timers or hardware counters)  Measuring at fine grain with reasonable overheads & transparently is HARD Use binary instrumentation

5 PMaC Performance Modeling and Characterization Binary Instrumentation  Instrumentation – inserting extra code into a program, usually to inspect some aspect of behavior  Binary instrumentation – instrumentation of the compiled object/executable void incrementby(int& n, int c){ counter++; // instrumentation code n += c; }

6 PMaC Performance Modeling and Characterization The Case for Binary Instrumentation  Low-level details of application –Program is in its binary form  Compilers transform and optimize  Basic program structure  Memory access  Vectorization  Data dependencies  The executable might be all we have  Easy to tie details to application structures int identity(int n){ int c = 0; while (c < n) c++; return c; } int identity(int n){ return n; }

7 PMaC Performance Modeling and Characterization Runtime Overhead is a Big Deal  PEBIL… the E stands for Efficient  We want to model real HPC applications –Relatively long runtimes: minutes, hours, days? –Lots of CPUS: O(10 5 ) in largest supercomputers –High slowdowns create problems  Too long for queue  Unsympathetic administrators/managers  Inconvenience  Unnecessarily use resources Mitigate problems by minimizing runtime overhead

8 PMaC Performance Modeling and Characterization Example Use Cases  Memory address trace collection –Capture all application loads/stores  Use a buffer, batch process them –Very widely used  Performance/energy models (e.g., PMaC)  Cache design  Memory bug detection –For efficiency, this is often used with sampling  Function/loop measurement –Insert calls to measurement routines around functions/loops –TAU uses this feature

9 PMaC Performance Modeling and Characterization PEBIL Design  Efficiency is priority #1  Designed around a few use cases –Execution counting –Memory tracing  Static binary rewriter –Write instrumented + runnable executable to disk  Keep original behavior intact  Gather information as a side-effect –Instrument once, run many times –No instrumentation cost at runtime –Code patching (not just-in-time compiled!)

10 PMaC Performance Modeling and Characterization Original Instrumented 0000c000 : c000: 48 89 7d f8 mov %rdi,-0x8(%rbp) c004: 5e pop %rsi c005: 75 f8 jne 0xc004 c007: c9 leaveq c008: c3 retq 0000c000 : c000: 48 89 7d f8 mov %rdi,-0x8(%rbp) c004: 5e pop %rsi c005: 75 f8 jne 0xc004 c007: c9 leaveq c008: c3 retq How Binary Instrumentation Works 0000d000 : d000: e9 de ad be ef jmp 0x1000 # to instrumentation d005: 48 89 7d f8 mov %rdi,-0x8(%rbp) d000: e9 de ad be ef jmp 0x1010 # to instrumentation d00a: 5e pop %rsi d00b: 75 00 00 00 f8 jne 0xd009 d000: e9 de ad be ef jmp 0x1020 # to instrumentation d00a: c9 leaveq d00b: c3 retq 0000d000 : d000: e9 de ad be ef jmp 0x1000 # to instrumentation d005: 48 89 7d f8 mov %rdi,-0x8(%rbp) d000: e9 de ad be ef jmp 0x1010 # to instrumentation d00a: 5e pop %rsi d00b: 75 00 00 00 f8 jne 0xd009 d000: e9 de ad be ef jmp 0x1020 # to instrumentation d00a: c9 leaveq d00b: c3 retq Basic Block 1 Basic Block 2 Basic Block 3 (Basic block counting) // do stuff // jump back // do stuff // jump back

11 PMaC Performance Modeling and Characterization Use case: Memory Address Collection  Collect the address of every load/store issued by the application –Put addresses in a buffer, process addresses in batch  Fewer function calls  Less cache pollution for (i = 0; i < n; i++){ A[i] = B[i]; } if (cur + 2 > BUF_SIZE) clear_buf(); buffer[cur + 0] = &(A[i]); buffer[cur + 1] = &(B[i]);

12 PMaC Performance Modeling and Characterization Optimization – Sampling w/ Instrumentation Point Disabling  Processing addresses is usually expensive –Cache simulation (multiple caches), locality analysis, address stream compression  Use interval-based sampling –Process the first X of every Y addresses (Y >= X) –Obvious result: reduced processing overhead –Not so obvious: reduced collection overhead by skipping address collection during sampled regions  Different approaches –PEBIL – swap instrumentation with nops  Very lightweight, limited functionality –PIN / Dyninst – Arbitrarily remove, re-instrument  Heavyweight, rich functionality

13 PMaC Performance Modeling and Characterization Memory Trace Overhead w/ Sampling OpenMP NAS Parallel Benchmarks (8 threads)

14 PMaC Performance Modeling and Characterization Use case: Inserting Profiling Routines  Insert calls to timers/tracking code around functions and loops –Want l low overhead, especially where no instrumentation is introduced  Small overhead = accurate profile  “throttle” instrumentation points that are called too frequently –Don’t just ignore them, disable them! –Collaboration w/ Tuning Analysis and Utilities (TAU) project void compute(){ // function id 0 for (i = 0; i < n; i++){ // loop id 1 A[i] = B[i]; } for (i = 0; i < n; i++){ // loop id 2 A[i] += C[i]; } profile_begin(0); profile_begin(1); profile_end(1); profile_begin(2); profile_end(2); profile_end(0);

15 PMaC Performance Modeling and Characterization download https://github.com/mlaurenzano/PEBIL email michaell@sdsc.edu, lcarring@sdsc.edu Contact Info


Download ppt "PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance."

Similar presentations


Ads by Google