Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita.

Similar presentations


Presentation on theme: "A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita."— Presentation transcript:

1 A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita Nagarajan

2 Introduction OpenMP – Standard for shared memory parallel programming – Set of directives and library routines for Fortran and C/C++ Performance Tools – Need: Analyse parallel behaviour. Determine causes for OpenMP application performance problems. – Properties: Minimize intrusion cost, maximize performance data captured

3 Introduction(Contd.)… Dynamic Instrumentation – Instrument application while it is executing, recompilation not required. Dynamic Probe Class Library(DPCL) – Developed at IBM, built on top of the Dyninst API. – Using DPCL, performance tool “attaches” to application, “inserts code patches” into the binary, “starts/continues” its execution – Program instrumentation can be done at “function entry points”, “exit points” and “call sites”.

4 DPCL DPCL consists of – Client library – Runtime library – Daemon – Super-daemon

5 OMPtrace Built on top of DPCL IBM compiler translates OpenMP directives into function calls.

6 Translation of OpenMP Directives

7 OMPtrace

8 OMPtrace(Contd.)…

9 Hardware Counters – OMPtrace can access hardware counters, and provide statistics of the hardware events. Eg.L1/L2 hits, L1/L2 misses, number of instructions Paraver – Computes “Derived Metrics” from hardware events. Eg. L1 misses per second

10 Case Study: Sweep3D Multidimensional wavefront algorithm for “discrete ordinates” deterministic particle transport simulation.

11 Sweep3D(Contd.)… diag - original version of Sweep3D mkj – “do idiag” and “do jkm” loops replaced by a triple nested loop (“do m”, “do k”, “do j”) ccrit - based on “mkj”, outer loop parallelized, synchronization implemented using the “CRITICAL” directive. cpipe – based on “mkj”, outer loop parallelized, synchronization implemented using shared arrays and busy waiting.

12 Results from Experiments version12345612 Ccrit28.2624.4126.8426.4729.2830.3430.43 Cpipe25.6318.4513.0112.5310.067.677.76 Diag17.2813.0911.409.648.507.786.55 Elapsed time in seconds for the different OpenMP versions

13 Analysis of Results using Paraver Ccrit – Not scalable Overhead of mutex lock and unlock, contention Red: Trying to obtain lock Blue: Using lock Green – Release lock Light Blue – Execution outside critical section

14 Cpipe – Better performance than ccrit. – Poor locality because the “m” loop has an iteration count of 6.

15 Diag – Limited scalability due to high number of L2 misses Blue: Large values Green: Low values

16 Optimization kjmi – Interchange loops – Good performance, better scalability 12345612 kjmi 14.8610.017.355.824.893.622.88

17 Conclusions OMPtrace and Paraver form a useful tool for performance analysis and optimization of OpenMP applications.


Download ppt "A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita."

Similar presentations


Ads by Google