Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &

Similar presentations


Presentation on theme: "Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &"— Presentation transcript:

1 Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture & Programming

2 Korea Univ Performance Analysis Assuming that the performance of an application is satisfactory in single-threaded mode, the most likely performance question is “Why does my application not get the expected speed-up when running on multiple threads? The performance of large-scale parallel applications depends on many factors  Load imbalance  Parallelization overheads 2

3 Korea Univ Profiling Several approaches can be used to obtain performance data Sampling  Based on periodic OS interrupts (timer interrupts)  At each sampling point, the performance data such as the program counter, call stacks, and hardware counter data are collected and recorded  Less numerically accurate, but allow the target program to run at near full speed  Examples Unix gprof Sun Performance Analyzer Oprofile Code instrumentation  Calls to a tracing library are inserted in the code by the programmer, the compiler, or a tool  These library calls write performance data into a file during program execution 3

4 Korea Univ Pertinent Performance Data Time spent in user and system level routines Time spent in serial parts and parallel regions Time spent in communications  #Invalidations, #cache-to-cache transfers Hardware performance counter information such as CPU cycles, I$ and D$ misses The state of a thread at given times such as waiting for work, synchronizing, forking, and joining 4

5 Korea Univ gprof Use GNU gprof to get the profile information  Compile and link your code with -pg option  Run your code gmon.out is generated  Run gprof to interpret the information 5

6 Korea Univ Testrun Benchmarks Download a parallel benchmark from  http://www.nas.nasa.gov/Resources/Software/npb.html http://www.nas.nasa.gov/Resources/Software/npb.html Download the OpenMP version of NPS (NPB 3) Compile the BT benchmark  Read README.install for information of how to compile the code  Edit ‘make.def’ under /config/ Change ‘f77’ to ‘gfortran’ Add ‘-pg’ option to FLAGS and FLINKFLAGS  FFLAGS = -O -fopenmp –pg  FLINKFLAGS = -O –fopenmp -pg  Compile BT with ‘make BT CLASS=A’ Run simulation with./bin/BT.A  It will generate gmon.out by default in the directory where you run the program Use gprof to extract the profile information  gprof./bin/BT.A > bt.txt  Open bt.txt with any text editor 6

7 Korea Univ Testrun Benchmarks Compile the DC benchmark  Read README.install for information of how to compile the code  Edit ‘make.def’ under /config/ Change ‘cc’ to ‘gcc’ Add ‘-pg’ option to FLAGS and FLINKFLAGS  CFLAGS = -O -fopenmp –pg  CLINK = $(CC) –fopenmp -pg  Compile BT with ‘make DC CLASS=A’ Run simulation with./bin/dc.A.x  It will generate gmon.out by default in the directory where you run the program Use gprof to extract the profile information  gprof./bin/dc.A.x > dc.txt  Open dc.txt with any text editor 7


Download ppt "Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &"

Similar presentations


Ads by Google