Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies.

Similar presentations


Presentation on theme: "Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies."— Presentation transcript:

1 Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

2 Supplementary Slides S.2 Empirical Study of Parallel Programs (cont’d) Objective –An initiation into empirical analysis of parallel programs –By example – number summation Basis for coursework Outcome: Ability to –Follow same steps to measure simple parallel programs –Explore the detail functionalities of the tools –Get better insight into and explain behavior of parallel programs –Optimize parallel programs –Use similar tools for program measurements

3 Supplementary Slides S.3 Homework Contract Requirements –A number generator program –Assemble and compile Hw program –Instrument Hw program with MPI timing functions –A file management script Deliverables –Speedup (and linear speedup) graph plots (on same page) showing # processors against problem size –A file of raw execution times of the form: Data size # processors Execution time –Jumpshot visualization graphs –A report explaining your work especially the instrumentation, the speedup graphs and the Jumpshot graphs

4 Supplementary Slides S.4 Execution Time: Number Generator Program main(int argc, char **argv) { int i; FILE *fp; if (argc != 4) { printf("randFile filename #ofValues powerOfTwo\n"); return -1; } srand(clock()); fp = fopen(argv[1],"w"); if (fp == NULL) return -1; fprintf(fp,"%d\n",atoi(argv[2])); for (i=0; i<atoi(argv[2]); i++) fprintf(fp,"%d\n",rand()%(int)pow(2,atoi(argv[3]))); fclose(fp); }

5 Supplementary Slides S.5 Number Generator: Compiling & Running Compiling Running Should generate >4 groups of #s of different sizes: 1000,5000,10000,15000,20000 etc

6 Supplementary Slides S.6 Number Generator: A Helper Script for var in 1000 5000 10000 15000 20000 do./genRandom data$var.txt $var 16; done;

7 Supplementary Slides S.7 Sample MPI program #include “mpi.h” #include #define MAXSIZE 1000 void main(int argc, char *argv[]){ int myid, numprocs; int data[MAXSIZE], i, x, low, high, myresult, result; char fn[255]; char *fp; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if (myid == 0) { /* Open input file and initialize data */ strcpy(fn,getenv(“HOME”)); strcat(fn,”/MPI/rand_data.txt”); if ((fp = fopen(fn,”r”)) == NULL) { printf(“Can’t open the input file: %s\n\n”, fn); exit(1); } for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]); } … Summation Program

8 Supplementary Slides S.8 Sample MPI program … MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* broadcast data */ x = n/nproc; /* Add my portion Of data */ low = myid * x; high = low + x; for(i = low; i < high; i++) myresult += data[i]; printf(“I got %d from %d\n”, myresult, myid); /* Compute global sum */ MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf(“The sum is %d.\n”, result); MPI_Finalize(); } Summation Program (cont’d)

9 Supplementary Slides S.9 Summation Program: Instrumentation Place your instrumentation code carefully You need to justify placement of such code MPI_Wtime() –Returns an elapsed (wall clock) time on the calling processor MPI_Wtick() –returns, as a double precision value, the number of seconds between successive clock ticks. –For example, if the clock is implemented by the hardware as a counter that is incremented every millisecond, the value returned by MPI_WTICK should be 10 -3

10 Supplementary Slides S.10 Summation Program: Compiling & Running Recompile for different data size, or Take the data size & input file dynamically Sample script: for var1 in forData1000 forData5000 forData10000 forData15000 forData20000 do for var2 in 1 2 4 5 8 10 12 do mpirun -np $var2 $var1; done;

11 Supplementary Slides S.11 Jumpshot: Visualizing execution trace Jumpshot is a graphical tool for investigating the behavior of parallel programs. –Implemented in Java (Jumpshot can run as Applet) It is a ``post-mortem'' analyzer –Inputs a logfile of time-stamped events The file is written by the companion package CLOG Jumpshot can present multiple views of logfile data. –Per process timelines - the primary view showing with colored bars the state of each process at each time. –State duration histograms view –``mountain range'' view showing the aggregate number of processes in each state at each time.

12 Supplementary Slides S.12 Visualizing Program Execution Other logfile-based tools with similar features : –Commercial tools include TimeScan and Vampir –Academic tools include ParaGraph TraceView XPVM XMPI Pablo

13 Supplementary Slides S.13 Linking with Logging Libraries Generating log files: –Compile your MPI code and link using the -mpilog flag: bash-2.04$ mpicc -c numbersSummation.c bash-2.04$ mpicc -o numbersSummation numbersSummation.o –mpilog –Check file names associated with the compiled program bash-2.04$ ls numbersSummation* numbersSummation numbersSummation.o numbersSummation.xls numbersSummation.c numbersSummation.txt

14 Supplementary Slides S.14 Linking with Logging Libraries (cont’d) Generating log files: –Run the MPI program: bash-2.04$ mpirun -np 8 numbersSummation I got 82638836 from 0 The sum is 657273685. Writing logfile.... Finished writing logfile. I got 81256047 from 3 I got 80498627 from 6 I got 82306891 from 2 I got 83437153 from 7 I got 82228251 from 4 I got 82302109 from 1 I got 82605771 from 5 –Check to verify that the a.clog file is created bash-2.04$ !l ls numbersSummation* numbersSummation numbersSummation.clog numbersSummation.txt numbersSummation.c numbersSummation.o numbersSummation.xls

15 Supplementary Slides S.15 Linking with Logging Libraries (cont’d) Use Jumpshot to visualize the.clog file –Run vncserver to get Linux remote desktop –Launch Jumpshot on the.clog file May require conversion to.slog-2

16 Supplementary Slides S.16 Jumpshot: Sample Display

17 Supplementary Slides S.17 Linking with Tracing Libraries Compile your MPI code and link using the -mpitrace flag: bash-2.04$ mpicc -c numbersSummation.c bash-2.04$ mpicc -o numbersSummation numbersSummation.o –mpitrace –Running: bash-2.04$ mpirun -np 4 numbersSummation Starting MPI_Init... [1] Ending MPI_Init [1] Starting MPI_Comm_size... [1] Ending MPI_Comm_size [1] Starting MPI_Comm_rank... [1] Ending MPI_Comm_rank [1] Starting MPI_Bcast... [2] Ending MPI_Init [3] Ending MPI_Init ……

18 Supplementary Slides S.18 Linking with Animation Libraries Compile your MPI code and link using the -mpianim flag: bash-2.04$ mpicc -c numbersSummation.c bash-2.04$ mpicc -o numbersSummation -mpianim numbersSummation.o - L/export/tools/mpich/lib -lmpe -L/usr/X11R6/lib -lX11 -lm –Running: bash-2.04$ mpirun -np 4 numbersSummation

19 Supplementary Slides S.19 Starting mpirun with a Debugger bash-2.04$ mpirun -dbg=gdb -np 4 summation GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... Breakpoint 1 at 0x804cbee Breakpoint 1, 0x0804cbee in MPI_Init () (gdb)

20 Supplementary Slides S.20 Structural changes may need to be made to a parallel program after measuring its performance –Hot spots exposed etc A number of measures can be taken to optimize a parallel program: 1.Change the number of processes to alter process granularity 2.Increase messages sizes to lessen the effect of startup times 3.Recompute values locally rather than send computed values in additional messages to send these values 4.Latency hiding – overlapping communication with computation 5.Perform critical path analysis – determine the longest path that dominates overall execution time 6.Address effect of memory hierarchy – reducing cache misses by, for example, reordering the memory requests in the program Optimization Strategies

21 Supplementary Slides S.21 Check documentation for mpich, jumpshot, mpe in: –/tools/mpich/doc http://www-unix.mcs.anl.gov/perfvis/ References


Download ppt "Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies."

Similar presentations


Ads by Google