Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies.

Slides:



Advertisements
Similar presentations
Cross-site running on TeraGrid using MPICH-G2 Presented by Krishna Muriki (SDSC) on behalf of Dr. Nick Karonis (NIU)
Advertisements

Parallel Algorithm Design
Tutorial on MPI Experimental Environment for ECE5610/CSC
MPI Fundamentals—A Quick Overview Shantanu Dutt ECE Dept., UIC.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.
Deino MPI Installation The famous “cpi.c” Profiling
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Parallel Programming in C with MPI and OpenMP
1 July 29, 2005 Distributed Computing 1:00 pm - 2:00 pm Introduction to MPI Barry Wilkinson Department of Computer Science UNC-Charlotte Consortium for.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
MPI (Message Passing Interface) Basics
1 Lecture 4: Distributed-memory Computing with PVM/MPI.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
CS 240A Models of parallel programming: Distributed memory and MPI.
CS 484. Message Passing Based on multi-processor Set of independent processors Connected via some communication net All communication between processes.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Hybrid MPI and OpenMP Parallel Programming
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Message-passing Model.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Task/ChannelMessage-passing TaskProcess Explicit channelsMessage communication.
Running on GCB part1 By: Camilo Silva. Simple steps to run MPI 1.Use putty or the terminal 2.SSH to gcb.fiu.edu 3.Loggin by providing your username and.
MPI and OpenMP.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
Debugging 1/6/2016. Debugging 1/6/2016 Debugging  Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a program.
12.1 Parallel Programming Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
CMSC 104, Version 8/061L14AssignmentOps.ppt Assignment Operators Topics Increment and Decrement Operators Assignment Operators Debugging Tips Reading Section.
Implementing Processes and Threads CS550 Operating Systems.
1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Message Passing Interface Using resources from
1 Advanced MPI William D. Gropp Rusty Lusk and Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory.
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming - Exercises Dr. Xiao Qin Auburn University
1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.
Message Passing Computing
Chapter 4.
CS4402 – Parallel Computing
MPI Message Passing Interface
CS 668: Lecture 3 An Introduction to MPI
Send and Receive.
CS 584.
Send and Receive.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Parallel Processing - MPI
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Message-Passing Computing
Lab Course CFD Parallelisation Dr. Miriam Mehl.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
MPI MPI = Message Passing Interface
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message-Passing Computing Message Passing Interface (MPI)
Distributed Memory Programming with Message-Passing
CS 584 Lecture 8 Assignment?.
Presentation transcript:

Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary Slides S.2 Empirical Study of Parallel Programs (cont’d) Objective –An initiation into empirical analysis of parallel programs –By example – number summation Basis for coursework Outcome: Ability to –Follow same steps to measure simple parallel programs –Explore the detail functionalities of the tools –Get better insight into and explain behavior of parallel programs –Optimize parallel programs –Use similar tools for program measurements

Supplementary Slides S.3 Homework Contract Requirements –A number generator program –Assemble and compile Hw program –Instrument Hw program with MPI timing functions –A file management script Deliverables –Speedup (and linear speedup) graph plots (on same page) showing # processors against problem size –A file of raw execution times of the form: Data size # processors Execution time –Jumpshot visualization graphs –A report explaining your work especially the instrumentation, the speedup graphs and the Jumpshot graphs

Supplementary Slides S.4 Execution Time: Number Generator Program main(int argc, char **argv) { int i; FILE *fp; if (argc != 4) { printf("randFile filename #ofValues powerOfTwo\n"); return -1; } srand(clock()); fp = fopen(argv[1],"w"); if (fp == NULL) return -1; fprintf(fp,"%d\n",atoi(argv[2])); for (i=0; i<atoi(argv[2]); i++) fprintf(fp,"%d\n",rand()%(int)pow(2,atoi(argv[3]))); fclose(fp); }

Supplementary Slides S.5 Number Generator: Compiling & Running Compiling Running Should generate >4 groups of #s of different sizes: 1000,5000,10000,15000,20000 etc

Supplementary Slides S.6 Number Generator: A Helper Script for var in do./genRandom data$var.txt $var 16; done;

Supplementary Slides S.7 Sample MPI program #include “mpi.h” #include #define MAXSIZE 1000 void main(int argc, char *argv[]){ int myid, numprocs; int data[MAXSIZE], i, x, low, high, myresult, result; char fn[255]; char *fp; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if (myid == 0) { /* Open input file and initialize data */ strcpy(fn,getenv(“HOME”)); strcat(fn,”/MPI/rand_data.txt”); if ((fp = fopen(fn,”r”)) == NULL) { printf(“Can’t open the input file: %s\n\n”, fn); exit(1); } for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]); } … Summation Program

Supplementary Slides S.8 Sample MPI program … MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* broadcast data */ x = n/nproc; /* Add my portion Of data */ low = myid * x; high = low + x; for(i = low; i < high; i++) myresult += data[i]; printf(“I got %d from %d\n”, myresult, myid); /* Compute global sum */ MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf(“The sum is %d.\n”, result); MPI_Finalize(); } Summation Program (cont’d)

Supplementary Slides S.9 Summation Program: Instrumentation Place your instrumentation code carefully You need to justify placement of such code MPI_Wtime() –Returns an elapsed (wall clock) time on the calling processor MPI_Wtick() –returns, as a double precision value, the number of seconds between successive clock ticks. –For example, if the clock is implemented by the hardware as a counter that is incremented every millisecond, the value returned by MPI_WTICK should be 10 -3

Supplementary Slides S.10 Summation Program: Compiling & Running Recompile for different data size, or Take the data size & input file dynamically Sample script: for var1 in forData1000 forData5000 forData10000 forData15000 forData20000 do for var2 in do mpirun -np $var2 $var1; done;

Supplementary Slides S.11 Jumpshot: Visualizing execution trace Jumpshot is a graphical tool for investigating the behavior of parallel programs. –Implemented in Java (Jumpshot can run as Applet) It is a ``post-mortem'' analyzer –Inputs a logfile of time-stamped events The file is written by the companion package CLOG Jumpshot can present multiple views of logfile data. –Per process timelines - the primary view showing with colored bars the state of each process at each time. –State duration histograms view –``mountain range'' view showing the aggregate number of processes in each state at each time.

Supplementary Slides S.12 Visualizing Program Execution Other logfile-based tools with similar features : –Commercial tools include TimeScan and Vampir –Academic tools include ParaGraph TraceView XPVM XMPI Pablo

Supplementary Slides S.13 Linking with Logging Libraries Generating log files: –Compile your MPI code and link using the -mpilog flag: bash-2.04$ mpicc -c numbersSummation.c bash-2.04$ mpicc -o numbersSummation numbersSummation.o –mpilog –Check file names associated with the compiled program bash-2.04$ ls numbersSummation* numbersSummation numbersSummation.o numbersSummation.xls numbersSummation.c numbersSummation.txt

Supplementary Slides S.14 Linking with Logging Libraries (cont’d) Generating log files: –Run the MPI program: bash-2.04$ mpirun -np 8 numbersSummation I got from 0 The sum is Writing logfile.... Finished writing logfile. I got from 3 I got from 6 I got from 2 I got from 7 I got from 4 I got from 1 I got from 5 –Check to verify that the a.clog file is created bash-2.04$ !l ls numbersSummation* numbersSummation numbersSummation.clog numbersSummation.txt numbersSummation.c numbersSummation.o numbersSummation.xls

Supplementary Slides S.15 Linking with Logging Libraries (cont’d) Use Jumpshot to visualize the.clog file –Run vncserver to get Linux remote desktop –Launch Jumpshot on the.clog file May require conversion to.slog-2

Supplementary Slides S.16 Jumpshot: Sample Display

Supplementary Slides S.17 Linking with Tracing Libraries Compile your MPI code and link using the -mpitrace flag: bash-2.04$ mpicc -c numbersSummation.c bash-2.04$ mpicc -o numbersSummation numbersSummation.o –mpitrace –Running: bash-2.04$ mpirun -np 4 numbersSummation Starting MPI_Init... [1] Ending MPI_Init [1] Starting MPI_Comm_size... [1] Ending MPI_Comm_size [1] Starting MPI_Comm_rank... [1] Ending MPI_Comm_rank [1] Starting MPI_Bcast... [2] Ending MPI_Init [3] Ending MPI_Init ……

Supplementary Slides S.18 Linking with Animation Libraries Compile your MPI code and link using the -mpianim flag: bash-2.04$ mpicc -c numbersSummation.c bash-2.04$ mpicc -o numbersSummation -mpianim numbersSummation.o - L/export/tools/mpich/lib -lmpe -L/usr/X11R6/lib -lX11 -lm –Running: bash-2.04$ mpirun -np 4 numbersSummation

Supplementary Slides S.19 Starting mpirun with a Debugger bash-2.04$ mpirun -dbg=gdb -np 4 summation GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... Breakpoint 1 at 0x804cbee Breakpoint 1, 0x0804cbee in MPI_Init () (gdb)

Supplementary Slides S.20 Structural changes may need to be made to a parallel program after measuring its performance –Hot spots exposed etc A number of measures can be taken to optimize a parallel program: 1.Change the number of processes to alter process granularity 2.Increase messages sizes to lessen the effect of startup times 3.Recompute values locally rather than send computed values in additional messages to send these values 4.Latency hiding – overlapping communication with computation 5.Perform critical path analysis – determine the longest path that dominates overall execution time 6.Address effect of memory hierarchy – reducing cache misses by, for example, reordering the memory requests in the program Optimization Strategies

Supplementary Slides S.21 Check documentation for mpich, jumpshot, mpe in: –/tools/mpich/doc References