Lecture 8 Objectives Material from Chapter 9 More complete introduction of MPI functions Show how to implement manager-worker programs Parallel Algorithms.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Practical techniques & Examples
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 16 10/18/2011.
Parallel Programming in C with MPI and OpenMP
Reference: / MPI Program Structure.
Tutorial on MPI Experimental Environment for ECE5610/CSC
High Performance Computing
MPI Program Structure Self Test with solution. Self Test 1.How would you modify "Hello World" so that only even-numbered processors print the greeting.
Chapter 6 Floyd’s Algorithm. 2 Chapter Objectives Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and.
Point-to-Point Communication Self Test with solution.
Chapter 6 Floyd’s Algorithm. 2 Chapter Objectives Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
CS 179: GPU Programming Lecture 20: Cross-system communication.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
ECE 1747H : Parallel Programming Message Passing (MPI)
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
MA471Fall 2003 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University.
Steve Lantz Computing and Information Science Distributed Memory Programming Using Advanced MPI (Message Passing Interface)
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Send/Receive Blocked/Unblocked Tom Murphy Director of Contra Costa College High Performance Computing Center Message Passing Interface BWUPEP2011,
An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Parallel Programming with MPI By, Santosh K Jena..
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Lecture 6: Message Passing Interface (MPI). Parallel Programming Models Message Passing Model Used on Distributed memory MIMD architectures Multiple processes.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
An Introduction to MPI (message passing interface)
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
Message Passing Interface Using resources from
Lecture 3: Today’s topics MPI Broadcast (Quinn Chapter 5) –Sieve of Eratosthenes MPI Send and Receive calls (Quinn Chapter 6) –Floyd’s algorithm Other.
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
An Introduction to Parallel Programming with MPI February 17, 19, 24, David Adams
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
CS 4410 – Parallel Computing 1 Chap 9 CS 4410 – Parallel Computing Dr. Dave Gallagher Chap 9 Manager Worker.
Introduction to parallel computing concepts and technics
CS4402 – Parallel Computing
MPI Point to Point Communication
Introduction to MPI.
MPI Message Passing Interface
Parallel Programming with MPI and OpenMP
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
May 19 Lecture Outline Introduce MPI functionality
CSCE569 Parallel Computing
Introduction to parallelism and the Message Passing Interface
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Lecture 8 Objectives Material from Chapter 9 More complete introduction of MPI functions Show how to implement manager-worker programs Parallel Algorithms for Document Classification Parallel Algorithms for Clustering

Outline Introduce MPI functionality Introduce problem Parallel algorithm design Creating communicators Non-blocking communications Implementation Pipelining Clustering

Implementation of a Very Simple Document Classifier Manager/Worker Design Strategy Manager description (create initial tasks and communicate to/from workers) Worker description (receive tasks, enter an alternating communication from/to master and computation

Structure of Main program: Manager/Worker Paradigm MPI_Init (&argc, &argv); // what is my rank? MPI_Comm_rank(MPI_COMM_WORLD, &myrank); // how many processors are there? MPI_Comm_size(MPI_COMM_WORLD, &p); if (myid == 0) Manager(p); else Worker(myid, p); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return(0);

More MPI functions MPI_Abort MPI_Comm_split MPI_Isend, MPI_Irecv, MPI_Wait MPI_Probe MPI_Get_count MPI_Testsome

MPI_Abort A “quick and dirty” way for one process to terminate all processes in a specified communicator Example use: If manager cannot allocate memory needed to store document profile vectors int MPI_Abort ( MPI_Comm comm, /* Communicator */ int error_code)/* Value returned to calling environment */

Creating a Workers-only Communicator To support workers-only broadcast, need workers-only communicator Can use MPI_Comm_split Excluded processes (e.g., Manager) passes MPI_UNDEFINED as the value of split_key, meaning it will not be part of any new communicator

Workers-only Communicator int id; MPI_Comm worker_comm;... if (!id) /* Manager */ MPI_Comm_split (MPI_COMM_WORLD, MPI_UNDEFINED, id, &worker_comm); else /* Worker */ MPI_Comm_split (MPI_COMM_WORLD, 0, id, &worker_comm);

Nonblocking Send / Receive MPI_Isend, MPI_Irecv initiate operation MPI_Wait blocks until operation complete Calls can be made early –MPI_Isend as soon as value(s) assigned –MPI_Irecv as soon as buffer available Can eliminate a message copying step Allows communication / computation overlap

Function MPI_Isend int MPI_Isend ( void *buffer, int cnt, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm, MPI_Request *handle ) Pointer to object that identifies communication operation

Function MPI_Irecv int MPI_Irecv ( void *buffer, int cnt, MPI_Datatype dtype, int src, int tag, MPI_Comm comm, MPI_Request *handle ) Pointer to object that identifies communication operation

Function MPI_Wait int MPI_Wait ( MPI_Request *handle, MPI_Status *status ) Blocks until operation associated with pointer handle completes. status points to object containing info on received message

Receiving Problem Worker does not know length of message it will receive Example, the length of File Path Name Alternatives –Allocate huge buffer –Check length of incoming message, then allocate buffer We’ll take the second alternative

Function MPI_Probe int MPI_Probe ( int src, int tag, MPI_Comm comm, MPI_Status *status ) Blocks until message is available to be received from process with rank src with message tag tag; status pointer gives info on message size.

Function MPI_Get_count int MPI_Get_count ( MPI_Status *status, MPI_Datatype dtype, int *cnt ) cnt returns the number of elements in message

MPI_Testsome Often need to check whether one or more messages have arrived Manager posts a nonblocking receive to each worker process Builds an array of handles or request objects Testsome allows manager to determine how many messages have arrived

Function MPI_Testsome int MPI_Testsome ( int in_cnt, /* IN - Number of nonblocking receives to check */ MPI_Request *handlearray, /* IN - Handles of pending receives */ int *out_cnt, /* OUT - Number of completed communications */ int *index_array, /* OUT - Indices of completed communications */ MPI_Status *status_array) /* OUT - Status records for completed comms */

Document Classification Problem Search directories, subdirectories for documents (look for.html,.txt,.tex, etc.) Using a dictionary of key words, create a profile vector for each document Store profile vectors

Data Dependence Graph (1)

Partitioning and Communication Most time spent reading documents and generating profile vectors Create two primitive tasks for each document

Data Dependence Graph (2)

Agglomeration and Mapping Number of tasks not known at compile time Tasks do not communicate with each other Time needed to perform tasks varies widely Strategy: map tasks to processes at run time

Manager/worker-style Algorithm 1.Task/Functional Partitioning 2.Domain/Data Partitioning

Roles of Manager and Workers

Manager Pseudocode Identify documents Receive dictionary size from worker 0 Allocate matrix to store document vectors repeat Receive message from worker if message contains document vector Store document vector endif if documents remain then Send worker file name else Send worker termination message endif until all workers terminated Write document vectors to file

Worker Pseudocode Send first request for work to manager if worker 0 then Read dictionary from file endif Broadcast dictionary among workers Build hash table from dictionary if worker 0 then Send dictionary size to manager endif repeat Receive file name from manager if file name is NULL then terminate endif Read document, generate document vector Send document vector to manager forever

Task/Channel Graph

Enhancements Finding middle ground between pre- allocation and one-at-a-time allocation of file paths Pipelining of document processing

Allocation Alternatives Documents Allocated per Request n/p Load imbalance 1 Excessive communication overhead Time

Pipelining

Time Savings through Pipelining

Pipelined Manager Pseudocode a  0 {assigned jobs} j  0 {available jobs} w  0 {workers waiting for assignment} repeat if (j > 0) and (w > 0) then assign job to worker j  j – 1; w  w – 1; a  a + 1 elseif (j > 0) then handle an incoming message from workers increment w else get another job increment j endif until (a = n) and (w = p)

Summary Manager/worker paradigm –Dynamic number of tasks –Variable task lengths –No communications between tasks New tools for “kit” –Create manager/worker program –Create workers-only communicator –Non-blocking send/receive –Testing for completed communications Next Step: Cluster Profile Vectors

K-Means Clustering Assumes documents are real-valued vectors. Assumes distance function on vector pairs Clusters based on centroids (aka the center of gravity or mean) of points in a cluster, c: Reassignment of instances to clusters is based on distance of vector to the current cluster centroids. –(Or one can equivalently phrase it in terms of similarities)

K-Means Algorithm Let d be the distance measure between instances. Select k random instances {s 1, s 2,… s k } as seeds. Until clustering converges or other stopping criterion: For each instance x i : Assign x i to the cluster c j such that d(x i, s j ) is minimal. // Now Update the seeds to the centroid of each cluster) For each cluster c j s j =  (c j )

K Means Example (K=2) Pick seeds Reassign clusters Compute centroids x x Reassign clusters x x x x Compute centroids Reassign clusters Converged!

Termination conditions Desire that docs in a cluster are unchanged Several possibilities, e.g., –A fixed number of iterations. –Doc partition unchanged. –Centroid positions don’t change. –We’ll choose termination when only small fraction change (threshold value)

Quiz: –Describe Manager/Worker pseudo-code that implements the K-means algorithm in parallel –What data partitioning for parallelism? –How are cluster centers updated and distributed?

Hints –objects to be clustered are evenly partitioned among all processes –cluster centers are replicated –Global-sum reduction on cluster centers is performed at the end of each iteration to generate the new cluster centers. –Use MPI_Bcast and MPI_Allreduce