Presentation is loading. Please wait.

Presentation is loading. Please wait.

May 19 Lecture Outline Introduce MPI functionality

Similar presentations


Presentation on theme: "May 19 Lecture Outline Introduce MPI functionality"— Presentation transcript:

1 May 19 Lecture Outline Introduce MPI functionality
Introduce Doc Classification problem Parallel algorithm design Creating communicators Non-blocking communications Implementation Pipelining Clustering

2 Implementation of a Very Simple Document Classifier
Manager/Worker Design Strategy Manager description (create initial tasks and communicate to/from workers) Worker description (receive tasks, enter an alternating communication from/to master and computation

3 Structure of Main program: Manager/Worker Paradigm
MPI_Init (&argc, &argv); // what is my rank? MPI_Comm_rank(MPI_COMM_WORLD, &myrank); // how many processors are there? MPI_Comm_size(MPI_COMM_WORLD, &p); if (myid == 0) Manager(p); else Worker(myid, p); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return(0);

4 Creating a Workers-only Communicator
To support workers-only broadcast, need workers-only communicator Can use MPI_Comm_split Excluded processes (e.g., Manager) passes MPI_UNDEFINED as the value of split_key, meaning it will not be part of any new communicator

5 Workers-only Communicator
int id; MPI_Comm worker_comm; ... if (!id) /* Manager */ MPI_Comm_split (MPI_COMM_WORLD, MPI_UNDEFINED, id, &worker_comm); else /* Worker */ MPI_Comm_split (MPI_COMM_WORLD, 0, id, &worker_comm);

6 Nonblocking Send / Receive
MPI_Isend, MPI_Irecv initiate operation MPI_Wait blocks until operation complete Calls can be made early MPI_Isend as soon as value(s) assigned MPI_Irecv as soon as buffer available Can eliminate a message copying step Allows communication / computation overlap

7 Receiving Problem Worker does not know length of message it will receive Example, the length of File Path Name Alternatives Allocate huge buffer Check length of incoming message, then allocate buffer We’ll take the second alternative

8 Function MPI_Probe int MPI_Probe ( int src, int tag, MPI_Comm comm,
MPI_Status *status ) Blocks until message is available to be received from process with rank src with message tag tag; status pointer gives info on message size.

9 Function MPI_Get_count
int MPI_Get_count ( MPI_Status *status, MPI_Datatype dtype, int *cnt ) cnt returns the number of elements in message

10 MPI_Testsome Often need to check whether one or more messages have arrived Manager posts a nonblocking receive to each worker process Builds an array of handles or request objects Testsome allows manager to determine how many messages have arrived

11 Document Classification Problem
Search directories, subdirectories for documents (look for .html, .txt, .tex, etc.) Using a dictionary of key words, create a profile vector for each document Store profile vectors

12 Data Dependence Graph (1)

13 Partitioning and Communication
Most time spent reading documents and generating profile vectors Create two primitive tasks for each document

14 Data Dependence Graph (2)

15 Agglomeration and Mapping
Number of tasks not known at compile time Tasks do not communicate with each other Time needed to perform tasks varies widely Strategy: map tasks to processes at run time

16 Manager/worker-style Algorithm
Task/Functional Partitioning Domain/Data Partitioning

17 Roles of Manager and Workers

18 Manager Pseudocode Identify documents
Receive dictionary size from worker 0 Allocate matrix to store document vectors repeat Receive message from worker if message contains document vector Store document vector endif if documents remain then Send worker file name else Send worker termination message until all workers terminated Write document vectors to file

19 Worker Pseudocode Send first request for work to manager
if worker 0 then Read dictionary from file endif Broadcast dictionary among workers Build hash table from dictionary Send dictionary size to manager repeat Receive file name from manager if file name is NULL then terminate endif Read document, generate document vector Send document vector to manager forever

20 Task/Channel Graph

21 Enhancements Finding middle ground between pre-allocation and one-at-a-time allocation of file paths Pipelining of document processing

22 Allocation Alternatives
n/p Load imbalance Time Excessive communication overhead 1 Documents Allocated per Request

23 Pipelining

24 Time Savings through Pipelining

25 Summary Manager/worker paradigm New tools for “kit”
Dynamic number of tasks Variable task lengths No communications between tasks New tools for “kit” Create manager/worker program Create workers-only communicator Non-blocking send/receive Testing for completed communications Next Step: Cluster Profile Vectors

26 K-Means Clustering Assumes documents are real-valued vectors.
Assumes distance function on vector pairs Clusters based on centroids (aka the center of gravity or mean) of points in a cluster, c: Reassignment of instances to clusters is based on distance of vector to the current cluster centroids. (Or one can equivalently phrase it in terms of similarities)

27 K-Means Algorithm Let d be the distance measure between instances.
Select k random instances {s1, s2,… sk} as seeds. Until clustering converges or other stopping criterion: For each instance xi: Assign xi to the cluster cj such that d(xi, sj) is minimal. // Now Update the seeds to the centroid of each cluster) For each cluster cj sj = (cj)

28 K Means Example (K=2) Pick seeds Reassign clusters Compute centroids
Converged!

29 Termination conditions
Desire that docs in a cluster are unchanged Several possibilities, e.g., A fixed number of iterations. Doc partition unchanged. Centroid positions don’t change. We’ll choose termination when only small fraction change (threshold value)

30

31 Quiz: Describe Manager/Worker pseudo-code that implements the K-means algorithm in parallel What data partitioning for parallelism? How are cluster centers updated and distributed?

32 Hints objects to be clustered are evenly partitioned among all processes cluster centers are replicated Global-sum reduction on cluster centers is performed at the end of each iteration to generate the new cluster centers. Use MPI_Bcast and MPI_Allreduce


Download ppt "May 19 Lecture Outline Introduce MPI functionality"

Similar presentations


Ads by Google