ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 8 October 23, 2002 Nayda G. Santiago.

Slides:



Advertisements
Similar presentations
Basic Communication Operations
Advertisements

Lecture 9: Group Communication Operations
Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.
Chapter 4 Downcasts and Upcasts. 4.1 Downcasts At first we assume the case where the root has m distinct items A = {  1, …,  m }, each destined to one.
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita.
1 Introduction to Collective Operations in MPI l Collective operations are called by all processes in a communicator. MPI_BCAST distributes data from one.
MPI Collective Communications
Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)
Getting Started with MPI Self Test with solution.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Reference: Getting Started with MPI.
Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication.
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CS 584. Algorithm Analysis Assumptions n Consider ring, mesh, and hypercube. n Each process can either send or receive a single message at a time. n No.
Topic Overview One-to-All Broadcast and All-to-One Reduction
CS 240A: Complexity Measures for Parallel Computation.
Collective Communication
Broadcast & Convergecast Downcast & Upcast
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.
Minimum Spanning Tree Given a weighted graph G = (V, E), generate a spanning tree T = (V, E’) such that the sum of the weights of all the edges is minimum.
Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Topic Overview One-to-All Broadcast and All-to-One Reduction All-to-All Broadcast and Reduction All-Reduce and Prefix-Sum Operations Scatter and Gather.
Distributed Anomaly Detection in Wireless Sensor Networks Ksutharshan Rajasegarar, Christopher Leckie, Marimutha Palaniswami, James C. Bezdek IEEE ICCS2006(Institutions.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
MPI Communications Point to Point Collective Communication Data Packaging.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 11.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Reduced slides for CSCE 3030 To accompany the text ``Introduction.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture FIT5174 Distributed & Parallel Systems Lecture 5 Message Passing and MPI.
Data Structures and Algorithms in Parallel Computing
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
On Optimizing Collective Communication UT/Texas Advanced Computing Center UT/Computer Science Avi Purkayastha Ernie Chan, Marcel Heinrich Robert van de.
HYPERCUBE ALGORITHMS-1
Basic Communication Operations Carl Tropper Department of Computer Science.
Ben Miller.   A distributed algorithm is a type of parallel algorithm  They are designed to run on multiple interconnected processors  Separate parts.
Complexity Measures for Parallel Computation. Problem parameters: nindex of problem size pnumber of processors Algorithm parameters: t p running time.
PARALLEL COMPUTATION FOR MATRIX MULTIPLICATION Presented By:Dima Ayash Kelwin Payares Tala Najem.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 12.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture ?? – September October.
CSCI-455/552 Introduction to High Performance Computing Lecture 15.
CIS 825 Lecture 9. Minimum Spanning tree construction Each node is a subtree/fragment by itself. Select the minimum outgoing edge of the fragment Send.
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
Computer Science 320 Introduction to Cluster Computing.
MPI: Message Passing Interface An Introduction S. Lakshmivarahan School of Computer Science.
Send and Receive.
Collective Communication with MPI
Parallel Programming in C with MPI and OpenMP
An Introduction to Parallel Programming with MPI
Collective Communication Operations
Send and Receive.
Complexity Measures for Parallel Computation
Complexity Measures for Parallel Computation
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 8 October 23, 2002 Nayda G. Santiago

Overview Message Passing and Shared Memory Reference Designing and Building Parallel Programs, by Ian Foster (textbook), Chapters 1, 2, and 8. Maui HPC Center site

Class Example Take a piece of paper. Algorithm Get your initial value. Initialize with the number of neighbors you have. Compute the average of your neighbor’s values and subtract from your value. Make this your new value. Repeat until done.

Class Example (cont.) Questions (Think message passing or shared memory) How do you get the values from your neighbors? Which step or iteration do they correspond to? Do you know? Do you care? How do you decide when you are done?

Communication for message passing Point to point Collective Patterns Broadcast Scatter Gather All Gather All to All Reduction operations

Point to Point Communication Most basic type of communication Send is accompanied by a matching receive Types Blocking – No processing until message transmitted Nonblocking – Continues processing, even if message is not transmitted data Memory data Memory Processor AProcessor B Network Send Data Receive Data

Broadcast One node has information needed by all processors. The processor sends this information to many other nodes. T1T2T3 P0 P1 P2 P3 P0 P1 P2 P3 Broadcast AAAAA

All of the data are initially collected on a single processor. After the scatter operation, pieces of the data are distributed on different processors. Scatter P0 P1 P2 P3 ABCD P0 P1 P2 P3 A B C D Scatter

The gather operation is the inverse operation to scatter: it collects pieces of the data that are distributed across a group of processors and reassembles them in the proper order on a single processor. Gather P0 P1 P2 P3 ABCD P0 P1 P2 P3 A B C D Scatter

Think as a gather but all processors will receive the information, not only root. All Gather P0 P1 P2 P3 ABCD P0 P1 P2 P3 A B C D All Gather ABCDABCDABCD

All to All Broadcast Each processor sends its unique information to all the other processors.

Reduction Operations Collective operation in which a single process (the root process) collects data from the other processes in a group and combines them into a single data item. Example Compute the sum of the elements of an array that is distributed over several processors. Operations other than arithmetic ones are also possible, for example, maximum and minimum, as well as various logical and bitwise operations.

Reduction Operations Examples