CS4402 – Parallel Computing

Slides:

Advertisements

Similar presentations

1 Introduction to Collective Operations in MPI l Collective operations are called by all processes in a communicator. MPI_BCAST distributes data from one.

Advertisements

Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed

Case Studies Class 7 Experiencing Cluster Computing.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.

Introduction to MPI Programming (Part III)‏ Michael Griffiths, Deniz Savas & Alan Real January 2006.

Parallel Programming in C with MPI and OpenMP

Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 5.

Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,

Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.

MPI Program Structure Self Test with solution. Self Test 1.How would you modify "Hello World" so that only even-numbered processors print the greeting.

Parallel Sorting Algorithms Comparison Sorts if (A>B) { temp=A; A=B; B=temp; } Potential Speed-up –Optimal Comparison Sort: O(N lg N) –Optimal Parallel.

Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.

Parallel Sorting – Odd-Even Sort David Monismith CS 599 Notes based upon multiple sources provided in the footers of each slide.

Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.

Sorting Algorithms CS 524 – High-Performance Computing.

12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

1 Friday, November 17, 2006 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.” -H.

Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.

CSE5304—Project Proposal Parallel Matrix Multiplication Tian Mi.

1 Lecture 11 Sorting Parallel Computing Fall 2008.

Collective Communications Self Test with solution.

MPI Collective Communication CS 524 – High-Performance Computing.

Collective Communications

Topic Overview One-to-All Broadcast and All-to-One Reduction

Collective Communication

L15: Putting it together: N-body (Ch. 6) October 30, 2012.

Week 11 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.

COMP308 Efficient Parallel Algorithms

Chapter 6 Parallel Sorting Algorithm Sorting Parallel Sorting Bubble Sort Odd-Even (Transposition) Sort Parallel Odd-Even Transposition Sort Related Functions.

Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.

1 Collective Communications. 2 Overview  All processes in a group participate in communication, by calling the same function with matching arguments.

Introduction to Parallel Programming with C and MPI at MCSR Part 2 Broadcast/Reduce.

Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.

Computer Science 101 Fast Searching and Sorting. Improving Efficiency We got a better best case by tweaking the selection sort and the bubble sort We.

HKOI 2006 Intermediate Training Searching and Sorting 1/4/2006.

Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.

PP Lab MPI programming VI. Program 1 Break up a long vector into subvectors of equal length. Distribute subvectors to processes. Let them compute the.

Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.

Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.

TECH Computer Science Problem: Selection Design and Analysis: Adversary Arguments The selection problem >  Finding max and min Designing against an adversary.

CS4402 – Parallel Computing

Advanced Algorithms Analysis and Design Lecture 6 (Continuation of 5 th Lecture) By Engr Huma Ayub Vine 1.

CS 591 x I/O in MPI. MPI exists as many different implementations MPI implementations are based on MPI standards MPI standards are developed and maintained.

Parallel Programming with MPI By, Santosh K Jena..

Parallel Programming & Cluster Computing MPI Collective Communications Dan Ernst Andrew Fitz Gibbon Tom Murphy Henry Neeman Charlie Peck Stephen Providence.

Searching & Sorting Programming 2. Searching Searching is the process of determining if a target item is present in a list of items, and locating it A.

CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez

1 Searching and Sorting Searching algorithms with simple arrays Sorting algorithms with simple arrays –Selection Sort –Insertion Sort –Bubble Sort –Quick.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 8 October 23, 2002 Nayda G. Santiago.

2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()

+ Even Odd Sort & Even-Odd Merge Sort Wolfie Herwald Pengfei Wang Rachel Celestine.

Sorting & Searching Geletaw S (MSC, MCITP). Objectives At the end of this session the students should be able to: – Design and implement the following.

Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.

CSCI-455/552 Introduction to High Performance Computing Lecture 21.

Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN

Lecture 3: Today’s topics MPI Broadcast (Quinn Chapter 5) –Sieve of Eratosthenes MPI Send and Receive calls (Quinn Chapter 6) –Floyd’s algorithm Other.

Data Structures and Algorithms Instructor: Tesfaye Guta [M.Sc.] Haramaya University.

Sorting – Lecture 3 More about Merge Sort, Quick Sort.

1 CS4402 – Parallel Computing Lecture 7 - Simple Parallel Sorting. - Parallel Merge Sort.

COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming - Exercises Dr. Xiao Qin Auburn University

Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into.

Collective Communication Implementations

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

Sort Algorithm.

CS4402 – Parallel Computing

The Complexity of Algorithms and the Lower Bounds of Problems

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

Parallel Computing Spring 2010

slides adapted from Marty Stepp

Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes

Presentation transcript:

CS4402 – Parallel Computing Lecture 9 – Sorting Algorithms (2) Compare and Exchange Operation Compare and Exchange Sorting

Compare and Exchange Operation Take place between processors rank1, rank2. Each processor keeps the sub-array a=(a[i],i=0,1,…,n). if(rank is rank1){ MPI_Send(&a,n,MPI_INT,rank2, tag1,MPI_COMM_WORLD); MPI_Recv(&b,n,MPI_INT,rank2, tag2,MPI_COMM_WORLD,&status); c = merge(n,a,n,b); for(i=0;i<n;i++)a[i]=c[i]; } if(rank is rank2){ MPI_Send(&a,n,MPI_INT,rank2, tag2,MPI_COMM_WORLD); MPI_Recv(&b,n,MPI_INT,rank2, tag1,MPI_COMM_WORLD,&status); for(i=0;i<n;i++)a[i]=c[i+n]; }

Compare and Exchange Operation Complexity? What amount of computation is being used? What amount of communication takes place? CAN YOU FIND ARGUMENTS TO PROVE THAT THIS IS OPTIMAL OR EFFICIENT?

Compare and Exchange Algorithms Step 1. The array is scattered onto p sub-arrays. Step 2. Processor rank sorts a sub-array. At any time the processors keep the sub-arrays sorted. Step 3. While is not sorted / is needed compare and exchange between some processors Step 4. Gather of arrays to restore a sorted array.

Bubble Sort

Bubble Sort

Bubble Sort

Odd-Even Sort 1. Scatter the array onto processors. 2. Sort each sub-array aa. 3. Repeat for step=0,1,2,…, p-1 if (step is odd){ if(rank is odd)exchange(aa,n/size,rank, rank+1); if(rank is even) exchange(aa,n/size,rank-1, rank); } if (step is even){ if(rank is even)exchange(aa,n/size,rank, rank+1); if(rank is odd) exchange(aa,n/size,rank-1, rank); 4. Gather the sub-arrays back to root.

Odd-Even Sort Simple Remarks: Odd-Even Sort uses size rounds of exchange. Odd-Even Sort keeps all processors busy … or almost all. The complexity is given by Scatter and Gather the array  n/size elements Sorting the array  n/size elements Compare and Exchange process  size rounds involving n/size elements

if( rank == 0 ) { array = (double *) calloc( n, sizeof(double) ); srand( ((unsigned)time(NULL)+rank) ); for( x = 0; x < n; x++ ) array[x]=((double)rand()/RAND_MAX)*m; } MPI_Scatter( array, n/size, MPI_DOUBLE, a, n/size, MPI_DOUBLE, 0, MPI_COMM_WORLD ); merge_sort(n/size,a); for(i=0;i<size;i++){ if( (i+rank)%2 ==0 ){ if( rank < size-1 ) exchange(n/size,a,rank,rank+1,MPI_COMM_WORLD); } else if( rank > 0 ) exchange(n/size,a,rank-1,rank,MPI_COMM_WORLD); MPI_Barrier(MPI_COMM_WORLD) MPI_Gather( a, n/size, MPI_DOUBLE, array, n/size, MPI_DOUBLE, 0, MPI_COMM_WORLD ); for( x = 0; x < n; x++ ) printf( "Output : %f\n", array[x] );

Comments on Odd-Even Features of the algorithm: - Simple and quite efficient. - In p steps of compare and exchange the array is sorted out - Why??? - The number of steps can be reduced if test “array sorted” but still in O(p). - C&E operations only between neighbors. Can we do C&E operations between other processors?

Odd-Even Sort Complexity Stage 0. To sort out the scattered array  Stage 1. Odd-Even for p levels  Scatter and Gather  Total computation complexity 

isSorted(n, a, comm) The parallel routine int isSorted(int n, double *a, MPI_Comm comm) Test if the processors have all the local arrays in order. rank1 < rank2  elements of rank1 < rank2. If the answer if yes then no exchange is needed. How to do it? The test is done at the root. The test is done collectively by all processors.

isSorted(n,a,comm) – Strategy 1 The test is done collectively by all processors Send last to the right processor Receive last from the left processor Test if last > a[0] then answer = 0 All_Reduce answer by using MIN

isSorted(n,a,comm) – Strategy 2 The test is done at the root. Gather the first elements to the root. Gather the last elements to the root. If rank == 0 then For size-1 times do - test if last[i] > first[i+1] Broadcast the answer

Shell Sort It is based on the notion of “shell/group” of consecutive processors. - C&E take place between equally extreme procs. - The shell is then divided into 2. (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)  #(shell)=p (0 1 2 3 4 5 6 7) (8 9 10 11 12 13 14 15)  #(shell)=p/2 (0 1 2 3) (4 5 6 7) (8 9 10 11) (12 13 14 15)  #(shell)=p/4 (0 1) (2 3) (4 5) (6 7) (8 9) (10 11) (12 13) (14 15)  #(shell)=p/8 - There are log(p) levels of division. For the level l we have - there are pow(2,l) shells each of size p/pow(2,l). - The shell k contains the processors

Shell Sort Shell Sort is based on two stages: Stage 1. Divide the shells for l=0,1,2, log(p) - exchange in parallel between extreme processors in each shell. Stage 2. Odd-Even for l=0,1,2, …,p - if rank and l are both even then exchange in parallel betw rank and rank+1 - if rank and l are both odd then exchange in parallel betw rank and rank+1 - test “array sorted”

Shell Sort Complexity Stage 0. To sort out the scattered array  Stage 1. Odd-Even for l levels  Catch  the average complexity of l is in this case O(log^2(p)) so that in average the shell can be Scatter and Gather  Total computation complexity 

Complexity Comparison for Parallel Sorting Odd-Even Sort  Shell Sort Merge Sort 

Assignment Description: Write a MPI program to sort out an array: Use a MPI method to compare and exchange Use a MPI method to test isSorted() Use the odd-even sort. Evaluate the performances of the program in a readme.doc General Points: It is for 10% of the marks. Deadline on Monday 2/12/2013 at 5 pm. The following elements must be submitted by email to j.horan@4c.ucc.ie: The c program name with your name and student number e.g. SabinTabirca_111111111.c. The Makefile file Readme.doc in which you have 1) to give your student details and 2) to state the performances.