Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.

Slides:



Advertisements
Similar presentations
Practical techniques & Examples
Advertisements

Partitioning and Divide-and-Conquer Strategies Data partitioning (or Domain decomposition) Functional decomposition.
Lecture 3: Parallel Algorithm Design
Parallelisation of Nonlinear Structural Analysis using Dual Partition Super-Elements G.A. Jokhio and B.A. Izzuddin.
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
1 Parallel Parentheses Matching Plus Some Applications.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
4.1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M.
Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)
Closest Pair Given a set S = {p1, p2,..., pn} of n points in the plane find the two points of S whose distance is the smallest. Images in this presentation.
Partitioning and Divide-and-Conquer Strategies Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.
CS267, Yelick1 Cosmology Applications N-Body Simulations Credits: Lecture Slides of Dr. James Demmel, Dr. Kathy Yelick, University of California, Berkeley.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Parallel Sorting Algorithms Comparison Sorts if (A>B) { temp=A; A=B; B=temp; } Potential Speed-up –Optimal Comparison Sort: O(N lg N) –Optimal Parallel.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.
CS 584. Review n Systems of equations and finite element methods are related.
Daniel Blackburn Load Balancing in Distributed N-Body Simulations.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
CSE 160/Berman Mapping and Scheduling W+A: Chapter 4.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
and Divide-and-Conquer Strategies
Topic Overview One-to-All Broadcast and All-to-One Reduction
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Unit 1. Sorting and Divide and Conquer. Lecture 1 Introduction to Algorithm and Sorting.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.
Molecular Dynamics Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 11.5.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.
Parallel Programming & Cluster Computing N-Body Simulation and Collective Communications Henry Neeman, University of Oklahoma Paul Gray, University of.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
Barnes Hut N-body Simulation Martin Burtscher Fall 2009.
High-Performance Computing 12.2: Parallel Algorithms.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
Barnes Hut – A Broad Review Abhinav S Bhatele The 27th day of April, 2006.
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5.
1 CS4402 – Parallel Computing Lecture 7 - Simple Parallel Sorting. - Parallel Merge Sort.
Barnes Hut N-body Simulation Martin Burtscher Fall 2009.
Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into.
Partitioning & Divide and Conquer Strategies Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,
Divide-and-Conquer Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, June 25, 2012 slides4.ppt. 4.1.
Lecture 3: Parallel Algorithm Design
ChaNGa: Design Issues in High Performance Cosmology
Problem Solving Strategies
Partitioning and Divide-and-Conquer Strategies
and Divide-and-Conquer Strategies
Course Outline Introduction in algorithms and applications
Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Cosmology Applications N-Body Simulations
Adding numbers n data items, p processors ts = O(n) tp = O(n/p) if data on each proc => S=ts/tp=O(p) tp = O(n + n/p) if data needs broadcasting.
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Parallel Programming in C with MPI and OpenMP
N-Body Gravitational Simulations
Presentation transcript:

Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide and Conquer –Dividing problem recursively into sub-problems of the same type –Assign sub-problems to individual processors (e.g. Save and hold) Domain (Data) Decomposition –Assign parts of the data to separate processors Functional Decomposition –Assign application functions to separate processors

Partitioning and Divide and Conquer Example Applications Summing Numbers Sorting Algorithms Numerical Integration The N-body problem Bucket Sort Adaptive Quadrature Partitioning Divide and Conquer Divide and Hold Half Partition Divide, compute and merge

Bucket Sort Partitioning Communication reduction is possible if each processor sends a small bucket to each other processor Bucket Sort works well if numbers are uniformly distributed across a known interval (e.g. 0->1) Unsorted Numbers Sorted Sequential Bucket Sort Unsorted Numbers Sorted Parallel Bucket Sort P0

All-To-All Broadcast Bucket Sort is a possible application for the all- to-all broadcast All-to-all is also useful for transposing matrices Processor i Buffer P-1 PiP1P2P0

Divide and Conquer Sum of N Numbers Two Conditions Required for Recursive Solutions –How does the recursion terminate? –How does a problem of size n relate to a problem of size < n? Pseudo code If (less than two numbers) return sum Divide the problem into two parts Recursive call to sum the first part Recursive call to sum the second part Merge the two partial sums and return the total Parallel implementation with eight processors P0 keep half and send half to P4 P4,P0 keep half and send half to P2,P6 respectively P0,P2,P4,P6 keep half and send half to P1,P3,P5,P7 respectively Perform the computation in parallel The merge phase Non leafs receive and reduce results Non root sends results to the parent processor

Numerical Integration p δ q ab ab Rectangles Trapezoids Difficulties How do we choose the value for δ? Parts of the integral requires a smaller δ than others

Adaptive Quadrature Pseudo code p=a, δ = b-a WHILE p b)?q=b:q=a+δ x = (p+q)/2 Compute A, B, and C IF C>tolerance δ /= 2 WHILE C > tolerance p += δ; δ*=2 Notes and Questions –When do we terminate? –Termination rates differ –Can we balance processor load? ab AB C ab AB C = 0 xpq p xq

Parallel Numerical Integration Sequential Algorithm Choose a δ For each region, x i, in the integral Calculate sum += f(x i ) * δ Parallel algorithm –Static Assignment (Question: How to choose δ?) Send region to each processor Processors perform parallel computation Reduce add operation computes final result –Dynamic Assignment Adaptive Quadrature varies the convergence rates Use work pool approach for assigning regions

Gravitational N-Body Problem Predict positions and movements of bodies in space For astrophysics and molecular dynamics Based on the Newtonian laws of physics Formulae F = G m x m y / r xy 2 F = m a Notation G = Gravitational constant m x,m y = mass of bodies x, y r xy = distance between x, y a = accelleration F = force between bodies 3 Dimension Force –F = G m x m y r x / r xy 3 –r x = distance in the x direction

The N-body problem astronomical systems, electrical charges, etc. Sequential Solution Pseudo Code For each time step, t. Compute pair-wise forces (F x =Gm a m b (x a -x b )/r 3 ) Compute acceleration on each body (F=ma) Compute velocity for each body (v t+1 =v t + a  t) Compute new position of each body (x t+1 =x t + v t+1  t) Parallel Solution Notes –Partition the bodies among the processors –Communication costs are relatively high –This n 2 algorithm doesn’t scale well to lots of bodies

Barnes and Hut Solution Pseudo code FOR each time step, t Perform recursive division All-to-all the essential tree Perform Parallel calculations Output visualization data Questions –How is the best way to partition the n-bodies? –Should the partitioning be dynamic or static? Center of mass r 2-Dimensional Recursive Division N lg N Complexity instead of N 2 Treat distant clusters as a single body