1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.

Slides:



Advertisements
Similar presentations
Graph Algorithms Carl Tropper Department of Computer Science McGill University.
Advertisements

Partitioning and Divide-and-Conquer Strategies Data partitioning (or Domain decomposition) Functional decomposition.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Divide and Conquer Yan Gu. What is Divide and Conquer? An effective approach to designing fast algorithms in sequential computation is the method known.
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.
2 Less fish … More fish! Parallelism means doing multiple things at the same time: you can get more work done in the same time.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Reference: Message Passing Fundamentals.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
1 Tuesday, September 26, 2006 Wisdom consists of knowing when to avoid perfection. -Horowitz.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
Virtues of Good (Parallel) Software
Mapping Techniques for Load Balancing
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Load Balancing and Termination Detection Load balance : - statically before the execution of any processes - dynamic during the execution of the processes.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg.
Network Aware Resource Allocation in Distributed Clouds.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Parallel Simulation of Continuous Systems: A Brief Introduction
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
Lecture 7: Design of Parallel Programs Part II Lecturer: Simon Winberg.
Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Data Structures and Algorithms in Parallel Computing Lecture 1.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Static Process Scheduling
Data Structures and Algorithms in Parallel Computing
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville.
Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Load Balancing : The Goal Given a collection of tasks comprising a computation and a set of computers on which these tasks may be executed, find the mapping.
Dynamic Load Balancing Tree and Structured Computations.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Lecture 5: Lecturer: Simon Winberg Review of paper: Temporal Partitioning Algorithm for a Coarse-grained Reconfigurable Computing Architecture by Chongyong.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
Auburn University
Auburn University
Parallel Tasks Decomposition
Parallel Programming By J. H. Wang May 2, 2017.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.
Parallel and Distributed Simulation Techniques
CS 584 Lecture 3 How is the assignment going?.
Advanced Design and Analysis Techniques
Parallel Algorithm Design
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.
Parallel Programming in C with MPI and OpenMP
Introduction to Parallel Computing by Grama, Gupta, Karypis, Kumar
شیوه های موازی سازی parallelization methods
CS 584.
Parallel Programming in C with MPI and OpenMP
CS 584 Lecture 5 Assignment. Due NOW!!.
Presentation transcript:

1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous

2 §Domain Decomposition 1. Divide data in approx. equal parts 2. Partition the computation §Functional Decomposition 1. Divide the computation into disjoint tasks 2. Determine data requirements of these tasks

3 §Domain Decomposition 1. Divide data in approx. equal parts 2. Partition the computation §Functional Decomposition 1. Divide the computation into disjoint tasks 2. Determine data requirements of these tasks l If data is disjoint then partition is complete l If there is significant overlap  consider domain decomposition.

4 Task dependency graph?

5 §Typically maximum degree of concurrency is less than the total number of tasks.

6 §Degree of concurrency depends on shape of task dependency graph.

7

8 Mapping §Maximize use of concurrency. §Task dependencies and interactions are important in selection of good mapping. §Minimize completion time by making sure that the processes on critical path execute as soon as they are ready.

9 Mapping §Maximize concurrency and minimize interaction among processors l Place tasks that are able to execute independently on different processors to increase concurrency. l Place tasks that communicate frequently on same processor to increase locality.

10 Mapping Cannot use more than 4 processors. Why?

11 Mapping Prevent inter-task interaction from becoming inter-process interaction

12 Agglomeration

13 Data partitioning: Block distribution Higher dimensional distributions may help reduce the amount of shared data that needs to be accessed

14 Data partitioning: Block distribution Higher dimensional distributions may help reduce the amount of shared data that needs to be accessed. n 2 /p +n 2 vs. 2n 2 /√p

15 Sum N numbers §N numbers are distributed among N tasks Centralized algorithm

16 Sum N numbers §N numbers are distributed among N tasks Distributing computation

17 Recursive Decomposition §Divide and conquer §Set of independent sub-problems

18 Recursive Decomposition §Sum N numbers. How many steps required?

19

20

21 Hybrid decomposition §Possible to combine different techniques §Finding minimum of a large set of numbers by purely recursive decomposition is not efficient.

22 Hybrid decomposition

23 Exploratory Decomposition

24 Unfinished tasks can be terminated once solution is found

25

26 §Read: Speculative Decomposition

27 Communications §Most parallel problems need to communicate data between different tasks.

28 Embarrassingly Parallel Applications §No communication between tasks. §One end of spectrum of parallelization §Examples?

29 Factors involving communication §Machine cycles and resources that could be used for computation are instead used to package and transmit data. §Sending many small messages can cause latency to dominate communication overheads. §Package small messages into a larger message results in increased bandwidth.

30 Factors involving communication §Synchronous communications. §Asynchronous communications. §Interleaving computation with communication.

31 npoints = circle_count = 0 do j = 1,npoints generate 2 random numbers between 0 and 1 xcoordinate = random1 ; ycoordinate = random2 ; if (xcoordinate, ycoordinate) inside circle then circle_count = circle_count + 1 end do PI = 4.0*circle_count/npoints

32