INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors.

Slides:

Advertisements

Similar presentations

Operations Scheduling

Advertisements

Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.

2 Less fish … More fish! Parallelism means doing multiple things at the same time: you can get more work done in the same time.

Lecture 7: Task Partitioning and Mapping to Processes Shantanu Dutt ECE Dept., UIC.

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

Reference: Message Passing Fundamentals.

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text “Introduction to Parallel Computing”,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Parallel Programming Models and Paradigms

Principles of Parallel Algorithm Design

Virtues of Good (Parallel) Software

Parallel Processing (CS526) Spring 2012(Week 5).  There are no rules, only intuition, experience and imagination!  We consider design techniques, particularly.

Parallel Processing (CS526) Spring 2012(Week 4).  Parallelism from two perspectives: ◦ Platform  Parallel Hardware Architecture  Parallel Communication.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.

Network Aware Resource Allocation in Distributed Clouds.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Parallel Simulation of Continuous Systems: A Brief Introduction

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Adapted for 3030 To accompany the text ``Introduction to Parallel Computing'',

CPSC 871 John D. McGregor Module 3 Session 1 Architecture.

Principles of Parallel Algorithm Design Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text “Introduction to Parallel Computing”,

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.

CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.

CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Principles of Parallel Algorithm Design (reading Chp.

Paper_topic: Parallel Matrix Multiplication using Vertical Data.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.

Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.

Parallel Computing Presented by Justin Reschke

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University

Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University

CPE 779 Parallel Computing - Spring Creating and Using Threads Based on Slides by Katherine Yelick

Auburn University

Auburn University

Auburn University

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Exploratory Decomposition Dr. Xiao Qin Auburn.

Parallel Tasks Decomposition

Parallel Programming By J. H. Wang May 2, 2017.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.

Software Design and Architecture

CS 584 Lecture 3 How is the assignment going?.

Parallel Algorithm Design

Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.

Parallel Programming in C with MPI and OpenMP

Principles of Parallel Algorithm Design

Introduction to Parallel Computing by Grama, Gupta, Karypis, Kumar

CSE8380 Parallel and Distributed Processing Presentation

Principles of Parallel Algorithm Design

COMP60621 Fundamentals of Parallel and Distributed Systems

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design

Parallel Programming in C with MPI and OpenMP

COMP60611 Fundamentals of Parallel and Distributed Systems

Presentation transcript:

INTRODUCTION TO PARALLEL ALGORITHMS

Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors  Characteristics of Tasks and Interactions  Parallel Algorithm Design Models

What is a Parallel Algorithm?  Imagine you needed to find a lost child in the woods.  Even in a small area, searching by yourself would be very time consuming  Now if you gathered some friends and family to help you, you could cover the woods in much faster manner…

Sherwood Forest

Definition  In computer science, a parallel algorithm or concurrent algorithm, as opposed to a traditional sequential (or serial) algorithm, is an algorithm which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result.computer sciencesequential (or serial) algorithm

Elements of a Parallel Algorithm  Pieces of work that can be done concurrently- tasks  Mapping of the tasks onto multiple processors- processes vs processors  Distribution of input/output & intermediate data across the different processors  Management the access of shared data either input or intermediate  Synchronization of the processors at various points of the parallel execution

Decomposition, Tasks, and Dependency Graphs  The first step in developing a parallel algorithm is to decompose the problem into tasks that can be executed concurrently  A given problem may be docomposed into tasks in many different ways.  Tasks may be of same, different, or even indetermined sizes.  A decomposition can be illustrated in the form of a directed graph with nodes corresponding to tasks and edges indicating that the result of one task is required for processing the next. Such a graph is called a task dependency graph.

Granularity of Task Decompositions  The number of tasks into which a problem is decomposed determines its granularity.  Decomposition into a large number of tasks results in fine-grained decomposition and that into a small number of tasks results in a coarse grained decomposition. A coarse grained counterpart to the dense matrix-vector product example. Each task in this example corresponds to the computation of three elements of the result vector.

Example: Multiplying a Dense Matrix with a Vector Computation of each element of output vector y is independent of other elements. Based on this, a dense matrix-vector product can be decomposed into n tasks. The figure highlights the portion of the matrix and vector accessed by Task 1.

Example: Database Query Processing Consider the execution of the query: MODEL = ``CIVIC'' AND YEAR = 2001 AND (COLOR = ``GREEN'' OR COLOR = ``WHITE) on the following database: ID#ModelYearColorDealerPrice 4523Civic2002BlueMN$18, Corolla1999WhiteIL$15, Camry2001GreenNY$21, Prius2001GreenCA$18, Civic2001WhiteOR$17, Altima2001GreenFL$19, Maxima2001BlueNY$22, Accord2000GreenVT$18, Civic2001RedCA$17, Civic2002RedWA$18,000

Example: Database Query Processing The execution of the query can be divided into subtasks in various ways. Each task can be thought of as generating an intermediate table of entries that satisfy a particular clause. Decomposing the given query into a number of tasks. Edges in this graph denote that the output of one task is needed to accomplish the next.

Task-Dependency Graph Key Concepts Derived from the Task Dependency Graph  Degree of Concurrency  The number of tasks that can be concurrently executed  Critical Path  The longest vertex-weighted path in the graph  The weights represent task size  Task granularity affects both of the above characteristics

Critical Path Length  A directed path in the task dependency graph represents a sequence of tasks that must be processed one after the other.  The longest such path determines the shortest time in which the program can be executed in parallel.  The length of the longest path in a task dependency graph is called the critical path length.

Critical Path Length Consider the task dependency graphs of the two database query decompositions:

Task-Interaction Graph  Captures the pattern of interaction between tasks  This graph usually contains the task-dependency graph as a subgraph i.e., there may be interactions between tasks even if there are no dependencies  these interactions usually occur due to accesses on shared data

Attributes of parallel algorithms  concurrency  scalability  locality  and modularity

Contd…  Concurrency refers to the ability to perform many actions simultaneously; this is essential if a program is to execute on many processors.  Scalability indicates resilience to increasing processor counts and is equally important, as processor counts appear likely to grow in most environments.  Locality means a high ratio of local memory accesses to remote memory accesses (communication); this is the key to high performance on multicomputer architectures.  Modularity ---the decomposition of complex entities into simpler components---is an essential aspect of software engineering, in parallel computing as well as sequential computing.

Design Process of Parallel Algorithms  Partitioning. The computation that is to be performed and the data operated on by this computation are decomposed into small tasks. Practical issues such as the number of processors in the target computer are ignored, and attention is focused on recognizing opportunities for parallel execution.  Communication. The communication required to coordinate task execution is determined, and appropriate communication structures and algorithms are defined.  Agglomeration. The task and communication structures defined in the first two stages of a design are evaluated with respect to performance requirements and implementation costs. If necessary, tasks are combined into larger tasks to improve performance or to reduce development costs.  Mapping. Each task is assigned to a processor in a manner that attempts to satisfy the competing goals of maximizing processor utilization and minimizing communication costs. Mapping can be specified statically or determined at runtime by load-balancing algorithms.

Contd…

Communication  Static Each processor is hard-wired to every other processor Completely Connected Star Connected Bounded Degree  Dynamic Processors are connected to a series of switches

Agglomeration

Processes and Mapping  In general, the number of tasks in a decomposition exceeds the number of processing elements available.  For this reason, a parallel algorithm must also provide a mapping of tasks to processes. Note: We refer to the mapping as being from tasks to processes, as opposed to processors. This is because typical programming do not allow easy binding of tasks to physical processors. Rather, we aggregate tasks into processes and rely on the system to map these processes to physical processors. We use processes, not in the UNIX sense of a process, rather, simply as a collection of tasks and associated data.

Processes and Mapping (Cont..)  Appropriate mapping of tasks to processes is critical to the parallel performance of an algorithm.  Mappings are determined by both the task dependency and task interaction graphs.  Task dependency graphs can be used to ensure that work is equally spread across all processes at any point (minimum idling and optimal load balance).  Task interaction graphs can be used to make sure that processes need minimum interaction with other processes (minimum communication).

Processes and Mapping (Cont..) An appropriate mapping must minimize parallel execution time by:  Mapping independent tasks to different processes.  Assigning tasks on critical path to processes as soon as they become available.  Minimizing interaction between processes by mapping tasks with dense interactions to the same process.

Processes and Mapping: Example Mapping tasks in the database query decomposition to processes. These mappings were arrived at by viewing the dependency graph in terms of levels (no two nodes in a level have dependencies). Tasks within a single level are then assigned to different processes.

Parallel Algorithm Models  Master-Slave Model: One or more processes generate work and allocate it to worker processes. This allocation may be static or dynamic.  Pipeline / Producer-Comsumer Model: A stream of data is passed through a succession of processes, each of which perform some task on it.  Hybrid Models: A hybrid model may be composed either of multiple models applied hierarchically or multiple models applied sequentially to different phases of a parallel algorithm.

Refrences  Principles of Parallel Algorithm Design by Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar  l/algorithms.html l/algorithms.html  users.cs.umn.edu/~karypis/parbook/ users.cs.umn.edu/~karypis/parbook/

Summary  Parallel Algorithm: It is an algorithm which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result.algorithm  Decompose the problem into tasks that can be executed concurrently  The number of tasks into which a problem is decomposed determines its granularity.  Task-Dependency Graph Based on this graph,mapping is done between processes and processors  Task-Interaction Graph Captures the pattern of interaction between tasks  Critical Path Length A directed path in the task dependency graph represents a sequence of tasks that must be processed one after the other. The length of the longest path in a task dependency graph is called the critical path length.

Summary(Cont….)  Attributes of parallel algorithms concurrency scalability locality and modularity  Design Process of Parallel Algorithms Partitioning Communication Agglomeration Mapping

THANK YOU