Uzi Vishkin.  Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique:

Slides:



Advertisements
Similar presentations
Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advertisements

1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Parallel Algorithms.
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
PERMUTATION CIRCUITS Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad.
VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Lecture 3: Parallel Algorithm Design
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
PRAM (Parallel Random Access Machine)
Efficient Parallel Algorithms COMP308
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
1 Lecture 6 More PRAM Algorithm Parallel Computing Fall 2008.
Parallel Computers 1 The PRAM Model for Parallel Computation (Chapter 2) References:[2, Akl, Ch 2], [3, Quinn, Ch 2], from references listed for Chapter.
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
1 Design and Analysis of Algorithms تصميم وتحليل الخوارزميات (311 عال) Chapter 1 Introduction to Algorithms.
RAM and Parallel RAM (PRAM). Why models? What is a machine model? – A abstraction describes the operation of a machine. – Allowing to associate a value.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
A.Broumandnia, 1 5 PRAM and Basic Algorithms Topics in This Chapter 5.1 PRAM Submodels and Assumptions 5.2 Data Broadcasting 5.3.
Chapter 11 Broadcasting with Selective Reduction -BSR- Serpil Tokdemir GSU, Department of Computer Science.
Analysis of algorithms Analysis of algorithms is the branch of computer science that studies the performance of algorithms, especially their run time.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Games Development 2 Concurrent Programming CO3301 Week 9.
RAM, PRAM, and LogP models
1 PRAM Algorithms Sums Prefix Sums by Doubling List Ranking.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
Chapter 10 Algorithm Analysis.  Introduction  Generalizing Running Time  Doing a Timing Analysis  Big-Oh Notation  Analyzing Some Simple Programs.
Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/
Data Structures and Algorithms in Parallel Computing Lecture 1.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
Introduction to Algorithms (2 nd edition) by Cormen, Leiserson, Rivest & Stein Chapter 2: Getting Started.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Chapter 9: Sorting1 Sorting & Searching Ch. # 9. Chapter 9: Sorting2 Chapter Outline  What is sorting and complexity of sorting  Different types of.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
1 The Role of Algorithms in Computing. 2 Computational problems A computational problem specifies an input-output relationship  What does the.
1 A simple parallel algorithm Adding n numbers in parallel.
PRAM and Parallel Computing
Introduction to Algorithms
Lecture 3: Parallel Algorithm Design
Parallel Algorithms (chap. 30, 1st edition)
Lecture 2: Parallel computational models
PRAM Algorithms.
Objective of This Course
Algorithm Discovery and Design
Unit –VIII PRAM Algorithms.
Introduction to Algorithms
Divide and Conquer Merge sort and quick sort Binary search
Presentation transcript:

Uzi Vishkin

 Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique: Balanced Binary Trees; Problem: Prefix-Sums  Application – The compaction problem  Putting Things Together  Technique: Informal Work-Depth (IWD)  Technique: Accelerating Cascades  Problem: Selection

 Single task completion time* & task through put  Using compiler methods to achieve parallelism is insufficient  A solution: conceptualize the parallelism in the algorithm first  Goal: seek & develop proper levels of abstractions for designing parallel algorithms

 Work-Depth(WD) : present algorithmic methods and paradigms including their complexity analysis and in their rigorous manner  PRAM ~ WD  Informal Work Depth (IWD): outline ideas and high level descriptions of algorithms

 Unique knowledge-base  Simplicity  Reusability  To get started  Performance prediction  Formal emulation

 IWD better than WD in training the mind to “think in parallel”.  Direct hardware implementation of some routines  The ongoing evolution of programming languages.

 n synchronous processors all having unit time access to a shared memory.  Each processor has also a local memory.  At each time unit, a processor can:  write into the shared memory (i.e., copy one of its local memory registers into a shared memory cell),  read into shared memory (i.e., copy a shared memory cell into one of its local memory registers ), or  do some computation with respect to its local memory.

for Pi, 1 ≤i ≤n pardo A(i) := B(i)  This means  The following n operations are performed concurrently: processorP1 assigns B(1) into A(1), processor P2 assigns B(2) into A(2), ….

 exclusive-read exclusive-write (EREW) PRAM: no simultaneous access by more than one processor to the same memory location for read or write purposes  concurrent-read exclusive-write (CREW) PRAM: concurrent access for reads but not for writes  concurrent-read concurrent-write (CRCW allows concurrent access for both reads and writes. We shall assume that in a concurrent-write model, an arbitrary processor among the processors attempting to write into a common memory location, succeeds. This is called the Arbitrary CRCW rule. There are two alternative CRCW rules:  Priority CRCW: the smallest numbered, among the processors attempting to write into a common memory location, actually succeeds.  Common CRCW: allows concurrent writes only when all the processors attempting to write into a common memory location are trying to write the same value.

 Input: An array A = A(1)...A(n) of n numbers  The problem is to compute A(1) A(n)  Algorithm:  works in rounds  In each round, add, in parallel, pairs of elements as follows: ▪ add each odd-numbered element and its successive even-numbered element.

 # of processors p = n  a time unit consists of a sequence of exactly p instructions to be performed concurrently.

 # of processors p = n  each time unit consists of a sequence of instructions to be performed concurrently  the sequence of instructions may include any number

 # of processors p <> n

 Input: An array A of n elements drawn from some domain  Assume that a binary operation, denoted ∗, is defined on the set.  ∗ is associative; namely, for any elements a, b, c in the set, a ∗ (b ∗ c) = (a ∗ b) ∗ c.

 Situation:  Algorithm A: W1(n) and T1(n).  Algorithm B: W2(n) and T2(n) time.  Suppose:  Algorithm A is more efficient (W1(n) < W2(n)), while algorithm B is faster (T1(n) < T2(n) ).  Algorithm A is a “reducing algorithm”: ▪ Given a problem of size n, Algorithm A operates in phases. Output of each successive phase is a smaller instance of the problem  The accelerating cascades technique composes a new algorithm as follows:  Start by applying Algorithm A. Once the output size of a phase of this algorithm is below some threshold, finish by switching to Algorithm B

 Algorithm A :  run in  Total work O(n)  Algorithm B: Sorting algorithm  Run in O(logN)  Total work O(nlogN)  Derived algorithm : run in O(lognLogLogn)

 After each iteration of A, the size of the problem is reduced to <= ¾ of the size of the original problem