1 Lecture 6 More PRAM Algorithm Parallel Computing Fall 2008.

Slides:



Advertisements
Similar presentations
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Advertisements

Parallel Algorithms.
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
Instructor Neelima Gupta Table of Contents Parallel Algorithms.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
التصميم المنطقي Second Course
Lecture 3: Parallel Algorithm Design
1 Parallel Parentheses Matching Plus Some Applications.
Recursive Definitions and Structural Induction
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).
Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
Triangular Numbers An Investigation Triangular Numbers Triangular numbers are made by forming triangular patterns with counters. The first four triangular.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Uzi Vishkin.  Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique:
1 Lecture 8 Architecture Independent (MPI) Algorithm Design Parallel Computing Fall 2007.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
CSE621/JKim Lec4.1 9/20/99 CSE621 Parallel Algorithms Lecture 4 Matrix Operation September 20, 1999.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Multiple Sequence alignment Chitta Baral Arizona State University.
COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.
The Euler-tour technique
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Dynamic Programming 0-1 Knapsack These notes are taken from the notes by Dr. Steve Goddard at
Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
Data Structures Using C++ 2E Chapter 6 Recursion.
Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Data Structures Using C++ 2E Chapter 6 Recursion.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
1 Problem Solving using computers Data.. Representation & storage Representation of Numeric data The Binary System.
Chapter 11 Broadcasting with Selective Reduction -BSR- Serpil Tokdemir GSU, Department of Computer Science.
1 0-1 Knapsack problem Dr. Ying Lu RAIK 283 Data Structures & Algorithms.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Relation. Combining Relations Because relations from A to B are subsets of A x B, two relations from A to B can be combined in any way two sets can be.
Foundation of Computing Systems
Greatest Common Divisors & Least Common Multiples  Definition 4 Let a and b be integers, not both zero. The largest integer d such that d|a and d|b is.
Fall 2008Simple Parallel Algorithms1. Fall 2008Simple Parallel Algorithms2 Scalar Product of Two Vectors Let a = (a 1, a 2, …, a n ); b = (b 1, b 2, …,
1 Fundamentals of Computer Science Combinational Circuits.
2/19/ ITCS 6114 Dynamic programming 0-1 Knapsack problem.
ECE DIGITAL LOGIC LECTURE 15: COMBINATIONAL CIRCUITS Assistant Prof. Fareena Saqib Florida Institute of Technology Fall 2015, 10/20/2015.
ECE DIGITAL LOGIC LECTURE 4: BINARY CODES Assistant Prof. Fareena Saqib Florida Institute of Technology Fall 2016, 01/26/2016.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
1 Chapter 7 Quicksort. 2 About this lecture Introduce Quicksort Running time of Quicksort – Worst-Case – Average-Case.
PRAM and Parallel Computing
Parallel Algorithms (chap. 30, 1st edition)
PRAM Algorithms.
Lecture 5 PRAM Algorithms (cont.)
Digital Electronics and Microprocessors
Unit –VIII PRAM Algorithms.
8.3 The Addition Method Also referred to as Elimination Method.
Presentation transcript:

1 Lecture 6 More PRAM Algorithm Parallel Computing Fall 2008

2 Application of Parallel Prefix: Binary Number Addition Add two n-bit binary numbers in 2 lg n + 1 steps using an n-leaf c.b.t.(complete binary tree) Sequential algorithm requires n steps. Binary addition example: a b s g p p g p s s p p s g p p p s s c a +b (a + b) i = a i ⊕ b i ⊕ c i-1, where ⊕ = XOR. Signal for each bit: s: stops a carry bit (0 + 0) g: generates a carry bit (1 + 1) p: propagates a carry bit (0 + 1 or 1 + 0).

3 Application of Parallel Prefix: Binary Number Addition Problem: In order to compute the k-th carry bit the k th − 1 st carry needs to be computed as well. There exists a non-trivial non-obvious parallel solution. We shall try for each bit position to find the carry bit required for addition so that all bit positions can be added in parallel. We shall show that carry computation takes Θ(lg n) time on a binary tree with a computation that is known to us: parallel prefix. Observation. The i-th carry bit is one if the leftmost non-p to the right of the i-th bit is a g. Question. How can we find i-th carry bit?

4 Application of Parallel Prefix: Binary Number Addition The previous observation takes the following algorithmic form. Scan for j=i,..., 0 if p ignore else if g carry=1 exit; else carry=0 exit; Such a computation requires O(n) time for j = n (n-th bit). Let the i-th bit position symbol (p, s, g) be denoted by xi. Then c0 = x0 = s c1 = x0 ⊗ x1 c2 = x0 ⊗ x1 ⊗ x2 c16 = x0 ⊗... ⊗ x16, where ⊗ spg sssg pspg gsgg

5 Application of Parallel Prefix: Binary Number Addition Algorithm for parallel addition: Step 1. Compute symbol ({s, p, g}) for i bit in parallel for all i. Step 2. Perform a parallel prefix computation on the n symbols plus 0-th symbol s in parallel where operator is defined as in previous table. Step 3. Combine (exclusive OR) the carry bit from bit position i−1 (interpret g as an 1 and an s as a 0) with the exclusive OR of bits in position i to find the i-th bit of the sum. Steps 1 and 3 require constant time. Step 2, on a complete binary tree on n leaves would require 2 lg n steps. Totally,T = lgn. P = 2n −1 = O(n).

6 Parallel Prefix : Segmented Parallel Prefix (will not be covered) A segmented prefix (scan) computation consists of a sequence of disjoint prefix computations. Let the xij below take values from a set X and let ⊕ be an associative operator defined on the elements of set X. Then the segmented prefix computation for x 11 x 12...x 1k1 | x 21 x 22...x 2k2 |... | x m1 x m2...x mkm | requires the computation of all p ij = x i1 ⊕ x i2 ⊕... ⊕ x ij ∀ 1 ≤ i ≤ m, 1 ≤ j ≤ k i

7 Parallel Prefix : Segmented Parallel Prefix (will not be covered) In brief the segment separator | terminates one prefix operation and starts another one. One way to deal with a segmented prefix computation in parallel is to extend (X, ⊕ ) into (X, ⊗ ) so that X = X ∪ {|} ∪ {|x : x ∈ X} i.e. X has more than twice the elements of X: it has all the elements of X, the segment separator | and a new element |x which consists of the segment separator and x. The new operator ⊗ is associative if we define it as follows. Now, if the length of the segmented prefix formula is n we can assign n processors to solve the problem with parallel prefix in asymptotically the same time. Note that an element in X requires for its representation no more than 2 extra bits of the storage size of an element of X. If an ⊕ computation takes O(1) time so does an ⊗ computation. | ⊗ |=|, | ⊗ x =|x, | ⊗ |x =|x, x ⊗ |=|, |x ⊗ |=|, x ⊗ y = x ⊕ y |x ⊗ y =|(x ⊕ y) x ⊗ |y =|y |x ⊗ |y =|y

8 Parallel Prefix : Segmented Parallel Prefix – Example and Refinements (will not be covered) Example {2, 3} {1, 7, 2} {1, 3, 6}. Create 2 3 |1 7 2| And result is: 2 5|1 8 10| We can refine the previous algorithm as follows. ⊕ b a (a ⊕ b) ⊗ b/b a (a ⊕ b) b |a |(a ⊕ b) b

9 PRAM Algorithms: Maximum finding Problem. Let X1...,XN be n keys. Find X = max{X1,X2,...,XN}. The sequential problem accepts a P = 1, T = O(N),W = O(N) direct solution. An EREW PRAM algorithm solution for this problem works the same way as the PARALLEL SUM algorithm and its performance is P = O(N), T = O(lgN),W = O(N lgN),W2 = O(N) along with the improvements in P and W mentioned for the PARALLEL SUM algorithm. In the remainder we will investigate a CRCW PRAM algorithm. Let binary value Xi reside in the local memory of processor i. The CRCW PRAM algorithm MAX1 to be presented has performance T = O(1), P = O(N2), and work W2 = W = O(N2). The second algorithm to be presented in the following pages utilizes what is called a doubly-logarithmic depth tree and achieves T = O(lglgN), P = O(N) and W = W2 = O(N lglgN). The third algorithm is a combination of the EREW PRAM algorithm and the CRCW doubly-logarithmic depth tree-based algorithm and requires T = O(lglgN), P = O(N) and W2 = O(N).

10 Maximum Finding: Algorithm MAX1 begin Max1 (X 1...X N ) 1. in proc (i, j) if X i ≥ X j then x ij = 1; 2. else x ij = 0; 3. Y i = x i1 ∧... ∧ x in ; 4. Processor i reads Y i ; 5. if Y i = 1 processor i writes i into cell 0. end Max1 In the algorithm, we rename processors so that pair (i, j) could refer to processor j × n + i. Variable Y i is equal to 1 if and only if X i is the maximum. The CRCW PRAM algorithm MAX1 has performance T = O(1), P = O(N2), and work W2 = W = O(N2).

11 End Thank you!