Vector Machines Model for Parallel Computation Bryan Petzinger Adv. Theory of Computing.

Slides:



Advertisements
Similar presentations
Algorithm Analysis Input size Time I1 T1 I2 T2 …
Advertisements

CS 1031 Recursion (With applications to Searching and Sorting) Definition of a Recursion Simple Examples of Recursion Conditions for Recursion to Work.
Garfield AP Computer Science
CSE Lecture 3 – Algorithms I
Finite Automata with Output
Markov Algorithms An Alternative Model of Computation.
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Complexity 11-1 Complexity Andrei Bulatov Space Complexity.
1 Introduction to Computability Theory Lecture11: Variants of Turing Machines Prof. Amos Israeli.
Data Parallel Algorithms Presented By: M.Mohsin Butt
Fall 2004COMP 3351 Turing Machines. Fall 2004COMP 3352 The Language Hierarchy Regular Languages Context-Free Languages ? ?
CSE115/ENGR160 Discrete Mathematics 03/03/11 Ming-Hsuan Yang UC Merced 1.
Courtesy Costas Busch - RPI1 Turing Machines. Courtesy Costas Busch - RPI2 The Language Hierarchy Regular Languages Context-Free Languages ? ?
1 Computing Functions with Turing Machines. 2 A function Domain: Result Region: has:
Costas Busch - RPI1 Turing Machines. Costas Busch - RPI2 The Language Hierarchy Regular Languages Context-Free Languages ? ?
1 Lecture 24: Parallel Algorithms I Topics: sort and matrix algorithms.
1 Turing Machines. 2 The Language Hierarchy Regular Languages Context-Free Languages ? ?
1 Turing Machines. 2 A Turing Machine Tape Read-Write head Control Unit.
Section 11.4 Language Classes Based On Randomization
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved ADT Implementation:
Randomized Turing Machines
Lecture 23: Finite State Machines with no Outputs Acceptors & Recognizers.
Analysis of Algorithms
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 15-1 Mälardalen University 2012.
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
Heapsort. Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines.
Analysis of Algorithms CSCI Previous Evaluations of Programs Correctness – does the algorithm do what it is supposed to do? Generality – does it.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
Set A formal collection of objects, which can be anything May be finite or infinite Can be defined by: – Listing the elements – Drawing a picture – Writing.
CSC 211 Data Structures Lecture 13
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
1 CSC 427: Data Structures and Algorithm Analysis Fall 2008 Algorithm analysis, searching and sorting  best vs. average vs. worst case analysis  big-Oh.
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
Umans Complexity Theory Lectures Lecture 1a: Problems and Languages.
Measuring complexity Section 7.1 Giorgi Japaridze Theory of Computability.
Digital Logic Design (CSNB163)
Searching and Sorting Recursion, Merge-sort, Divide & Conquer, Bucket sort, Radix sort Lecture 5.
Automata & Formal Languages, Feodor F. Dragan, Kent State University 1 CHAPTER 3 The Church-Turing Thesis Contents Turing Machines definitions, examples,
ALGORITHMS.
Theory of computing, part 4. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of.
Searching Topics Sequential Search Binary Search.
Lecture # 15. Mealy machine A Mealy machine consists of the following 1. A finite set of states q 0, q 1, q 2, … where q 0 is the initial state. 2. An.
Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.
PREVIOUS SORTING ALGORITHMS  BUBBLE SORT –Time Complexity: O(n 2 ) For each item, make (n –1) comparisons Gives: Comparisons = (n –1) + (n – 2)
1 Introduction to Turing Machines
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 12 Mälardalen University 2007.
1 Turing Machines. 2 The Language Hierarchy Regular Languages Context-Free Languages ? ?
CMPT 120 Topic: Searching – Part 2 and Intro to Time Complexity (Algorithm Analysis)
Chapter 3 Chapter Summary  Algorithms o Example Algorithms searching for an element in a list sorting a list so its elements are in some prescribed.
1 Algorithms Searching and Sorting Algorithm Efficiency.
1 Computing Functions with Turing Machines. 2 A function Domain Result Region has:
Lecture 14: Theory of Automata:2014 Finite Automata with Output.
Algorithmic Foundations COMP108 COMP108 Algorithmic Foundations Algorithm efficiency Prudence Wong
Sorts, CompareTo Method and Strings
CSE202: Introduction to Formal Languages and Automata Theory
CSC 427: Data Structures and Algorithm Analysis
Analysis of Algorithms
Turing Machines Space bounds Reductions Complexity classes
Chapter 4 Divide-and-Conquer
Turing Machines 2nd 2017 Lecture 9.
Data Structures Recursion CIS265/506: Chapter 06 - Recursion.
Algorithm design and Analysis
Chapter 14 Time Complexity.
Computing Functions with Turing Machines
Sorting … and Insertion Sort.
Formal Languages, Automata and Models of Computation
CSC 427: Data Structures and Algorithm Analysis
Presentation transcript:

Vector Machines Model for Parallel Computation Bryan Petzinger Adv. Theory of Computing

What is Parallel Computation? Utilizing multiple processors simultaneously over the course of a computation.

Example: Sequential vs Parallel Mergesort

Sequential Mergesort Given the sequence: Combine into pairs: (four comparisons) Combine the pairs into quadruples (five comparisons) Combine the quadruples into an eight-tuple

Sequential Mergesort (cont)

Sequential Mergesort (cont) For a size n sequence Each merge involves fewer than n comparisons O(n) Require log 2 n merges O(nlog 2 n)

Parallel Mergesort For the same sequence (size 8) parallel mergesort requires 15 processors Essentially one processor for each node of the tree in the previous diagram

Parallel Mergesort (cont) Each number is transmitted to one of the eight processors at the top of the tree Each of these 8 processors passes the number to a processor below it For each of the remaining seven processors Receives one number from each of the two processors above it. Smaller number is passed down and another is request from its parents Ultimately the processor at the root communicates the sorted sequence as output

Parallel Mergesort (cont) 1/0 2/1 4/3 8/7 output The circle representing each processor contains two numbers. The first is the number of numbers processed The second is the number of comparisons (worst case). It takes 2 comparisons before the root node receives any numbers Afterwards all comparisons are dominated by the root comparisons total O(n)

Vectors and Vector Operations

Vectors Binary digit strings Properties Infinite to the left Constant (repeating 0s or 1s)

Binary Integers Vectors have two parts: constant and non-constant Constant: sign 0's → positive 1's → negative

Binary Integers (cont.) Non-constant: magnitude = = -6

Unary Integers Non-negative integers +1 n U(4) = +1 4 = 1111 U(0) = +

Boolean Operations Similar to usual definitions of propositional logic and, or, not, implication, etc v 1 → v 2 = !v 1 || v 2

Boolean Operations (cont.) v 1 = v 2 = v 1 & v 2 = b1b1 b2b2 b 1 & b

Shift Operations Left shift Digits shifted left one position 0 or 1 fills the newly vacant position “v << 0 1” shift left one position, fill with 0s Right shift Digits shifted right one position Rightmost bit disappears “v >> 3” shift right three positions

Shift Operations (cont) Vector v = = V << 0 1 = V << 1 1 = V << 1 3 = V >> 4 = +11

Vector Machines Set of registers, each containing a vector Instructions to perform on those registers Start v 3 = v 1 ⊕ v 2 Halt

Example if( v 1 != +0){ v 2 = v 1 << 1 4 } Start Halt v 2 = v 1 << 1 4 v 1 = +? No Yes

Example Input: v1 and v2//v1 = +110 = 6 Output: v3 = v1 << 1 v2//v2 = +010 = 2 begin x = -//x = - x = x << 0 v2//x = -00 x = !x//x = -11 v3 = v1 << 0 v2//v3 = v3 = v3 || x//v3 = end

Vector Machines and Function Computation

Remarks Binary left shift-0 → multiplication (powers of 2) Unary left shift-1 → addition

Example Input: v1 = +1 n = U(n) Output: v2 = +1 log2(n+1) = U(log 2 (n+1)) begin v3 = v1 << 1 1 v2 = + v4 = +1 while (v3 >> v4 != +) v2 = v2 << 1 1 v4 = v4 << 0 1 end Left shift-1 Increment uval v2 counts number of iterations of while loop Left shift-0 Double bval

Time Complexity Computes in O(1) plus log 2 (n+1) * 2 steps time(n) is O(log 2 n) 3 parallel steps log(n+1) iterations 2 parallel steps begin v3 = v1 << 1 1 v2 = + v4 = +1 while (v3 >> v4 != +) v2 = v2 << 1 1 v4 = v4 << 0 1 end

Formal Definition M = be a vector machine, and suppose that f is a k-ary partial number-theoretic function, for some fixed k>= 0. Then we shall say that M computes f provided that: If M's registers v1, v2,...,vk initially contain vectors U(n1) = +1 n1, U(n2) = +1 n2,..., U(nk) = +1 nk, all other registers contain vector +; and if f(n1, n2,..., nk) is defined, then, upon termination, registers v1, v2,...vk contain vectors U(n1) = +1 n1, U(n2) = +1 n2,..., U(nk) = +1 nk as before and register vk+1 contains U(f(n1,n2,...,nk)) = +1 f(n1,n2,...,nk). If M's registers v1, v2,..., vk initially contain vectors U(n1) = +1 n1, U(n2) = +1 n2,..., U(nk) = +1 nk, all other registers contain vector +; and if f(n1,n2...,nk) is undefined, then either M never terminates or, if M does terminate, then the final contents of vk+1 are not of the form +1 m.

Formal Definition (cont) First case: f is defined. M terminates with original registers unchanged and the result of f is stored in an additional register Second case: f is undefined. M never terminates, or M terminates but doesn't store +1 m into v k+1

Unary to Binary Formal definition defined in terms of unary vectors We want to include operations on binary vectors Need a method to convert unary to binary, and vice versa.

Unary to Binary converter Function B(n) B(8) = B(27) = |B(8)| = 4 = log 2 (8+1) |B(27)| = 5 = log 2 (27+1) |B(n)| = log 2 (n+1)

Input: v1 = +1 n = U(n) for n >= 0 Output v2 = +a m a m-1...a 2 a 1 a 0 = B(n), where m = |B(n)| -1 begin v3 = +1 log2(n+1) = +1 |B(n)| //vector machine of previous example v4 = +1 while (v3 != +) v4 = v4 << 0 1 v3 = v3 >> 1 end v5 = v1 v4 = v4 >> 1//v4 = B(2 m ) v5 = v1 << 1 1//v5 = +1 n+1 while (v4 != +) if (v5 >> v4 != +)//is diff greater than power of 2 under consideration v2 = v2 << 1 1//left-shift-1 v5 = v5 >> v4 end else v2 = v2 << 0 1//left-shift-0 v4 = v4 >> 1//v4 becomes next smaller power of 2 end initialization while loop

... while (v4 != +) if (v5 >> v4 != +)//is diff greater than power of 2 under consideration v2 = v2 << 1 1//left-shift-1 v5 = v5 >> v4 end else v2 = v2 << 0 1//left-shift-0 v4 = v4 >> 1//v4 becomes next smaller power of 2 end... Input vector U(27) = Output Register v2Auxiliary Registers v4v5 initialization While loop Iteration Iteration Iteration Iteration Iteration = B(27)+a+a +1 1 while loop

Masks Mask of index i Positive vector of length 2 i For each i >= 0, there will be i distinct masks Ex. i = 4 results in four masks of index 4, each of length 16 M 40 = M 41 = M 42 = M 43 =

Example: Reverse Word

Considerations ASCII-style encoding to represent alphabet Our alphabet is {a,b} Cardinality 2, requires only 1 bit per character 0 = a 1 = b

Considerations (cont) Problematic scenario: Input word = abb Input vector = +011 = +11 = bb Require 2 input vectors to avoid this issue 1 st vector will be the encoded word 2 nd vector will be the words length in unary

Using Masks Suppose 8 bit word 2 k = 8; k = 3 Generate k 8-bit masks m 30 = m 31 = m 32 =

Algorithm w = a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 Apply masks to word w = (w & m 3k ) >> 2 k || (w & !m 3k ) <<0 2 k Coarsest mask first m 32 → a 5 a 6 a 7 a 8 a 1 a 2 a 3 a 4 m 31 → a 7 a 8 a 5 a 6 a 3 a 4 a 1 a 2 m 30 → a 8 a 7 a 6 a 5 a 4 a 3 a 2 a 1

Pseudocode Input : v1 = +Trans(w) = +a 1 a 2...a n, where n = 2 k v2 = U(|Trans(w)|) = +1n Output: v3 = +Trans(w R ) = a n a n-1...a 1 begin y = convert(v2) >> 1 = +10 k-1 x = m k,k-1 v3 = v1 while (y != +) v3 = (v3 & x) >> y || (v3 & !x) << 0 Y x = x ⊕ (x >> y) end

Time & Space Linear space - O(n) Logarithmic time - O(log 2 n) parallel steps Single-tape Turing machine is O(n 2 ) Two-tape Turing machine is O(n) Difference is parallel steps Same amount of work, but faster

Vector Machines and Parallel Computation

Parallelism Single processor responsible for each bit in non- constant portion of vector E.g. processor p 1 would be responsible for the rightmost bit, p 2 for the next bit, and so on.

Diagram Processor p 1 Processor p v1v1 v2v2 v3v3 vkvk

Example v k = v i ⊕ v j v i = v j = Involves processors p 1, p 2, p 3, p 4 Each processor has its own local memory & dedicated channels for interacting with that memory

Remarks Requires an infinite number of processors to be available Finitely many active at once

Vector Machines and Formal Languages

Language Acceptance Ex. Palindromes: Input: +Trans(w), U(|Trans(w)|) Output: v3 { +1 if w = w R, +0 otherwise} begin v4 = reverse_word(v1, v2) if (v1 ⊕ v4 == +) v3 = +1 else v3 = +0 end

Language Recognition Ex. L = {a 2^i | i >= 0} Input: v1 = +Trans(w) = +1n v2 = U(|w|) = +1n Output: V3 = {+1 if w = a 2^i for i >= 0, +0 otherwise}

begin v4 = convert(v1)//unary to binary converter v5 = +1 if (v4 == +)//empty word not in L v3 = +0 else if (v4 == v5)//word a is in L v3 = +1 else v6 = v4 while (v6 != + && v5 != v4) v5 = v5 << 0 1//double v5 v6 = v6 >> 1 end if (v5 == v4) v3 = +1 else v3 = +0 end