Parallel and Distributed Algorithms. Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Great Theoretical Ideas in Computer Science for Some.
Problems and Their Classes
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
PERMUTATION CIRCUITS Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad.
CSC 421: Algorithm Design & Analysis
Greedy Algorithms Greed is good. (Some of the time)
Greed is good. (Some of the time)
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
The Theory of NP-Completeness
PRAM (Parallel Random Access Machine)
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
What we will cover…  Distributed Coordination 1-1.
Introduction to Analysis of Algorithms
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Algorithm Efficiency and Sorting
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Fundamental Techniques
Maximal Independent Set Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
RAM and Parallel RAM (PRAM). Why models? What is a machine model? – A abstraction describes the operation of a machine. – Allowing to associate a value.
Delaunay Triangulations Presented by Glenn Eguchi Computational Geometry October 11, 2001.
Randomized Algorithms Morteza ZadiMoghaddam Amin Sayedi.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Fixed Parameter Complexity Algorithms and Networks.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Recursion, Complexity, and Searching and Sorting By Andrew Zeng.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Chapter 12 Recursion, Complexity, and Searching and Sorting
1 Greedy algorithm 叶德仕 2 Greedy algorithm’s paradigm Algorithm is greedy if it builds up a solution in small steps it chooses a decision.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
Elementary Sorting Algorithms Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
RAM, PRAM, and LogP models
Computational Geometry Piyush Kumar (Lecture 10: Point Location) Welcome to CIS5930.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Lectures on Greedy Algorithms and Dynamic Programming
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
1 Network Models Transportation Problem (TP) Distributing any commodity from any group of supply centers, called sources, to any group of receiving.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Vertex Coloring Distributed Algorithms for Multi-Agent Networks
Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1.
1 Recursive algorithms Recursive solution: solve a smaller version of the problem and combine the smaller solutions. Example: to find the largest element.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
1 Distributed Vertex Coloring. 2 Vertex Coloring: each vertex is assigned a color.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Lecture 3: Parallel Algorithm Design
Maximal Independent Set
Algorithm Analysis CSE 2011 Winter September 2018.
Enumerating Distances Using Spanners of Bounded Degree
Maximal Independent Set
Objective of This Course
Data Structures Review Session
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Locality In Distributed Graph Algorithms
Presentation transcript:

Parallel and Distributed Algorithms

Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination problem Real world applications

INTRODUCTION

Need of distributed processing A massively parallel processing machine CPUs with 1000 processors Moore’s law coming to an end

Parallel Algorithm A parallel algorithm is an algorithm which can be executed a piece at a time on many different processing devices, and then combined together again at the end to get the correct result.* * Blelloch, Guy E.; Maggs, Bruce M. Parallel Algorithms. USA: School of Computer Science, Carnegie Mellon University.

Distributed Algorithm A distributed algorithm is an algorithm designed to run on computer hardware constructed from interconnected processors.* *Lynch, Nancy (1996). Distributed Algorithms. San Francisco, CA: Morgan Kaufmann Publishers. ISBN Morgan Kaufmann PublishersISBN

PRAM

Random Access Machine An abstract machine with unbounded number of local memory cells and with simple set of instruction sets Time complexity: number of instructions executed Space complexity: number of memory cells used All operations take Unit time

PRAM (Parallel Random Access Machine) PRAM is a parallel version of RAM for designing the algorithms applicable to parallel computers Why PRAM ?  The number of processor execute per one cycle on P processors is at most P  Any processor can read/write any shared memory cell in unit time  It abstracts from overhead which makes the complexity of PRAM algorithm easier  It is a benchmark

Read A[i-1], Computation A[i]=A[i-1]+1, Write A[i] Shared Memory (A) P1 P2 Pn A[0] A[1]=A[0]+1 A[2]=A[1]+1 A[n]=A[n-1]+1 A[1] A[2] A[n-1] A[n]

Share Memory Access Conflicts Exclusive Read(ER) : all processors can simultaneously read from distinct memory locations Exclusive Write(EW) : all processors can simultaneously write to distinct memory locations Concurrent Read(CR) : all processors can simultaneously read from any memory location Concurrent Write(CW) : all processors can write to any memory location EREW, CREW, CRCW

Complexity Parallel time complexity : The number of synchronous steps in the algorithm Space complexity: The number of share memory cell Parallelism: The number of processors used

MAXIMAL INDEPENDENT SET Lahiru Samarakoon Sumanaruban Rajadurai

14 Independent Set (IS): Any set of nodes that are not adjacent

15 Maximal Independent Set (MIS): An independent set that is no subset of any other independent set

16 Maximal vs. Maximum IS a maximum independent seta maximal independent set

17 A Sequential Greedy algorithm Suppose that will hold the final MIS Initially

18 Pick a node and add it to Phase 1:

19 Remove and neighbors

20 Remove and neighbors

21 Pick a node and add it to Phase 2:

22 Remove and neighbors

23 Remove and neighbors

24 Repeat until all nodes are removed Phases 3,4,5,…:

25 Repeat until all nodes are removed No remaining nodes Phases 3,4,5,…,x:

26 At the end, set will be an MIS of

27 Running time of algorithm: Worst case graph: nodes

Intuition for parallelization 28 At each phase we may select any independent set (instead of a single node), remove S and neighbors of S from the graph.

29 Suppose that will hold the final MIS Initially Example:

30 Find any independent set Phase 1: And insert to :

31 remove and neighbors

32 Phase 2: Find any independent set And insert to : On new graph

33 remove and neighbors

34 remove and neighbors

35 Phase 3: Find any independent set And insert to : On new graph

36 remove and neighbors

37 No nodes are left remove and neighbors

38 Final MIS

39 The number of phases depends on the choice of independent set in each phase The larger the independent set at each phase the faster the algorithm Observation:

Let be the degree of node Randomized Maximal Independent Set ( MIS )

41 Each node elects itself with probability At each phase : 1 2 degree of in Elected nodes are candidates for the independent set

42 If two neighbors are elected simultaneously, then the higher degree node wins Example: if

43 If both have the same degree, ties are broken arbitrarily Example: if

44 Problematic nodes Using previous rules, problematic nodes are removed

45 The remaining elected nodes form independent set

46 mark lower- degree vertices with higher probability Luby’s algorithm

47 Problematic nodes Using previous rules, problematic nodes are removed

48 if both end- points of an edge is marked, unmark the one with the lower degree Luby’s algorithm

49 The remaining elected nodes form independent set

50 remove marked vertices with their neighbors and corresponding edges add all marked vertices to MIS Luby’s algorithm

ANALYSIS

6 2 Goodness property A vertex v is good at least ⅓ of its neighbors have lesser degree than it. bad otherwise. An edge is bad if its both endpoints are bad. good otherwise.

Lemma 1 Let v Є V be a good vertex with degree d(v) > 0.Then, the probability that some vertex w in N( v) gets marked is at least 1 - exp( -1 / 6). Define L(v) is set of neighbors of v whose degree is lesser than v’s degree. By definition, |L(v)|≥d(v)/3 if v is a GOOD vertex.

Lemma 2 During any iteration, if a vertex w is marked then it is selected to be in S with probability at least 1/2.

From lemma1 and 2 => The probability that a good vertex belongs to S U N(S) is at least (1 - exp(- 1/6))/2. Good vertices get eliminated with a constant probability. It follows that the expected number of edges eliminated during an iteration is a constant fraction of the current set of edges. This implies that the expected number of iterations of the Parallel MIS algorithm is O(log n).

Lemma 3 In a graph G(V,E), the number of good edges is at least |E|/2. Proof Direct the edges in E from the lower degree endpoint to the higher degree end-point, breaking ties arbitrarily. for each bad vertex v For all S, T С V, define the subset of the (oriented) edges E(S, T) as those edges that are directed from vertices in S to vertices in T

Let V G and V B be the set of good and bad vertices

SORTING ON PRAM Jessica Makucka Puneet Dewan

Sorting Current problem: sort n numbers Best average case for sorting is O(nlog n) Can we do better with more processors? YES!

Notes about Quicksort Sort n numbers on a PRAM with n processors Assume all numbers are distinct CREW PRAM for this case Each of the n processors contains an input element Notation: Let P i denote ith processor

Quicksort Algorithm 0. If n=1 stop 1.Pick a splitter at random from n elements 2.Each processor determines whether its element is bigger or smaller than the splitter 3.Let j denote splitters rank: If j [n/4, 3n/4] means failure, go back to (1) If j [n/4, 3n/4] means success and move splitter to P j Every element smaller than j is moved to distinct processor P i for i j 4.Sort elements recursively in processors P 1 through P j-1, and the elements in processors P j+1 through P n

Quicksort Time Analysis Algorithm 1. Pick a successful splitter at random from n elements (assumption) 2. Each processor determines whether its element is bigger or smaller than the splitter Time Analysis of each stage 1. O(logn) stages for every sequence or recursive split 2. Trivial – can be implemented in single CREW PRAM step

Quicksort Time Analysis 3.Let j denote splitters rank: If j [n/4, 3n/4] go back to 1. If j [n/4, 3n/4] move splitter to P j Every element smaller than j is moved to distinct processor P i for i j O(log n) PRAM steps needed for the single splitting stage

Comparison Splitting Stage (3) P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P splitter Assign bit depending on if P i ’s element is smaller or bigger than the splitter - 0 if element is bigger - 1 otherwise

Comparison Splitting Stage (3) P1P1 P2P2 P3P3 P4P4 P5P splitter Step 1: Step 2: ++ +

Overall Time Analysis This algorithm would terminate in O(log 2 n) steps Each step is O(log n) for splitting stage O(log n) steps Derived from this solved equation:

Cons In this algorithm, There is an assumption that split will always be successful and it will break the problem from N to a constant fraction of N. No Suitable method for successful split.

Improvement Idea Reduce the problem into size of n 1-e where e<1 while keeping the time to split the same.

Benefits if e=1/2 The total time for the entire problem size will be: log n + log n 1/2 +log n 1/4 +… resulting in O(log n) Then we could hope for an overall running time of O(log n).

Long Story Suppose that we have n processors and n elements. Suppose that processors P 1 through P r, contain r of the elements in sorted order, and that processors P r+1 through P n contain the remaining n - r elements. 1.Choose Random Splitters and sort them. Let the sorted elements in the first r processors the splitters. For 1 < =j <= r, let s j denote the j th largest splitter. 2. Insert Insert the n - r unsorted elements among the splitters. 3.Sort remaining elements among splitters a. Each processor should end up with a distinct input element. b. Let i(s j ) denote the index of the processor containing s j following the insertion operation. Then, for all k < i(s j ), processor P k contains an element that is smaller than s j similarly, for all k > i(s j ), processor P k contains an element that is larger than s j.

Example Choose Random Splitter

Example (Contd.) Sort the random splitters. Sorted List Unsorted List

Example(Contd.) Insert the unsorted elements among the splitters

Example(Contd.) Check the number of elements between the splitters has size less than or equal to (Log n ) or not. Suppose S represents size S=4 (exceeds log n i.e 3) S=1 S=

Example Contd. Recur on the sub problem whose size exceeds log n. Again choose random splitters and follow the same process Random Splitters

Partitioning as tree Tree formed from first partition. Now the size on the right exceeds log n, so we again split by choosing random partitions. E.g 9, Size on right exceeds log n

Contd. Sorted because of partition

Lemma’s to be Used 1.A CREW PRAM having (n 2 ) processors. Suppose that each of the processors P 1 through P n has an input element to be sorted. Then the PRAM can sort these n elements in O(log n). 2. For n processors, and n elements of which n 1/2 are splitters, then the insertion process can be completed in O(log n) steps.

Box Sort Algorithm : Input: A set of numbers S. Output: The elements of S sorted in increasing order. 1. Select n 1/2 (e is 1/2)elements at random from the n input elements. Using all n processors, sort them in O(log n) steps.(Fact 1) 2. Using the sorted elements from Stage 1 as splitters, insert the remaining elements among them in O(log n) steps(Fact 2) 3. Treating the elements that are inserted between adjacent splitters as subproblems, recur on each sub-problem whose size exceeds log n. For subproblems of size log n or less, invoke LogSort.

Sort Fact A CREW PRAM with m processors can sort m elements in O(m) steps.

Example Each Processor is assigned an element and compares its element with remaining elements simultaneously in O(m) steps. Rank assigned implies elements are sorted P1P2P3P4P5P6P7P8 Ranks Assigned

Things to remember Last statement of Box Sort algorithm. Idea on the previous slide.

Log Sort We will be having log n processors with log n elements then we can sort in O(log n).

Analysis Consider each node of the tree as a box. Choosing random splitters and sort them take time of O(log n). Insert the unsorted elements among the splitters takes O(log n). With high probability (assumption) the sub problems resulting from splitting operation are very small(i.e the unsorted elements among the splitters). So each leaf is a box of size at most log n. For calculating the time spent, we can use the Log Sort which sorts the elements in O(log n) Total time is O(log n)

DISTRIBUTED RANDOMIZED ALGORITHM Yogesh S Rawat R. Ramanathan

CHOICE COORDINATION PROBLEM (CCP)

Biological Inspiration mite (genus Myrmoyssus)

Biological Inspiration mite (genus Myrmoyssus) reside as parasites on the ear membrane of the moths of family Phaenidae

Biological Inspiration mite (genus Myrmoyssus) reside as parasites on the ear membrane of the moths of family Phaenidae Moths are prey to bats and the only defense they have is that they can hear the sonar used by an approaching bat

Biological Inspiration if both ears of the moth are infected by the mites, then their ability to detect the sonar is considerably diminished, thereby severely decreasing the survival chances of both the moth and its colony of mites.

Biological Inspiration The mites are therefore faced with a "choice coordination problem" How does any collection of mites infecting a particular ear ensure that every other mite chooses the same ear?

Problem Specification Set of N processors

Problem Specification Set of N processors M options to choose from

Problem Specification Set of N processors processors have to reach a consensus on unique choice M options to choose from

Model for Communication Collection of M read-write registers accessible to all the processors – Locking mechanism for conflicts Each processor follow a protocol for making a choice – A special symbol (√) is used to mark the choice At the end only one register contains the special symbol

Deterministic Solution Complexity is measured in terms of number of read and write operations For a deterministic solution – Complexity in terms of operations : Ω(n 1/3 ) n - Number of processors For more details - M. O. Rabin, “The choice coordination problem,” Acta Informatica, vol. 17, no. 2, pp. 121–134, Jun

Randomized Solution for any c > 0 It will solve the problem using c operations with a probability of success atleast 1-2 -Ω(c) For simplicity we will consider only the case where n = m = 2 although the protocol can be easily generalized

Analogy from Real Life ? ? Random Action - Give way or Move ahead

Analogy from Real Life ? ? Random Action - Give way or Move ahead Give way Move Ahead Give way Move Ahead Person 1 Person 2

Analogy from Real Life ? ? Random Action - Give way or Move ahead Give way Move Ahead Give way Move Ahead Breaking Symmetry Person 1 Person 2

Synchronous CCP The two processors are synchronous Operate in lock-step according to some global clock Used terminology P i – processor i, where i ϵ {0,1} C i – shared register for choices, where i ϵ {0,1} B i – local variable for each processor, where i ϵ {0,1} P0P0 P1P1 B0B0 B1B1 C0C0 C1C1

Synchronous CCP P0P0 P1P1 B0B0 B1B1 C0C0 C1C1 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.

Synchronous CCP P0P0 P1P1 B0B0 B1B1 C0C0 C1C1 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.

Synchronous CCP P0P0 P1P1 B0B0 B1B1 C1C1 C0C0 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.

Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1.

Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ] » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Read Operation

Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Choice has already been made by the other processor

Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Only condition for making a choice

Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Generate a random value Write operation

Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Exchange Registers

Correctness of Algorithm We need to prove only one of the shared register has √ marked in it Suppose that both are marked with √ –This must have had in same iteration –Otherwise step 2.1 will halt the algorithm

Correctness of Algorithm Let us assume that the error takes place during the t th iteration After step 1 values for processor P i B i (t) and R i (t) By case 2.3 R 0 (t) = B 1 (t) R 1 (t) = B 0 (t) Suppose P i writes √ in the t th iteration, then R i = 0 and B i = 1 and R 1-i = 1 and B 1-i = 0 P 1-i cannot write √ in ith iteration Breaking Symmetry

R0B0 00 Read Operation R1B1 00 C0C1 00 Processor 0Shared RegistersProcessor 1

R0B Write Operation R1B C0C Processor 0Shared RegistersProcessor 1 Random

R0B Read Operation R1B C0C Processor 0Shared RegistersProcessor 1

R0B Write Operation R1B C0C Processor 0Shared RegistersProcessor 1 Random

R0B Read Operation R1B C0C Processor 0Shared RegistersProcessor 1

R0B Write Operation R1B C0C Processor 0Shared RegistersProcessor 1 Random

R0B Read Operation R1B HALT C0C Processor 0Shared RegistersProcessor 1

R0B /1 Write Operation R1B HALT C0C √0/1 Processor 0Shared RegistersProcessor 1 Random

R0B /1 √ HALT Read Operation R1B HALT C0C √0/1 √ Processor 0Shared RegistersProcessor 1

Complexity Probability that both the random bits B 0 and B 1 are the same is 1/2 Therefore probability that number of steps exceeds t is 1/2 t. The algorithm will terminate in next two steps as soon as B 0 and B 1 are different. Computation cost of each iteration is bounded –Therefore, the protocol does O(t) work with probability 1-1/2 t

The Problem C1C1 C2C2 P2P2 P1P1

C1C1 C2C2 P2P2 P1P1

C1C1 C2C2 P1P1 P2P2

C1C1 C2C2 P1P1 P2P2

C1C1 C2C2 P1P1 P2P2

The processors are not synchronized C1C1 C2C2 P1P1 P2P2

What can we do? C1C1 C2C2 P1P1 P2P2

Idea: Timestamp C1C1 C2C2 P1P1 P2P2

Read P1P1 P2P2 B1B1 B2B2 C1C1 C2C2 T1T1 T2T2 t2t2 t1t1 Timestamp of Processor: T i Timestamp of Register : t i

Input: Registers C 1 and C 2 initialized to Output: Exactly one of the two registers has  Algorithm

0) P i initially scans a randomly chosen register. are initialized to 1)P i gets a lock on its current register and reads 2)P i executes one of these cases: 2.1) If R i =  : HALT 2.2) If T i < t i : T i  t i and B i  R i 2.3) If T i > t i : Write  into the current register and HALT 2.4) If T i = t i, R i = 0, B i = 1 : Write  into the current register and HALT 2.5) Otherwise: T i  T i + 1 and t i  t i + 1 B i  Random (unbiased) bit Write into the current register 3) P i releases the lock on its current register, moves to the other register and returns to step 1. Algorithm for a process P i

Initial state B2B2 T2T2 B1B1 T1T1 C1C1 t1t1 00 C2C2 t2t2 00 Processor P1Register R2Register R1Processor P2

1) P 1 chooses C 1 and reads B2B2 T2T2 B1B1 T1T1 00 C1C1 t1t1 00 C2C2 t2t2 00 History : P 1 ==C 1

1) P 1 chooses C 1 and reads B2B2 T2T2 B1B1 T1T1 00 C1C1 t1t1 00 C2C2 t2t2 00 [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1

B1B1 T1T C1C1 t1t ) T 1  T and t 1  t B2B2 T2T2 C2C2 t2t2 00 History : P 1 ==C 1

2.5) P 1 writes into C 1 B1B1 T1T C1C1 t1t B2B2 T2T2 C2C2 t2t2 00 History : P 1 ==C 1

3) P 1 releases the lock on C 1 B1B1 T1T C1C1 t1t B2B2 T2T2 C2C2 t2t2 00 [P 1 moves to C 2 and returns to step 1] History : P 1 ==C 1

B2B2 T2T2 00 C2C2 t2t2 00 1) P 2 chooses C 2 and reads B1B1 T1T C1C1 t1t History : P 1 ==C 1 P 2 ==C 2

B2B2 T2T2 00 C2C2 t2t2 00 1) P 2 chooses C 2 and reads B1B1 T1T C1C1 t1t [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2

B2B2 T2T C2C2 t2t ) T 2  T and t 2  t B1B1 T1T C1C1 t1t History : P 1 ==C 1 P 2 ==C 2

2.5) P 2 writes into C 2 B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t History : P 1 ==C 1 P 2 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 releases the lock on C 2 [P 2 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 1 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) T 2  T and t 1  t History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 writes into C 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 releases the lock on C 1 [P 2 moves to C 2 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 2 and reads [Case 2.3: T 2 > t 2 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

B2B2 T2T C2C2 T2T  B1B1 T1T C1C1 t1t ) P 2 writes  into C 2 [P 2 HALTS] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

We’ll show another case of the algorithm History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

Let’s go back 1 iteration History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 Let’s go back 1 iteration

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 1 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 1 locks C 2 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) T 1  T and t 2  t History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) B 1  Random (unbiased) bit History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 1 releases the lock on C 2 [P 1 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 2 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) T 2  T and t 2  t History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 writes into C 2 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 releases the lock on C 2 [P 2 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 1 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 1 locks C 1 and reads [Case 2.4: T 1 = t 1, R 1 = 0, B 1 = 1 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t  2.4) P 1 writes  into C 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t  2.4) P 1 HALTS History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t  1) P 2 locks C 1 and reads  History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t  1) P 2 locks C 1 and reads  [Case 2.1: R 1 =  is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1

B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t  2.1) P 2 HALTS History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1

Correctness C1C1 C2C2 P1P1  P2P2

When a processor writes  on a register, the other processor should NOT write  on the other register

Correctness Case 2.3) T i > t i : Write  into the current register and halt. Case 2.4) T i = t i, R i = 0, B i = 1: Write  into the current register and halt. C1C1 C2C2 

T i * : Current timestamp of processor P i t i * : Current timestamp of register C i Whenever P i finishes an iteration in C i, T i = t i Correctness

T i * : Current timestamp of processor P i t i * : Current timestamp of register C i When a processor enters a register, it would have just left the other register Correctness

2.3) T i > t i : Write  into the current register and HALT Consider P 1 has just entered C 1 with t 1 * < T 1 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 In prev iter, P 1 must have left C 2 with same T 1 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 P 2 must go to C 2 only after C 1 T 1 * ≤ t 2 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 2 * ≤ t 1 * T 1 * ≤ t 2 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * Summing up : T 2 * ≤ t 1 * < T 1 * ≤ t 2 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * T 2 * < t 2 * : T 2 cannot write 

2.4) T i = t i, R i = 0, B i = 1: Write  into register and HALT Similarly consider P 1 has entered C 1 with t 1 * = T 1 *

2.4) T i = t i, R i = 0, B i = 1: Write  into register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T1*)1(T1*) C1C1 t1t (t1*)1(t1*) Similarly consider P 1 has entered C 1 with t 1 * = T 1 * History : P 1 ==C 2 P 2 ==C 1 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * Summing up : T 2 * ≤ t 1 * = T 1 * ≤ t 2 *

2.4) T i = t i, R i = 0, B i = 1: Write  into register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T1*)1(T1*) C1C1 t1t (t1*)1(t1*) Similarly consider P 1 has entered C 1 with t 1 * = T 1 * History : P 1 ==C 2 P 2 ==C 1 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * T 2 * ≤ t 2 *, R 2 = 1, B 2 = 0 : T 2 cannot write 

Cost is proportional to the largest timestamp Timestamp can go up only in case 2.5 Processor’s current B i value is set during a visit to the other register So, synchronous case complexity applies Complexity

REAL WORLD APPLICATIONS Pham Nam Khanh

Applications of parallel sorting Sorting is fundamental algorithm in data processing: » Parallel Database operations: Rank, Join, etc. » Search (rapid index/lookup after sort) Best record in sorting: TB in 4,328 seconds using 2100 nodes from Yahoo.

Applications of MIS Wireless and communication Scheduling problem Perfect matching => assignment problem Finance

Applications of Maximal independent set Market graph EAFE EM Low latency requirement Parallel MIS

Applications of Maximal independent set Market graph Stocks Commodities Bonds

Applications of Maximal independent set Market graph

 MIS form completely diversified portfolio, where all instruments are negatively correlated with each other => lower the risk

Applications of Choice coordination algorithm Given n processes, each one can choose between m options. They need to agree on unique choice. => belongs to class of distributed consensus algorithms. HW and SW task involving concurrency Clock sync in wireless sensor networks Multivehicle cooperative control

Coordinate the movement of multiple vehicles in a certain way to accomplish an objective. Task Assignment, cooperative transport, cooperative role assignment, air traffic control, cooperative timing.

CONCLUSION

Conclusion PRAM model: CREW Parallel algorithm Maximal Independent Set with O(log n) and applications Parallel sorting algorithm: QuickSort with O(log 2 n) BoxSort with O(log n) Choice Coordination Problem: distributed algorithms for synchronous and asynchronous system + applications