Parallel and Distributed Algorithms. Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination.

Parallel and Distributed Algorithms

Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination problem Real world applications

INTRODUCTION

Need of distributed processing A massively parallel processing machine CPUs with 1000 processors Moore’s law coming to an end

Parallel Algorithm A parallel algorithm is an algorithm which can be executed a piece at a time on many different processing devices, and then combined together again at the end to get the correct result.* * Blelloch, Guy E.; Maggs, Bruce M. Parallel Algorithms. USA: School of Computer Science, Carnegie Mellon University.

Distributed Algorithm A distributed algorithm is an algorithm designed to run on computer hardware constructed from interconnected processors.* *Lynch, Nancy (1996). Distributed Algorithms. San Francisco, CA: Morgan Kaufmann Publishers. ISBN 978-1-55860-348-6.Morgan Kaufmann PublishersISBN978-1-55860-348-6

Random Access Machine An abstract machine with unbounded number of local memory cells and with simple set of instruction sets Time complexity: number of instructions executed Space complexity: number of memory cells used All operations take Unit time

PRAM (Parallel Random Access Machine) PRAM is a parallel version of RAM for designing the algorithms applicable to parallel computers Why PRAM ?  The number of processor execute per one cycle on P processors is at most P  Any processor can read/write any shared memory cell in unit time  It abstracts from overhead which makes the complexity of PRAM algorithm easier  It is a benchmark

Read A[i-1], Computation A[i]=A[i-1]+1, Write A[i] Shared Memory (A) P1 P2 Pn A[0] A[1]=A[0]+1 A[2]=A[1]+1 A[n]=A[n-1]+1 A[1] A[2] A[n-1] A[n]

Share Memory Access Conflicts Exclusive Read(ER) : all processors can simultaneously read from distinct memory locations Exclusive Write(EW) : all processors can simultaneously write to distinct memory locations Concurrent Read(CR) : all processors can simultaneously read from any memory location Concurrent Write(CW) : all processors can write to any memory location EREW, CREW, CRCW

Complexity Parallel time complexity : The number of synchronous steps in the algorithm Space complexity: The number of share memory cell Parallelism: The number of processors used

MAXIMAL INDEPENDENT SET Lahiru Samarakoon Sumanaruban Rajadurai

14 Independent Set (IS): Any set of nodes that are not adjacent

15 Maximal Independent Set (MIS): An independent set that is no subset of any other independent set

16 Maximal vs. Maximum IS a maximum independent seta maximal independent set

17 A Sequential Greedy algorithm Suppose that will hold the final MIS Initially

18 Pick a node and add it to Phase 1:

19 Remove and neighbors

21 Pick a node and add it to Phase 2:

24 Repeat until all nodes are removed Phases 3,4,5,…:

25 Repeat until all nodes are removed No remaining nodes Phases 3,4,5,…,x:

26 At the end, set will be an MIS of

27 Running time of algorithm: Worst case graph: nodes

Intuition for parallelization 28 At each phase we may select any independent set (instead of a single node), remove S and neighbors of S from the graph.

29 Suppose that will hold the final MIS Initially Example:

30 Find any independent set Phase 1: And insert to :

31 remove and neighbors

32 Phase 2: Find any independent set And insert to : On new graph

35 Phase 3: Find any independent set And insert to : On new graph

37 No nodes are left remove and neighbors

38 Final MIS

39 The number of phases depends on the choice of independent set in each phase The larger the independent set at each phase the faster the algorithm Observation:

40 1 2 Let be the degree of node Randomized Maximal Independent Set ( MIS )

41 Each node elects itself with probability At each phase : 1 2 degree of in Elected nodes are candidates for the independent set

42 If two neighbors are elected simultaneously, then the higher degree node wins Example: if

43 If both have the same degree, ties are broken arbitrarily Example: if

44 Problematic nodes Using previous rules, problematic nodes are removed

45 The remaining elected nodes form independent set

46 mark lower- degree vertices with higher probability Luby’s algorithm

47 Problematic nodes Using previous rules, problematic nodes are removed

48 if both endpoints of an edge is marked, unmark the one with the lower degree Luby’s algorithm

49 The remaining elected nodes form independent set

50 remove marked vertices with their neighbors and corresponding edges add all marked vertices to MIS Luby’s algorithm

ANALYSIS

6 2 Goodness property 3 4 4 3 4 44 A vertex v is good at least ⅓ of its neighbors have lesser degree than it. bad otherwise. An edge is bad if its both endpoints are bad. good otherwise.

Lemma 1 Let v Є V be a good vertex with degree d(v) > 0.Then, the probability that some vertex w in N( v) gets marked is at least 1 - exp( -1 / 6). Define L(v) is set of neighbors of v whose degree is lesser than v’s degree. By definition, |L(v)|≥d(v)/3 if v is a GOOD vertex.

Lemma 2 During any iteration, if a vertex w is marked then it is selected to be in S with probability at least 1/2.

From lemma1 and 2 => The probability that a good vertex belongs to S U N(S) is at least (1 - exp(- 1/6))/2. Good vertices get eliminated with a constant probability. It follows that the expected number of edges eliminated during an iteration is a constant fraction of the current set of edges. This implies that the expected number of iterations of the Parallel MIS algorithm is O(log n).

Lemma 3 In a graph G(V,E), the number of good edges is at least |E|/2. Proof Direct the edges in E from the lower degree endpoint to the higher degree end-point, breaking ties arbitrarily. for each bad vertex v For all S, T С V, define the subset of the (oriented) edges E(S, T) as those edges that are directed from vertices in S to vertices in T

Let V G and V B be the set of good and bad vertices

SORTING ON PRAM Jessica Makucka Puneet Dewan

Sorting Current problem: sort n numbers Best average case for sorting is O(nlog n) Can we do better with more processors? YES!

Notes about Quicksort Sort n numbers on a PRAM with n processors Assume all numbers are distinct CREW PRAM for this case Each of the n processors contains an input element Notation: Let P i denote ith processor

Quicksort Algorithm 0. If n=1 stop 1.Pick a splitter at random from n elements 2.Each processor determines whether its element is bigger or smaller than the splitter 3.Let j denote splitters rank: If j [n/4, 3n/4] means failure, go back to (1) If j [n/4, 3n/4] means success and move splitter to P j Every element smaller than j is moved to distinct processor P i for i j 4.Sort elements recursively in processors P 1 through P j-1, and the elements in processors P j+1 through P n

Quicksort Time Analysis Algorithm 1. Pick a successful splitter at random from n elements (assumption) 2. Each processor determines whether its element is bigger or smaller than the splitter Time Analysis of each stage 1. O(logn) stages for every sequence or recursive split 2. Trivial – can be implemented in single CREW PRAM step

Quicksort Time Analysis 3.Let j denote splitters rank: If j [n/4, 3n/4] go back to 1. If j [n/4, 3n/4] move splitter to P j Every element smaller than j is moved to distinct processor P i for i j O(log n) PRAM steps needed for the single splitting stage

Comparison Splitting Stage (3) P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 12375112114 splitter 011 1 1 10 Assign bit depending on if P i ’s element is smaller or bigger than the splitter - 0 if element is bigger - 1 otherwise

Comparison Splitting Stage (3) P1P1 P2P2 P3P3 P4P4 P5P5 1237511 splitter 11 1 0 1 2 3 Step 1: Step 2: ++ +

Overall Time Analysis This algorithm would terminate in O(log 2 n) steps Each step is O(log n) for splitting stage O(log n) steps Derived from this solved equation:

Cons In this algorithm, There is an assumption that split will always be successful and it will break the problem from N to a constant fraction of N. No Suitable method for successful split.

Improvement Idea Reduce the problem into size of n 1-e where e<1 while keeping the time to split the same.

Benefits if e=1/2 The total time for the entire problem size will be: log n + log n 1/2 +log n 1/4 +… resulting in O(log n) Then we could hope for an overall running time of O(log n).

Long Story Suppose that we have n processors and n elements. Suppose that processors P 1 through P r, contain r of the elements in sorted order, and that processors P r+1 through P n contain the remaining n - r elements. 1.Choose Random Splitters and sort them. Let the sorted elements in the first r processors the splitters. For 1 < =j <= r, let s j denote the j th largest splitter. 2. Insert Insert the n - r unsorted elements among the splitters. 3.Sort remaining elements among splitters a. Each processor should end up with a distinct input element. b. Let i(s j ) denote the index of the processor containing s j following the insertion operation. Then, for all k < i(s j ), processor P k contains an element that is smaller than s j similarly, for all k > i(s j ), processor P k contains an element that is larger than s j.

Example Choose Random Splitter 59810761211

Example (Contd.) Sort the random splitters. Sorted List Unsorted List 61159871012

Example(Contd.) Insert the unsorted elements among the splitters 56798101112

Example(Contd.) Check the number of elements between the splitters has size less than or equal to (Log n ) or not. Suppose S represents size S=4 (exceeds log n i.e 3) S=1 S=1 5 6 7 9 8 10 11 12 56798101112

Example Contd. Recur on the sub problem whose size exceeds log n. Again choose random splitters and follow the same process Random Splitters 56798101112

Partitioning as tree Tree formed from first partition. Now the size on the right exceeds log n, so we again split by choosing random partitions. E.g 9,8 6 5 7 9 8 10 Size on right exceeds log n

Contd. Sorted because of partition 6 5 8 9 7 10

Lemma’s to be Used 1.A CREW PRAM having (n 2 ) processors. Suppose that each of the processors P 1 through P n has an input element to be sorted. Then the PRAM can sort these n elements in O(log n). 2. For n processors, and n elements of which n 1/2 are splitters, then the insertion process can be completed in O(log n) steps.

Box Sort Algorithm : Input: A set of numbers S. Output: The elements of S sorted in increasing order. 1. Select n 1/2 (e is 1/2)elements at random from the n input elements. Using all n processors, sort them in O(log n) steps.(Fact 1) 2. Using the sorted elements from Stage 1 as splitters, insert the remaining elements among them in O(log n) steps(Fact 2) 3. Treating the elements that are inserted between adjacent splitters as subproblems, recur on each sub-problem whose size exceeds log n. For subproblems of size log n or less, invoke LogSort.

Sort Fact A CREW PRAM with m processors can sort m elements in O(m) steps.

Example Each Processor is assigned an element and compares its element with remaining elements simultaneously in O(m) steps. Rank assigned implies elements are sorted. 4 7 6 5 8 2 3 1 598710342 P1P2P3P4P5P6P7P8 Ranks Assigned

Things to remember Last statement of Box Sort algorithm. Idea on the previous slide.

Log Sort We will be having log n processors with log n elements then we can sort in O(log n).

Analysis Consider each node of the tree as a box. Choosing random splitters and sort them take time of O(log n). Insert the unsorted elements among the splitters takes O(log n). With high probability (assumption) the sub problems resulting from splitting operation are very small(i.e the unsorted elements among the splitters). So each leaf is a box of size at most log n. For calculating the time spent, we can use the Log Sort which sorts the elements in O(log n) Total time is O(log n)

DISTRIBUTED RANDOMIZED ALGORITHM Yogesh S Rawat R. Ramanathan

CHOICE COORDINATION PROBLEM (CCP)

Biological Inspiration mite (genus Myrmoyssus)

Biological Inspiration mite (genus Myrmoyssus) reside as parasites on the ear membrane of the moths of family Phaenidae

Biological Inspiration mite (genus Myrmoyssus) reside as parasites on the ear membrane of the moths of family Phaenidae Moths are prey to bats and the only defense they have is that they can hear the sonar used by an approaching bat

Biological Inspiration if both ears of the moth are infected by the mites, then their ability to detect the sonar is considerably diminished, thereby severely decreasing the survival chances of both the moth and its colony of mites.

Biological Inspiration The mites are therefore faced with a "choice coordination problem" How does any collection of mites infecting a particular ear ensure that every other mite chooses the same ear?

Problem Specification Set of N processors

Problem Specification Set of N processors M options to choose from

Problem Specification Set of N processors processors have to reach a consensus on unique choice M options to choose from

Model for Communication Collection of M read-write registers accessible to all the processors – Locking mechanism for conflicts Each processor follow a protocol for making a choice – A special symbol (√) is used to mark the choice At the end only one register contains the special symbol

Deterministic Solution Complexity is measured in terms of number of read and write operations For a deterministic solution – Complexity in terms of operations : Ω(n 1/3 ) n - Number of processors For more details - M. O. Rabin, “The choice coordination problem,” Acta Informatica, vol. 17, no. 2, pp. 121–134, Jun. 1982.

Randomized Solution for any c > 0 It will solve the problem using c operations with a probability of success atleast 1-2 -Ω(c) For simplicity we will consider only the case where n = m = 2 although the protocol can be easily generalized

Analogy from Real Life ? ? Random Action - Give way or Move ahead

Analogy from Real Life ? ? Random Action - Give way or Move ahead Give way Move Ahead Give way Move Ahead Person 1 Person 2

Analogy from Real Life ? ? Random Action - Give way or Move ahead Give way Move Ahead Give way Move Ahead Breaking Symmetry Person 1 Person 2

Synchronous CCP The two processors are synchronous Operate in lock-step according to some global clock Used terminology P i – processor i, where i ϵ {0,1} C i – shared register for choices, where i ϵ {0,1} B i – local variable for each processor, where i ϵ {0,1} P0P0 P1P1 B0B0 B1B1 C0C0 C1C1

Synchronous CCP P0P0 P1P1 B0B0 B1B1 C0C0 C1C1 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.

Synchronous CCP P0P0 P1P1 B0B0 B1B1 C1C1 C0C0 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.

Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1.

Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ] » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Read Operation

Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Choice has already been made by the other processor

Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Only condition for making a choice

Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Generate a random value Write operation

Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Exchange Registers

Correctness of Algorithm We need to prove only one of the shared register has √ marked in it Suppose that both are marked with √ –This must have had in same iteration –Otherwise step 2.1 will halt the algorithm

Correctness of Algorithm Let us assume that the error takes place during the t th iteration After step 1 values for processor P i B i (t) and R i (t) By case 2.3 R 0 (t) = B 1 (t) R 1 (t) = B 0 (t) Suppose P i writes √ in the t th iteration, then R i = 0 and B i = 1 and R 1-i = 1 and B 1-i = 0 P 1-i cannot write √ in ith iteration Breaking Symmetry

R0B0 00 Read Operation R1B1 00 C0C1 00 Processor 0Shared RegistersProcessor 1

R0B0 00 00 Write Operation R1B1 00 00 C0C1 00 00 Processor 0Shared RegistersProcessor 1 Random

R0B0 00 00 00 Read Operation R1B1 00 00 00 C0C1 00 00 00 Processor 0Shared RegistersProcessor 1

R0B0 00 00 00 01 Write Operation R1B1 00 00 00 01 C0C1 00 00 00 11 Processor 0Shared RegistersProcessor 1 Random

R0B0 00 00 00 01 11 Read Operation R1B1 00 00 00 01 11 C0C1 00 00 00 11 11 Processor 0Shared RegistersProcessor 1

R0B0 00 00 00 01 11 10 Write Operation R1B1 00 00 00 01 11 11 C0C1 00 00 00 11 11 01 Processor 0Shared RegistersProcessor 1 Random

R0B0 00 00 00 01 11 10 10 Read Operation R1B1 00 00 00 01 11 11 01 HALT C0C1 00 00 00 11 11 01 01 Processor 0Shared RegistersProcessor 1

R0B0 00 00 00 01 11 10 10 10/1 Write Operation R1B1 00 00 00 01 11 11 01 HALT C0C1 00 00 00 11 11 01 01 √0/1 Processor 0Shared RegistersProcessor 1 Random

R0B0 00 00 00 01 11 10 10 10/1 √ HALT Read Operation R1B1 00 00 00 01 11 11 01 HALT C0C1 00 00 00 11 11 01 01 √0/1 √ Processor 0Shared RegistersProcessor 1

Complexity Probability that both the random bits B 0 and B 1 are the same is 1/2 Therefore probability that number of steps exceeds t is 1/2 t. The algorithm will terminate in next two steps as soon as B 0 and B 1 are different. Computation cost of each iteration is bounded –Therefore, the protocol does O(t) work with probability 1-1/2 t

The Problem C1C1 C2C2 P2P2 P1P1

C1C1 C2C2 P2P2 P1P1

C1C1 C2C2 P1P1 P2P2

The processors are not synchronized C1C1 C2C2 P1P1 P2P2

What can we do? C1C1 C2C2 P1P1 P2P2

Idea: Timestamp C1C1 C2C2 P1P1 P2P2

Read P1P1 P2P2 B1B1 B2B2 C1C1 C2C2 T1T1 T2T2 t2t2 t1t1 Timestamp of Processor: T i Timestamp of Register : t i

Input: Registers C 1 and C 2 initialized to Output: Exactly one of the two registers has  Algorithm

0) P i initially scans a randomly chosen register. are initialized to 1)P i gets a lock on its current register and reads 2)P i executes one of these cases: 2.1) If R i =  : HALT 2.2) If T i < t i : T i  t i and B i  R i 2.3) If T i > t i : Write  into the current register and HALT 2.4) If T i = t i, R i = 0, B i = 1 : Write  into the current register and HALT 2.5) Otherwise: T i  T i + 1 and t i  t i + 1 B i  Random (unbiased) bit Write into the current register 3) P i releases the lock on its current register, moves to the other register and returns to step 1. Algorithm for a process P i

Initial state B2B2 T2T2 B1B1 T1T1 C1C1 t1t1 00 C2C2 t2t2 00 Processor P1Register R2Register R1Processor P2

1) P 1 chooses C 1 and reads B2B2 T2T2 B1B1 T1T1 00 C1C1 t1t1 00 C2C2 t2t2 00 History : P 1 ==C 1

1) P 1 chooses C 1 and reads B2B2 T2T2 B1B1 T1T1 00 C1C1 t1t1 00 C2C2 t2t2 00 [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1

B1B1 T1T1 00 1 C1C1 t1t1 00 1 2.5) T 1  T 1 + 1 and t 1  t 1 + 1 B2B2 T2T2 C2C2 t2t2 00 History : P 1 ==C 1

2.5) P 1 writes into C 1 B1B1 T1T1 00 11 C1C1 t1t1 00 11 B2B2 T2T2 C2C2 t2t2 00 History : P 1 ==C 1

3) P 1 releases the lock on C 1 B1B1 T1T1 00 11 C1C1 t1t1 00 11 B2B2 T2T2 C2C2 t2t2 00 [P 1 moves to C 2 and returns to step 1] History : P 1 ==C 1

B2B2 T2T2 00 C2C2 t2t2 00 1) P 2 chooses C 2 and reads B1B1 T1T1 00 11 C1C1 t1t1 00 11 History : P 1 ==C 1 P 2 ==C 2

B2B2 T2T2 00 C2C2 t2t2 00 1) P 2 chooses C 2 and reads B1B1 T1T1 00 11 C1C1 t1t1 00 11 [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2

B2B2 T2T2 00 1 C2C2 t2t2 00 1 2.5) T 2  T 2 + 1 and t 2  t 2 + 1 B1B1 T1T1 00 11 C1C1 t1t1 00 11 History : P 1 ==C 1 P 2 ==C 2

2.5) P 2 writes into C 2 B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 History : P 1 ==C 1 P 2 ==C 2

B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 3) P 2 releases the lock on C 2 [P 2 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2

B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 1) P 2 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 1) P 2 locks C 1 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

B2B2 T2T2 00 1 1 2 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 2 2.5) T 2  T 2 + 1 and t 1  t 1 + 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 2.5) P 2 writes into C 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 3) P 2 releases the lock on C 1 [P 2 moves to C 2 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads [Case 2.3: T 2 > t 2 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

B2B2 T2T2 00 1 1 02 C2C2 T2T2 00 11  B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 2.3) P 2 writes  into C 2 [P 2 HALTS] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

We’ll show another case of the algorithm History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

Let’s go back 1 iteration History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 Let’s go back 1 iteration

B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 1 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 1 locks C 2 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 2 B1B1 T1T1 00 11 2 C1C1 t1t1 00 11 02 2.5) T 1  T 1 + 1 and t 2  t 2 + 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.5) B 1  Random (unbiased) bit History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 3) P 1 releases the lock on C 2 [P 1 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

B2B2 T2T2 00 1 1 02 3 C2C2 t2t2 00 11 12 3 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.5) T 2  T 2 + 1 and t 2  t 2 + 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.5) P 2 writes into C 2 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 3) P 2 releases the lock on C 2 [P 2 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 1 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 1 locks C 1 and reads [Case 2.4: T 1 = t 1, R 1 = 0, B 1 = 1 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02  2.4) P 1 writes  into C 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02  2.4) P 1 HALTS History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02  1) P 2 locks C 1 and reads  History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1

B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02  1) P 2 locks C 1 and reads  [Case 2.1: R 1 =  is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1

B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02  2.1) P 2 HALTS History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1

Correctness C1C1 C2C2 P1P1  P2P2

When a processor writes  on a register, the other processor should NOT write  on the other register

Correctness Case 2.3) T i > t i : Write  into the current register and halt. Case 2.4) T i = t i, R i = 0, B i = 1: Write  into the current register and halt. C1C1 C2C2 

T i * : Current timestamp of processor P i t i * : Current timestamp of register C i Whenever P i finishes an iteration in C i, T i = t i Correctness

T i * : Current timestamp of processor P i t i * : Current timestamp of register C i When a processor enters a register, it would have just left the other register Correctness

2.3) T i > t i : Write  into the current register and HALT Consider P 1 has just entered C 1 with t 1 * < T 1 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 In prev iter, P 1 must have left C 2 with same T 1 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 P 2 must go to C 2 only after C 1 T 1 * ≤ t 2 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 2 * ≤ t 1 * T 1 * ≤ t 2 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * Summing up : T 2 * ≤ t 1 * < T 1 * ≤ t 2 *

2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * T 2 * < t 2 * : T 2 cannot write 

2.4) T i = t i, R i = 0, B i = 1: Write  into register and HALT Similarly consider P 1 has entered C 1 with t 1 * = T 1 *

2.4) T i = t i, R i = 0, B i = 1: Write  into register and HALT C2C2 t2t2 00 11 (t2*)(t2*) B2B2 t2t2 00 01 (T2*)(T2*) B1B1 T1T1 00 1 1(T1*)1(T1*) C1C1 t1t1 00 01(t1*)1(t1*) Similarly consider P 1 has entered C 1 with t 1 * = T 1 * History : P 1 ==C 2 P 2 ==C 1 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * Summing up : T 2 * ≤ t 1 * = T 1 * ≤ t 2 *

2.4) T i = t i, R i = 0, B i = 1: Write  into register and HALT C2C2 t2t2 00 11 (t2*)(t2*) B2B2 t2t2 00 01 (T2*)(T2*) B1B1 T1T1 00 1 1(T1*)1(T1*) C1C1 t1t1 00 01(t1*)1(t1*) Similarly consider P 1 has entered C 1 with t 1 * = T 1 * History : P 1 ==C 2 P 2 ==C 1 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * T 2 * ≤ t 2 *, R 2 = 1, B 2 = 0 : T 2 cannot write 

Cost is proportional to the largest timestamp Timestamp can go up only in case 2.5 Processor’s current B i value is set during a visit to the other register So, synchronous case complexity applies Complexity

REAL WORLD APPLICATIONS Pham Nam Khanh

Applications of parallel sorting Sorting is fundamental algorithm in data processing: » Parallel Database operations: Rank, Join, etc. » Search (rapid index/lookup after sort) Best record in sorting: 102.5 TB in 4,328 seconds using 2100 nodes from Yahoo.

Applications of MIS Wireless and communication Scheduling problem Perfect matching => assignment problem Finance

Applications of Maximal independent set Market graph EAFE EM Low latency requirement Parallel MIS

Applications of Maximal independent set Market graph Stocks Commodities Bonds

Applications of Maximal independent set Market graph

 MIS form completely diversified portfolio, where all instruments are negatively correlated with each other => lower the risk

Applications of Choice coordination algorithm Given n processes, each one can choose between m options. They need to agree on unique choice. => belongs to class of distributed consensus algorithms. HW and SW task involving concurrency Clock sync in wireless sensor networks Multivehicle cooperative control

Coordinate the movement of multiple vehicles in a certain way to accomplish an objective. Task Assignment, cooperative transport, cooperative role assignment, air traffic control, cooperative timing.

CONCLUSION

Conclusion PRAM model: CREW Parallel algorithm Maximal Independent Set with O(log n) and applications Parallel sorting algorithm: QuickSort with O(log 2 n) BoxSort with O(log n) Choice Coordination Problem: distributed algorithms for synchronous and asynchronous system + applications

Parallel and Distributed Algorithms. Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination.

Similar presentations

Presentation on theme: "Parallel and Distributed Algorithms. Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel and Distributed Algorithms. Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination.

Similar presentations

Presentation on theme: "Parallel and Distributed Algorithms. Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination."— Presentation transcript:

Similar presentations

About project

Feedback