Parallel and Distributed Algorithms
Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination problem Real world applications
INTRODUCTION
Need of distributed processing A massively parallel processing machine CPUs with 1000 processors Moore’s law coming to an end
Parallel Algorithm A parallel algorithm is an algorithm which can be executed a piece at a time on many different processing devices, and then combined together again at the end to get the correct result.* * Blelloch, Guy E.; Maggs, Bruce M. Parallel Algorithms. USA: School of Computer Science, Carnegie Mellon University.
Distributed Algorithm A distributed algorithm is an algorithm designed to run on computer hardware constructed from interconnected processors.* *Lynch, Nancy (1996). Distributed Algorithms. San Francisco, CA: Morgan Kaufmann Publishers. ISBN Morgan Kaufmann PublishersISBN
PRAM
Random Access Machine An abstract machine with unbounded number of local memory cells and with simple set of instruction sets Time complexity: number of instructions executed Space complexity: number of memory cells used All operations take Unit time
PRAM (Parallel Random Access Machine) PRAM is a parallel version of RAM for designing the algorithms applicable to parallel computers Why PRAM ? The number of processor execute per one cycle on P processors is at most P Any processor can read/write any shared memory cell in unit time It abstracts from overhead which makes the complexity of PRAM algorithm easier It is a benchmark
Read A[i-1], Computation A[i]=A[i-1]+1, Write A[i] Shared Memory (A) P1 P2 Pn A[0] A[1]=A[0]+1 A[2]=A[1]+1 A[n]=A[n-1]+1 A[1] A[2] A[n-1] A[n]
Share Memory Access Conflicts Exclusive Read(ER) : all processors can simultaneously read from distinct memory locations Exclusive Write(EW) : all processors can simultaneously write to distinct memory locations Concurrent Read(CR) : all processors can simultaneously read from any memory location Concurrent Write(CW) : all processors can write to any memory location EREW, CREW, CRCW
Complexity Parallel time complexity : The number of synchronous steps in the algorithm Space complexity: The number of share memory cell Parallelism: The number of processors used
MAXIMAL INDEPENDENT SET Lahiru Samarakoon Sumanaruban Rajadurai
14 Independent Set (IS): Any set of nodes that are not adjacent
15 Maximal Independent Set (MIS): An independent set that is no subset of any other independent set
16 Maximal vs. Maximum IS a maximum independent seta maximal independent set
17 A Sequential Greedy algorithm Suppose that will hold the final MIS Initially
18 Pick a node and add it to Phase 1:
19 Remove and neighbors
20 Remove and neighbors
21 Pick a node and add it to Phase 2:
22 Remove and neighbors
23 Remove and neighbors
24 Repeat until all nodes are removed Phases 3,4,5,…:
25 Repeat until all nodes are removed No remaining nodes Phases 3,4,5,…,x:
26 At the end, set will be an MIS of
27 Running time of algorithm: Worst case graph: nodes
Intuition for parallelization 28 At each phase we may select any independent set (instead of a single node), remove S and neighbors of S from the graph.
29 Suppose that will hold the final MIS Initially Example:
30 Find any independent set Phase 1: And insert to :
31 remove and neighbors
32 Phase 2: Find any independent set And insert to : On new graph
33 remove and neighbors
34 remove and neighbors
35 Phase 3: Find any independent set And insert to : On new graph
36 remove and neighbors
37 No nodes are left remove and neighbors
38 Final MIS
39 The number of phases depends on the choice of independent set in each phase The larger the independent set at each phase the faster the algorithm Observation:
Let be the degree of node Randomized Maximal Independent Set ( MIS )
41 Each node elects itself with probability At each phase : 1 2 degree of in Elected nodes are candidates for the independent set
42 If two neighbors are elected simultaneously, then the higher degree node wins Example: if
43 If both have the same degree, ties are broken arbitrarily Example: if
44 Problematic nodes Using previous rules, problematic nodes are removed
45 The remaining elected nodes form independent set
46 mark lower- degree vertices with higher probability Luby’s algorithm
47 Problematic nodes Using previous rules, problematic nodes are removed
48 if both end- points of an edge is marked, unmark the one with the lower degree Luby’s algorithm
49 The remaining elected nodes form independent set
50 remove marked vertices with their neighbors and corresponding edges add all marked vertices to MIS Luby’s algorithm
ANALYSIS
6 2 Goodness property A vertex v is good at least ⅓ of its neighbors have lesser degree than it. bad otherwise. An edge is bad if its both endpoints are bad. good otherwise.
Lemma 1 Let v Є V be a good vertex with degree d(v) > 0.Then, the probability that some vertex w in N( v) gets marked is at least 1 - exp( -1 / 6). Define L(v) is set of neighbors of v whose degree is lesser than v’s degree. By definition, |L(v)|≥d(v)/3 if v is a GOOD vertex.
Lemma 2 During any iteration, if a vertex w is marked then it is selected to be in S with probability at least 1/2.
From lemma1 and 2 => The probability that a good vertex belongs to S U N(S) is at least (1 - exp(- 1/6))/2. Good vertices get eliminated with a constant probability. It follows that the expected number of edges eliminated during an iteration is a constant fraction of the current set of edges. This implies that the expected number of iterations of the Parallel MIS algorithm is O(log n).
Lemma 3 In a graph G(V,E), the number of good edges is at least |E|/2. Proof Direct the edges in E from the lower degree endpoint to the higher degree end-point, breaking ties arbitrarily. for each bad vertex v For all S, T С V, define the subset of the (oriented) edges E(S, T) as those edges that are directed from vertices in S to vertices in T
Let V G and V B be the set of good and bad vertices
SORTING ON PRAM Jessica Makucka Puneet Dewan
Sorting Current problem: sort n numbers Best average case for sorting is O(nlog n) Can we do better with more processors? YES!
Notes about Quicksort Sort n numbers on a PRAM with n processors Assume all numbers are distinct CREW PRAM for this case Each of the n processors contains an input element Notation: Let P i denote ith processor
Quicksort Algorithm 0. If n=1 stop 1.Pick a splitter at random from n elements 2.Each processor determines whether its element is bigger or smaller than the splitter 3.Let j denote splitters rank: If j [n/4, 3n/4] means failure, go back to (1) If j [n/4, 3n/4] means success and move splitter to P j Every element smaller than j is moved to distinct processor P i for i j 4.Sort elements recursively in processors P 1 through P j-1, and the elements in processors P j+1 through P n
Quicksort Time Analysis Algorithm 1. Pick a successful splitter at random from n elements (assumption) 2. Each processor determines whether its element is bigger or smaller than the splitter Time Analysis of each stage 1. O(logn) stages for every sequence or recursive split 2. Trivial – can be implemented in single CREW PRAM step
Quicksort Time Analysis 3.Let j denote splitters rank: If j [n/4, 3n/4] go back to 1. If j [n/4, 3n/4] move splitter to P j Every element smaller than j is moved to distinct processor P i for i j O(log n) PRAM steps needed for the single splitting stage
Comparison Splitting Stage (3) P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P splitter Assign bit depending on if P i ’s element is smaller or bigger than the splitter - 0 if element is bigger - 1 otherwise
Comparison Splitting Stage (3) P1P1 P2P2 P3P3 P4P4 P5P splitter Step 1: Step 2: ++ +
Overall Time Analysis This algorithm would terminate in O(log 2 n) steps Each step is O(log n) for splitting stage O(log n) steps Derived from this solved equation:
Cons In this algorithm, There is an assumption that split will always be successful and it will break the problem from N to a constant fraction of N. No Suitable method for successful split.
Improvement Idea Reduce the problem into size of n 1-e where e<1 while keeping the time to split the same.
Benefits if e=1/2 The total time for the entire problem size will be: log n + log n 1/2 +log n 1/4 +… resulting in O(log n) Then we could hope for an overall running time of O(log n).
Long Story Suppose that we have n processors and n elements. Suppose that processors P 1 through P r, contain r of the elements in sorted order, and that processors P r+1 through P n contain the remaining n - r elements. 1.Choose Random Splitters and sort them. Let the sorted elements in the first r processors the splitters. For 1 < =j <= r, let s j denote the j th largest splitter. 2. Insert Insert the n - r unsorted elements among the splitters. 3.Sort remaining elements among splitters a. Each processor should end up with a distinct input element. b. Let i(s j ) denote the index of the processor containing s j following the insertion operation. Then, for all k < i(s j ), processor P k contains an element that is smaller than s j similarly, for all k > i(s j ), processor P k contains an element that is larger than s j.
Example Choose Random Splitter
Example (Contd.) Sort the random splitters. Sorted List Unsorted List
Example(Contd.) Insert the unsorted elements among the splitters
Example(Contd.) Check the number of elements between the splitters has size less than or equal to (Log n ) or not. Suppose S represents size S=4 (exceeds log n i.e 3) S=1 S=
Example Contd. Recur on the sub problem whose size exceeds log n. Again choose random splitters and follow the same process Random Splitters
Partitioning as tree Tree formed from first partition. Now the size on the right exceeds log n, so we again split by choosing random partitions. E.g 9, Size on right exceeds log n
Contd. Sorted because of partition
Lemma’s to be Used 1.A CREW PRAM having (n 2 ) processors. Suppose that each of the processors P 1 through P n has an input element to be sorted. Then the PRAM can sort these n elements in O(log n). 2. For n processors, and n elements of which n 1/2 are splitters, then the insertion process can be completed in O(log n) steps.
Box Sort Algorithm : Input: A set of numbers S. Output: The elements of S sorted in increasing order. 1. Select n 1/2 (e is 1/2)elements at random from the n input elements. Using all n processors, sort them in O(log n) steps.(Fact 1) 2. Using the sorted elements from Stage 1 as splitters, insert the remaining elements among them in O(log n) steps(Fact 2) 3. Treating the elements that are inserted between adjacent splitters as subproblems, recur on each sub-problem whose size exceeds log n. For subproblems of size log n or less, invoke LogSort.
Sort Fact A CREW PRAM with m processors can sort m elements in O(m) steps.
Example Each Processor is assigned an element and compares its element with remaining elements simultaneously in O(m) steps. Rank assigned implies elements are sorted P1P2P3P4P5P6P7P8 Ranks Assigned
Things to remember Last statement of Box Sort algorithm. Idea on the previous slide.
Log Sort We will be having log n processors with log n elements then we can sort in O(log n).
Analysis Consider each node of the tree as a box. Choosing random splitters and sort them take time of O(log n). Insert the unsorted elements among the splitters takes O(log n). With high probability (assumption) the sub problems resulting from splitting operation are very small(i.e the unsorted elements among the splitters). So each leaf is a box of size at most log n. For calculating the time spent, we can use the Log Sort which sorts the elements in O(log n) Total time is O(log n)
DISTRIBUTED RANDOMIZED ALGORITHM Yogesh S Rawat R. Ramanathan
CHOICE COORDINATION PROBLEM (CCP)
Biological Inspiration mite (genus Myrmoyssus)
Biological Inspiration mite (genus Myrmoyssus) reside as parasites on the ear membrane of the moths of family Phaenidae
Biological Inspiration mite (genus Myrmoyssus) reside as parasites on the ear membrane of the moths of family Phaenidae Moths are prey to bats and the only defense they have is that they can hear the sonar used by an approaching bat
Biological Inspiration if both ears of the moth are infected by the mites, then their ability to detect the sonar is considerably diminished, thereby severely decreasing the survival chances of both the moth and its colony of mites.
Biological Inspiration The mites are therefore faced with a "choice coordination problem" How does any collection of mites infecting a particular ear ensure that every other mite chooses the same ear?
Problem Specification Set of N processors
Problem Specification Set of N processors M options to choose from
Problem Specification Set of N processors processors have to reach a consensus on unique choice M options to choose from
Model for Communication Collection of M read-write registers accessible to all the processors – Locking mechanism for conflicts Each processor follow a protocol for making a choice – A special symbol (√) is used to mark the choice At the end only one register contains the special symbol
Deterministic Solution Complexity is measured in terms of number of read and write operations For a deterministic solution – Complexity in terms of operations : Ω(n 1/3 ) n - Number of processors For more details - M. O. Rabin, “The choice coordination problem,” Acta Informatica, vol. 17, no. 2, pp. 121–134, Jun
Randomized Solution for any c > 0 It will solve the problem using c operations with a probability of success atleast 1-2 -Ω(c) For simplicity we will consider only the case where n = m = 2 although the protocol can be easily generalized
Analogy from Real Life ? ? Random Action - Give way or Move ahead
Analogy from Real Life ? ? Random Action - Give way or Move ahead Give way Move Ahead Give way Move Ahead Person 1 Person 2
Analogy from Real Life ? ? Random Action - Give way or Move ahead Give way Move Ahead Give way Move Ahead Breaking Symmetry Person 1 Person 2
Synchronous CCP The two processors are synchronous Operate in lock-step according to some global clock Used terminology P i – processor i, where i ϵ {0,1} C i – shared register for choices, where i ϵ {0,1} B i – local variable for each processor, where i ϵ {0,1} P0P0 P1P1 B0B0 B1B1 C0C0 C1C1
Synchronous CCP P0P0 P1P1 B0B0 B1B1 C0C0 C1C1 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.
Synchronous CCP P0P0 P1P1 B0B0 B1B1 C0C0 C1C1 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.
Synchronous CCP P0P0 P1P1 B0B0 B1B1 C1C1 C0C0 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.
Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1.
Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ] » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Read Operation
Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Choice has already been made by the other processor
Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Only condition for making a choice
Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Generate a random value Write operation
Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Exchange Registers
Correctness of Algorithm We need to prove only one of the shared register has √ marked in it Suppose that both are marked with √ –This must have had in same iteration –Otherwise step 2.1 will halt the algorithm
Correctness of Algorithm Let us assume that the error takes place during the t th iteration After step 1 values for processor P i B i (t) and R i (t) By case 2.3 R 0 (t) = B 1 (t) R 1 (t) = B 0 (t) Suppose P i writes √ in the t th iteration, then R i = 0 and B i = 1 and R 1-i = 1 and B 1-i = 0 P 1-i cannot write √ in ith iteration Breaking Symmetry
R0B0 00 Read Operation R1B1 00 C0C1 00 Processor 0Shared RegistersProcessor 1
R0B Write Operation R1B C0C Processor 0Shared RegistersProcessor 1 Random
R0B Read Operation R1B C0C Processor 0Shared RegistersProcessor 1
R0B Write Operation R1B C0C Processor 0Shared RegistersProcessor 1 Random
R0B Read Operation R1B C0C Processor 0Shared RegistersProcessor 1
R0B Write Operation R1B C0C Processor 0Shared RegistersProcessor 1 Random
R0B Read Operation R1B HALT C0C Processor 0Shared RegistersProcessor 1
R0B /1 Write Operation R1B HALT C0C √0/1 Processor 0Shared RegistersProcessor 1 Random
R0B /1 √ HALT Read Operation R1B HALT C0C √0/1 √ Processor 0Shared RegistersProcessor 1
Complexity Probability that both the random bits B 0 and B 1 are the same is 1/2 Therefore probability that number of steps exceeds t is 1/2 t. The algorithm will terminate in next two steps as soon as B 0 and B 1 are different. Computation cost of each iteration is bounded –Therefore, the protocol does O(t) work with probability 1-1/2 t
The Problem C1C1 C2C2 P2P2 P1P1
C1C1 C2C2 P2P2 P1P1
C1C1 C2C2 P1P1 P2P2
C1C1 C2C2 P1P1 P2P2
C1C1 C2C2 P1P1 P2P2
The processors are not synchronized C1C1 C2C2 P1P1 P2P2
What can we do? C1C1 C2C2 P1P1 P2P2
Idea: Timestamp C1C1 C2C2 P1P1 P2P2
Read P1P1 P2P2 B1B1 B2B2 C1C1 C2C2 T1T1 T2T2 t2t2 t1t1 Timestamp of Processor: T i Timestamp of Register : t i
Input: Registers C 1 and C 2 initialized to Output: Exactly one of the two registers has Algorithm
0) P i initially scans a randomly chosen register. are initialized to 1)P i gets a lock on its current register and reads 2)P i executes one of these cases: 2.1) If R i = : HALT 2.2) If T i < t i : T i t i and B i R i 2.3) If T i > t i : Write into the current register and HALT 2.4) If T i = t i, R i = 0, B i = 1 : Write into the current register and HALT 2.5) Otherwise: T i T i + 1 and t i t i + 1 B i Random (unbiased) bit Write into the current register 3) P i releases the lock on its current register, moves to the other register and returns to step 1. Algorithm for a process P i
Initial state B2B2 T2T2 B1B1 T1T1 C1C1 t1t1 00 C2C2 t2t2 00 Processor P1Register R2Register R1Processor P2
1) P 1 chooses C 1 and reads B2B2 T2T2 B1B1 T1T1 00 C1C1 t1t1 00 C2C2 t2t2 00 History : P 1 ==C 1
1) P 1 chooses C 1 and reads B2B2 T2T2 B1B1 T1T1 00 C1C1 t1t1 00 C2C2 t2t2 00 [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1
B1B1 T1T C1C1 t1t ) T 1 T and t 1 t B2B2 T2T2 C2C2 t2t2 00 History : P 1 ==C 1
2.5) P 1 writes into C 1 B1B1 T1T C1C1 t1t B2B2 T2T2 C2C2 t2t2 00 History : P 1 ==C 1
3) P 1 releases the lock on C 1 B1B1 T1T C1C1 t1t B2B2 T2T2 C2C2 t2t2 00 [P 1 moves to C 2 and returns to step 1] History : P 1 ==C 1
B2B2 T2T2 00 C2C2 t2t2 00 1) P 2 chooses C 2 and reads B1B1 T1T C1C1 t1t History : P 1 ==C 1 P 2 ==C 2
B2B2 T2T2 00 C2C2 t2t2 00 1) P 2 chooses C 2 and reads B1B1 T1T C1C1 t1t [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2
B2B2 T2T C2C2 t2t ) T 2 T and t 2 t B1B1 T1T C1C1 t1t History : P 1 ==C 1 P 2 ==C 2
2.5) P 2 writes into C 2 B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t History : P 1 ==C 1 P 2 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 releases the lock on C 2 [P 2 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 1 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) T 2 T and t 1 t History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 writes into C 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 releases the lock on C 1 [P 2 moves to C 2 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 2 and reads [Case 2.3: T 2 > t 2 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2
B2B2 T2T C2C2 T2T B1B1 T1T C1C1 t1t ) P 2 writes into C 2 [P 2 HALTS] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2
We’ll show another case of the algorithm History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2
Let’s go back 1 iteration History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 Let’s go back 1 iteration
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 1 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 1 locks C 2 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) T 1 T and t 2 t History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) B 1 Random (unbiased) bit History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 1 releases the lock on C 2 [P 1 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 locks C 2 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) T 2 T and t 2 t History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 writes into C 2 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 2 releases the lock on C 2 [P 2 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 1 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t ) P 1 locks C 1 and reads [Case 2.4: T 1 = t 1, R 1 = 0, B 1 = 1 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t 2.4) P 1 writes into C 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t 2.4) P 1 HALTS History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t 1) P 2 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t 1) P 2 locks C 1 and reads [Case 2.1: R 1 = is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1
B2B2 T2T C2C2 t2t B1B1 T1T C1C1 t1t 2.1) P 2 HALTS History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1
Correctness C1C1 C2C2 P1P1 P2P2
When a processor writes on a register, the other processor should NOT write on the other register
Correctness Case 2.3) T i > t i : Write into the current register and halt. Case 2.4) T i = t i, R i = 0, B i = 1: Write into the current register and halt. C1C1 C2C2
T i * : Current timestamp of processor P i t i * : Current timestamp of register C i Whenever P i finishes an iteration in C i, T i = t i Correctness
T i * : Current timestamp of processor P i t i * : Current timestamp of register C i When a processor enters a register, it would have just left the other register Correctness
2.3) T i > t i : Write into the current register and HALT Consider P 1 has just entered C 1 with t 1 * < T 1 *
2.3) T i > t i : Write into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1
2.3) T i > t i : Write into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 In prev iter, P 1 must have left C 2 with same T 1 *
2.3) T i > t i : Write into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 *
2.3) T i > t i : Write into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 *
2.3) T i > t i : Write into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 P 2 must go to C 2 only after C 1 T 1 * ≤ t 2 *
2.3) T i > t i : Write into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 2 * ≤ t 1 * T 1 * ≤ t 2 *
2.3) T i > t i : Write into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 *
2.3) T i > t i : Write into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 *
2.3) T i > t i : Write into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * Summing up : T 2 * ≤ t 1 * < T 1 * ≤ t 2 *
2.3) T i > t i : Write into the current register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T 1 * ) C1C1 t1t (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * T 2 * < t 2 * : T 2 cannot write
2.4) T i = t i, R i = 0, B i = 1: Write into register and HALT Similarly consider P 1 has entered C 1 with t 1 * = T 1 *
2.4) T i = t i, R i = 0, B i = 1: Write into register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T1*)1(T1*) C1C1 t1t (t1*)1(t1*) Similarly consider P 1 has entered C 1 with t 1 * = T 1 * History : P 1 ==C 2 P 2 ==C 1 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * Summing up : T 2 * ≤ t 1 * = T 1 * ≤ t 2 *
2.4) T i = t i, R i = 0, B i = 1: Write into register and HALT C2C2 t2t (t2*)(t2*) B2B2 t2t (T2*)(T2*) B1B1 T1T (T1*)1(T1*) C1C1 t1t (t1*)1(t1*) Similarly consider P 1 has entered C 1 with t 1 * = T 1 * History : P 1 ==C 2 P 2 ==C 1 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * T 2 * ≤ t 2 *, R 2 = 1, B 2 = 0 : T 2 cannot write
Cost is proportional to the largest timestamp Timestamp can go up only in case 2.5 Processor’s current B i value is set during a visit to the other register So, synchronous case complexity applies Complexity
REAL WORLD APPLICATIONS Pham Nam Khanh
Applications of parallel sorting Sorting is fundamental algorithm in data processing: » Parallel Database operations: Rank, Join, etc. » Search (rapid index/lookup after sort) Best record in sorting: TB in 4,328 seconds using 2100 nodes from Yahoo.
Applications of MIS Wireless and communication Scheduling problem Perfect matching => assignment problem Finance
Applications of Maximal independent set Market graph EAFE EM Low latency requirement Parallel MIS
Applications of Maximal independent set Market graph Stocks Commodities Bonds
Applications of Maximal independent set Market graph
MIS form completely diversified portfolio, where all instruments are negatively correlated with each other => lower the risk
Applications of Choice coordination algorithm Given n processes, each one can choose between m options. They need to agree on unique choice. => belongs to class of distributed consensus algorithms. HW and SW task involving concurrency Clock sync in wireless sensor networks Multivehicle cooperative control
Coordinate the movement of multiple vehicles in a certain way to accomplish an objective. Task Assignment, cooperative transport, cooperative role assignment, air traffic control, cooperative timing.
CONCLUSION
Conclusion PRAM model: CREW Parallel algorithm Maximal Independent Set with O(log n) and applications Parallel sorting algorithm: QuickSort with O(log 2 n) BoxSort with O(log n) Choice Coordination Problem: distributed algorithms for synchronous and asynchronous system + applications