Download presentation
Presentation is loading. Please wait.
Published byJoseph Maxwell Modified over 8 years ago
1
Parallel and Distributed Algorithms
2
Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination problem Real world applications
3
INTRODUCTION
4
Need of distributed processing A massively parallel processing machine CPUs with 1000 processors Moore’s law coming to an end
5
Parallel Algorithm A parallel algorithm is an algorithm which can be executed a piece at a time on many different processing devices, and then combined together again at the end to get the correct result.* * Blelloch, Guy E.; Maggs, Bruce M. Parallel Algorithms. USA: School of Computer Science, Carnegie Mellon University.
6
Distributed Algorithm A distributed algorithm is an algorithm designed to run on computer hardware constructed from interconnected processors.* *Lynch, Nancy (1996). Distributed Algorithms. San Francisco, CA: Morgan Kaufmann Publishers. ISBN 978-1-55860-348-6.Morgan Kaufmann PublishersISBN978-1-55860-348-6
7
PRAM
8
Random Access Machine An abstract machine with unbounded number of local memory cells and with simple set of instruction sets Time complexity: number of instructions executed Space complexity: number of memory cells used All operations take Unit time
9
PRAM (Parallel Random Access Machine) PRAM is a parallel version of RAM for designing the algorithms applicable to parallel computers Why PRAM ? The number of processor execute per one cycle on P processors is at most P Any processor can read/write any shared memory cell in unit time It abstracts from overhead which makes the complexity of PRAM algorithm easier It is a benchmark
10
Read A[i-1], Computation A[i]=A[i-1]+1, Write A[i] Shared Memory (A) P1 P2 Pn A[0] A[1]=A[0]+1 A[2]=A[1]+1 A[n]=A[n-1]+1 A[1] A[2] A[n-1] A[n]
11
Share Memory Access Conflicts Exclusive Read(ER) : all processors can simultaneously read from distinct memory locations Exclusive Write(EW) : all processors can simultaneously write to distinct memory locations Concurrent Read(CR) : all processors can simultaneously read from any memory location Concurrent Write(CW) : all processors can write to any memory location EREW, CREW, CRCW
12
Complexity Parallel time complexity : The number of synchronous steps in the algorithm Space complexity: The number of share memory cell Parallelism: The number of processors used
13
MAXIMAL INDEPENDENT SET Lahiru Samarakoon Sumanaruban Rajadurai
14
14 Independent Set (IS): Any set of nodes that are not adjacent
15
15 Maximal Independent Set (MIS): An independent set that is no subset of any other independent set
16
16 Maximal vs. Maximum IS a maximum independent seta maximal independent set
17
17 A Sequential Greedy algorithm Suppose that will hold the final MIS Initially
18
18 Pick a node and add it to Phase 1:
19
19 Remove and neighbors
20
20 Remove and neighbors
21
21 Pick a node and add it to Phase 2:
22
22 Remove and neighbors
23
23 Remove and neighbors
24
24 Repeat until all nodes are removed Phases 3,4,5,…:
25
25 Repeat until all nodes are removed No remaining nodes Phases 3,4,5,…,x:
26
26 At the end, set will be an MIS of
27
27 Running time of algorithm: Worst case graph: nodes
28
Intuition for parallelization 28 At each phase we may select any independent set (instead of a single node), remove S and neighbors of S from the graph.
29
29 Suppose that will hold the final MIS Initially Example:
30
30 Find any independent set Phase 1: And insert to :
31
31 remove and neighbors
32
32 Phase 2: Find any independent set And insert to : On new graph
33
33 remove and neighbors
34
34 remove and neighbors
35
35 Phase 3: Find any independent set And insert to : On new graph
36
36 remove and neighbors
37
37 No nodes are left remove and neighbors
38
38 Final MIS
39
39 The number of phases depends on the choice of independent set in each phase The larger the independent set at each phase the faster the algorithm Observation:
40
40 1 2 Let be the degree of node Randomized Maximal Independent Set ( MIS )
41
41 Each node elects itself with probability At each phase : 1 2 degree of in Elected nodes are candidates for the independent set
42
42 If two neighbors are elected simultaneously, then the higher degree node wins Example: if
43
43 If both have the same degree, ties are broken arbitrarily Example: if
44
44 Problematic nodes Using previous rules, problematic nodes are removed
45
45 The remaining elected nodes form independent set
46
46 mark lower- degree vertices with higher probability Luby’s algorithm
47
47 Problematic nodes Using previous rules, problematic nodes are removed
48
48 if both end- points of an edge is marked, unmark the one with the lower degree Luby’s algorithm
49
49 The remaining elected nodes form independent set
50
50 remove marked vertices with their neighbors and corresponding edges add all marked vertices to MIS Luby’s algorithm
51
ANALYSIS
52
6 2 Goodness property 3 4 4 3 4 44 A vertex v is good at least ⅓ of its neighbors have lesser degree than it. bad otherwise. An edge is bad if its both endpoints are bad. good otherwise.
53
Lemma 1 Let v Є V be a good vertex with degree d(v) > 0.Then, the probability that some vertex w in N( v) gets marked is at least 1 - exp( -1 / 6). Define L(v) is set of neighbors of v whose degree is lesser than v’s degree. By definition, |L(v)|≥d(v)/3 if v is a GOOD vertex.
55
Lemma 2 During any iteration, if a vertex w is marked then it is selected to be in S with probability at least 1/2.
58
From lemma1 and 2 => The probability that a good vertex belongs to S U N(S) is at least (1 - exp(- 1/6))/2. Good vertices get eliminated with a constant probability. It follows that the expected number of edges eliminated during an iteration is a constant fraction of the current set of edges. This implies that the expected number of iterations of the Parallel MIS algorithm is O(log n).
59
Lemma 3 In a graph G(V,E), the number of good edges is at least |E|/2. Proof Direct the edges in E from the lower degree endpoint to the higher degree end-point, breaking ties arbitrarily. for each bad vertex v For all S, T С V, define the subset of the (oriented) edges E(S, T) as those edges that are directed from vertices in S to vertices in T
60
Let V G and V B be the set of good and bad vertices
61
SORTING ON PRAM Jessica Makucka Puneet Dewan
62
Sorting Current problem: sort n numbers Best average case for sorting is O(nlog n) Can we do better with more processors? YES!
63
Notes about Quicksort Sort n numbers on a PRAM with n processors Assume all numbers are distinct CREW PRAM for this case Each of the n processors contains an input element Notation: Let P i denote ith processor
64
Quicksort Algorithm 0. If n=1 stop 1.Pick a splitter at random from n elements 2.Each processor determines whether its element is bigger or smaller than the splitter 3.Let j denote splitters rank: If j [n/4, 3n/4] means failure, go back to (1) If j [n/4, 3n/4] means success and move splitter to P j Every element smaller than j is moved to distinct processor P i for i j 4.Sort elements recursively in processors P 1 through P j-1, and the elements in processors P j+1 through P n
65
Quicksort Time Analysis Algorithm 1. Pick a successful splitter at random from n elements (assumption) 2. Each processor determines whether its element is bigger or smaller than the splitter Time Analysis of each stage 1. O(logn) stages for every sequence or recursive split 2. Trivial – can be implemented in single CREW PRAM step
66
Quicksort Time Analysis 3.Let j denote splitters rank: If j [n/4, 3n/4] go back to 1. If j [n/4, 3n/4] move splitter to P j Every element smaller than j is moved to distinct processor P i for i j O(log n) PRAM steps needed for the single splitting stage
67
Comparison Splitting Stage (3) P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 12375112114 splitter 011 1 1 10 Assign bit depending on if P i ’s element is smaller or bigger than the splitter - 0 if element is bigger - 1 otherwise
68
Comparison Splitting Stage (3) P1P1 P2P2 P3P3 P4P4 P5P5 1237511 splitter 11 1 0 1 2 3 Step 1: Step 2: ++ +
69
Overall Time Analysis This algorithm would terminate in O(log 2 n) steps Each step is O(log n) for splitting stage O(log n) steps Derived from this solved equation:
70
Cons In this algorithm, There is an assumption that split will always be successful and it will break the problem from N to a constant fraction of N. No Suitable method for successful split.
71
Improvement Idea Reduce the problem into size of n 1-e where e<1 while keeping the time to split the same.
72
Benefits if e=1/2 The total time for the entire problem size will be: log n + log n 1/2 +log n 1/4 +… resulting in O(log n) Then we could hope for an overall running time of O(log n).
73
Long Story Suppose that we have n processors and n elements. Suppose that processors P 1 through P r, contain r of the elements in sorted order, and that processors P r+1 through P n contain the remaining n - r elements. 1.Choose Random Splitters and sort them. Let the sorted elements in the first r processors the splitters. For 1 < =j <= r, let s j denote the j th largest splitter. 2. Insert Insert the n - r unsorted elements among the splitters. 3.Sort remaining elements among splitters a. Each processor should end up with a distinct input element. b. Let i(s j ) denote the index of the processor containing s j following the insertion operation. Then, for all k < i(s j ), processor P k contains an element that is smaller than s j similarly, for all k > i(s j ), processor P k contains an element that is larger than s j.
74
Example Choose Random Splitter 59810761211
75
Example (Contd.) Sort the random splitters. Sorted List Unsorted List 61159871012
76
Example(Contd.) Insert the unsorted elements among the splitters 56798101112
77
Example(Contd.) Check the number of elements between the splitters has size less than or equal to (Log n ) or not. Suppose S represents size S=4 (exceeds log n i.e 3) S=1 S=1 5 6 7 9 8 10 11 12 56798101112
78
Example Contd. Recur on the sub problem whose size exceeds log n. Again choose random splitters and follow the same process Random Splitters 56798101112
79
Partitioning as tree Tree formed from first partition. Now the size on the right exceeds log n, so we again split by choosing random partitions. E.g 9,8 6 5 7 9 8 10 Size on right exceeds log n
80
Contd. Sorted because of partition 6 5 8 9 7 10
81
Lemma’s to be Used 1.A CREW PRAM having (n 2 ) processors. Suppose that each of the processors P 1 through P n has an input element to be sorted. Then the PRAM can sort these n elements in O(log n). 2. For n processors, and n elements of which n 1/2 are splitters, then the insertion process can be completed in O(log n) steps.
82
Box Sort Algorithm : Input: A set of numbers S. Output: The elements of S sorted in increasing order. 1. Select n 1/2 (e is 1/2)elements at random from the n input elements. Using all n processors, sort them in O(log n) steps.(Fact 1) 2. Using the sorted elements from Stage 1 as splitters, insert the remaining elements among them in O(log n) steps(Fact 2) 3. Treating the elements that are inserted between adjacent splitters as subproblems, recur on each sub-problem whose size exceeds log n. For subproblems of size log n or less, invoke LogSort.
83
Sort Fact A CREW PRAM with m processors can sort m elements in O(m) steps.
84
Example Each Processor is assigned an element and compares its element with remaining elements simultaneously in O(m) steps. Rank assigned implies elements are sorted. 4 7 6 5 8 2 3 1 598710342 P1P2P3P4P5P6P7P8 Ranks Assigned
85
Things to remember Last statement of Box Sort algorithm. Idea on the previous slide.
86
Log Sort We will be having log n processors with log n elements then we can sort in O(log n).
87
Analysis Consider each node of the tree as a box. Choosing random splitters and sort them take time of O(log n). Insert the unsorted elements among the splitters takes O(log n). With high probability (assumption) the sub problems resulting from splitting operation are very small(i.e the unsorted elements among the splitters). So each leaf is a box of size at most log n. For calculating the time spent, we can use the Log Sort which sorts the elements in O(log n) Total time is O(log n)
88
DISTRIBUTED RANDOMIZED ALGORITHM Yogesh S Rawat R. Ramanathan
89
CHOICE COORDINATION PROBLEM (CCP)
90
Biological Inspiration mite (genus Myrmoyssus)
91
Biological Inspiration mite (genus Myrmoyssus) reside as parasites on the ear membrane of the moths of family Phaenidae
92
Biological Inspiration mite (genus Myrmoyssus) reside as parasites on the ear membrane of the moths of family Phaenidae Moths are prey to bats and the only defense they have is that they can hear the sonar used by an approaching bat
93
Biological Inspiration if both ears of the moth are infected by the mites, then their ability to detect the sonar is considerably diminished, thereby severely decreasing the survival chances of both the moth and its colony of mites.
94
Biological Inspiration The mites are therefore faced with a "choice coordination problem" How does any collection of mites infecting a particular ear ensure that every other mite chooses the same ear?
95
Problem Specification Set of N processors
96
Problem Specification Set of N processors M options to choose from
97
Problem Specification Set of N processors processors have to reach a consensus on unique choice M options to choose from
98
Model for Communication Collection of M read-write registers accessible to all the processors – Locking mechanism for conflicts Each processor follow a protocol for making a choice – A special symbol (√) is used to mark the choice At the end only one register contains the special symbol
99
Deterministic Solution Complexity is measured in terms of number of read and write operations For a deterministic solution – Complexity in terms of operations : Ω(n 1/3 ) n - Number of processors For more details - M. O. Rabin, “The choice coordination problem,” Acta Informatica, vol. 17, no. 2, pp. 121–134, Jun. 1982.
100
Randomized Solution for any c > 0 It will solve the problem using c operations with a probability of success atleast 1-2 -Ω(c) For simplicity we will consider only the case where n = m = 2 although the protocol can be easily generalized
101
Analogy from Real Life ? ? Random Action - Give way or Move ahead
102
Analogy from Real Life ? ? Random Action - Give way or Move ahead Give way Move Ahead Give way Move Ahead Person 1 Person 2
103
Analogy from Real Life ? ? Random Action - Give way or Move ahead Give way Move Ahead Give way Move Ahead Breaking Symmetry Person 1 Person 2
104
Synchronous CCP The two processors are synchronous Operate in lock-step according to some global clock Used terminology P i – processor i, where i ϵ {0,1} C i – shared register for choices, where i ϵ {0,1} B i – local variable for each processor, where i ϵ {0,1} P0P0 P1P1 B0B0 B1B1 C0C0 C1C1
105
Synchronous CCP P0P0 P1P1 B0B0 B1B1 C0C0 C1C1 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.
106
Synchronous CCP P0P0 P1P1 B0B0 B1B1 C0C0 C1C1 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.
107
Synchronous CCP P0P0 P1P1 B0B0 B1B1 C1C1 C0C0 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.
108
Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1.
109
Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ] » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Read Operation
110
Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Choice has already been made by the other processor
111
Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Only condition for making a choice
112
Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Generate a random value Write operation
113
Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Exchange Registers
114
Correctness of Algorithm We need to prove only one of the shared register has √ marked in it Suppose that both are marked with √ –This must have had in same iteration –Otherwise step 2.1 will halt the algorithm
115
Correctness of Algorithm Let us assume that the error takes place during the t th iteration After step 1 values for processor P i B i (t) and R i (t) By case 2.3 R 0 (t) = B 1 (t) R 1 (t) = B 0 (t) Suppose P i writes √ in the t th iteration, then R i = 0 and B i = 1 and R 1-i = 1 and B 1-i = 0 P 1-i cannot write √ in ith iteration Breaking Symmetry
116
R0B0 00 Read Operation R1B1 00 C0C1 00 Processor 0Shared RegistersProcessor 1
117
R0B0 00 00 Write Operation R1B1 00 00 C0C1 00 00 Processor 0Shared RegistersProcessor 1 Random
118
R0B0 00 00 00 Read Operation R1B1 00 00 00 C0C1 00 00 00 Processor 0Shared RegistersProcessor 1
119
R0B0 00 00 00 01 Write Operation R1B1 00 00 00 01 C0C1 00 00 00 11 Processor 0Shared RegistersProcessor 1 Random
120
R0B0 00 00 00 01 11 Read Operation R1B1 00 00 00 01 11 C0C1 00 00 00 11 11 Processor 0Shared RegistersProcessor 1
121
R0B0 00 00 00 01 11 10 Write Operation R1B1 00 00 00 01 11 11 C0C1 00 00 00 11 11 01 Processor 0Shared RegistersProcessor 1 Random
122
R0B0 00 00 00 01 11 10 10 Read Operation R1B1 00 00 00 01 11 11 01 HALT C0C1 00 00 00 11 11 01 01 Processor 0Shared RegistersProcessor 1
123
R0B0 00 00 00 01 11 10 10 10/1 Write Operation R1B1 00 00 00 01 11 11 01 HALT C0C1 00 00 00 11 11 01 01 √0/1 Processor 0Shared RegistersProcessor 1 Random
124
R0B0 00 00 00 01 11 10 10 10/1 √ HALT Read Operation R1B1 00 00 00 01 11 11 01 HALT C0C1 00 00 00 11 11 01 01 √0/1 √ Processor 0Shared RegistersProcessor 1
125
Complexity Probability that both the random bits B 0 and B 1 are the same is 1/2 Therefore probability that number of steps exceeds t is 1/2 t. The algorithm will terminate in next two steps as soon as B 0 and B 1 are different. Computation cost of each iteration is bounded –Therefore, the protocol does O(t) work with probability 1-1/2 t
126
The Problem C1C1 C2C2 P2P2 P1P1
127
C1C1 C2C2 P2P2 P1P1
128
C1C1 C2C2 P1P1 P2P2
129
C1C1 C2C2 P1P1 P2P2
130
C1C1 C2C2 P1P1 P2P2
131
The processors are not synchronized C1C1 C2C2 P1P1 P2P2
132
What can we do? C1C1 C2C2 P1P1 P2P2
133
Idea: Timestamp C1C1 C2C2 P1P1 P2P2
134
Read P1P1 P2P2 B1B1 B2B2 C1C1 C2C2 T1T1 T2T2 t2t2 t1t1 Timestamp of Processor: T i Timestamp of Register : t i
135
Input: Registers C 1 and C 2 initialized to Output: Exactly one of the two registers has Algorithm
136
0) P i initially scans a randomly chosen register. are initialized to 1)P i gets a lock on its current register and reads 2)P i executes one of these cases: 2.1) If R i = : HALT 2.2) If T i < t i : T i t i and B i R i 2.3) If T i > t i : Write into the current register and HALT 2.4) If T i = t i, R i = 0, B i = 1 : Write into the current register and HALT 2.5) Otherwise: T i T i + 1 and t i t i + 1 B i Random (unbiased) bit Write into the current register 3) P i releases the lock on its current register, moves to the other register and returns to step 1. Algorithm for a process P i
137
Initial state B2B2 T2T2 B1B1 T1T1 C1C1 t1t1 00 C2C2 t2t2 00 Processor P1Register R2Register R1Processor P2
138
1) P 1 chooses C 1 and reads B2B2 T2T2 B1B1 T1T1 00 C1C1 t1t1 00 C2C2 t2t2 00 History : P 1 ==C 1
139
1) P 1 chooses C 1 and reads B2B2 T2T2 B1B1 T1T1 00 C1C1 t1t1 00 C2C2 t2t2 00 [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1
140
B1B1 T1T1 00 1 C1C1 t1t1 00 1 2.5) T 1 T 1 + 1 and t 1 t 1 + 1 B2B2 T2T2 C2C2 t2t2 00 History : P 1 ==C 1
141
2.5) P 1 writes into C 1 B1B1 T1T1 00 11 C1C1 t1t1 00 11 B2B2 T2T2 C2C2 t2t2 00 History : P 1 ==C 1
142
3) P 1 releases the lock on C 1 B1B1 T1T1 00 11 C1C1 t1t1 00 11 B2B2 T2T2 C2C2 t2t2 00 [P 1 moves to C 2 and returns to step 1] History : P 1 ==C 1
143
B2B2 T2T2 00 C2C2 t2t2 00 1) P 2 chooses C 2 and reads B1B1 T1T1 00 11 C1C1 t1t1 00 11 History : P 1 ==C 1 P 2 ==C 2
144
B2B2 T2T2 00 C2C2 t2t2 00 1) P 2 chooses C 2 and reads B1B1 T1T1 00 11 C1C1 t1t1 00 11 [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2
145
B2B2 T2T2 00 1 C2C2 t2t2 00 1 2.5) T 2 T 2 + 1 and t 2 t 2 + 1 B1B1 T1T1 00 11 C1C1 t1t1 00 11 History : P 1 ==C 1 P 2 ==C 2
146
2.5) P 2 writes into C 2 B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 History : P 1 ==C 1 P 2 ==C 2
147
B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 3) P 2 releases the lock on C 2 [P 2 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2
148
B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 1) P 2 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1
149
B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 1) P 2 locks C 1 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1
150
B2B2 T2T2 00 1 1 2 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 2 2.5) T 2 T 2 + 1 and t 1 t 1 + 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1
151
B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 2.5) P 2 writes into C 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1
152
B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 3) P 2 releases the lock on C 1 [P 2 moves to C 2 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1
153
B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2
154
B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads [Case 2.3: T 2 > t 2 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2
155
B2B2 T2T2 00 1 1 02 C2C2 T2T2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 2.3) P 2 writes into C 2 [P 2 HALTS] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2
156
We’ll show another case of the algorithm History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2
157
Let’s go back 1 iteration History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2
158
B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 Let’s go back 1 iteration
159
B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 1 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2
160
B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 1 locks C 2 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2
161
B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 2 B1B1 T1T1 00 11 2 C1C1 t1t1 00 11 02 2.5) T 1 T 1 + 1 and t 2 t 2 + 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2
162
B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.5) B 1 Random (unbiased) bit History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2
163
B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 3) P 1 releases the lock on C 2 [P 1 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2
164
B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2
165
B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2
166
B2B2 T2T2 00 1 1 02 3 C2C2 t2t2 00 11 12 3 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.5) T 2 T 2 + 1 and t 2 t 2 + 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2
167
B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.5) P 2 writes into C 2 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2
168
B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 3) P 2 releases the lock on C 2 [P 2 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2
169
B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 1 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1
170
B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 1 locks C 1 and reads [Case 2.4: T 1 = t 1, R 1 = 0, B 1 = 1 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1
171
B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.4) P 1 writes into C 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1
172
B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.4) P 1 HALTS History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1
173
B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 2 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1
174
B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 2 locks C 1 and reads [Case 2.1: R 1 = is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1
175
B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.1) P 2 HALTS History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1
176
Correctness C1C1 C2C2 P1P1 P2P2
177
When a processor writes on a register, the other processor should NOT write on the other register
178
Correctness Case 2.3) T i > t i : Write into the current register and halt. Case 2.4) T i = t i, R i = 0, B i = 1: Write into the current register and halt. C1C1 C2C2
179
T i * : Current timestamp of processor P i t i * : Current timestamp of register C i Whenever P i finishes an iteration in C i, T i = t i Correctness
180
T i * : Current timestamp of processor P i t i * : Current timestamp of register C i When a processor enters a register, it would have just left the other register Correctness
181
2.3) T i > t i : Write into the current register and HALT Consider P 1 has just entered C 1 with t 1 * < T 1 *
182
2.3) T i > t i : Write into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1
183
2.3) T i > t i : Write into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 In prev iter, P 1 must have left C 2 with same T 1 *
184
2.3) T i > t i : Write into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 *
185
2.3) T i > t i : Write into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 *
186
2.3) T i > t i : Write into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 P 2 must go to C 2 only after C 1 T 1 * ≤ t 2 *
187
2.3) T i > t i : Write into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 2 * ≤ t 1 * T 1 * ≤ t 2 *
188
2.3) T i > t i : Write into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 *
189
2.3) T i > t i : Write into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 *
190
2.3) T i > t i : Write into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * Summing up : T 2 * ≤ t 1 * < T 1 * ≤ t 2 *
191
2.3) T i > t i : Write into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * T 2 * < t 2 * : T 2 cannot write
192
2.4) T i = t i, R i = 0, B i = 1: Write into register and HALT Similarly consider P 1 has entered C 1 with t 1 * = T 1 *
193
2.4) T i = t i, R i = 0, B i = 1: Write into register and HALT C2C2 t2t2 00 11 (t2*)(t2*) B2B2 t2t2 00 01 (T2*)(T2*) B1B1 T1T1 00 1 1(T1*)1(T1*) C1C1 t1t1 00 01(t1*)1(t1*) Similarly consider P 1 has entered C 1 with t 1 * = T 1 * History : P 1 ==C 2 P 2 ==C 1 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * Summing up : T 2 * ≤ t 1 * = T 1 * ≤ t 2 *
194
2.4) T i = t i, R i = 0, B i = 1: Write into register and HALT C2C2 t2t2 00 11 (t2*)(t2*) B2B2 t2t2 00 01 (T2*)(T2*) B1B1 T1T1 00 1 1(T1*)1(T1*) C1C1 t1t1 00 01(t1*)1(t1*) Similarly consider P 1 has entered C 1 with t 1 * = T 1 * History : P 1 ==C 2 P 2 ==C 1 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * T 2 * ≤ t 2 *, R 2 = 1, B 2 = 0 : T 2 cannot write
195
Cost is proportional to the largest timestamp Timestamp can go up only in case 2.5 Processor’s current B i value is set during a visit to the other register So, synchronous case complexity applies Complexity
196
REAL WORLD APPLICATIONS Pham Nam Khanh
197
Applications of parallel sorting Sorting is fundamental algorithm in data processing: » Parallel Database operations: Rank, Join, etc. » Search (rapid index/lookup after sort) Best record in sorting: 102.5 TB in 4,328 seconds using 2100 nodes from Yahoo.
198
Applications of MIS Wireless and communication Scheduling problem Perfect matching => assignment problem Finance
199
Applications of Maximal independent set Market graph EAFE EM Low latency requirement Parallel MIS
200
Applications of Maximal independent set Market graph Stocks Commodities Bonds
201
Applications of Maximal independent set Market graph
202
MIS form completely diversified portfolio, where all instruments are negatively correlated with each other => lower the risk
203
Applications of Choice coordination algorithm Given n processes, each one can choose between m options. They need to agree on unique choice. => belongs to class of distributed consensus algorithms. HW and SW task involving concurrency Clock sync in wireless sensor networks Multivehicle cooperative control
204
Coordinate the movement of multiple vehicles in a certain way to accomplish an objective. Task Assignment, cooperative transport, cooperative role assignment, air traffic control, cooperative timing.
205
CONCLUSION
206
Conclusion PRAM model: CREW Parallel algorithm Maximal Independent Set with O(log n) and applications Parallel sorting algorithm: QuickSort with O(log 2 n) BoxSort with O(log n) Choice Coordination Problem: distributed algorithms for synchronous and asynchronous system + applications
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.