Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel and Distributed Algorithms. Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination.

Similar presentations


Presentation on theme: "Parallel and Distributed Algorithms. Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination."— Presentation transcript:

1 Parallel and Distributed Algorithms

2 Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination problem Real world applications

3 INTRODUCTION

4 Need of distributed processing A massively parallel processing machine CPUs with 1000 processors Moore’s law coming to an end

5 Parallel Algorithm A parallel algorithm is an algorithm which can be executed a piece at a time on many different processing devices, and then combined together again at the end to get the correct result.* * Blelloch, Guy E.; Maggs, Bruce M. Parallel Algorithms. USA: School of Computer Science, Carnegie Mellon University.

6 Distributed Algorithm A distributed algorithm is an algorithm designed to run on computer hardware constructed from interconnected processors.* *Lynch, Nancy (1996). Distributed Algorithms. San Francisco, CA: Morgan Kaufmann Publishers. ISBN 978-1-55860-348-6.Morgan Kaufmann PublishersISBN978-1-55860-348-6

7 PRAM

8 Random Access Machine An abstract machine with unbounded number of local memory cells and with simple set of instruction sets Time complexity: number of instructions executed Space complexity: number of memory cells used All operations take Unit time

9 PRAM (Parallel Random Access Machine) PRAM is a parallel version of RAM for designing the algorithms applicable to parallel computers Why PRAM ?  The number of processor execute per one cycle on P processors is at most P  Any processor can read/write any shared memory cell in unit time  It abstracts from overhead which makes the complexity of PRAM algorithm easier  It is a benchmark

10 Read A[i-1], Computation A[i]=A[i-1]+1, Write A[i] Shared Memory (A) P1 P2 Pn A[0] A[1]=A[0]+1 A[2]=A[1]+1 A[n]=A[n-1]+1 A[1] A[2] A[n-1] A[n]

11 Share Memory Access Conflicts Exclusive Read(ER) : all processors can simultaneously read from distinct memory locations Exclusive Write(EW) : all processors can simultaneously write to distinct memory locations Concurrent Read(CR) : all processors can simultaneously read from any memory location Concurrent Write(CW) : all processors can write to any memory location EREW, CREW, CRCW

12 Complexity Parallel time complexity : The number of synchronous steps in the algorithm Space complexity: The number of share memory cell Parallelism: The number of processors used

13 MAXIMAL INDEPENDENT SET Lahiru Samarakoon Sumanaruban Rajadurai

14 14 Independent Set (IS): Any set of nodes that are not adjacent

15 15 Maximal Independent Set (MIS): An independent set that is no subset of any other independent set

16 16 Maximal vs. Maximum IS a maximum independent seta maximal independent set

17 17 A Sequential Greedy algorithm Suppose that will hold the final MIS Initially

18 18 Pick a node and add it to Phase 1:

19 19 Remove and neighbors

20 20 Remove and neighbors

21 21 Pick a node and add it to Phase 2:

22 22 Remove and neighbors

23 23 Remove and neighbors

24 24 Repeat until all nodes are removed Phases 3,4,5,…:

25 25 Repeat until all nodes are removed No remaining nodes Phases 3,4,5,…,x:

26 26 At the end, set will be an MIS of

27 27 Running time of algorithm: Worst case graph: nodes

28 Intuition for parallelization 28 At each phase we may select any independent set (instead of a single node), remove S and neighbors of S from the graph.

29 29 Suppose that will hold the final MIS Initially Example:

30 30 Find any independent set Phase 1: And insert to :

31 31 remove and neighbors

32 32 Phase 2: Find any independent set And insert to : On new graph

33 33 remove and neighbors

34 34 remove and neighbors

35 35 Phase 3: Find any independent set And insert to : On new graph

36 36 remove and neighbors

37 37 No nodes are left remove and neighbors

38 38 Final MIS

39 39 The number of phases depends on the choice of independent set in each phase The larger the independent set at each phase the faster the algorithm Observation:

40 40 1 2 Let be the degree of node Randomized Maximal Independent Set ( MIS )

41 41 Each node elects itself with probability At each phase : 1 2 degree of in Elected nodes are candidates for the independent set

42 42 If two neighbors are elected simultaneously, then the higher degree node wins Example: if

43 43 If both have the same degree, ties are broken arbitrarily Example: if

44 44 Problematic nodes Using previous rules, problematic nodes are removed

45 45 The remaining elected nodes form independent set

46 46 mark lower- degree vertices with higher probability Luby’s algorithm

47 47 Problematic nodes Using previous rules, problematic nodes are removed

48 48 if both end- points of an edge is marked, unmark the one with the lower degree Luby’s algorithm

49 49 The remaining elected nodes form independent set

50 50 remove marked vertices with their neighbors and corresponding edges add all marked vertices to MIS Luby’s algorithm

51 ANALYSIS

52 6 2 Goodness property 3 4 4 3 4 44 A vertex v is good at least ⅓ of its neighbors have lesser degree than it. bad otherwise. An edge is bad if its both endpoints are bad. good otherwise.

53 Lemma 1 Let v Є V be a good vertex with degree d(v) > 0.Then, the probability that some vertex w in N( v) gets marked is at least 1 - exp( -1 / 6). Define L(v) is set of neighbors of v whose degree is lesser than v’s degree. By definition, |L(v)|≥d(v)/3 if v is a GOOD vertex.

54

55 Lemma 2 During any iteration, if a vertex w is marked then it is selected to be in S with probability at least 1/2.

56

57

58 From lemma1 and 2 => The probability that a good vertex belongs to S U N(S) is at least (1 - exp(- 1/6))/2. Good vertices get eliminated with a constant probability. It follows that the expected number of edges eliminated during an iteration is a constant fraction of the current set of edges. This implies that the expected number of iterations of the Parallel MIS algorithm is O(log n).

59 Lemma 3 In a graph G(V,E), the number of good edges is at least |E|/2. Proof Direct the edges in E from the lower degree endpoint to the higher degree end-point, breaking ties arbitrarily. for each bad vertex v For all S, T С V, define the subset of the (oriented) edges E(S, T) as those edges that are directed from vertices in S to vertices in T

60 Let V G and V B be the set of good and bad vertices

61 SORTING ON PRAM Jessica Makucka Puneet Dewan

62 Sorting Current problem: sort n numbers Best average case for sorting is O(nlog n) Can we do better with more processors? YES!

63 Notes about Quicksort Sort n numbers on a PRAM with n processors Assume all numbers are distinct CREW PRAM for this case Each of the n processors contains an input element Notation: Let P i denote ith processor

64 Quicksort Algorithm 0. If n=1 stop 1.Pick a splitter at random from n elements 2.Each processor determines whether its element is bigger or smaller than the splitter 3.Let j denote splitters rank: If j [n/4, 3n/4] means failure, go back to (1) If j [n/4, 3n/4] means success and move splitter to P j Every element smaller than j is moved to distinct processor P i for i j 4.Sort elements recursively in processors P 1 through P j-1, and the elements in processors P j+1 through P n

65 Quicksort Time Analysis Algorithm 1. Pick a successful splitter at random from n elements (assumption) 2. Each processor determines whether its element is bigger or smaller than the splitter Time Analysis of each stage 1. O(logn) stages for every sequence or recursive split 2. Trivial – can be implemented in single CREW PRAM step

66 Quicksort Time Analysis 3.Let j denote splitters rank: If j [n/4, 3n/4] go back to 1. If j [n/4, 3n/4] move splitter to P j Every element smaller than j is moved to distinct processor P i for i j O(log n) PRAM steps needed for the single splitting stage

67 Comparison Splitting Stage (3) P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 12375112114 splitter 011 1 1 10 Assign bit depending on if P i ’s element is smaller or bigger than the splitter - 0 if element is bigger - 1 otherwise

68 Comparison Splitting Stage (3) P1P1 P2P2 P3P3 P4P4 P5P5 1237511 splitter 11 1 0 1 2 3 Step 1: Step 2: ++ +

69 Overall Time Analysis This algorithm would terminate in O(log 2 n) steps Each step is O(log n) for splitting stage O(log n) steps Derived from this solved equation:

70 Cons In this algorithm, There is an assumption that split will always be successful and it will break the problem from N to a constant fraction of N. No Suitable method for successful split.

71 Improvement Idea Reduce the problem into size of n 1-e where e<1 while keeping the time to split the same.

72 Benefits if e=1/2 The total time for the entire problem size will be: log n + log n 1/2 +log n 1/4 +… resulting in O(log n) Then we could hope for an overall running time of O(log n).

73 Long Story Suppose that we have n processors and n elements. Suppose that processors P 1 through P r, contain r of the elements in sorted order, and that processors P r+1 through P n contain the remaining n - r elements. 1.Choose Random Splitters and sort them. Let the sorted elements in the first r processors the splitters. For 1 < =j <= r, let s j denote the j th largest splitter. 2. Insert Insert the n - r unsorted elements among the splitters. 3.Sort remaining elements among splitters a. Each processor should end up with a distinct input element. b. Let i(s j ) denote the index of the processor containing s j following the insertion operation. Then, for all k < i(s j ), processor P k contains an element that is smaller than s j similarly, for all k > i(s j ), processor P k contains an element that is larger than s j.

74 Example Choose Random Splitter 59810761211

75 Example (Contd.) Sort the random splitters. Sorted List Unsorted List 61159871012

76 Example(Contd.) Insert the unsorted elements among the splitters 56798101112

77 Example(Contd.) Check the number of elements between the splitters has size less than or equal to (Log n ) or not. Suppose S represents size S=4 (exceeds log n i.e 3) S=1 S=1 5 6 7 9 8 10 11 12 56798101112

78 Example Contd. Recur on the sub problem whose size exceeds log n. Again choose random splitters and follow the same process Random Splitters 56798101112

79 Partitioning as tree Tree formed from first partition. Now the size on the right exceeds log n, so we again split by choosing random partitions. E.g 9,8 6 5 7 9 8 10 Size on right exceeds log n

80 Contd. Sorted because of partition 6 5 8 9 7 10

81 Lemma’s to be Used 1.A CREW PRAM having (n 2 ) processors. Suppose that each of the processors P 1 through P n has an input element to be sorted. Then the PRAM can sort these n elements in O(log n). 2. For n processors, and n elements of which n 1/2 are splitters, then the insertion process can be completed in O(log n) steps.

82 Box Sort Algorithm : Input: A set of numbers S. Output: The elements of S sorted in increasing order. 1. Select n 1/2 (e is 1/2)elements at random from the n input elements. Using all n processors, sort them in O(log n) steps.(Fact 1) 2. Using the sorted elements from Stage 1 as splitters, insert the remaining elements among them in O(log n) steps(Fact 2) 3. Treating the elements that are inserted between adjacent splitters as subproblems, recur on each sub-problem whose size exceeds log n. For subproblems of size log n or less, invoke LogSort.

83 Sort Fact A CREW PRAM with m processors can sort m elements in O(m) steps.

84 Example Each Processor is assigned an element and compares its element with remaining elements simultaneously in O(m) steps. Rank assigned implies elements are sorted. 4 7 6 5 8 2 3 1 598710342 P1P2P3P4P5P6P7P8 Ranks Assigned

85 Things to remember Last statement of Box Sort algorithm. Idea on the previous slide.

86 Log Sort We will be having log n processors with log n elements then we can sort in O(log n).

87 Analysis Consider each node of the tree as a box. Choosing random splitters and sort them take time of O(log n). Insert the unsorted elements among the splitters takes O(log n). With high probability (assumption) the sub problems resulting from splitting operation are very small(i.e the unsorted elements among the splitters). So each leaf is a box of size at most log n. For calculating the time spent, we can use the Log Sort which sorts the elements in O(log n) Total time is O(log n)

88 DISTRIBUTED RANDOMIZED ALGORITHM Yogesh S Rawat R. Ramanathan

89 CHOICE COORDINATION PROBLEM (CCP)

90 Biological Inspiration mite (genus Myrmoyssus)

91 Biological Inspiration mite (genus Myrmoyssus) reside as parasites on the ear membrane of the moths of family Phaenidae

92 Biological Inspiration mite (genus Myrmoyssus) reside as parasites on the ear membrane of the moths of family Phaenidae Moths are prey to bats and the only defense they have is that they can hear the sonar used by an approaching bat

93 Biological Inspiration if both ears of the moth are infected by the mites, then their ability to detect the sonar is considerably diminished, thereby severely decreasing the survival chances of both the moth and its colony of mites.

94 Biological Inspiration The mites are therefore faced with a "choice coordination problem" How does any collection of mites infecting a particular ear ensure that every other mite chooses the same ear?

95 Problem Specification Set of N processors

96 Problem Specification Set of N processors M options to choose from

97 Problem Specification Set of N processors processors have to reach a consensus on unique choice M options to choose from

98 Model for Communication Collection of M read-write registers accessible to all the processors – Locking mechanism for conflicts Each processor follow a protocol for making a choice – A special symbol (√) is used to mark the choice At the end only one register contains the special symbol

99 Deterministic Solution Complexity is measured in terms of number of read and write operations For a deterministic solution – Complexity in terms of operations : Ω(n 1/3 ) n - Number of processors For more details - M. O. Rabin, “The choice coordination problem,” Acta Informatica, vol. 17, no. 2, pp. 121–134, Jun. 1982.

100 Randomized Solution for any c > 0 It will solve the problem using c operations with a probability of success atleast 1-2 -Ω(c) For simplicity we will consider only the case where n = m = 2 although the protocol can be easily generalized

101 Analogy from Real Life ? ? Random Action - Give way or Move ahead

102 Analogy from Real Life ? ? Random Action - Give way or Move ahead Give way Move Ahead Give way Move Ahead Person 1 Person 2

103 Analogy from Real Life ? ? Random Action - Give way or Move ahead Give way Move Ahead Give way Move Ahead Breaking Symmetry Person 1 Person 2

104 Synchronous CCP The two processors are synchronous Operate in lock-step according to some global clock Used terminology P i – processor i, where i ϵ {0,1} C i – shared register for choices, where i ϵ {0,1} B i – local variable for each processor, where i ϵ {0,1} P0P0 P1P1 B0B0 B1B1 C0C0 C1C1

105 Synchronous CCP P0P0 P1P1 B0B0 B1B1 C0C0 C1C1 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.

106 Synchronous CCP P0P0 P1P1 B0B0 B1B1 C0C0 C1C1 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.

107 Synchronous CCP P0P0 P1P1 B0B0 B1B1 C1C1 C0C0 The processor P i initially scans the register C i Thereafter, the processors exchange registers after every iteration At no time will the two processors scan the same register.

108 Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1.

109 Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ] » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Read Operation

110 Algorithm Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Choice has already been made by the other processor

111 Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Only condition for making a choice

112 Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Generate a random value Write operation

113 Input: Registers C 0 and C 1 initialized to 0 Output: Exactly one of the two registers has the value √ Step 0 - P i is initially scanning the register C i Step 1 - Read the current register and obtain a bit R i Step 2 - Select one of three cases case: 2.1 [R i = √] » halt case: 2.2 [R i = 0, B i = 1 ) » Write √ into the current register and halt case: 2.3 [otherwise] » Assign an unbiased random bit to B i » write B i into the current register Step 3 - P i exchanges its current register with P 1 - i and returns to Step 1. Algorithm Exchange Registers

114 Correctness of Algorithm We need to prove only one of the shared register has √ marked in it Suppose that both are marked with √ –This must have had in same iteration –Otherwise step 2.1 will halt the algorithm

115 Correctness of Algorithm Let us assume that the error takes place during the t th iteration After step 1 values for processor P i B i (t) and R i (t) By case 2.3 R 0 (t) = B 1 (t) R 1 (t) = B 0 (t) Suppose P i writes √ in the t th iteration, then R i = 0 and B i = 1 and R 1-i = 1 and B 1-i = 0 P 1-i cannot write √ in ith iteration Breaking Symmetry

116 R0B0 00 Read Operation R1B1 00 C0C1 00 Processor 0Shared RegistersProcessor 1

117 R0B0 00 00 Write Operation R1B1 00 00 C0C1 00 00 Processor 0Shared RegistersProcessor 1 Random

118 R0B0 00 00 00 Read Operation R1B1 00 00 00 C0C1 00 00 00 Processor 0Shared RegistersProcessor 1

119 R0B0 00 00 00 01 Write Operation R1B1 00 00 00 01 C0C1 00 00 00 11 Processor 0Shared RegistersProcessor 1 Random

120 R0B0 00 00 00 01 11 Read Operation R1B1 00 00 00 01 11 C0C1 00 00 00 11 11 Processor 0Shared RegistersProcessor 1

121 R0B0 00 00 00 01 11 10 Write Operation R1B1 00 00 00 01 11 11 C0C1 00 00 00 11 11 01 Processor 0Shared RegistersProcessor 1 Random

122 R0B0 00 00 00 01 11 10 10 Read Operation R1B1 00 00 00 01 11 11 01 HALT C0C1 00 00 00 11 11 01 01 Processor 0Shared RegistersProcessor 1

123 R0B0 00 00 00 01 11 10 10 10/1 Write Operation R1B1 00 00 00 01 11 11 01 HALT C0C1 00 00 00 11 11 01 01 √0/1 Processor 0Shared RegistersProcessor 1 Random

124 R0B0 00 00 00 01 11 10 10 10/1 √ HALT Read Operation R1B1 00 00 00 01 11 11 01 HALT C0C1 00 00 00 11 11 01 01 √0/1 √ Processor 0Shared RegistersProcessor 1

125 Complexity Probability that both the random bits B 0 and B 1 are the same is 1/2 Therefore probability that number of steps exceeds t is 1/2 t. The algorithm will terminate in next two steps as soon as B 0 and B 1 are different. Computation cost of each iteration is bounded –Therefore, the protocol does O(t) work with probability 1-1/2 t

126 The Problem C1C1 C2C2 P2P2 P1P1

127 C1C1 C2C2 P2P2 P1P1

128 C1C1 C2C2 P1P1 P2P2

129 C1C1 C2C2 P1P1 P2P2

130 C1C1 C2C2 P1P1 P2P2

131 The processors are not synchronized C1C1 C2C2 P1P1 P2P2

132 What can we do? C1C1 C2C2 P1P1 P2P2

133 Idea: Timestamp C1C1 C2C2 P1P1 P2P2

134 Read P1P1 P2P2 B1B1 B2B2 C1C1 C2C2 T1T1 T2T2 t2t2 t1t1 Timestamp of Processor: T i Timestamp of Register : t i

135 Input: Registers C 1 and C 2 initialized to Output: Exactly one of the two registers has  Algorithm

136 0) P i initially scans a randomly chosen register. are initialized to 1)P i gets a lock on its current register and reads 2)P i executes one of these cases: 2.1) If R i =  : HALT 2.2) If T i < t i : T i  t i and B i  R i 2.3) If T i > t i : Write  into the current register and HALT 2.4) If T i = t i, R i = 0, B i = 1 : Write  into the current register and HALT 2.5) Otherwise: T i  T i + 1 and t i  t i + 1 B i  Random (unbiased) bit Write into the current register 3) P i releases the lock on its current register, moves to the other register and returns to step 1. Algorithm for a process P i

137 Initial state B2B2 T2T2 B1B1 T1T1 C1C1 t1t1 00 C2C2 t2t2 00 Processor P1Register R2Register R1Processor P2

138 1) P 1 chooses C 1 and reads B2B2 T2T2 B1B1 T1T1 00 C1C1 t1t1 00 C2C2 t2t2 00 History : P 1 ==C 1

139 1) P 1 chooses C 1 and reads B2B2 T2T2 B1B1 T1T1 00 C1C1 t1t1 00 C2C2 t2t2 00 [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1

140 B1B1 T1T1 00 1 C1C1 t1t1 00 1 2.5) T 1  T 1 + 1 and t 1  t 1 + 1 B2B2 T2T2 C2C2 t2t2 00 History : P 1 ==C 1

141 2.5) P 1 writes into C 1 B1B1 T1T1 00 11 C1C1 t1t1 00 11 B2B2 T2T2 C2C2 t2t2 00 History : P 1 ==C 1

142 3) P 1 releases the lock on C 1 B1B1 T1T1 00 11 C1C1 t1t1 00 11 B2B2 T2T2 C2C2 t2t2 00 [P 1 moves to C 2 and returns to step 1] History : P 1 ==C 1

143 B2B2 T2T2 00 C2C2 t2t2 00 1) P 2 chooses C 2 and reads B1B1 T1T1 00 11 C1C1 t1t1 00 11 History : P 1 ==C 1 P 2 ==C 2

144 B2B2 T2T2 00 C2C2 t2t2 00 1) P 2 chooses C 2 and reads B1B1 T1T1 00 11 C1C1 t1t1 00 11 [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2

145 B2B2 T2T2 00 1 C2C2 t2t2 00 1 2.5) T 2  T 2 + 1 and t 2  t 2 + 1 B1B1 T1T1 00 11 C1C1 t1t1 00 11 History : P 1 ==C 1 P 2 ==C 2

146 2.5) P 2 writes into C 2 B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 History : P 1 ==C 1 P 2 ==C 2

147 B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 3) P 2 releases the lock on C 2 [P 2 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2

148 B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 1) P 2 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

149 B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 1) P 2 locks C 1 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

150 B2B2 T2T2 00 1 1 2 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 2 2.5) T 2  T 2 + 1 and t 1  t 1 + 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

151 B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 2.5) P 2 writes into C 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

152 B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 3) P 2 releases the lock on C 1 [P 2 moves to C 2 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1

153 B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

154 B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads [Case 2.3: T 2 > t 2 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

155 B2B2 T2T2 00 1 1 02 C2C2 T2T2 00 11  B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 2.3) P 2 writes  into C 2 [P 2 HALTS] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

156 We’ll show another case of the algorithm History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

157 Let’s go back 1 iteration History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 2 ==C 2

158 B2B2 T2T2 00 1 1 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 Let’s go back 1 iteration

159 B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 1 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

160 B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 B1B1 T1T1 00 11 C1C1 t1t1 00 11 02 1) P 1 locks C 2 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

161 B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 2 B1B1 T1T1 00 11 2 C1C1 t1t1 00 11 02 2.5) T 1  T 1 + 1 and t 2  t 2 + 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

162 B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.5) B 1  Random (unbiased) bit History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

163 B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 3) P 1 releases the lock on C 2 [P 1 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2

164 B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

165 B2B2 T2T2 00 1 1 02 C2C2 t2t2 00 11 12 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 2 locks C 2 and reads [None of the cases from 2.1 to 2.4 are met. Case 2.5 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

166 B2B2 T2T2 00 1 1 02 3 C2C2 t2t2 00 11 12 3 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.5) T 2  T 2 + 1 and t 2  t 2 + 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

167 B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 2.5) P 2 writes into C 2 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

168 B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 3) P 2 releases the lock on C 2 [P 2 moves to C 1 and returns to step 1] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2

169 B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 1 locks C 1 and reads History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

170 B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02 1) P 1 locks C 1 and reads [Case 2.4: T 1 = t 1, R 1 = 0, B 1 = 1 is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

171 B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02  2.4) P 1 writes  into C 1 History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

172 B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02  2.4) P 1 HALTS History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1

173 B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02  1) P 2 locks C 1 and reads  History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1

174 B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02  1) P 2 locks C 1 and reads  [Case 2.1: R 1 =  is satisfied] History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1

175 B2B2 T2T2 00 1 1 02 13 C2C2 t2t2 00 11 12 13 B1B1 T1T1 00 11 12 C1C1 t1t1 00 11 02  2.1) P 2 HALTS History : P 1 ==C 1 P 2 ==C 2 P 2 ==C 1 P 1 ==C 2 P 2 ==C 2 P 1 ==C 1 P 2 ==C 1

176 Correctness C1C1 C2C2 P1P1  P2P2

177 When a processor writes  on a register, the other processor should NOT write  on the other register

178 Correctness Case 2.3) T i > t i : Write  into the current register and halt. Case 2.4) T i = t i, R i = 0, B i = 1: Write  into the current register and halt. C1C1 C2C2 

179 T i * : Current timestamp of processor P i t i * : Current timestamp of register C i Whenever P i finishes an iteration in C i, T i = t i Correctness

180 T i * : Current timestamp of processor P i t i * : Current timestamp of register C i When a processor enters a register, it would have just left the other register Correctness

181 2.3) T i > t i : Write  into the current register and HALT Consider P 1 has just entered C 1 with t 1 * < T 1 *

182 2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1

183 2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 In prev iter, P 1 must have left C 2 with same T 1 *

184 2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 *

185 2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 *

186 2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 P 2 must go to C 2 only after C 1 T 1 * ≤ t 2 *

187 2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 2 * ≤ t 1 * T 1 * ≤ t 2 *

188 2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 *

189 2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 *

190 2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * Summing up : T 2 * ≤ t 1 * < T 1 * ≤ t 2 *

191 2.3) T i > t i : Write  into the current register and HALT C2C2 t2t2 00 11 02 (t2*)(t2*) B2B2 t2t2 00 11 (T2*)(T2*) B1B1 T1T1 00 1 1 02 (T 1 * ) C1C1 t1t1 00 11 (t 1 * ) Consider P 1 has just entered C 1 with t 1 * < T 1 * History : P 2 ==C 2 P 1 ==C 1 P 1 ==C 2 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * T 2 * < t 2 * : T 2 cannot write 

192 2.4) T i = t i, R i = 0, B i = 1: Write  into register and HALT Similarly consider P 1 has entered C 1 with t 1 * = T 1 *

193 2.4) T i = t i, R i = 0, B i = 1: Write  into register and HALT C2C2 t2t2 00 11 (t2*)(t2*) B2B2 t2t2 00 01 (T2*)(T2*) B1B1 T1T1 00 1 1(T1*)1(T1*) C1C1 t1t1 00 01(t1*)1(t1*) Similarly consider P 1 has entered C 1 with t 1 * = T 1 * History : P 1 ==C 2 P 2 ==C 1 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * Summing up : T 2 * ≤ t 1 * = T 1 * ≤ t 2 *

194 2.4) T i = t i, R i = 0, B i = 1: Write  into register and HALT C2C2 t2t2 00 11 (t2*)(t2*) B2B2 t2t2 00 01 (T2*)(T2*) B1B1 T1T1 00 1 1(T1*)1(T1*) C1C1 t1t1 00 01(t1*)1(t1*) Similarly consider P 1 has entered C 1 with t 1 * = T 1 * History : P 1 ==C 2 P 2 ==C 1 P 1 ==C 1 T 1 * ≤ t 2 * T 2 * ≤ t 1 * T 2 * ≤ t 2 *, R 2 = 1, B 2 = 0 : T 2 cannot write 

195 Cost is proportional to the largest timestamp Timestamp can go up only in case 2.5 Processor’s current B i value is set during a visit to the other register So, synchronous case complexity applies Complexity

196 REAL WORLD APPLICATIONS Pham Nam Khanh

197 Applications of parallel sorting Sorting is fundamental algorithm in data processing: » Parallel Database operations: Rank, Join, etc. » Search (rapid index/lookup after sort) Best record in sorting: 102.5 TB in 4,328 seconds using 2100 nodes from Yahoo.

198 Applications of MIS Wireless and communication Scheduling problem Perfect matching => assignment problem Finance

199 Applications of Maximal independent set Market graph EAFE EM Low latency requirement Parallel MIS

200 Applications of Maximal independent set Market graph Stocks Commodities Bonds

201 Applications of Maximal independent set Market graph

202  MIS form completely diversified portfolio, where all instruments are negatively correlated with each other => lower the risk

203 Applications of Choice coordination algorithm Given n processes, each one can choose between m options. They need to agree on unique choice. => belongs to class of distributed consensus algorithms. HW and SW task involving concurrency Clock sync in wireless sensor networks Multivehicle cooperative control

204 Coordinate the movement of multiple vehicles in a certain way to accomplish an objective. Task Assignment, cooperative transport, cooperative role assignment, air traffic control, cooperative timing.

205 CONCLUSION

206 Conclusion PRAM model: CREW Parallel algorithm Maximal Independent Set with O(log n) and applications Parallel sorting algorithm: QuickSort with O(log 2 n) BoxSort with O(log n) Choice Coordination Problem: distributed algorithms for synchronous and asynchronous system + applications


Download ppt "Parallel and Distributed Algorithms. Overview Parallel Algorithm vs Distributed Algorithm PRAM Maximal Independent Set Sorting using PRAM Choice coordination."

Similar presentations


Ads by Google