Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Algorithms for Global Snapshots in Distributed Systems Rahul Garg IBM India Research Lab Vijay K. Garg Univ. of Texas at Austin Yogish SabharwalIBM.

Similar presentations


Presentation on theme: "Scalable Algorithms for Global Snapshots in Distributed Systems Rahul Garg IBM India Research Lab Vijay K. Garg Univ. of Texas at Austin Yogish SabharwalIBM."— Presentation transcript:

1 Scalable Algorithms for Global Snapshots in Distributed Systems Rahul Garg IBM India Research Lab Vijay K. Garg Univ. of Texas at Austin Yogish SabharwalIBM India Research Lab

2 Motivation for Global Snapshot Checkpoint to tolerate faults  Take global snapshot periodically  On failure, restart from the last checkpoint Global property detection  Detecting deadlock, loss-of-a-token etc. Distributed Debugging  Inspecting the global state

3 Global Snapshot Global state  A set of local states  States of channels between processes Messages in transit in the global snapshot Key requirement: Consistency

4 Consistent and inconsistent cuts G 1 is not consistent G 2 is consistent but m 3 must be recorded P1P1 P2P2 P3P3 m1m1 m2m2 m3m3 G1G1 G2G2

5 Model of the System No shared clock No shared memory Processes communicate using messages Messages are reliable No upper bound on delivery of messages

6 Checkpoint A process must be red to receive a red message  A white process turns red on receiving a red message Any white message received by a red process must be recorded as in-transit message P Q wr rr rw ww Classification of Messages  w – white process (pre-recording local state)  r – red process (post-recording)  e.g. rw – sent by a red process, received by a white process

7 Previous Work Chandy and Lamport’s algorithm  Assumes FIFO channels  Requires one message (marker) per channel Marker indicates the end of white messages Mattern’s algorithm Schulz, Bronevetsky et al.  Work for non-FIFO channels  Require a message that indicates the total number of white messages sent on the channel

8 Results AlgorithmMessage Complexity Message Size Space CLMO(N 2 )O(1)O(N) Grid-basedO(N 3/2 )O(  N)O(N) Tree-basedO(N log N log W/n)O(1) CentralizedO(N log W/n)O(1)

9 Grid-based Algorithm Idea 1  Previously: send number of white messages/channel  This algorithm: the total number of white messages destined to a process Idea 2  Previously: send N messages of size O(1)  Now: send  N messages of size  N

10 Grid-based Algorithm Algorithm for P(r,c)  Step 1: send row i of matrix to P(r,i)  Step 2: compute cumulative count for row c Send this count to P(c,c)  Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j) whiteSent = [ ][ ][ ]

11 Grid-based Algorithm Algorithm for P(r,c)  Step 1: send row i of matrix to P(r,i)  Step 2: compute cumulative count for row c Send this count to P(c,c)  Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j) [ ] [ ] [ ]

12 + Grid-based Algorithm Algorithm for P(r,c)  Step 1: send row i of matrix to P(r,i)  Step 2: compute cumulative count for row c Send this count to P(c,c)  Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j) For each processor of second row: Count of messages sent to it from processors in third row [ ] [ ] [ ] [ ]

13 Grid-based Algorithm Algorithm for P(r,c)  Step 1: send row i of matrix to P(r,i)  Step 2: compute cumulative count for row c Send this count to P(c,c)  Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j) [ ]

14 + Grid-based Algorithm Algorithm for P(r,c)  Step 1: send row i of matrix to P(r,i)  Step 2: compute cumulative count for row c Send this count to P(c,c)  Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j) [ ] [ ] [ ]

15 Grid-based Algorithm Algorithm for P(r,c)  Step 1: send row i of matrix to P(r,i)  Step 2: compute cumulative count for row c Send this count to P(c,c)  Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j) For each processor of second row: Count of messages sent to it from all processors [ ]

16 Tree/Centralized Algorithms Idea  Previously: maintain white messages sent for every destination  These algorithms: nodes maintain local deficits Local deficit = white messg sent – white messg recvd  Total deficit = Sum of all local deficits Distributed Message Counting Problem  W in-transit messages destined for N processors  Detect when all messages have been received  W tokens: a token is consumed when a message is received

17 Tree/Centralized Algorithms Distributed Message Counting Algorithm  Arrange nodes in suitable data structure  Distribute tokens equally to all processors at start w = W/n  Each node has a color: Green (Rich): has more than w/2 tokens Yellow (Debt-free): has <= w/2 tokens Orange (Poor) : has no tokens and has received a white message

18 Tree-based Algorithm: High level idea Arrange nodes as a binary tree Progresses in rounds  In each round all the nodes start off rich  A token is consumed on receiving a message  Debt-free node cannot have a rich child Ensured by transfer of tokens Starting a new round  When root is no longer rich  ½ tokens consumed

19 Tree-based Algorithm Invariants  I1: Yellow process cannot have green child  I2: Root is always green  I3: Any orange node eventually becomes yellow I2 I1

20 Tree-based Algorithm - Example Invariants  I1: Yellow process cannot have green child  I2: Root is always green  I3: Any orange node eventually becomes yellow

21 Tree-based Algorithm - Example Invariants  I1: Yellow process cannot have green child  I2: Root is always green  I3: Any orange node eventually becomes yellow Violates I1 Swap Request Swap Accept

22 Tree-based Algorithm - Example Invariants  I1: Yellow process cannot have green child  I2: Root is always green  I3: Any orange node eventually becomes yellow

23 Tree-based Algorithm - Example Invariants  I1: Yellow process cannot have green child  I2: Root is always green  I3: Any orange node eventually becomes yellow Split Request Split Accept Violates I3

24 Tree-based Algorithm - Example Invariants  I1: Yellow process cannot have green child  I2: Root is always green  I3: Any orange node eventually becomes yellow Violates I2

25 Tree-based Algorithm - Example Reset Round  Recalculate remaining tokens W’ ( <= nw/2 = W/2 )  Start new round with W’  Redistribute tokens equally  All nodes turn Green Violates I2

26 Tree-based Algorithm – Analysis Number of rounds  If W < 2n, only O( n ) messages are required  Tokens reduce by half in every round  # of rounds = O( log W/n ) Number of control messages per round  O( log n ) control messages per color change  Whenever color changes, some green node turns yellow  O( n ) color changes per round  # of control messages per round = O( n log n ) Total control messages = O( n log n log W/n )

27 Centralized Algorithm Idea  In tree-based algorithm, every color change requires search for a green node to split/swap tokens with Requires O( log n ) control messages  Can we find a green node with O(1) control messages?  Master node (tail) maintains list of all green nodes Master

28 Centralized Algorithm - Example Master Swap Request Swap Accept Master

29 Centralized Algorithm - Example Master Split Request Split Accept Master

30 Centralized Algorithm – Analysis Number of rounds  If W < 2n, only O( n ) messages are required  Tokens reduce by half in every round  # of rounds = O( log W/n ) Number of control messages per round  O( 1 ) control messages per color change  Whenever color changes, some green node turns yellow  O( n ) color changes per round  # of control messages per round = O( n ) Total control messages = O( n log W/n )

31 Lower Bound Observation  Suppose there are W outstanding tokens  Some process must generate a control message on receiving W/n white messages W/n  Send W/n white messages to that processor  Remaining tokens = (n-1)W/n  Repeat Argument recursively Tokens remaining after i control messages >= ((n-1)/n) i. W  # of control messages =  ( n log W/n )

32 Experimental Results

33

34 Conclusions Global Snapshots in distributed systems  Distributed Message Counting problem  Optimal algorithm Message Complexity O( n log W/n ) Matching lower bound Centralized algorithm Open Problem  Decentralized algorithm ?

35 Thank You Questions?


Download ppt "Scalable Algorithms for Global Snapshots in Distributed Systems Rahul Garg IBM India Research Lab Vijay K. Garg Univ. of Texas at Austin Yogish SabharwalIBM."

Similar presentations


Ads by Google