Download presentation

Presentation is loading. Please wait.

Published byJalynn Sturman Modified over 2 years ago

1
**Scalable Algorithms for Global Snapshots in Distributed Systems**

Rahul Garg IBM India Research Lab Vijay K. Garg Univ. of Texas at Austin Yogish Sabharwal IBM India Research Lab

2
**Motivation for Global Snapshot**

Checkpoint to tolerate faults Take global snapshot periodically On failure, restart from the last checkpoint Global property detection Detecting deadlock, loss-of-a-token etc. Distributed Debugging Inspecting the global state

3
**Global Snapshot Global state Key requirement: Consistency**

A set of local states States of channels between processes Messages in transit in the global snapshot Key requirement: Consistency

4
**Consistent and inconsistent cuts**

P1 m1 m3 P2 m2 P3 G2 G1 G1 is not consistent G2 is consistent but m3 must be recorded

5
**Model of the System No shared clock No shared memory**

Processes communicate using messages Messages are reliable No upper bound on delivery of messages

6
**Checkpoint Classification of Messages**

w – white process (pre-recording local state) r – red process (post-recording) e.g. rw – sent by a red process, received by a white process P rw rr ww wr Q A process must be red to receive a red message A white process turns red on receiving a red message Any white message received by a red process must be recorded as in-transit message

7
**Previous Work Chandy and Lamport’s algorithm Mattern’s algorithm**

Assumes FIFO channels Requires one message (marker) per channel Marker indicates the end of white messages Mattern’s algorithm Schulz, Bronevetsky et al. Work for non-FIFO channels Require a message that indicates the total number of white messages sent on the channel

8
**Results Algorithm Message Complexity Message Size Space CLM O(N2) O(1)**

Grid-based O(N3/2) O(N) Tree-based O(N log N log W/n) Centralized O(N log W/n)

9
**Grid-based Algorithm Idea 1 Idea 2**

Previously: send number of white messages/channel This algorithm: the total number of white messages destined to a process Idea 2 Previously: send N messages of size O(1) Now: send N messages of size N

10
**Grid-based Algorithm Algorithm for P(r,c)**

[ ] [ ] [ ] whiteSent = Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

11
**Grid-based Algorithm Algorithm for P(r,c)**

[ ] [ ] [ ] Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

12
**Grid-based Algorithm + Algorithm for P(r,c)**

For each processor of second row: Count of messages sent to it from processors in third row + [ ] [ ] [ ] [ ] Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

13
**Grid-based Algorithm Algorithm for P(r,c)**

[ ] Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

14
**Grid-based Algorithm + Algorithm for P(r,c)**

[ ] [ ] [ ] Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

15
**Grid-based Algorithm Algorithm for P(r,c)**

For each processor of second row: Count of messages sent to it from all processors [ ] Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

16
**Tree/Centralized Algorithms**

Idea Previously: maintain white messages sent for every destination These algorithms: nodes maintain local deficits Local deficit = white messg sent – white messg recvd Total deficit = Sum of all local deficits Distributed Message Counting Problem W in-transit messages destined for N processors Detect when all messages have been received W tokens: a token is consumed when a message is received

17
**Tree/Centralized Algorithms**

Distributed Message Counting Algorithm Arrange nodes in suitable data structure Distribute tokens equally to all processors at start w = W/n Each node has a color: Green (Rich) : has more than w/2 tokens Yellow (Debt-free) : has <= w/2 tokens Orange (Poor) : has no tokens and has received a white message

18
**Tree-based Algorithm: High level idea**

Arrange nodes as a binary tree Progresses in rounds In each round all the nodes start off rich A token is consumed on receiving a message Debt-free node cannot have a rich child Ensured by transfer of tokens Starting a new round When root is no longer rich ½ tokens consumed

19
**Tree-based Algorithm Invariants**

I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

20
**Tree-based Algorithm - Example**

Invariants I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

21
**Tree-based Algorithm - Example**

Violates I1 Swap Request Swap Accept Invariants I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

22
**Tree-based Algorithm - Example**

Invariants I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

23
**Tree-based Algorithm - Example**

Split Request Split Accept Violates I3 Invariants I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

24
**Tree-based Algorithm - Example**

Violates I2 Invariants I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

25
**Tree-based Algorithm - Example**

Violates I2 Reset Round Recalculate remaining tokens W’ ( <= nw/2 = W/2 ) Start new round with W’ Redistribute tokens equally All nodes turn Green

26
**Tree-based Algorithm – Analysis**

Number of rounds If W < 2n, only O( n ) messages are required Tokens reduce by half in every round # of rounds = O( log W/n ) Number of control messages per round O( log n ) control messages per color change Whenever color changes, some green node turns yellow O( n ) color changes per round # of control messages per round = O( n log n ) Total control messages = O( n log n log W/n )

27
**Centralized Algorithm**

Idea In tree-based algorithm, every color change requires search for a green node to split/swap tokens with Requires O( log n ) control messages Can we find a green node with O(1) control messages? Master node (tail) maintains list of all green nodes Master

28
**Centralized Algorithm - Example**

Swap Request Master Swap Accept Swap Request Master

29
**Centralized Algorithm - Example**

Split Request Master Split Accept Split Request Master

30
**Centralized Algorithm – Analysis**

Number of rounds If W < 2n, only O( n ) messages are required Tokens reduce by half in every round # of rounds = O( log W/n ) Number of control messages per round O( 1 ) control messages per color change Whenever color changes, some green node turns yellow O( n ) color changes per round # of control messages per round = O( n ) Total control messages = O( n log W/n )

31
**Lower Bound Observation Suppose there are W outstanding tokens**

Some process must generate a control message on receiving W/n white messages W/n Send W/n white messages to that processor Remaining tokens = (n-1)W/n Repeat Argument recursively Tokens remaining after i control messages >= ((n-1)/n)i . W # of control messages = ( n log W/n )

32
Experimental Results

33
Experimental Results

34
**Conclusions Global Snapshots in distributed systems Open Problem**

Distributed Message Counting problem Optimal algorithm Message Complexity O( n log W/n ) Matching lower bound Centralized algorithm Open Problem Decentralized algorithm ?

35
Thank You Questions?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google