Scalable Algorithms for Global Snapshots in Distributed Systems

Name: Scalable Algorithms for Global Snapshots in Distributed Systems
Uploaded: 2017-12-01T13:29:05+00:00
Duration: PTM15S39
Channel: Jalynn Sturman
Description: Scalable Algorithms for Global Snapshots in Distributed Systems

Scalable Algorithms for Global Snapshots in Distributed Systems
Rahul Garg IBM India Research Lab Vijay K. Garg Univ. of Texas at Austin Yogish Sabharwal IBM India Research Lab

Motivation for Global Snapshot
Checkpoint to tolerate faults Take global snapshot periodically On failure, restart from the last checkpoint Global property detection Detecting deadlock, loss-of-a-token etc. Distributed Debugging Inspecting the global state

Global Snapshot Global state Key requirement: Consistency
A set of local states States of channels between processes Messages in transit in the global snapshot Key requirement: Consistency

Consistent and inconsistent cuts
P1 m1 m3 P2 m2 P3 G2 G1 G1 is not consistent G2 is consistent but m3 must be recorded

Model of the System No shared clock No shared memory
Processes communicate using messages Messages are reliable No upper bound on delivery of messages

Checkpoint Classification of Messages
w – white process (pre-recording local state) r – red process (post-recording) e.g. rw – sent by a red process, received by a white process P rw rr ww wr Q A process must be red to receive a red message A white process turns red on receiving a red message Any white message received by a red process must be recorded as in-transit message

Previous Work Chandy and Lamport’s algorithm Mattern’s algorithm
Assumes FIFO channels Requires one message (marker) per channel Marker indicates the end of white messages Mattern’s algorithm Schulz, Bronevetsky et al. Work for non-FIFO channels Require a message that indicates the total number of white messages sent on the channel

Results Algorithm Message Complexity Message Size Space CLM O(N2) O(1)
Grid-based O(N3/2) O(N) Tree-based O(N log N log W/n) Centralized O(N log W/n)

Grid-based Algorithm Idea 1 Idea 2
Previously: send number of white messages/channel This algorithm: the total number of white messages destined to a process Idea 2 Previously: send N messages of size O(1) Now: send N messages of size N

Grid-based Algorithm Algorithm for P(r,c)
[ ] [ ] [ ] whiteSent = Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

[ ] [ ] [ ] Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

Grid-based Algorithm + Algorithm for P(r,c)
For each processor of second row: Count of messages sent to it from processors in third row + [ ] [ ] [ ] [ ] Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

[ ] Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

Grid-based Algorithm + Algorithm for P(r,c)
[ ] [ ] [ ] Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

For each processor of second row: Count of messages sent to it from all processors [ ] Algorithm for P(r,c) Step 1: send row i of matrix to P(r,i) Step 2: compute cumulative count for row c Send this count to P(c,c) Step 3: if (r=c) // diagonal entry Receive count from all processes in the column Send jth entry to P(c,j)

Tree/Centralized Algorithms
Idea Previously: maintain white messages sent for every destination These algorithms: nodes maintain local deficits Local deficit = white messg sent – white messg recvd Total deficit = Sum of all local deficits Distributed Message Counting Problem W in-transit messages destined for N processors Detect when all messages have been received W tokens: a token is consumed when a message is received

Tree/Centralized Algorithms
Distributed Message Counting Algorithm Arrange nodes in suitable data structure Distribute tokens equally to all processors at start w = W/n Each node has a color: Green (Rich) : has more than w/2 tokens Yellow (Debt-free) : has <= w/2 tokens Orange (Poor) : has no tokens and has received a white message

Tree-based Algorithm: High level idea
Arrange nodes as a binary tree Progresses in rounds In each round all the nodes start off rich A token is consumed on receiving a message Debt-free node cannot have a rich child Ensured by transfer of tokens Starting a new round When root is no longer rich  ½ tokens consumed

Tree-based Algorithm Invariants
I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

Tree-based Algorithm - Example
Invariants I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

Violates I1 Swap Request Swap Accept Invariants I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

Invariants I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

Split Request Split Accept Violates I3 Invariants I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

Violates I2 Invariants I1: Yellow process cannot have green child I2: Root is always green I3: Any orange node eventually becomes yellow

Violates I2 Reset Round Recalculate remaining tokens W’ ( <= nw/2 = W/2 ) Start new round with W’ Redistribute tokens equally  All nodes turn Green

Tree-based Algorithm – Analysis
Number of rounds If W < 2n, only O( n ) messages are required Tokens reduce by half in every round # of rounds = O( log W/n ) Number of control messages per round O( log n ) control messages per color change Whenever color changes, some green node turns yellow  O( n ) color changes per round # of control messages per round = O( n log n ) Total control messages = O( n log n log W/n )

Centralized Algorithm
Idea In tree-based algorithm, every color change requires search for a green node to split/swap tokens with Requires O( log n ) control messages Can we find a green node with O(1) control messages? Master node (tail) maintains list of all green nodes Master

Centralized Algorithm - Example
Swap Request Master Swap Accept Swap Request Master

Centralized Algorithm - Example
Split Request Master Split Accept Split Request Master

Centralized Algorithm – Analysis
Number of rounds If W < 2n, only O( n ) messages are required Tokens reduce by half in every round # of rounds = O( log W/n ) Number of control messages per round O( 1 ) control messages per color change Whenever color changes, some green node turns yellow  O( n ) color changes per round # of control messages per round = O( n ) Total control messages = O( n log W/n )

Lower Bound Observation Suppose there are W outstanding tokens
Some process must generate a control message on receiving W/n white messages W/n Send W/n white messages to that processor Remaining tokens = (n-1)W/n Repeat Argument recursively Tokens remaining after i control messages >= ((n-1)/n)i . W # of control messages =  ( n log W/n )

Experimental Results

Conclusions Global Snapshots in distributed systems Open Problem
Distributed Message Counting problem Optimal algorithm Message Complexity O( n log W/n ) Matching lower bound Centralized algorithm Open Problem Decentralized algorithm ?

Thank You Questions?

Scalable Algorithms for Global Snapshots in Distributed Systems

Similar presentations

Presentation on theme: "Scalable Algorithms for Global Snapshots in Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scalable Algorithms for Global Snapshots in Distributed Systems

Similar presentations

Presentation on theme: "Scalable Algorithms for Global Snapshots in Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback