CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.

Slides:



Advertisements
Similar presentations
Chapter 12 Message Ordering. Causal Ordering A single message should not be overtaken by a sequence of messages Stronger than FIFO Example of FIFO but.
Advertisements

Global States.
CS425/CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
CS 542: Topics in Distributed Systems Diganta Goswami.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 4 Instructor: Haifeng YU.
Winter, 2004CSS490 MPI1 CSS490 Group Communication and MPI Textbook Ch3 Instructor: Munehiro Fukuda These slides were compiled from the course textbook,
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
CS542 Topics in Distributed Systems Diganta Goswami.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
1 Causality. 2 The “happens before” relation happens before (causes)
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 7 Instructor: Haifeng YU.
Distributed Systems Spring 2009
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Cloud Computing Concepts
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Computer Science 425 Distributed Systems (Fall 2009) Lecture 5 Multicast Communication Reading: Section 12.4 Klara Nahrstedt.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
Logical Clocks n event ordering, happened-before relation (review) n logical clocks conditions n scalar clocks condition implementation limitation n vector.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
DISTRIBUTED ALGORITHMS By Nancy.A.Lynch Chapter 18 LOGICAL TIME By Sudha Elavarti.
Issues with Clocks. Context The tree correction protocol was based on the idea of local detection and correction. Protocols of this type are complex to.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
Logical Clocks. Topics Logical clocks Totally-Ordered Multicasting Vector timestamps.
Lamport’s Logical Clocks & Totally Ordered Multicasting.
“Virtual Time and Global States of Distributed Systems”
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CIS825 Lecture 2. Model Processors Communication medium.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Hwajung Lee. Why do we need these? Don’t we already know a lot about programming? Well, you need to capture the notions of atomicity, non-determinism,
Building Dependable Distributed Systems, Copyright Wenbing Zhao
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
Logical Clocks event ordering, happened-before relation (review) logical clocks conditions scalar clocks  condition  implementation  limitation vector.
Advanced Operating System
CSE 486/586 Distributed Systems Global States
Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013
COT 5611 Operating Systems Design Principles Spring 2012
Chapter 5 (through section 5.4)
CSE 486/586 Distributed Systems Global States
CSE 486/586 Distributed Systems Reliable Multicast --- 2
COT 5611 Operating Systems Design Principles Spring 2014
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Presentation transcript:

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 22 Review of Last Lecture  Chapter 9 “Global Snapshot”  What is our goal?  Formalizing consistent global snapshot  Protocol for capturing a consistent global snapshot

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 23 Today’s Roadmap  Chapter 12 “Message Ordering”  FIFO ordering  Already discussed last lecture  Causal ordering and its application  Total ordering and its application

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 24 A Internet Chat Room Example Alice (process1) Bob (process2) Cary (process3) Introducing Bob to Cary Hi Bob, I just introduced you to Cary Hi Cary, this is Bob ??? Sorry, I don’t know any Bob. The notion of “causal order”.

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 25 Formalizing Causal Order  If a “send” event s1 caused a “send” event s2  Causal order requires the corresponding receive event r1 be before r2 (r1 and r2 on the same process)  But how do we know whether s1 caused s2?  Based on “Happened-Before” relation  This is the only ordering observable to external users  If a “send” event s1 happens before a “send” event s2  Then s1 may have caused s2  Let’s be pessimistic and be safe -- Assume s1 indeed caused s2  Causal order: If s1 happens before s2, and r1 and r2 are on the same process, then r1 must be before r2

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 26 How to Ensure Causal Ordering – Intuition Alice (process1) Bob (process2) Cary (process3) Message1 A to C Message1 A to B PS: Message1 A to C PS: Message1 A to C Message1 A to B Message1 B to C

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 27 How to Ensure Causal Ordering – Protocol  Each process maintains n*n matrix M  M[i,j] denotes that number of messages sent from i to j, as known by the local process  If process i send a message to process j  On process i: M[i,j]++  Piggyback M on the message  Upon process j receiving a message from process i with matrix M’ piggybacked  If M[k,j]  M’[k,j] for all k except k = i, AND M[i,j] == M’[i,j] -1, then deliver message and M = pairwise-max(M,M’)  Else delay message

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 28 Example Run for the Protocol Alice (process1) Bob (process2) Cary (process3) Message1 B to C (0,0,1) (0,0,0) (0,1,1) (0,0,0) (0,1,1) (0,0,0) can deliver (0,0,0)  (0,1,1) (0,0,1) (0,0,0) can’t deliver (0,0,0) (0,1,1) (0,0,1) (0,0,0)

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 29 Correctness Proof of the Protocol  (The textbook does not have an explicit correctness proof.)  s1, s2, r1, r2, where s1 happened before s2  We want to prove that r1 is never delivered after r2  Consider the case where s1 and s2 are on different processes  W.l.o.g., assume s1 on process 1, s2 on process 2, r1 and r2 on process 3  We focus on the first 3 elements of the 3rd column of the matrices

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 210 Correctness Proof of the Protocol Alice (process1) Bob (process2) Cary (process3) x1 y1 0 x2 y2 0 x2  x1 y2  y1 x3 y3 0 x4 y4 0 x4  x3  x2 y4 > y3  y2 x1 y1 0 x4 y4 0

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 211 Correctness Proof of the Protocol (continued) Cary (process3) x1 y1 0 x4 y4 0 x4  x1 y4 > y1  Prove by contradiction: Assume Red delivered before Blue  After delivering Red: 3rd column of the matrix on process 3 will be  The values in the matrix never decreases: Blue can never be delivered  x4  y4 0

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 212 Need More Correctness Proof  What we proved so far:  If s1 happened before s2, then r1 is never delivered after r2 is delivered  We don’t know if r1 and r2 will be delivered at all!  We will now prove that  At any given time, there must be one message that can be delivered to the process – induction will take care of the other messages

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 213 More Correctness Proof  Consider a process and its corresponding column in the matrix  If there is any message (from process i) undelivered to the process, then there must be messages whose matrix has the column x_1 x_2 … x_n ? x_i + 1 ? call such message as successor messages

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 214 More Correctness Proof  Consider the non-empty set of all successor messages and their corresponding “send” events  There must be a send event s that does not have any other send events that happened before s  Call the corresponding message first successor message  Claim: Any first successor message can be delivered  W.l.o.g, assume the message is from process 1 x_1 x_2 … x_n x_1 + 1 y_2 … y_n The matrix column on the receiver The matrix column in the message need to show x_i  y_i for i between 2 and n W.l.o.g, assume i = 2

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 215 More Correctness Proof  Prove by contradiction: Suppose y_2 > x_2 process1 process2 Receiver process x_1 + 1 y_2 …  x_1 x_2 … The last delivered message from process 2 must be another un-delivered message this is not a first successor message then!

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 216 Summary of Correctness Proof  All messages will eventually be delivered  The delivery order satisfies causal order

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 217 Causal Ordering of Broadcast Messages  Broadcast: Every message is sent to all people (including the sender itself)  Modeled as n point-to-point messages  Application: Internet chat room  For same reason as before, we may need to ensure causal order among messages  Each process uses the previous protocol

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 218 A Simpler Protocol for Causal Ordering Broadcast  Each process maintains a message log for all messages delivered  When sending a message, append the message to log and send the whole log  Upon receiving a log, scan the log sequentially, for any message not seen before: Append the message to local log and deliver it  Can be used for point-to-point as well  But inefficient..

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 219 Total Ordering of Broadcast Messages  All messages delivered to all processes in exactly the same order  Also called atomic broadcast  Total ordering does not imply causal ordering  Causal ordering does not imply total ordering  Thus…there is something called “causal total order”

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 220 Application: Internet Chat Room  We want to assign numbers to people have said  The number have to be consistent across all users [0] Alice: Welcome! [1] Bob: Hello, Alice [2] Alice: Let’s try to prove P=NP [3] Bob: OK, how do we start? …. [1712] Bob: I am confused, what were we trying to prove? [1713] Alice: Please refer to message [2]

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 221 Application: Replicated State Machine  State machine can be an abstract of many things  E.g., A database  Typically deterministic  Replicated state machine  Multiple copies of the same state machine for performance / availability  Each “instruction” is broadcast to all state machines  The state machines must execution the “instruction” in exactly the same order!

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 222 Using a Coordinator for Total Order Broadcast  A special process is assigned as the coordinator  To broadcast a message  Send a message to the coordinator  Coordinator assigns a sequence number to the message  Coordinator forward the message to all processes with the sequence number  Messages delivered according to sequence number order  Problem:  Coordinator is performance bottleneck  What if coordinator fails?

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 223 Skeen’s Algorithm for Total Order Broadcast  Each process maintains  Logical clock and a message buffer for undelivered messages  A message in the buffer is delivered / removed if  All messages in the buffer have been assigned numbers  This message has the smallest number process 1 process 2 process 3 broadcast message acknowledge notify message number put msg in buffer reply with current logical clock value pick the max clock value received as message number

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 224 Correctness Proof for Skeen’s Algorithm  Claim: All messages will be assigned message numbers  Claim: All messages will be delivered  Claim: If message A has a number smaller than B, then B is delivered after A – Prove by contradiction (continue on next slide) process 1 process 2 process 3 broadcast message acknowledge notify message number put msg in buffer reply with current logical clock value pick the max clock value received as message number

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 225 Correctness Proof for Skeen’s Algorithm -- Continued  Suppose A is delivered on process 3 after B.  A must be placed in buffer after B is delivered  A must have a number larger than B – Contradiction. process 3 B placed in buffer B’s number received and B delivered A must be placed in buffer after B delivered reply for A key: Process 3’s logical clock now must be larger than B’s number

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 226 Summary  Chapter 12 “Message Ordering”  FIFO ordering for point-to-point messages  Already discussed last lecture  Causal ordering for point-to-point messages  Applications  Protocol to ensure causal ordering  Causal ordering for broadcast messages  Protocol  Total ordering for broadcast messages  Application  Skeen’s algorithm

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 227 Homework Assignment  Page 206, Problem For each pair (x, y) of the three pairs (i.e., C1 & C2, C1 & C3, C2 & C3)  Prove x implies y and y implies x, OR  Prove x implies y, and a counter example that satisfies y but not x, OR  Prove y implies x, and a counter example that satisfies x but not y, OR  A counter example that satisfies y but not x, and a second counter example that satisfies x but not y  Think about why the protocol on slide 18 will deliver all messages in causal order  For the protocol on slide 22, think about whether the resulting total order satisfy causal order? Why?  Homework due a week from today  Read Chapter 13