Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science Lecture 10, page 1 CS677: Distributed OS Canonical Problems in Distributed Systems Time ordering and clock synchronization Leader election.

Similar presentations


Presentation on theme: "Computer Science Lecture 10, page 1 CS677: Distributed OS Canonical Problems in Distributed Systems Time ordering and clock synchronization Leader election."— Presentation transcript:

1 Computer Science Lecture 10, page 1 CS677: Distributed OS Canonical Problems in Distributed Systems Time ordering and clock synchronization Leader election Mutual exclusion Replication and consistency Fault Tolerance

2 Computer Science Lecture 10, page 2 CS677: Distributed OS Clock Synchronization Time in unambiguous in centralized systems –System clock keeps time, all entities use this for time Distributed systems: each node has own system clock –Crystal-based clocks are less accurate (1 part in million) –Problem: An event that occurred after another may be assigned an earlier time

3 Computer Science Lecture 10, page 3 CS677: Distributed OS Physical Clocks: A Primer Accurate clocks are atomic oscillators (one part in 10 13 ) Most clocks are less accurate (e.g., mechanical watches) –Computers use crystal-based blocks (one part in million) –Results in clock drift How do you tell time? –Use astronomical metrics (solar day) Coordinated universal time (UTC) – international standard based on atomic time –Add leap seconds to be consistent with astronomical time –UTC broadcast on radio (satellite and earth) –Receivers accurate to 0.1 – 10 ms Need to synchronize machines with a master or with one another

4 Computer Science Lecture 10, page 4 CS677: Distributed OS Cristian’s Algorithm Synchronize machines to a time server with a UTC receiver Machine P requests time from server every  seconds –Receives time t from server, P sets clock to t+t reply where t reply is the time to send reply to P –Use (t req +t reply )/2 as an estimate of t reply –Improve accuracy by making a series of measurements

5 Computer Science Lecture 10, page 5 CS677: Distributed OS Berkeley Algorithm Used in systems without UTC receiver –Keep clocks synchronized with one another –One computer is master, other are slaves –Master periodically polls slaves for their times Average times and return differences to slaves Communication delays compensated as in Cristian’s algo –Failure of master => election of a new master

6 Computer Science Lecture 10, page 6 CS677: Distributed OS Berkeley Algorithm a)The time daemon asks all the other machines for their clock values b)The machines answer c)The time daemon tells everyone how to adjust their clock

7 Computer Science Lecture 10, page 7 CS677: Distributed OS Distributed Approaches Both approaches studied thus far are centralized Decentralized algorithms: use resync intervals –Broadcast time at the start of the interval –Collect all other broadcast that arrive in a period S –Use average value of all reported times –Can throw away few highest and lowest values Approaches in use today –rdate: synchronizes a machine with a specified machine –Network Time Protocol (NTP) Uses advanced techniques for accuracies of 1-50 ms

8 Computer Science Lecture 10, page 8 CS677: Distributed OS Logical Clocks For many problems, internal consistency of clocks is important –Absolute time is less important –Use logical clocks Key idea: –Clock synchronization need not be absolute –If two machines do not interact, no need to synchronize them –More importantly, processes need to agree on the order in which events occur rather than the time at which they occurred

9 Computer Science Lecture 10, page 9 CS677: Distributed OS Event Ordering Problem: define a total ordering of all events that occur in a system Events in a single processor machine are totally ordered In a distributed system: –No global clock, local clocks may be unsynchronized –Can not order events on different machines using local times Key idea [Lamport ] –Processes exchange messages –Message must be sent before received –Send/receive used to order events (and synchronize clocks)

10 Computer Science Lecture 10, page 10 CS677: Distributed OS Happened Before Relation If A and B are events in the same process and A executed before B, then A -> B If A represents sending of a message and B is the receipt of this message, then A -> B Relation is transitive: –A -> B and B -> C => A -> C Relation is undefined across processes that do not exhange messages –Partial ordering on events

11 Computer Science Lecture 10, page 11 CS677: Distributed OS Event Ordering Using HB Goal: define the notion of time of an event such that –If A-> B then C(A) < C(B) –If A and B are concurrent, then C(A) C(B) Solution: –Each processor maintains a logical clock LC i –Whenever an event occurs locally at I, LC i = LC i +1 –When i sends message to j, piggyback Lc i –When j receives message from i If LC j < LC i then LC j = LC i +1 else do nothing –Claim: this algorithm meets the above goals

12 Computer Science Lecture 10, page 12 CS677: Distributed OS Lamport’s Logical Clocks

13 Computer Science Lecture 10, page 13 CS677: Distributed OS Causality Lamport’s logical clocks –If A -> B then C(A) < C(B) –Reverse is not true!! Nothing can be said about events by comparing time-stamps! If C(A) < C(B), then ?? Need to maintain causality –Causal delivery:If send(m) -> send(n) => deliver(m) -> deliver(n) –Capture causal relationships between groups of processes –Need a time-stamping mechanism such that: If T(A) < T(B) then A should have causally preceded B

14 Computer Science Lecture 10, page 14 CS677: Distributed OS Vector Clocks Each process i maintains a vector V i –V i [i] : number of events that have occurred at i –V i [j] : number of events I knows have occurred at process j Update vector clocks as follows –Local event: increment V i [I] –Send a message :piggyback entire vector V –Receipt of a message: V j [k] = max( V j [k],V i [k] ) Receiver is told about how many events the sender knows occurred at another process k Also V j [i] = V j [i]+1

15 Computer Science Lecture 10, page 15 CS677: Distributed OS Election Algorithms Many distributed algorithms need one process to act as coordinator –Doesn’t matter which process does the job, just need to pick one Election algorithms: technique to pick a unique coordinator (aka leader election) Examples: take over the role of a failed process, pick a master in Berkeley clock synchronization algorithm Types of election algorithms: Bully and Ring algorithms

16 Computer Science Lecture 10, page 16 CS677: Distributed OS Bully Algorithm Each process has a unique numerical ID Processes know the Ids and address of every other process Communication is assumed reliable Key Idea: select process with highest ID Process initiates election if it just recovered from failure or if coordinator failed 3 message types: election, OK, I won Several processes can initiate an election simultaneously –Need consistent result O(n 2 ) messages required with n processes

17 Computer Science Lecture 10, page 17 CS677: Distributed OS Bully Algorithm Details Any process P can initiate an election P sends Election messages to all process with higher Ids and awaits OK messages If no OK messages, P becomes coordinator and sends I won messages to all process with lower Ids If it receives an OK, it drops out and waits for an I won If a process receives an Election msg, it returns an OK and starts an election If a process receives a I won, it treats sender an coordinator

18 Computer Science Lecture 10, page 18 CS677: Distributed OS Bully Algorithm Example The bully election algorithm Process 4 holds an election Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election

19 Computer Science Lecture 10, page 19 CS677: Distributed OS Bully Algorithm Example d)Process 6 tells 5 to stop e)Process 6 wins and tells everyone

20 Computer Science Lecture 10, page 20 CS677: Distributed OS Ring-based Election Processes have unique Ids and arranged in a logical ring Each process knows its neighbors –Select process with highest ID Begin election if just recovered or coordinator has failed Send Election to closest downstream node that is alive –Sequentially poll each successor until a live node is found Each process tags its ID on the message Initiator picks node with highest ID and sends a coordinator message Multiple elections can be in progress –Wastes network bandwidth but does no harm

21 Computer Science Lecture 10, page 21 CS677: Distributed OS A Ring Algorithm Election algorithm using a ring.

22 Computer Science Lecture 10, page 22 CS677: Distributed OS Comparison Assume n processes and one election in progress Bully algorithm –Worst case: initiator is node with lowest ID Triggers n-2 elections at higher ranked nodes: O(n 2 ) msgs –Best case: immediate election: n-2 messages Ring –2 (n-1) messages always

23 Computer Science Lecture 10, page 23 CS677: Distributed OS Distributed Synchronization Distributed system with multiple processes may need to share data or access shared data structures –Use critical sections with mutual exclusion Single process with multiple threads –Semaphores, locks, monitors How do you do this for multiple processes in a distributed system? –Processes may be running on different machines Solution: lock mechanism for a distributed environment –Can be centralized or distributed

24 Computer Science Lecture 10, page 24 CS677: Distributed OS Centralized Mutual Exclusion Assume processes are numbered One process is elected coordinator (highest ID process) Every process needs to check with coordinator before entering the critical section To obtain exclusive access: send request, await reply To release: send release message Coordinator: –Receive request: if available and queue empty, send grant; if not, queue request –Receive release: remove next request from queue and send grant

25 Computer Science Lecture 10, page 25 CS677: Distributed OS Mutual Exclusion: A Centralized Algorithm a)Process 1 asks the coordinator for permission to enter a critical region. Permission is granted b)Process 2 then asks permission to enter the same critical region. The coordinator does not reply. c)When process 1 exits the critical region, it tells the coordinator, when then replies to 2

26 Computer Science Lecture 10, page 26 CS677: Distributed OS Properties Simulates centralized lock using blocking calls Fair: requests are granted the lock in the order they were received Simple: three messages per use of a critical section (request, grant, release) Shortcomings: –Single point of failure –How do you detect a dead coordinator? A process can not distinguish between “lock in use” from a dead coordinator –No response from coordinator in either case –Performance bottleneck in large distributed systems

27 Computer Science Lecture 10, page 27 CS677: Distributed OS Distributed Algorithm [Ricart and Agrawala]: needs 2(n-1) messages Based on event ordering and time stamps Process k enters critical section as follows – Generate new time stamp TS k = TS k +1 –Send request(k,TS k ) all other n-1 processes –Wait until reply(j) received from all other processes –Enter critical section Upon receiving a request message, process j –Sends reply if no contention –If already in critical section, does not reply, queue request –If wants to enter, compare TS j with TS k and send reply if TS k <TS j, else queue

28 Computer Science Lecture 10, page 28 CS677: Distributed OS A Distributed Algorithm a)Two processes want to enter the same critical region at the same moment. b)Process 0 has the lowest timestamp, so it wins. c)When process 0 is done, it sends an OK also, so 2 can now enter the critical region.

29 Computer Science Lecture 10, page 29 CS677: Distributed OS Properties Fully decentralized N points of failure! All processes are involved in all decisions –Any overloaded process can become a bottleneck

30 Computer Science Lecture 10, page 30 CS677: Distributed OS A Token Ring Algorithm a)An unordered group of processes on a network. b)A logical ring constructed in software. Use a token to arbitrate access to critical section Must wait for token before entering CS Pass the token to neighbor once done or if not interested Detecting token loss in not-trivial

31 Computer Science Lecture 10, page 31 CS677: Distributed OS Comparison A comparison of three mutual exclusion algorithms. Algorithm Messages per entry/exit Delay before entry (in message times) Problems Centralized32Coordinator crash Distributed2 ( n – 1 ) Crash of any process Token ring 1 to  0 to n – 1 Lost token, process crash

32 Computer Science Lecture 10, page 32 CS677: Distributed OS Consistency and Replication Data replication: common technique in distributed systems Reasons for data replication: Reliability –If one replica is unavailable or crashes, use another –Protect against corrupted data Performance –Scale with size of the distributed system (replicated web servers) –Scale in geographically distributed systems (web proxies) Key issue: need to maintain consistency of replicated data –If one copy is modified, others become inconsistent

33 Computer Science Lecture 10, page 33 CS677: Distributed OS Replication and Scaling Replication and caching used for system scalability Multiple copies: –Improves performance by reducing access latency –But higher network overheads of maintaining consistency –Example: object is replicated N times Read frequency R, write frequency W If R<<W, high consistency overhead and wasted messages Consistency maintenance is itself an issue –What semantics to provide? –Tight consistency requires globally synchronized clocks! Solution: loosen consistency requirements –Variety of consistency semantics possible

34 Computer Science Lecture 10, page 34 CS677: Distributed OS Consistency Models Consistency model (aka consistency semantics) –Contract between processes and the data store If processes obey certain rules, data store will work correctly –All models attempt to return the results of the last write for a read operation Differ in how “last” write is determined/defined

35 Computer Science Lecture 10, page 35 CS677: Distributed OS Strict Consistency Any read always returns the result of the most recent write –Implicitly assumes the presence of a global clock –A write is immediately visible to all processes Difficult to achieve in real systems (network delays can be variable)

36 Computer Science Lecture 10, page 36 CS677: Distributed OS Sequential Consistency Sequential consistency: weaker than strict consistency –Assumes all operations are executed in some sequential order and each process issues operations in program order Any valid interleaving is allowed All agree on the same interleaving Each process preserves its program order Nothing is said about “most recent write”

37 Computer Science Lecture 10, page 37 CS677: Distributed OS Sequential Consistency Example P1: x = 1;print ( y, z); P2: y = 1; print (x, z); P3: z = 1;print (x, y); Four valid execution sequences for the processes of the previous slide. The vertical axis is time. x = 1; print ((y, z); y = 1; print (x, z); z = 1; print (x, y); Prints: 001011 Signature: 001011 (a) x = 1; y = 1; print (x,z); print(y, z); z = 1; print (x, y); Prints: 101011 Signature: 101011 (b) y = 1; z = 1; print (x, y); print (x, z); x = 1; print (y, z); Prints: 010111 Signature: 110101 (c) y = 1; x = 1; z = 1; print (x, z); print (y, z); print (x, y); Prints: 111111 Signature: 111111 (d)

38 Computer Science Lecture 10, page 38 CS677: Distributed OS Causal consistency Causally related writes must be seen by all processes in the same order. –Concurrent writes may be seen in different orders on different machines Not permittedPermitted

39 Computer Science Lecture 10, page 39 CS677: Distributed OS FIFO Consistency FIFO consistency: writes from a process are seen by others in the same order. Writes from different processes may be seen in different order (even if causally related) Even FIFO consistency may be too strong! –Requires all writes from a process be seen in order –There are other less restrictive models

40 Computer Science Lecture 10, page 40 CS677: Distributed OS Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated replicas (pull/client caching)

41 Computer Science Lecture 10, page 41 CS677: Distributed OS Web Caching Example of the web to illustrate caching and replication issues –Simpler model: clients are read-only, only server updates data browser Web Proxy cache request response request response Web server browser Web server request response

42 Computer Science Lecture 10, page 42 CS677: Distributed OS Consistency Issues Web pages tend to be updated over time –Some objects are static, others are dynamic –Different update frequencies (few minutes to few weeks) How can a proxy cache maintain consistency of cached data? –Send invalidate or update –Push versus pull

43 Computer Science Lecture 10, page 43 CS677: Distributed OS Push-based Approach Server tracks all proxies that have requested objects If a web page is modified, notify each proxy Notification types –Indicate object has changed [invalidate] –Send new version of object [update] How to decide between invalidate and updates? –Pros and cons? –One approach: send updates for more frequent objects, invalidate for rest proxy Web server push

44 Computer Science Lecture 10, page 44 CS677: Distributed OS Push-based Approaches Advantages –Provide tight consistency [minimal stale data] –Proxies can be passive Disadvantages –Need to maintain state at the server Recall that HTTP is stateless Need mechanisms beyond HTTP –State may need to be maintained indefinitely Not resilient to server crashes

45 Computer Science Lecture 10, page 45 CS677: Distributed OS Pull-based Approaches Proxy is entirely responsible for maintaining consistency Proxy periodically polls the server to see if object has changed –Use if-modified-since HTTP messages Key question: when should a proxy poll? –Server-assigned Time-to-Live (TTL) values No guarantee if the object will change in the interim proxy Web server poll response

46 Computer Science Lecture 10, page 46 CS677: Distributed OS Pull-based Approach: Intelligent Polling Proxy can dynamically determine the refresh interval –Compute based on past observations Start with a conservative refresh interval Increase interval if object has not changed between two successive polls Decrease interval if object is updated between two polls Adaptive: No prior knowledge of object characteristics needed

47 Computer Science Lecture 10, page 47 CS677: Distributed OS Pull-based Approach Advantages –Implementation using HTTP (If-modified-Since) –Server remains stateless –Resilient to both server and proxy failures Disadvantages –Weaker consistency guarantees (objects can change between two polls and proxy will contain stale data until next poll) Strong consistency only if poll before every HTTP response –More sophisticated proxies required –High message overhead

48 Computer Science Lecture 10, page 48 CS677: Distributed OS A Hybrid Approach: Leases Lease: duration of time for which server agrees to notify proxy of modification Issue lease on first request, send notification until expiry –Need to renew lease upon expiry Smooth tradeoff between state and messages exchanged –Zero duration => polling, Infinite leases => server-push Efficiency depends on the lease duration ClientProxy Server Get + lease req Reply + lease read Invalidate/update

49 Computer Science Lecture 10, page 49 CS677: Distributed OS Policies for Leases Duration Age-based lease –Based on bi-modal nature of object lifetimes –Larger the expected lifetime longer the lease Renewal-frequency based –Based on skewed popularity –Proxy at which objects is popular gets longer lease Server load based –Based on adaptively controlling the state space –Shorter leases during heavy load

50 Computer Science Lecture 10, page 50 CS677: Distributed OS Fault Tolerance Basic concepts in fault tolerance Masking failure by redundancy Process resilience

51 Computer Science Lecture 10, page 51 CS677: Distributed OS Motivation Single machine systems –Failures are all or nothing OS crash, disk failures Distributed systems: multiple independent nodes –Partial failures are also possible (some nodes fail) Question: Can we automatically recover from partial failures? –Important issue since probability of failure grows with number of independent components (nodes) in the systems –Prob(failure) = Prob(Any one component fails)=1-P(no failure)

52 Computer Science Lecture 10, page 52 CS677: Distributed OS A Perspective Computing systems are not very reliable –OS crashes frequently (Windows), buggy software, unreliable hardware, software/hardware incompatibilities –Until recently: computer users were “tech savvy” Could depend on users to reboot, troubleshoot problems –Growing popularity of Internet/World Wide Web “Novice” users Need to build more reliable/dependable systems –Example: what is your TV (or car) broke down every day? Users don’t want to “restart” TV or fix it (by opening it up) Need to make computing systems more reliable

53 Computer Science Lecture 10, page 53 CS677: Distributed OS Basic Concepts Need to build dependable systems Requirements for dependable systems –Availability: system should be available for use at any given time 99.999 % availability (five 9s) => very small down times –Reliability: system should run continuously without failure –Safety: temporary failures should not result in a catastrophic Example: computing systems controlling an airplane, nuclear reactor –Maintainability: a failed system should be easy to repair

54 Computer Science Lecture 10, page 54 CS677: Distributed OS Basic Concepts (contd) Fault tolerance: system should provide services despite faults –Transient faults –Intermittent faults –Permanent faults

55 Computer Science Lecture 10, page 55 CS677: Distributed OS Failure Models Different types of failures. Type of failureDescription Crash failureA server halts, but is working correctly until it halts Omission failure Receive omission Send omission A server fails to respond to incoming requests A server fails to receive incoming messages A server fails to send messages Timing failureA server's response lies outside the specified time interval Response failure Value failure State transition failure The server's response is incorrect The value of the response is wrong The server deviates from the correct flow of control Arbitrary failureA server may produce arbitrary responses at arbitrary times

56 Computer Science Lecture 10, page 56 CS677: Distributed OS Failure Masking by Redundancy Triple modular redundancy.

57 Computer Science Lecture 10, page 57 CS677: Distributed OS Process Resilience Handling faulty processes: organize several processes into a group –All processes perform same computation –All messages are sent to all members of the group –Majority need to agree on results of a computation –Ideally want multiple, independent implementations of the application (to prevent identical bugs) Use process groups to organize such processes

58 Computer Science Lecture 10, page 58 CS677: Distributed OS Flat Groups versus Hierarchical Groups Advantages and disadvantages?

59 Computer Science Lecture 10, page 59 CS677: Distributed OS Agreement in Faulty Systems How should processes agree on results of a computation? K-fault tolerant: system can survive k faults and yet function Assume processes fail silently –Need (k+1) redundancy to tolerant k faults Byzantine failures: processes run even if sick –Produce erroneous, random or malicious replies Byzantine failures are most difficult to deal with –Need ? Redundancy to handle Byzantine faults

60 Computer Science Lecture 10, page 60 CS677: Distributed OS Byzantine Faults Simplified scenario: two perfect processes with unreliable channel –Need to reach agreement on a 1 bit message Two army problem: Two armies waiting to attack –Each army coordinates with a messenger –Messenger can be captured by the hostile army –Can generals reach agreement? –Property: Two perfect process can never reach agreement in presence of unreliable channel Byzantine generals problem: Can N generals reach agreement with a perfect channel? –M generals out of N may be traitors

61 Computer Science Lecture 10, page 61 CS677: Distributed OS Byzantine Generals Problem Recursive algorithm by Lamport The Byzantine generals problem for 3 loyal generals and 1 traitor. a)The generals announce their troop strengths (in units of 1 kilosoldiers). b)The vectors that each general assembles based on (a) c)The vectors that each general receives in step 3.

62 Computer Science Lecture 10, page 62 CS677: Distributed OS Byzantine Generals Problem Example The same as in previous slide, except now with 2 loyal generals and one traitor. Property: With m faulty processes, agreement is possible only if 2m+1 processes function correctly [Lamport 82] –Need more than two-thirds processes to function correctly


Download ppt "Computer Science Lecture 10, page 1 CS677: Distributed OS Canonical Problems in Distributed Systems Time ordering and clock synchronization Leader election."

Similar presentations


Ads by Google