Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fault Tolerance CSCI 4780/6780. RPC Semantics in Presence of Failures 5 types of exceptions Client cannot locate server Request to server is lost Server.

Similar presentations


Presentation on theme: "Fault Tolerance CSCI 4780/6780. RPC Semantics in Presence of Failures 5 types of exceptions Client cannot locate server Request to server is lost Server."— Presentation transcript:

1 Fault Tolerance CSCI 4780/6780

2 RPC Semantics in Presence of Failures 5 types of exceptions Client cannot locate server Request to server is lost Server crashes after receiving request Reply message from server is lost Client crashes after sending in request

3 Not Locating Server Causes: –Server might be down –Version mismatch between client and server stubs Possible solutions –Raising exception Relying on programming language for a systems problem Not all languages have exceptions Transparency is compromised

4 Lost Request Messages Easiest to handle Use timers Retransmission on timeout Duplicate detection at server end

5 Server Crashes Server can crash either before executing or after executing (before sending reply) Crash after execution needs to be reported to client Crash before execution can be handled by retransmission Client’s OS cannot distinguish between the two

6 Server Crashes A server in client-server communication a)Normal case b)Crash after execution c)Crash before execution

7 Handling Server Crashes Wait until server reboots and try again –At least once semantics Give up immediately and report failure –At most once semantics Guarantee nothing The need is for exactly once semantics

8 Server and Client Strategies Server strategies –Send completion message before operation –Send completion message after operation Client strategies –Never reissue a request –Always reissue a request –Only reissue request if acknowledgement not received –Only reissue if acknowledgement is received Client never knows the exact sequence of crash Server failures changes RPC fundamentally

9 Server Crash Scenarios M -> P -> C M -> C-> (P) C -> (M -> P) P -> M -> C P -> C -> (M) C -> (P -> M)

10 Server Crashes Different combinations of client and server strategies in the presence of server crashes. ClientServer Strategy M -> PStrategy P -> M Reissue strategyMPCMC(P)C(MP)PMCPC(M)C(PM) AlwaysDUPOK DUP OK NeverOKZERO OK ZERO Only when ACKedDUPOKZERODUPOKZERO Only when not ACKedOKZEROOK DUPOK

11 Lost Reply Messages Timer at client –Client is not sure whether the reply is lost or server is slow Idempotent operations Can all operations be made idempotent? Sequence numbers in requests –Server refuses to perform a duplicate request –Server should maintain state of each client A bit to distinguish duplicates from originals

12 Client Crashes Can lead to orphans Wastages of resources Confusions or reboots Extermination with logging –Maintain logs of RPC calls –Explicit termination of orphans –Logging is expensive –Grand-orphans

13 Client Crashes Reincarnation with epochs –Time is divided into epochs –Broadcast epoch on client reboot –Orphans are killed when a server receives new epoch announcement Gentler re-incarnation –Kill computations whose owners cannot be located Expiration –Time window for completion with explicit extension –Client waits before rebooting


Download ppt "Fault Tolerance CSCI 4780/6780. RPC Semantics in Presence of Failures 5 types of exceptions Client cannot locate server Request to server is lost Server."

Similar presentations


Ads by Google