Presentation is loading. Please wait.

Presentation is loading. Please wait.

© DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

Similar presentations

Presentation on theme: "© DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,"— Presentation transcript:

1 © DEEDS – OS Distributed Operating Systems

2 © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization, coordination, replication…

3 © DEEDS – OS What is a Distributed System “ A distributed system is the one preventing you from working because of the failure of a machine that you had never heard of ” Leslie Lamport  Multiple computers sharing (same) state and interconnected by a network … collection of autonomous entities appearing to users as a single OS shared memory multiprocessor message passing multicomputer distributed system

4 © DEEDS – OS Distribution: Example Pro/Cons The Good Stuff: Resource Sharing (concurrency  performance), Distributed Access (matching spatial distribution of applications), Scalable, Load Balancing (Migration, Relocation), Fault Tolerance.  Bank account database (DB) example –Naturally centralized: easy consistency and performance –Fragment DB among regions: exploit locality of reference, security & reduce reliance on network for remote access –Replicate each fragment for fault tolerance  But, we now need (additional) DS techniques –Route request to right fragment –Maintain access/consistency of fragments as a whole database –Maintain access/consistency of each fragment’s replicas –…

5 © DEEDS – OS OS’s for DS’s  Loosely-coupled OS –A collection of computers each running their own OS with OS’s allowing sharing of resources across machines –A.K.A. Network Operating System (NOS)  Provides local services to remote clients via remote logging  Data transfer from remote OS to local OS via FTP (File Transfer Protocols)  Tightly-coupled OS –OS tries to maintain single global view of resources it manages –A.K.A. Distributed Operating System (DOS)  “Local access feel” as a non-distributed, standalone OS  Data migration or computation migration modes (entire process or threads)

6 © DEEDS – OS Network Operating Systems (NOS)  Provide an environment where users are (explicitly) aware of the multiplicity of machines.  Users can access remote resources by  logging into the remote machine OR  transferring data from the remote machine to their own machine  Users should know where the required files and directories are and mount them.  Each machine could act like a server and a client at the same time.  E.g NFS from Sun Microsystems, CMU-AFS etc

7 © DEEDS – OS Distributed Operating Systems (DOS)  Runs on a cluster of machines with no shared memory  Users get the feel of a single processor - virtual uni-processor  Transparency is the driving force  Requires  A single global IPC mechanism  Identical process management and system calls at all nodes  Common file system at all nodes  State, services and data consistency

8 © DEEDS – OS Basic Client Server Model for DOS & NOS Non-blocking comm.!!! File based Comm./object based

9 © DEEDS – OS Middleware Can we have the best of both worlds? –Scalability and openness of a NOS –Transparency and common-state of a DOS Solution  additional layer of SW above OS (Middleware) –Mask heterogeneity –Improve distribution transparency (and others)

10 © DEEDS – OS Middleware  Openness Basis Document-based middleware (e.g. WWW) Coordination-based MW (e.g., Linda, publish subscribe, Jini etc.) File system based MW (upload/download, remote access) Shared object based MW

11 © DEEDS – OS Global Access  Transparency  Illusion of a single computer across a DS Distribution transparency: All of above + performance + flexibility (modification, enhancements for kernel/devices), balancing/scheduling, & scaling (allowing systems to expand without disrupting users) + … Fragmentation Hide whether the resource is fragmented or not

12 © DEEDS – OS Reliability, Performance, Scalability  Faults (Fail stop, transient, Byzantine)  Fault Avoidance (de-cluster, rejuvenate)  Fault Tolerance  Redundancy techniques (k failures??)  Distributed control  Fault Detection & Recovery  Atomic transactions  Stateless servers  Acknowledgements and timeout-based retransmissions of messages  Batch if possible  Cache whenever possible  Minimize copying of data  Minimize network traffic  Take advantage of fine-grain parallelism for multiprocessing  Avoid centralized entities  Provides no/limited fault tolerance  Leads to system bottlenecks  Issues of network traffic capabilities with centralized entity  Avoid centralized algorithms  Perform most operations on client workstations

13 © DEEDS – OS Design Issues  Resource management  Hard to obtain consistent information about utilization or availability of resources.  Has be calculated (costly!!) approximately using heuristic methods.  Processor allocation  Load balancing  Hierarchical organization of processors.  If a processor cannot handle a request, ask the parent for help.  …BUT crashing of a higher level processor results in isolation of all processors attached to it.  Process scheduling  Communication dependency, Causality, Linearizability… to consider  Fault tolerance  Consider distribution of control and data.  Services provided  Typical services include name, directory, file, time, etc.

14 © DEEDS – OS Process Addressing ~ N-OS Flavor  Explicit addressing  Send(process_id, message)  Implicit addressing (Functional addressing)  Send_any(service_id, message) Ex: (Berkeley UNIX) -Limited with process migration Link based process addressing Ex: -Overload of locating a process -Intermediate node failure  System-wide unique identifier (Location Transparency)  High level m/c independent and low level m/c dependent -Centralized naming server for high level id (functional)

15 © DEEDS – OS So what services do we need to realize DS? Communication Coordination (Stateful? Stateless?) & Synchronization Replication Failure Handling Consistency Liveness Storage

16 © DEEDS – OS Communication (Group Comm) One to many communication (blocking or non-blocking?)  Multicast/Broadcast  Open group/Closed group  Flexible reliability  The 0-reliable  The 1-reliable  The m-out-of-n reliable  All reliable  Atomic Multicast Many to one communication Many to many Communication  Absolute Ordering (Global clock)  Consistent ordering (Sequencer/ABCAST protocol)  Causal ordering

17 © DEEDS – OS Communication Failure handling  Delivers messages despite –communication link(s) failure –process failures  Main kinds of failures to tolerate –timing (link and process) –omission (link and process) –value  Loss of request message  Loss of response message  Unsuccessful execution of the request (system crash) Inter Process Communication (IPC)  Two message IPC (Request, reply)  Three message reliable IPC (request, reply, ack)  Four message reliable IPC (request, ack, reply, ack) Failure handling  At-least-once (Time out)  Idempotency (no side effects no matter how many times performed)  Nonidempotent (Exactly once semantics) Reply from the cache with unique Id

18 © DEEDS – OS Communication: Reliable Delivery Omission failure tolerance (degree k). Design choices : a)Error masking (spatial): several (> k) links b)Error masking (temporal): repeat K+1 times c)Error recovery: detect error and recover

19 © DEEDS – OS Reliable Delivery (cont.) Error detection and recovery: ACK’s and timeouts Positive ACK: sent when a message is received –Timeout on sender without ACK: sender retransmits Negative ACK: sent when a message loss detected –Needs sequence #s or time-based reception semantics Tradeoffs –Positive ACKs faster failure detection usually –NACKs : fewer msgs… Q: what kind of situations are good for –Spatial error masking? –Temporal error masking? –Error detection and recovery with positive ACKs? –Error detection and recovery with NACKs?

20 © DEEDS – OS Resilience to Sender Failure Multicast FT-Communication harder than point-to-point –Basic problem is of failure detection –Subsets of senders may receive msg, then sender fails Solutions depend on flavor of multicast reliability a)Unreliable: no effort to overcome link failures b)Best-effort: some steps taken to overcome link failures c)Reliable: participants coordinate to ensure that all or none of correct recipients get it

21 © DEEDS – OS Coverage DS Paradigms –DS & OS’s –Services and models –Communication Coordination –Distributed ME –Distributed Coordination

22 © DEEDS – OS Co-ordination protocols in DOS/DS Distributed ME Distributed atomicity Distributed synchronization & ordering  How do we co-ordinate the distributed resources for ME, CS access, consistency etc?

23 © DEEDS – OS Co-ordination in Distributed Systems Event ordering –centralized system: ~ easy (common clock & memory) –distributed system: hard (convergent/consistent dist. time) Example: Unix “make” program –source files/object files  make [compiles & links based on last version] &  re-compile a.c [assuming a common time base] –“make” in a DS? A B slow clock

24 © DEEDS – OS Synchronization  Blocking (send primitive is blocked until receive acknowledgment)  Time out  Nonblocking (send and copy to buffer)  Polling at receive primitive  Interrupt  Synchronous (Send and receive primitives are blocked)  Asynchronous  Distributed Synchronization with failures? DB/Control apps where “order” is essential for consistency

25 © DEEDS – OS ME TSL, Semaphores, monitors… (Single OS) Do they work in DS given timing delays, ordering issues +++?

26 © DEEDS – OS Let’s start with TSL for Multiprocessors TSL no longer atomic but over the bus! CPU #1/#2 TSL RW sequencing? Both CPU #1 & #2 think they have CS access – ME? –Single CPU: disabling interrupts; Multiprocessor? –Is TSL atomic at the distributed/networked level? –  TSL instruction can fail unless bus locking made part of the TSL op.

27 © DEEDS – OS Progressive TSL – Private Locks per CPU * possible to make separate decision each time locked mutex encountered * multiple locks needed to avoid cache thrashing CPU needing a locked mutex just waits for it, either by - polling continuously, polling intermittently, - or attaching itself to a list of waiting CPUs:

28 © DEEDS – OS Distributed Lock Problems p3p3 p2p2 p1p1 LOCK GRANTEDLOCK GRANTED LOCK p4p4 What happens? Solutions?

29 © DEEDS – OS Distributed Mutual Exclusion Lock server solution problems –Server is a single point of failure –Server is performance bottleneck –Failure of client holding lock also causes problems: No unlock sent Similar to garbage collection problems in DSs … validity conditions etc What is the state of the lock server? For stateless servers? Works? Under what assumptions? Solution #1: Build a lock server in a centralized manner (generally simple)

30 © DEEDS – OS Distributed Mutual Exclusion (cont.) Solution #2: Decentralized alg –Replicate state in central server on all processes –Requesting process sends LOCK message to all others –Then waits for LOCK_GRANTED from all –To release critical region, send UNLOCK to all Works? What assumptions?

31 © DEEDS – OS Co-ordination in Distributed Systems 1.Given distributed entities, how does one obtain resource “coordination” to result in an agreed action (such as CS access/ME, shared memory “writes”, producer/consumer modes or decisions)? 2.How are distributed tasks/requests “ordered”? 3.Given distributed resources, how do they “all” agree on a “single” course of action? Asynchronous Co-ordination -2PC, leader elections, etc… Synchronous Co-ordination - clocks, ordering, serialization, etc …

32 © DEEDS – OS Consistency & Distributed State Machines

33 © DEEDS – OS Distributed State Machine Consensus

34 © DEEDS – OS Asynch: “Single” Decision - Commit 2PC: Two Phase Commit Protocols –coordinator (pre-specified or selected dynamically) –multiple secondary sites (“cohorts”) Objective: All nodes agree and execute a single decision [all agree or no action taken...banking transactions]

35 © DEEDS – OS Two – Phase Commit (2PC) Protocol 1. send PREPARE to all ”bounded waiting” receive OK from all put COMMIT in log & send COMMIT to all 4’. receive ABORT  send ABORT to all 5. ACK from all? DONE 2. get msg (PREPARE) 3. if ready, send OK (write undo/redo logs) else, send NOT-OK  receive COMMIT release resources, send ACK 4’  receive ABORT, undo actions Coord. Actions Each Client Actions

36 © DEEDS – OS Two-phase commit (cont.) Problem: coordinator failure, after PREPARE & before COMMIT, blocks participants waiting for decision (a) Three-phase commit overcomes this (b) … slowwwwww –delay final decision until enough processes “know” which decision will be taken Q: can this somehow block?

37 © DEEDS – OS Comments... Time-lag in making decisions – RT applications?? Resources locked till voting/decision is completed Message overhead Reliable common assumptions Possibilities of deadlock/livelock Limited fault tolerance (coordinator dependency) New coordinator initiation per new request

38 © DEEDS – OS Distributed ME (all ACK) To request CS: send REQ msg. M to ALL; enQ M in local Q Upon receiving M from P i –if it does not want (or have) CS, send ACK –if it has CS, enQ request P i : M –if it wants CS, enQ/ACK based on lowest ID (time-stamp would be so much nicer but lack of time  no time basis for timestamps) To release CS: send ACK to all in Q, deQ [diff. from 2PC] To enter CS: enter CS when ACK received from all A C B A C B A C B ACK {8,12} enters CS ACK enters CS {8}{8} {12}

39 © DEEDS – OS DS Solutions: State Machine Replication State machine replication Implements the abstraction of a single reliable server

40 © DEEDS – OS Efficient S tate Machines (with Crashes) Observation: worst-case failures are the exception –Replicas often have an a-priori consistent view (w/o coordination) –There is a correct replica (called leader) known to every replica  “Fast” consensus = short latency Observation: latency optimizations have overhead –Significant latency overhead if assumptions are not met Observation: messages & crypto are expensive  Minimal latency + no crypto + trade latency for mess. complexity

41 © DEEDS – OS Example: Web Applications Ideal setting for applying replication –Exposed to the Internet, strong reliability requirements Application code Front-end server Database WEB SERVER Client

42 © DEEDS – OS Practical Replication client primary backup ORDERAGREECOMMIT REQUESTf+1 REPLIES deliver Seminal work on efficient replication –Optimal resilience –Three phases: Non-optimal –O(n 2 ) message complexity: Non-optimal

43 © DEEDS – OS Motivation [PNUTS, ZooKeeper] [Cassandra] [GFS, Bigtable] [Dynamo] 15K+ commodity servers

44 © DEEDS – OS Internet Datacenters Large Scale  Crashes are the common case to handle Need high performance for ALMOST ALL requests –Example: Dynamo’s SLA specifies worst-case latency for 99,9% of the requests under high load –ALL = also in presence of crashes Need low replication costs –100s to 1,000s of replicated services –Additional replication costs must be multiplied over the number of services –Diagnosis, repair & re-configurations –Speed with unresponsive replicas, e.g. WAN replication Replicas can be located at geographically remote sites Some sites can become temporary unreachable

45 © DEEDS – OS Large Scale DS Goals Consistency (Safety) –Linearizability Availability (Liveness) –Wait-freedom Performance –Latency, throughput,... Despite –Failures –Concurrency –Asynchrony

46 © DEEDS – OS Challenges Crash failures –Detectable –Very popular Byzantine failures –Nodes under adversarial control –Worst-case needs...but costly (# replicas, latency, complexity) Asynchronous communication –Reflects real networks (e.g. WAN)

47 © DEEDS – OS Efficient distributed solutions Main efficiency metrics –Resilience: # of replicas –Latency: # of communication steps –Crypto: use of signatures –Message complexity: # of messages E.g., 3t+1 replicas Client Leader replica Replicas

48 © DEEDS – OS Key advocated abstractions: –Consensus –Distributed Storage State Machine Replication (SMR)  Consensus Reliable Shared Memory  Distributed Storage Advocated Solutions no reply or “bad” reply clients request reply clients Replication request service Practical implementations of these abstractions?

49 © DEEDS – OS Distributed Storage Storage server request “bad” reply or no reply clients Previously Nowadays request reply certificate clients Distributed Storage  Storage is a state machine w/ operations read and write (SWMR/MWMR…)

Download ppt "© DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,"

Similar presentations

Ads by Google