Presentation on theme: "Distributed Operating Systems Neeraj Suri www.deeds.informatik.tu-darmstadt.de."— Presentation transcript:
Distributed Operating Systems Neeraj Suri www.deeds.informatik.tu-darmstadt.de
First: Evaluation Forms! Please fill the provided forms for the evaluation of our lecture Your answers are providing us with feedback for improving the lecture/exercises/labs Online: www.fachschaft.informatik.tu-darmstadt.de/feedback Two volunteers should bring the filled forms to Fachschaft Informatik (Raum D120, S2|02) Thanks!
Coverage DS Paradigms –DS & OS’s –Services and models –Communication –File Systems Coordination –Dist. ME –Dist. Co-ordination –Synchronization DS Scheduling & Misc. Issues
What is a Distributed System “ A distributed system is the one preventing you from working because of the failure of a machine that you had never heard of. ” Leslie Lamport Multiple computers sharing (same) state and interconnected by a network
Distribution: Example Pro/Cons All the Good Stuff: High-Perf, Distributed Access, Scalable, Heterogeneous, Sharing (Concurrency), Load Balancing (Migration, Relocation), FT, … Bank account database (DB) example –Naturally centralized: easy consistency and performance –Fragment DB among regions: exploit locality of reference, security & reduce reliance on network for remote access –Replicate each fragment for fault tolerance But, we now need (additional) DS techniques –Route request to right fragment –Maintain access/consistency of fragments as a whole database –Maintain access/consistency of each fragment’s replicas –…
Transparency: Global Access Illusion of a single computer across a DS
Multiprocessor OS Types (1) -Each CPU has its own operating system -Shared bus commn. blocking & CPU idling! Bus
Multiprocessor OS Types (2) Master-Slave multiprocessors Bus -Master is a bottleneck!
Multiprocessor OS Types (3) Symmetric Multiprocessors –SMP multiprocessor model Bus -Eliminates the CPU bottleneck, but have issues associated to ME, synchronization -Mutex on OS?
OS’s for DS’s Loosely-coupled OS –A collection of computers each running their own OS, OS’s allow sharing of resources across machines –AKA Network Operating System (NOS) –Manages heterogeneous multicomputer DS –Difference: provides local services to remote clients via remote logging –Data transfer from remote OS to local OS via FTP (File Transfer Protocols) Tightly-coupled OS –OS tries to maintain single global view of resources it manages –AKA Distributed Operating System (DOS) –Manages multiprocessors & homogeneous multicomputers –Similar “local access feel” as a non-distributed, standalone OS –Data migration or computation migration modes (entire process or threads)
Middleware Can we have the best of both worlds? –Scalability and openness of a NOS –Transparency and relative ease of a DOS Solution: additional layer of SW above NOS –Mask heterogeneity –Improve distribution transparency (and others) “Middleware”
File System-Based Middleware (a) (b) Approach: make a DS look like a big file system Transfer: (a) upload/download model (work done locally) (b) remote access model (work done remotely)
(a) Two file systems (b) Naming Transparency: All clients have same view of FS (c) Some clients with different FS view File System-Based Middleware
Semantics of File sharing (ordering and session semantics) –(a) single processor gives sequential consistency –(b) distributed system may return obsolete value
Shared Object-Based Middleware Main elements of CORBA based system –Common Object Request Broker Architecture Approach: make a DS look like objects (variables + methods) Easy Scaling to large systems replicated objects (C++, Java) flexibility inter-ORB protocol
Shared Object-Based Middleware Internet structured object (Globe)
Shared Object-Based Middleware A distributed shared object in Internet –can have its state copied on multiple computers at once –how to maintain sequential consistency of write operations?
Network Services and Protocols Network Services (blocking) (non-blocking)
Client-Server Communications Unbuffered msg passing – send(addr,msg), recv(addr,msg) – all request and reply at C/S level – all msg. acks between kernels only Buffered msg. passing –msg. sent to kernel mailbox or kernel/user interface socket client server kernel msg. directed at a process client server kernel blocking? non-blocking?
Remote Procedure Calls Synchronous/Asynchronous (blocking/non-blocking) communication –[Sync] client generated request, STUB kernel –[Sync] kernel blocks process till reply received from server –[ASync] buffers msg
RPC & Stubs (Dummy Procedure i.p.o RPC) [C] call “client stub” procedure [CS] prepare msg. buffer [CS] load parameters into buffer [CS] prepare msg. header [CS] send trap to kernel [K] context switch to kernel [K] copy msg. to kernel [K} determine server address (NS) [K} put address in header [K} set up network interface [K] start timer for msg [S] process req; initiate “server stub” [SS] call server [SS] set up parameter stack/unbundle [K] context switch to server stub [K] copy msg. to stub [K] see if stub is waiting [K] decide which stub to assign [K] check packet for validity [K] process interrupt (save PC, kernel state) NS C: Client; CS: Client Stub S: Server; SS: Server Stub
Remote Procedure Call Implementation Issues Can we pass pointers? (local context…) –call by reference becomes copy-restore (but might fail) Weakly typed languages (C) allow computations (say product of arrays sans array size specs) –can client stub determine unspecified size to pass on? Not always possible to determine parameter types Cannot use global variables –C/S may get moved to remote machine
RPC Failures? C/S failure vs. communication failure? Who detects? Timeouts? Does it matter if a node (C/S) failed BEFORE or AFTER a request arrived? BEFORE or AFTER a request is processed? Client failure: orphan requests? add expiration counters Server crash?
Communication Delivers messages despite –communication link(s) failure –process failures Main kinds of failures to tolerate –Timing (link and process) –Omission (link and process) –Value
Reliable Delivery (cont.) Error detection and recovery: ACK’s and timeouts Positive ACK: sent when a message is received –Timeout on sender without ACK: sender retransmits Negative ACK: sent when a message loss detected –Needs sequence #s or time-based reception semantics Tradeoffs –Positive ACKs faster failure detection usually –NACKs : fewer msgs… Q: what kind of situations are good for –Spatial error masking? –Temporal error masking? –Error detection and recovery with positive ACKs? –Error detection and recovery with NACKs?
Resilience to Sender Failure Multicast FT-Communication harder than point-to-point –Basic problem is of failure detection –Subsets of senders may receive msg, then sender fails Solutions depend on flavor of multicast reliability a)Unreliable: no effort to overcome link failures b)Best-effort: some steps taken to overcome link failures c)Reliable: participants coordinate to ensure that all or none of correct recipients get it (sender failed in b)
Services of Distributed OS (NOS) DOS’s (actually NOS) basic services –(Often augmented with other specific services for local applications) –Name service –Registration, Authentication, and Authorization Services –File Service –Load Service –Process Migration Consistency Service –Networking Service –Remote Invocation Service –Time Service & Co-ordination mechanisms –Administration Services: management tasks –…
Distributed-File Systems Multiple users, multiple sites, multiple files & storage Transparency of services (local =‘s remote) –network transparency –location transparency/independence (migration transparency) –name transparency (symbolic pathnames) –universal access from all sites –concurrency (use of resources) transparency Availability/Performance –fault tolerance –security –scalability Sanity of Ops –sequencing of actions –cache/consistency
Typical Model: Client Server (NFS) (a) Upload/Download Model: file resides on FS, moved to client on request (b) Remote Access Model 2: File ops done at client’s local cache client file server work copy initial final 1 3 updated file returned to server storage needs on client/FS? 2 nd client trying to access file consistency of data? ME? client file server file file resides on FS file ops done on FS C/FS comm. via RPC for all file ops data consistency performance? network dependence?
Transparency location transparency (file name does not reveal the file’s physical storage location) –/server X/dir Y/ f.n –server X identified by name, not by physical ID or location –file moves to server Z – transparent? location independence (file name does not need to be changed when the file’s physical storage location changes ) –file accessible by name not by path –(a) /server group name/ –(b) file/dir mounting (exporting): symbolic linkage tree rooted at /home/X is mounted on /rule/home/X point, the users of “rule” can see the X file system as if were a directory under /rule/home/X (the mounting point) access ~/home/X [~ location transparency] –MOUNT Table at client? client access flexibility –MOUNT Table at server? server transparency for updates and consistency
over location transparency/independence, files are still governed by “access control lists” [capability lists & protection matrix] for RWX access over different sites transparency: nice attribute; realistically client makes NFS requests to server; server translates pathnames (trans + slow) speedup file handle client server client server first req (pathname) name path translation create file handle (local file system ID; inode #, gen # of inode) send file handle to client next req file handle only while client makes REQ, if server deleted the file, a stale file handle results (as inode is freed, gen # is void and the 32bit object of file handle mismatches
performance aspects? should NS be different from FS? –if a user requests to read a file, why first find the object and then send a request reply? combine NS/FS as a single entity! use of caches? –in server? better response time –in client? in memory? good for diskless devices, else conflicts with VM about which pages to replace? who manages memory consistency etc in local disk? less expensive –write policy? sequences across clients? –update policy –consistency policy? –FT/security?
Consistency Is locally cached copy of the data consistent with the master copy? Client-initiated approach –Client initiates a validity check –Server checks whether the local data are consistent with the master copy Server-initiated approach –Server records, for each client, the (parts of) files it caches –When server detects a potential inconsistency, it must react
Caching and Remote Service Servers contracted only occasionally in caching (not for each access) –Reduces server load and network traffic –Enhances potential for scalability Total network overhead in transmitting big chunks of data (caching) is lower than a series of responses to specific requests (remote-service) Caching best with infrequent writes –With frequent writes, substantial overhead incurred to overcome cache-consistency problem
Caching and Remote Service (Cont.) Who (co-ordinates) caching in a DFS? Robust? Access control handled by …? What policy? Write-Thru, Delayed write (NFS v2: 3 sec for data block, 30 sec for dir. block, forced write after that), write-on-close (NFS v3+: write file block back to server), sender initiated (RFS: server maintains global client view … not scalable)
Caching & File Replication File replicas reside on failure-independent machines Improves availability and can shorten service time Naming scheme maps a replicated file name to a particular replica –Existence of replicas should be invisible to higher levels –Replicas must be distinguished from one another by different lower-level names BUT: Updates – replicas of a file denote the same logical entity, and thus an update to any replica must be reflected on all other replicas Demand replication – reading a nonlocal replica causes it to be cached locally, thereby generating a new nonprimary replica.
Stateful File Service (RFS, AFS) Server remembers last request! –Client opens a file –Server fetches information about the file from its disk, stores it in its memory, and gives the client a connection identifier unique to the client and the open file –Server maintains request/file status in centralized tables: open/close file, data consistency, file op. ordering etc –Identifier is used for subsequent accesses until the session ends –Server must reclaim the main-memory space used by clients who are no longer active –Performance! Fewer disk accesses, shorter msgs –Stateful server knows if a file was opened for sequential access and can thus read ahead the next blocks, file locking, good cache predictions –caching (client and server) with concurrent write invalidation (state info) –poor FT: server crashes complex state recovery! Restore state by recovery protocol based on a dialog with clients, or abort operations that were underway when the crash occurred Server needs to be aware of client failures in order to reclaim space allocated to record the state of crashed client processes (orphan detection and elimination)
Stateless File Server (NFS) Acts on per-request basis (client sends req to server; server executes req and replies; server “deletes” all info about client post-request) – no state information stored! –Each request identifies the file and position in the file –No need to establish and terminate a connection by open and close operations –+ no server space used for file tables etc –+ FT: server crashes: no state recovery client crashes: no effect on server consistency –- file sharing/locking –- long msg. and network dependency
Distinctions Some environments require stateful service –A server employing server-initiated cache validation cannot provide stateless service, since it maintains a record of which files are cached by which clients –UNIX use of file descriptors and implicit offsets is inherently stateful; servers must maintain tables to map the file descriptors to inodes, and store the current offset within a file
Example: NFS: Stateless Server No file open/close semantic visible to clients (unlike UNIX); file access done using file handles + locking protocol on file for gen # consistency Session semantics: file attributes known to other files only at client/server instance (at time of discrete READ/CLOSE) Consistency processes: 4.2 BSD/NFS delayed write (x secs after write); NFV v3+ write-on-close stateless nice FT for clients –server crash clients hang (NFS server mount or RPC service not responding) –ideally: no dedicated server –reality: dedicated, customized fall-back server lists
ANDREW FS: Stateful Server Very large systems (5000+) Clients and servers structured in clusters interconnected by a backbone LAN A cluster consists of a collection of workstations and a cluster server and is connected to the backbone by a router Key mechanism for remote file operations is whole file caching from servers –Opening a file causes it to be cached, in its entirety, on the local disk –A client workstation interacts with Vice servers only during opening and closing of files –Files modified locally and updated only on CLOSE Reading and writing bytes of a file are done by the kernel without intervention on the cached copy Caches contents of directories and symbolic links, for path-name translation Exceptions to the caching policy are modifications to directories that are made directly on the server responsibility for that directory
ANDREW (Cont.) Clients are presented with a partitioned space of file names: a local name space and a shared name space Dedicated servers, present the shared name space to the clients as an homogeneous, identical, and location transparent file hierarchy The local name space is the root file system of a workstation, from which the shared name space descends Fids are location transparent; therefore, file movements from server to server do not invalidate cached directory contents Location information is kept on a volume basis, and the information is replicated on each server
ANDREW Implementation Client processes are interfaced to a UNIX kernel with the usual set of system calls Venus carries out path-name translation component by component The UNIX file system is used as a low-level storage system for both servers and clients –The client cache is a local directory on the workstation’s disk Server processes access UNIX files directly by their inodes to avoid the expensive path name-to-inode translation routine