CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.

Slides:



Advertisements
Similar presentations
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Advertisements

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed File Systems.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Distributed Storage March 12, Distributed Storage What is Distributed Storage?  Simple answer: Storage that can be shared throughout a network.
Yanjun Zhao.  A network file system where a single file system can be distributed across several physical computers  allows administrators to group.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 8 -- Spring 2001.
Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE CS582: Distributed Systems Lecture 13, 14 -
Distributed Systems 2006 Styles of Client/Server Computing.
File System Implementation
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
Distributed Resource Management: Distributed Shared Memory
Jeff Chheng Jun Du.  Distributed file system  Designed for scalability, security, and high availability  Descendant of version 2 of Andrew File System.
PRASHANTHI NARAYAN NETTEM.
Naming Names in computer systems are used to share resources, to uniquely identify entities, to refer to locations and so on. An important issue with naming.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
1 DNS,NFS & RPC Rizwan Rehman, CCS, DU. Netprog: DNS and name lookups 2 Hostnames IP Addresses are great for computers –IP address includes information.
B. Prabhakaran 1 Distributed File System File system spread over multiple, autonomous computers. A distributed file system should provide: Network transparency:
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM R. Sandberg, D. Goldberg S. Kleinman, D. Walsh, R. Lyon Sun Microsystems.
Sun NFS Distributed File System Presentation by Jeff Graham and David Larsen.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Distributed File Systems 1 CS502 Spring 2006 Distributed Files Systems CS-502 Operating Systems Spring 2006.
Networked File System CS Introduction to Operating Systems.
Database Design – Lecture 16
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Advanced Operating Systems - Spring 2009 Lecture 21 – Monday April 6 st, 2009 Dan C. Marinescu Office: HEC 439 B. Office.
Distributed File Systems
Distributed File Systems Case Studies: Sprite Coda.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Chapter 20 Distributed File Systems Copyright © 2008.
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Databases Illuminated
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Sun Network File System Presentation 3 Group A4 Sean Hudson, Syeda Taib, Manasi Kapadia.
Distributed File Systems
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
Distributed File Systems Architecture – 11.1 Processes – 11.2 Communication – 11.3 Naming – 11.4.
Distributed File Systems 11.2Process SaiRaj Bharath Yalamanchili.
Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
Case Study -- Sun’s Network File System (NFS) NFS is popular and widely used. NFS was originally designed and implemented by Sun Microsystems for use on.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Distributed File Systems
File System Implementation
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Distributed File Systems
Distributed File Systems
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM
Distributed File Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Chapter 15: File System Internals
Today: Distributed File Systems
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Ch 9 – Distributed Filesystem
Distributed File Systems
Distributed Resource Management: Distributed Shared Memory
Distributed File Systems
Presentation transcript:

CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems

CS-550: Distributed File Systems [SiS]2 Distributed File Systems Definition: Implement a common file system that can be shared by all autonomous computers in a distributed system Goals: Network transparency High availability Architectural options: Fully distributed: files distributed to all sites –Issues: performance, implementation complexity Client-server Model: –Fileserver: dedicated sites storing files perform storage and retrieval operations –Client: rest of the sites use servers to access files

CS-550: Distributed File Systems [SiS]3 Distributed File Systems: Client-Server Architecture

CS-550: Distributed File Systems [SiS]4 Distributed File Systems Services Services provided by the distributed file system: (1) Name Server : Provides mapping (name resolution) the names supplied by clients into objects (files and directories) Takes place when process attempts to access file or directory the first time. (2) Cache manager: Improves performance through file caching Caching at the client - When client references file at server: –Copy of data brought from server to client machine –Subsequent accesses done locally at the client Caching at the server: –File saved in memory to reduce subsequent access time * Issue: different cached copies can become inconsistent. Cache managers (at server and clients) have to provide coordination.

CS-550: Distributed File Systems [SiS]5 Typical Data Access in a Client/File Server Architecture

CS-550: Distributed File Systems [SiS]6 (1)Mounting The mount mechanism binds together several filename spaces (collection of files and directories) into a single hierarchically structured name space (Example: UNIX and its derivatives) A name space ‘A’ can be mounted (bounded) at an internal node (mount point) of a name space ‘B’ Implementation: kernel maintains the mount table, mapping mount points to storage devices Mechanisms used in distributed file systems

CS-550: Distributed File Systems [SiS]7 Mechanisms used in distributed file systems (cont.) (1) Mounting (cont.) Location of mount information a. Mount information maintained at clients –Each client mounts every file system –Different clients may not see the same filename space –If files move to another server, every client needs to update its mount table –Example: SUN NFS b. Mount information maintained at servers –Every client see the same filename space –If files move to another server, mount info at server only needs to change –Example: Sprite File System

CS-550: Distributed File Systems [SiS]8 Mechanisms used in distributed file systems (cont.) (2) Caching –Improves file system performance by exploiting the locality of reference –When client references a remote file, the file is cached in the main memory of the server (server cache) and at the client (client cache) –When multiple clients modify shared (cached) data, cache consistency becomes a problem –It is very difficult to implement a solution that guarantees consistency (3) Hints –Treat the cached data as hints, i.e. cached data may not be completely accurate –Can be used by applications that can discover that the cached data is invalid and can recover Example: –After the name of a file is mapped to an address, that address is stored as a hint in the cache –If the address later fails, it is purged from the cache –The name server is consulted to provide the actual location of the file and the cache is updated

CS-550: Distributed File Systems [SiS]9 Mechanism used in distributed file systems (cont.) (4) Bulk data transfer –Observations: Overhead introduced by protocols does not depend on the amount of data transferred in one transaction Most files are accessed in their entirety –Common practice: when client requests one block of data, multiple consecutive blocks are transferred (5) Encryption –Encryption is needed to provide security in distributed systems –Entities that need to communicate send request to authentication server –Authentication server provides key for conversation

CS-550: Distributed File Systems [SiS]10 Design Issues 1. Naming and name resolution –Terminology Name: each object in a file system (file, directory) has a unique name Name resolution: mapping a name to an object or multiple objects (replication) Name space: collection of names with or without same resolution mechanism –Approaches to naming files in a distributed system (a) Concatenate name of host to names of files on that host –Advantage: unique filenames, simple resolution –Disadvantages: »Conflicts with network transparency »Moving file to another host requires changing its name and the applications using it (b) Mount remote directories onto local directories –Requires that host of remote directory is known –After mounting, files referenced location-transparent (I.e., file name does not reveal its location) (c) Have a single global directory –All files belong to a single name space –Limitation: having unique system wide filenames require a single computing facility or cooperating facilities

CS-550: Distributed File Systems [SiS]11 Design Issues (cont.) 1. Naming and Name Resolution (cont.) –Contexts Solve the problem of system-wide unique names, by partitioning a name space into contexts (geographical, organizational, etc.) Name resolution is done within that context Interpretation may lead to another context File Name = Context + Name local to context –Nameserver Process that maps file names to objects (files, directories) Implementation options –Single name Server »Simple implementation, reliability and performance issues –Several Name Servers (on different hosts) »Each server responsible for a domain »Example: Client requests access to file ‘A/B/C’ Local name server looks up a table (in kernel) Local name server points to a remote server for ‘/B/C’ mapping

CS-550: Distributed File Systems [SiS]12 Design Issues (Cont.) 2. Caching –Caching at the client: Main memory vs. Disk Main memory: (+) Fast, (+) Works for diskless clients, (-) Expensive memory, (-) Complex Virtual Memory Management. Disk: (+) Large files, (+) Simpler Virtual Memory Management (-) Requires local disk. – Cache consistency Server initiated –Server informs cache managers when data in client caches is stale –Client cache managers invalidate stale data or retrieve new data –Disadvantage: extensive communication Client initiated –Cache managers at the clients validate data with server before returning it to clients –Disadvantage: extensive communication Prohibit file caching when concurrent-writing –Several clients open a file, at least one of them for writing –Server informs all clients to purge that cached file Lock files when concurrent-write sharing (at least one client opens for write)

CS-550: Distributed File Systems [SiS]13 Design Issues (Cont.) 3. Writing policy –Question: once a client writes into a file (and the local cache), when should the modified cache be sent to the server? –Options: Write-through: all writes at the clients, immediately transferred to the servers –Advantage: reliability –Disadvantage: performance, it does not take advantage of the cache Delayed writing: delay transfer to servers –Advantages: »Many writes take place (including intermediate results) before a transfer »Some data may be deleted –Disadvantage: reliability Delayed writing until file is closed at client –For short open intervals, same as delayed writing –For long intervals, reliability problems

CS-550: Distributed File Systems [SiS]14 Design Issues (Cont.) 4. Availability –Issue: what is the level of availability of files in a distributed file system? –Resolution: use replication to increase availability, i.e. many copies (replicas) of files are maintained at different sites/servers –Replication issues: How to keep replicas consistent How to detect inconsistency among replicas –Unit of replication File Group of files a) Volume: group of all files of a user or group or all files in a server »Advantage: ease of implementation »Disadvantage: wasteful, user may need only a subset replicated b) Primary pack vs. pack »Primary pack:all files of a user »Pack: subset of primary pack. Can receive a different degree of replication for each pack

CS-550: Distributed File Systems [SiS]15 Design Issues (Cont.) 5. Scalability –Issue: can the design support a growing system? –Example: server-initiated cache invalidation complexity and load grow with size of system. Possible solutions: Do not provide cache invalidation service for read-only files Provide design to allow users to share cached data –Design file servers for scalability: threads, SMPs, clusters 6. Semantics –Expected semantics: a read will return data stored by the latest write –Possible options: All read and writes go through the server –Disadvantage: communication overhead Use of lock mechanism –Disadvantage: file not always available

CS-550: Distributed File Systems [SiS]16 Case Studies: The Sun Network File System (NSF) Developed by Sun Microsystems to provide a distributed file system independent of the hardware and operating system Architecture –Virtual File System (VFS): File system interface that allows NSF to support different file systems –Requests for operation on remote files are routed by VFS to NFS –Requests are sent to the VFS on the remote using The remote procedure call (RPC), and The external data representation (XDR) –VFS on the remote server initiates files system operation locally –Vnode (Virtual Node): There is a network-wide vnode for every object in the file system (file or directory)- equivalent of UNIX inode vnode has a mount table, allowing any node to be a mount node

CS-550: Distributed File Systems [SiS]17 Case Studies: NFS Architecture

CS-550: Distributed File Systems [SiS]18 NFS (Cont.) Naming and location: –Workstations are designated as clients or file servers –A client defines its own private file system by mounting a subdirectory of a remote file system on its local file system –Each client maintains a table which maps the remote file directories to servers –Mapping a filename to an object is done the first time a client references the field. Example: Filename: /A/B/C Assume ‘A’ corresponds to ‘vnode1’ Look up on ‘vnode1/B’ returns ‘vnode2’ for ‘B’ where‘vnode2’ indicates that object is on server ‘X’ Client asks server ‘X’ to lookup ‘vnode2/C’ ‘file handle’ returned to client by server storing that file Client uses ‘file handle’ for all subsequent operation on that file

CS-550: Distributed File Systems [SiS]19 NFS (Cont.) Caching: –Caching done in main memory of clients –Caching done for: file blocks, translation of filenames to vnodes, and attributes of files and directories (1) Caching of file blocks Cached on demand with time stamp of the file (when last modified on the server) Entire file cached, if under certain size, with timestamp when last modified After certain age, blocks have to be validated with server Delayed writing policy: Modified blocks flushed to the server after certain delay (2) Caching of filenames to vnodes for remote directory names Speeds up the lookup procedure (3) Caching of file and directory attributes Updated when new attributes received from the server, discarded after certain time Stateless Server –Servers are stateless File access requests from clients contain all needed information (pointer position, etc) Servers have no record of past requests –Simple recovery from crashes.