Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.

Similar presentations


Presentation on theme: "University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee."— Presentation transcript:

1 University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee

2 University of Pennsylvania 11/21/00CSE 3802 Remote Files File service vs. file server –File service interface: the specification of what the file system offers to its clients. –File server: a process that runs on some machine and helps implement the file service. File Service Model (Fig 13-1) –upload/download model –remote access model –Comparison between two model The directory service –creating and deleting directories –naming and renaming files –moving files

3 University of Pennsylvania 11/21/00CSE 3803 Goals 1Network transparency: uses do not have to aware the location of files to access them location transparency: the name of a file does not reveal any kind of the file's physical storage location. –/server1/dir1/dir2/X –server1 can be moved anywhere (e.g., from CIS to SEAS). location independence: the name of a file does not need to be changed when the file's physical storage location changes. –The above file X cannot moved to server2 if server1 is full and server2 is no so full. 2High availability: system failures or scheduled activities such as backups, addition of nodes

4 University of Pennsylvania 11/21/00CSE 3804 Architecture Computation model –file severs -- machines dedicated to storing files and performing storage and retrieval operations (for high performance) –clients -- machines used for computational activities may have a local disk for caching remote files Two most important services –name server -- maps user specified names to stored objects, files and directories –cache manager -- to reduce network delay, disk delay problem: inconsistency Typical data access actions –open, close, read, write, etc.

5 University of Pennsylvania 11/21/00CSE 3805 Design Issues Naming and name resolution Semantics of file sharing (Fig 13-4, Fig 13-5) Stateless versus stateful servers (Fig 13-8) Caching -- where to store files (Fig 13-9) Cache consistency (Fig 13-11) Replication (Fig 13-12)

6 University of Pennsylvania 11/21/00CSE 3806 Naming and Name Resolution a name space -- collection of names name resolution -- mapping a name to an object –same or different view of a directory hierarchy (Fig. 13-3) 3 traditional ways to name files in a distributed environment –concatenate the host name to the names of files stored on that host: system-wide uniqueness guaranteed, simple to located a file; however, not network transparent, not location independent, e.g., /machine/usr/foo –mount remote directories onto local directories: once mounted, files can be referenced in a location-transparent manner –provide a single global directory: requires a unique file name for every file, location independent, cannot encompass heterogeneous environments and wide geographical areas

7 University of Pennsylvania 11/21/00CSE 3807 Semantics of File Sharing Consistency Semantics Problem (Fig 13-4): read after write Assume open; reads/writes; close 1UNIX semantics: value read is the value stored by last write Writes to an open file are visible immediately to others that have this file opened at the same time. Easy to implement if one server and no cache. 2Session semantics: Writes to an open file by a user is not visible immediately by other users that have files opened already. Once a file is closed, the changes made by it are visible by sessions started later. 3Immutable-Shared-Files semantics: A sharable file cannot be modified. File names cannot be reused and its contents may not be altered. Simple to implement. 4Transactions: All changes have all-or-nothing property. W1,R1,R2,W2 not allowed where P1 = W1;W2 and P2 = R1;R2

8 University of Pennsylvania 11/21/00CSE 3808 Stateful versus Stateless Service Two approaches to server-side information 1stateful file server –a client performs open on a file –the server keeps file information (e.g., file descriptor entry, offset) –Adv: increased performance –On server crash, it looses all its volatile state information –On client crash, the server needs to know to claim state space 2stateless file server -- each request is self-contained –each request identifies the file, the position, read/write. –server failure is identical to slow server (client retries...) –each request must be idempotent. –NFS employs this.

9 University of Pennsylvania 11/21/00CSE 3809 Caching Four places to store files (Fig. 13-9) server’s disk: slow performance server caching: in main memory –cache management issue, how much to cache, replacement strategy –still slow due to network delay –Used in high-performance web-search engine servers client caching in main memory –can be used by diskless workstation –faster to access from main memory than disk –compete with the virtual memory system for physical memory space –Three options (Fig. 13-10) client-cache on a local disk –large files can be cached –the virtual memory management is simpler –a workstation can function even when it is disconnected from the network

10 University of Pennsylvania 11/21/00CSE 38010 A Comparison of Caching and Remote Service 1reduces remote accesses (esp, when locality is capitalized)  reduces network traffic and server load 2total network overhead is lower for big chunks of data (caching) than a series of responses to specific requests. 3disk access can be optimized better for large requests than random disk blocks 4cache-consistency problem is the major drawback. If there are frequent writes, overhead due to the consistency problem is significant. 5OS is simpler for remote service.

11 University of Pennsylvania 11/21/00CSE 38011 Cache Consistency Reflecting changes to local cache to master copy Reflecting changes to master copy to local caches update Copy 1 Copy 2 Master copy write

12 University of Pennsylvania 11/21/00CSE 38012 Update algorithms for client caching write-through: all writes are carried out immediately –Reliable: little information is lost in the event of a client crash –Slow: cache not that useful delayed-write: delays writing at the server –possible to perform many writes to a block in the cache before it is written –if data is written and then deleted immediately, data need not be written at all (20-30 % of new data is deleted with 30 secs) write-on-close: delay writing until the file is closed at the client –if file is open for short duration, works fine –if file is open for long, susceptible to losing data in the event of client crash

13 University of Pennsylvania 11/21/00CSE 38013 Cache Coherence How to maintain consistency between locally cached data with the master data when the data has been modified by another client? 1Client-initiated approach -- check validity on every access: too much overhead first access to a file (e.g., file open) every fixed time interval 2Server-initiated approach -- server records, for each client, the (parts of) files it caches. After the server detects a potential inconsistency, it reacts. 3Not allow caching when concurrent-write sharing occurs. Allow many readers. If a client opens for writing, inform all the clients to purge their cached data.

14 University of Pennsylvania 11/21/00CSE 38014 Cache consistency, cont. Potential inconsistency: –In session semantics, a client closes a modified file. –In UNIX semantics, the server must be notified whenever a file is opened and the intended mode (read or write mode) must be indicated for every open. –Disable cache when a file is opened in conflicting modes.

15 University of Pennsylvania 11/21/00CSE 38015 Replication Reasons: –Increase reliability –improve availability –balance the servers’ workload how to make replication transparent (Fig. 13-12) how to keep the replicas consistent –Problems -- mainly with updates 1a replica is not updated due to its server failure 2network partitioned Replication Management: 1weighted vote for read and write 2current synchronization site for each file group to control access

16 University of Pennsylvania 11/21/00CSE 38016 Current research issues Scalability Mobile Users disconnected operation low bandwidth communication Security


Download ppt "University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee."

Similar presentations


Ads by Google