Download presentation
Presentation is loading. Please wait.
Published byGwen Gordon Modified over 9 years ago
1
Distributed File Systems Overview A file system is an abstract data type – an abstraction of a storage device. A distributed file system is available to processes in a distributed system, often based on a central server model (e.g., Network File System, NFS).
2
Desirable features 1.transparency structure access naming replication 2.user mobility 3.performance 4.simplicity and ease of use
3
Desirable features 5.scalability 6.high availability 7.high reliability 8.data integrity 9.security 10.heterogeneity
4
File characteristics Files can be unstructured or structured. Files can be mutable or immutable. They can be changed or not. If a file is immutable then only a new version can be saved. When the file is accessed, every access can go back to the file server, or the data can be cached (at the client, at the server, at another client in the system).
5
Unit of transfer The unit of data transfer when a request is satisfied by a server can be a whole file a number of blocks of a file a specific number of bytes a number of records
6
File sharing semantics 1.Unix semantics Enforce an absolute time ordering on all operations and ensure that every read operation on a file sees the effects of all previous write operations immediately. This is common for single processor systems, but difficult to achieve for distributed file systems.
7
File sharing semantics, cont. 2.Session semantics Open a file, perform read/write operations, close file, and a session is defined to be the series of file accesses between the open and the close. Questions arise about how to handle reads/writes by multiple processes. The file-level transfer model should be used.
8
File sharing semantics, cont. 3.Immutable shared-file semantics A file cannot be modified, so the problem of when to make writes visible doesn’t exist.
9
File sharing semantics, cont. 4.Transaction-like semantics Begin transaction, perform operations, end transaction. The final file content is the same as if all transactions were run in some sequential order.
10
File caching schemes Decisions must be made about: cache location modification propagation cache validation
11
Cache location choices 1.Server’s main memory Cost of cache miss == cost to access disk plus network transfer Cost of cache hit == network transfer only. Multiple accesses can be easily synchronized to support Unix-like file sharing semantics.
12
Cache location choices, cont. 2.Client’s disk Cost of cache hit == time to access from local disk. This may not be faster than getting the data from the server’s memory with today’s fast networks! (Or, it could be, with a very fast disk subsystem…) Advantages: lots of storage But, doesn’t work on diskless workstations Have to worry about cache consistency…
13
Cache location choices, cont. 2.Client’s main memory Very fast, not very reliable. If the client crashes partial updates are lost (not the usual expectations of secondary storage). Also, memory may not be large enough.
14
Modification propagation When should modifications to cached data be written back to the server? How should the validity of cached data be determined? 1.Write through 2.Delayed write Write on ejection from cache Periodic write Write on close
15
Delayed write Delayed write can help performance: Write accesses by the client complete more quickly because only the writing client waits, no contention Modified data may be deleted before it actually has to be written to the server Gathering of all data and sending to the server in a single operation is more efficient for the disk and for the network
16
Cache validation schemes How do I tell if the cache entry is valid or not? 1. Client-initiated approach check before every access (defeats purpose of caching!) periodic checking check on file open, good for session semantics, could use write-on-close and check on open together, for example
17
Cache validation schemes, cont 2. Server-initiated approach Idea: server keeps a record of which client has the file and reacts when it detects a potential for inconsistency, for example, when two clients try to open the same file for write.
18
Server-initiated approach, cont The server could disable caching when this happens. Problems with disabling caching: It violates client-server model Requires that servers be stateful, which is less reliable. Client must still check on open Can also use a call-back approach, in which the server promises to notify all clients of a cached file before allowing any modifications of the file by any client.
19
File replication Not the same as caching: 1.A replica is associated with a server, whereas a cached copy is associated with a client. 2.The existence of a cached copy is dependent on access patterns, whereas the existence of a replica normally depends on availability and performance requirements
20
File replication, cont. 3.A replica is more persistent, widely known, secure, available, complete, and accurate than a cached copy. 4.A cached copy is contingent upon a replica. Only by periodic revalidation with respect to a replica is a cached copy useful.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.