Download presentation
Presentation is loading. Please wait.
Published bySamantha Lynch Modified over 7 years ago
1
Accessing Remote Files CS 188 Distributed Systems January 20, 2015
2
Introduction Not all the interesting data is on our machine
Other machines have files we’d like to use How can we make effective use of those files?
3
Basic Approaches Explicit copy of the files
A file server on the remote machine (e.g., FTP) A more general server that also allows file downloads (e.g., a web server) An integrated file system approach
4
Ideal Goals Transparency Performance Cost
Indistinguishable from local files for all uses All clients see all files from anywhere Performance Per-client: at least as fast as local disk Scalability: unaffected by the number of clients Cost Capital: less than local (per client) disk storage Operational: zero, it requires no administration Capacity: unlimited, it should handle any amount of files Availability: 100%, no failures or service down-time Applicability: Should work well on any network (LAN, wireless, Internet, whatever)
5
Functionality Challenges
Transparency Making remote files look just like local files On a network of heterogenous clients and servers In the face of Deutch’s warnings Creating global file name-spaces Security WAN scale authentication and authorization Providing ACID properties Atomicity, Consistency, Isolation, Durability
6
Performance Challenges
Single client response-time Remote requests involve messages and delays Aggregate bandwidth Each client puts message processing load on server Each client puts disk throughput load on server Each message loads server’s NIC and network WAN scale operation Where bandwidth is limited and latency is high Aggregate capacity How to transparently grow existing file systems
7
Robustness Challenges
All files should always be available, despite … Failures of the disk on which they are stored Failures of the remote file access server Regional catastrophes (flood, earthquake, etc.) Fail-over should be prompt and seamless A delay of a few seconds might be acceptable Recovery must be entirely automated For time, cost, and correctness reasons
8
Manageability Challenges
Storage management Integrating new storage into the system Diagnosing and replacing failed components Load and capacity balancing Spreading files among volumes and servers Spreading clients among servers Information life cycle management Moving unused files to less expensive storage Archival “compliance”, finding archived data Client configuration Domain services, file servers, name-spaces, authentication
9
Security Challenges What meaningful security can we provide for networked file systems? Can we guarantee reasonable access control? How about secrecy of data crossing the network? How can we provide integrity guarantees to remote users? What if we can’t trust all of the systems requesting files? What if we can’t trust all of the systems storing files?
10
Evolution of Remote File Systems
Explicit file copying (one time transfers) Commands like ftp, secure ftp, rcp, rsh, rsync Explicit remote access (special case) Remote data access methods (special code) Remote data access tools (special programs) Implicit remote access (all files appear local) Remote disk access Remote file access Distributed file systems vs. remote file access
11
Key Characteristics of Network File System Solutions
APIs and transparency How do users and processes access remote files? How closely do remote files mimic local files? Performance and robustness Are remote files as fast and reliable as local ones? Architecture How is solution integrated into clients and servers? Protocol and work partitioning How do client and server cooperate?
12
A Brief Digression Into Local File Systems
Ideally, the remote file access should look like local access How we might do that is thus related to how we handle local file access We’ve done some review of that, but let’s do a little more
13
How Network File Systems Fit In
client server remote FS server system calls file operations directory operations file I/O socket I/O virtual file system integration layer socket I/O UDP TCP EXT3 FS UDP TCP IP CD FS DOS FS UNIX FS remote FS IP MAC driver block I/O block I/O MAC driver NIC driver disk driver NIC driver CD drivers disk drivers flash drivers
14
The Name Space Issue How do you name files on a remote machine?
Even if things like open() and read() are the same Two popular choices: Each file system (including remote ones) has a different root for its name space The Windows solution Knit together the file systems into one overall name space The Unix solution
15
Windows File System Naming
Each file system is treated as a different “drive” The A: drive, the D: drive, etc. Each remote file system is one such drive Advantages: Easy to build Simple model Makes remote access fairly explicit Disadvantages: Poor transparency Harder for users to work with
16
Unix File Naming Each machine has one global namespace
Individual file systems grafted into that namespace At particular places Whether on local disks or remote machines Advantages: Easy to work with Highly transparent Disadvantages: More complex May hide too much
17
Building The Unix Global Name Space
Goal: Make many file systems appear to be one giant one Users need not be aware of file system boundaries Mechanism: Mount device on directory Creates a warp from the named directory to the top of the file system on the specified device Any file name beneath that directory is interpreted relative to the root of the mounted file system
18
Unix Mounted File System Example
root file system mount filesystem2 on /export/user1 mount filesystem3 on /export/user2 mount filesystem4 on /opt /export /opt /bin user1 user2 file system 2 file system 3 file system 4
19
How Does This Actually Work?
Mark the directory that was mounted on When file system opens that directory, don’t treat it as an ordinary directory Instead, consult a table of mounts to figure out where the root of the new file system is Go to that device and open its root directory And proceed from there
20
Mounting and Remote File Systems
In principle, we don’t care where the file system is stored From the name perspective From the mechanical perspective
21
Remote File System Mounts
Machine A root file system /export /opt /bin user1 user2 Machine D Machine B Machine C file system 2 file system 3 file system 4
22
So What’s Going On During File Access?
remote FS server system calls file operations directory operations file I/O socket I/O virtual file system integration layer socket I/O UDP TCP EXT3 FS UDP TCP IP UNIX FS remote FS IP MAC driver block I/O block I/O MAC driver NIC driver disk driver disk driver NIC driver open(/bin); open(/opt);
23
The Client Side On Unix/Linux, makes use of VFS interface
Allows plug-in of file system implementations Each implements a set of basic methods create, delete, open, close, link, unlink, etc. Translates logical operations into actual access operations Remote file systems are one possible choice Translate each standard method into messages Forward those requests to a remote file server RFS client only knows the RFS protocol Need not know the underlying on-disk implementation
24
Server Side Implementation
RFS Server Daemon Receives and decodes messages Does requested operations on local file system Can be implemented in user- or kernel-mode Kernel daemon may offer better performance User-mode is much easier to implement One daemon may serve all incoming requests Higher performance, fewer context switches Or could be many per-user-session daemons Simpler, and probably more secure
25
Remote File Systems: Problems and Solutions
Authentication and authorization Performance Synchronization Robustness Network security
26
Authorization and Authentication
Authorization is determined if someone is allowed to do something Authentication is determining who someone is Both are required for good file system security Be sure who someone is first Then see if that entity can do what he asked for Both are more challenging when file system spans multiple machines
27
Problems in Authentication/Authorization
How does remote server know requestor identity? User isn’t logged into his machine Where should we enforce access control rules? On the requesting client side? That’s who really knows who the client is On the responding server side? That’s who has responsibility to protect the data On both? Name space issues Do the client and server agree on who’s who?
28
Approaches to These Security Issues
User-session protocols (e.g., CIFS) Remote session establishment includes authentication So server authenticates requesting client Server performs all authorization checks Peer-to-peer protocols (e.g., NFS) Server trusts client to enforce authorization control And to authenticate the user Third party authentication (e.g., Kerberos) Server checks authorization based on credentials
29
Performance Issues Performance of the remote file system now dependent on many more factors Not just the local CPU, bus, memory, and disk Also on the same hardware on the server that stores the files Which often is servicing many clients And on the network in between Which can have wide or narrow bandwidth
30
Some Performance Solutions
Appropriate transport and session protocols Minimize messages, maximize throughput Partition the work Minimize number of remote requests Spread load over more processors and disks Client-side pre-fetching and caching Fetching whole file at a once is more efficient Block caching for read-ahead and deferred writes Reduces disk I/O and network I/O (vs. server cache)
31
Protocol-Related Solutions
Minimize messages Allow any key operation to be performed with a single request and a single response Combine short messages and responses into a single packet Maximize throughput Design for large data transfers per message Use minimal flow control between client and server
32
On-disk data representation
Partitioning the Work Clearly on client side Open file instances, offsets Data packing and unpacking Authentication/authorization Either side (or both) Directory searching Block caching Specialized caching (directories, file descriptors) Logical to physical block mapping On-disk data representation Clearly on server side Device driver integration layer Device driver
33
Server Load Balancing If multiple servers can handle the same file requests, we can load balance Improving performance for multiple clients Provide a pool of servers All with access to the same data E.g., they all have copies of all the same files Spread client traffic across all of the servers E.g., using a load-balancing front-end router Increase capacity by adding servers to pool With potentially linear scalability Works best if requests are idempotent
34
Client-Side Caching Benefits Avoids network latencies
Clients can cache name-to-handle bindings Eliminating the repeat the same search Clients can cache blocks of file data Eliminating the need to re-fetch them from the server Dangers Multiple clients, each with his own cache Cache consistency issues Challenges Serializing concurrent writes from multiple clients Keeping client side caches up-to date Without sending N messages per update
35
Client Caching Issues Is the cache integrated with the local block cache or is it a special cache? How long do you cache remote blocks? Based on standard caching algorithm? Based on activities with remote server? Is the local cached copy still valid? Cache invalidation
36
Cache Invalidation Two (or more) clients cache the same block
One of them updates it What about the other one? Server could notify every client of every write Very inefficient Server could track which clients to notify Higher server overhead Clients could obtain lock on files before update Clients could verify cache validity before use
37
Synchronization Issues
Distributed synchronization is slow and difficult Provide a centralized synchronization server All locks are granted by a single server Changes are not official until he acknowledges them He notifies other nodes of “interesting” changes Distributed systems have complex failure modes Locks are granted as revocable leases Update transaction must be accompanied by valid lease Versioned files can detect stale information All cached information should have a “time to live” A tradeoff between performance and consistency
38
Robustness Issues Three major components in remote file system operations The client machine The server machine The network in between All can fail Leading to potential problems for the remote file system’s data and users
39
Robustness Solution Approaches
Network errors – support client retries Have file system protocol use idempotent requests Have protocol support all-or-none transactions Client failures – support server-side recovery Automatic back-out of uncommitted transactions Automatic expiration of timed out lock leases Server failures – support server fail-over Replicated (parallel or back-up) servers Stateless remote file system protocols Automatic client-server rebinding
40
Idempotent Operations
Operations that can be repeated many times with same effect as if done once If server does not respond, client repeats request If server gets request multiple times, no harm done Examples: Read block 100 of file X Write block 100 of file X with contents Y Delete file X, version v Examples of non-idempotent operations: Read next block of current file Append contents Y to end of file X
41
State-full and Stateless Protocols
A state-full protocol has a notion of a “session” Context for a sequence of operations Each operation depends on previous operations Server is expected to remember session state Examples: TCP (message sequence numbers) A stateless protocol does not assume server retains “session state” Client supplies necessary context on each request Each operation is complete and unambiguous Example: HTTP
42
Statefulness and Layering
The statefulness of a particular protocol refers to its own level HTTP saves no information between HTTP requests No memory of last request But it runs over TCP, which is stateful To ensure reliable, ordered packet delivery HTTP can run each request as a separate TCP session, though Even if it doesn’t, HTTP saves no state
43
Server Fail-Over When can we handle server failure by switching to another server? If the other server can access the required data Because files are replicated to multiple servers Because new server can access old server’s disks If the protocol allows stateless servers Client will not expect server to remember anything If clients can be re-bound to a new server IP address fail-over may make this automatic RFS client layer might rebind w/o telling application Idempotent requests can be re-sent with no danger All three are necessary
44
Replication Keep more than one copy of data Spread over multiple sites
Clients should be able to access any replica Implying the contents of the replicas must be identical If updates occur, all replicas must be updated
45
Benefits of Replication
Improved resiliency Losing one machine doesn’t necessarily lose data Potential performance increases More than one machine can provide access to data May also gain by smart replica placement
46
Dangers of Replication
Vital to keep all replicas up to date Easy if it’s purely read-only data Harder if it’s writable data Particularly when considering all possible failure/recovery scenarios
47
Network Security Best general protection is to ensure the data is encrypted With proper encryption, eavesdropper can’t read it Requires key distribution and other issues of implementing the crypto
48
What Do We Encrypt? Do we just encrypt file contents?
The bits in the files Or also file metadata? File names, access times, etc. Or also operations being performed? “This is an open,” “this is a read,” etc.
49
How Do We Encrypt? Rely on full disk encryption?
But usually that’s undone when data is pulled off the disk Encrypt payloads in file system? Use something like IPSec in RFS client/server? Run over a VPN or other tunnel encryption?
50
A Brief Digression on Crypto Alternatives
What is end-to-end encryption? What do we mean by a tunnel in cryptography? What is IPSec? What is a VPN?
51
End-to-End Encryption
Source Destination plaintext ciphertext ciphertext ciphertext ciphertext plaintext ciphertext Cryptography only at the end points Only the end points see the plaintext Normal way network cryptography done
52
Cryptographic Tunnels
Set up cryptographic keys and other arrangements between two machines Encrypt all messages going between those machines using that arrangement Make multiple connections/sessions look like one opaque tunnel Goal is to hide details of what’s going on
53
An Example Cryptographic Tunnel
Start remote file access Perform HTTP request Source machine Destination machine Eavesdroppers can’t tell nature of packets in the tunnel “End points” can be on these machines or some other machines in the path
54
IPSec Standard for applying cryptography at the network layer of IP stack Actually, kind of between networking and transport layers Provides various options for encrypting and authenticating packets On end-to-end basis Without concern for transport layer (or higher)
55
What IPsec Covers Message integrity Message authentication
Message confidentiality
56
General Structure of IPsec
Really designed for end-to-end encryption Though could do link level Designed to operate with either IPv4 or IPv6 Meant to operate with a variety of different ciphers And to be neutral to key distribution methods Has sub-protocols E.g., Encapsulating Security Payload (ESP)
57
ESP Modes Transport mode
Encrypt just the transport-level data in the original packet No IP headers encrypted Tunnel mode Original IP datagram is encrypted and placed in ESP Unencrypted headers wrapped around ESP
58
Virtual Private Networks (VPNs)
Set up an encrypted tunnel between two gateway machines Each has a LAN behind it All messages sent between the LANs are encrypted And sent in the same tunnel Often using IPSec Conceals details of the LANs And which of their machines are talking And what kinds of things they’re doing VPNs can also be set up with individual machines as endpoints Or with more than two endpoints
59
Key Distribution Crypto only effective if proper parties know the keys
And other parties don’t Two vital elements Key secrecy Distribution of the right keys to the right parties Neither is as simple as it sounds
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.