Presentation is loading. Please wait.

Presentation is loading. Please wait.

FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak, R. Chaiken, J.

Similar presentations


Presentation on theme: "FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak, R. Chaiken, J."— Presentation transcript:

1 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, R. P. Wattenhoffer Microsoft Research

2 Paper highlights Paper discusses a distributed file system lacking a central server –Files and directories reside on client machines –Files are encrypted and replicated –Directory metadata are maintained by Byzantine-replicated finite state machines

3 Serverless file systems Idea is not new –xFS (Anderson et al. SOSP 1995) Objective is to utilize free disk space and processing power of client machines Two major issues are –Availability of files –Security

4 Design assumptions (I) 1.Farsite is intended to run on the desktops of a large corporation or a university: –Maximum scale of ~10 5 machines –Interconnected by a high-bandwidth low- latency network –Most machines up most of the time –Uncorrelated machine failures

5 Design assumptions (II) 2.No files are both –Read by many users and –Frequently updated by at least one user (very infrequent in Windows NT file system) 3.Small but significant fraction of users will maliciously attempt to destroy or corrupt file data and metadata

6 Design assumptions (III) 4.Large fraction of users may independently attempt unauthorized accesses 5.Each machine is under the control of its immediate user –Cannot be subverted by other people 6.No user sensitive data persist after logout or system reboot –Not true for any commodity OS

7 Enabling technology trends (I) 1.General increase in unused disk capacity: for 4800 desktops at Microsoft research YearUnused disk space 1998 49% 1999 50% 2000 58%

8 Enabling technology trends (II) 2.Lowered cost of cryptographic operations: –Can now encrypt data at 72MB/s –Faster than disk sequential I/O bandwidth (32MB/s)

9 Namespace roots Farsite provides hierarchical directory namespaces –Each namespace has its own root –Each root has a unique root name –Each root is managed by a designated set of machines forming a Byzantine-fault-tolerant group No need for a protected set of machines

10 Trust and certification (I) Basic Requirements –Users must trust the machines that offer to present data or metadata –Machines must trust the validity of requests from remote users –System security must trust that machines that claim to be distinct are truly distinct To prevent Sybil attacks

11 Sybil attacks (Douceur 2002) Possible whenever redundancy is used to increase security Single rogue entity can –Pretend to be many and –End controlling a large part of the system Cannot prevent them without a logically centralized authority certifying identities

12 Trust and certification (II) Farsite manages trust through public-key cryptographic certificates – Namespace certificates – User certificates – Machine certificates

13 Trust and certification (III) Bootstrapped by fiat : –Machines told to accept certificates that can be authenticated with some public keys –Associated private keys are called Certification Authorities (CA) Certificates created either by CAs themselves or by users authorized to create certificates

14 Trust and certification (IV) User private keys are –Encrypted with a symmetric key derived from user password –Stored in a globally-readable directory in Farsite Does not require users to modify their behavior User or machine keys can be revoked

15 Handling malicious behaviors Most fault-tolerant file systems do not protect users’ files against malicious behaviors of hosts They assume that a host will either behave correctly or crash Malicious behaviors are often called Byzantine failures –One or more hosts act as if they were controlled by very clever traitors

16 System architecture (I) Each Farsite client will deal with two different sets of hosts –A set of machines constituting a directory group –A set of machines acting as file hosts In practice these three roles are shared by all machines

17 Client File Host Member Directory Group Client sees one directory group System architecture (II)

18 The directory group (I) Replicates directories on directory members Directory integrity enforced through a Byzantine-fault-tolerant protocol – Works as long as less than one-third of the hosts misbehave in any manner (“traitor) –Requires a minimum of four hosts to tolerate one misbehaving host

19 The directory group (II) Decisions for all operations that are not determined by the client request are made through a cryptographically secure distributed random number generator Issues leases on files to clients –Promise not to allow any incompatible access to the file during the duration of the lease without notifying the client

20 The directory group (III) Directory groups can split : –Randomly select a group of machines they know –Tell them to form a new directory group –Delegate a portion of their namespace to new group Both user and directory group mutually authenticate themselves

21 The file hosts (I) Farsite stores encrypted replicas of each file to ensure file integrity and file availability Continuously monitors host availability and relocates replicas whenever necessary Does not allow all replicas of a given file to reside on hosts owned by the same user Files that were recently accessed by a client are cached locally (for “roughly one week ”)

22 The file hosts (II) Farsite does not use voting: –Correct replicas are identified by the directory host Farsite does not update at once all replicas of a file: –Would be too slow –Uses instead a background update mechanism

23 Semantic differences Unlike NTFS, Farsite –Puts a limit on the number of clients that can have a file open for write –Allows a directory to be renamed even if there is an open handle on a file in the directory or any of its descendents –Uses background—”lazy”—propagation of directory updates

24 Reliability and availability (I) Trough redundancy –Metadata stored in a directory group of R D members remain accessible if no more than  R D - 1  / 3  members fail –Data replicated on R F file hosts remain accessible as long as one of these hosts remains alive

25 Reliability and availability (II) Farsite migrates duties of machines that have been unavailable for a long period of time to new machines ( regeneration ) –More aggressive approach to directory migration than to file-host migration Farsite continuously monitors host availability and relocates replicas whenever necessary Client cache files for a week after last access

26 Security (I) Write access control enforced through Access Control Lists managed by directory group –Requires Byzantine agreement Read access control achieved through strong cryptography –File is encrypted with symmetric file key –File key is encrypted with public keys of all authorized users

27 Security (II) Same technique is applied to directory names –Members of directory group cannot read them To ensure file integrity, Farsite stores a copy of a Merkle hash tree over the file data blocks in the directory group that manages the file’s metadata

28 What is a Merkle hash tree? (I) Consider a file made up of four blocks: A, B, C and D We successively compute: –a =leaf_hash(A), …, d = leaf_hash(D) –p = inner_hash( a, b), q = inner_hash( c, d) –r = inner_hash( p, q) Recomputing r (the root hash) an comparing it with its supposed value will detect any tampering

29 What is a Merkle hash tree? (II) ABCD a=leaf_hash(A)b=leaf_hash(B)d =leaf_hash(D)c=leaf_hash(C) q=inner_hash(c, d)p=inner_hash(a, b) r=inner_hash(p,q)

30 Durability (I) File creations, deletions and renames are not immediately forwarded to directory group –High cost of Byzantine protocol First stored in a log on client –Much as in Coda disconnected mode Log is pushed back to directory group –At fixed intervals –Whenever a lease is recalled

31 Durability (II) When a client reboots, it needs to send its committed updates to the directory group and have them accepted as authentic –Client will generate an authenticator key which it will distribute among members of the directory group –Can use this key to sign each committed update

32 Consistency (I) Directory group uses a lease mechanism: – Data read/write leases – Data read-only leases Concurrent write accesses are handled by redirecting them to a single client machine –Guarantees correctness –Non scalable

33 Consistency (II) Leases have variable granularity –Single file –Entire subtree No good way to handle read/write lease expiration on a disconnected client The fundamental paper on leases is C. G. Gray,.D. R. Cheriton: Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency. SOSP 1989: pp. 202-210

34 Consistency (III) Special name leases for files and directories –A name lease on a directory allows holder to create files and subdirectories under that directory with any non-extant name More special-purpose leases were introduced to implement Windows file sharing semantics

35 Scalability Ensured through – Hint-based pathname translation: Hints are data items that are useful when they are correct and cause no harm when they are incorrect Think of a phone number – Delayed-directory change notification

36 Efficiency Space efficiency: –Almost 50% of disk space could be reclaimed by eliminating duplicate files –Farsite detects files with duplicate contents and co-locates them in same set of file hosts Performance: –Achieved through caching and delaying updates

37 Evaluation Designed to scale up to 10 5 machines –Roughly 300 new machines per day Andrew benchmark two times slower than NTFS Still to do –Implement disk quotas –Have mechanism to measure machine availability


Download ppt "FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak, R. Chaiken, J."

Similar presentations


Ads by Google