FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak, R. Chaiken, J.

Slides:



Advertisements
Similar presentations
Key distribution and certification In the case of public key encryption model the authenticity of the public key of each partner in the communication must.
Advertisements

Security by Design A Prequel for COMPSCI 702. Perspective “Any fool can know. The point is to understand.” - Albert Einstein “Sometimes it's not enough.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Presented by: Boon Thau Loo CS294-4 (Adapted from Adya’s OSDI’02.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
DESIGNING A PUBLIC KEY INFRASTRUCTURE
G Robert Grimm New York University Disconnected Operation in the Coda File System.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
1 The Sybil Attack John R. Douceur Microsoft Research Presented for Cs294-4 by Benjamin Poon.
G Robert Grimm New York University Farsite: A Serverless File System.
Chapter 12 File Management Systems
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
The Google File System.
Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment Microsoft Reseach, Appear in OSDI’02.
Wide-area cooperative storage with CFS
5.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 5: Working with File Systems.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Team CMD Distributed Systems Team Report 2 1/17/07 C:\>members Corey Andalora Mike Adams Darren Stanley.
A Low-Bandwidth Network File System A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, NYU.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, Bill Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken,
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Review Session for Fourth Quiz Jehan-François Pâris Summer 2011.
FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Microsoft Active Directory(AD) A presentation by Robert, Jasmine, Val and Scott IMT546 December 11, 2004.
Distributed File Systems
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
CH1. Hardware: CPU: Ex: compute server (executes processor-intensive applications for clients), Other servers, such as file servers, do some computation.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Strong Security for Distributed File Systems Group A3 Ka Hou Wong Jahanzeb Faizan Jonathan Sippel.
Practical Byzantine Fault Tolerance
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Presenters: Rezan Amiri Sahar Delroshan
1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.
Peer-to-peer Information Systems Universität des Saarlandes Max-Planck-Institut für Informatik – AG5: Databases and Information Systems Group Prof. Dr.-Ing.
CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Merkle trees Introduced by Ralph Merkle, 1979 An authentication scheme
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley.
Introduction to Active Directory
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Dsitributed File Systems
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
Solutions for Fourth Quiz COSC 6360 Fall First question What do we mean when we say that NFS client requests are: (2×10 pts)  self-contained? 
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Providing Secure Storage on the Internet
Replica Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
THE GOOGLE FILE SYSTEM.
Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie.
Presentation transcript:

FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, R. P. Wattenhoffer Microsoft Research

Paper highlights Paper discusses a distributed file system lacking a central server –Files and directories reside on client machines –Files are encrypted and replicated –Directory metadata are maintained by Byzantine-replicated finite state machines

Serverless file systems Idea is not new –xFS (Anderson et al. SOSP 1995) Objective is to utilize free disk space and processing power of client machines Two major issues are –Availability of files –Security

Design assumptions (I) 1.Farsite is intended to run on the desktops of a large corporation or a university: –Maximum scale of ~10 5 machines –Interconnected by a high-bandwidth low- latency network –Most machines up most of the time –Uncorrelated machine failures

Design assumptions (II) 2.No files are both –Read by many users and –Frequently updated by at least one user (very infrequent in Windows NT file system) 3.Small but significant fraction of users will maliciously attempt to destroy or corrupt file data and metadata

Design assumptions (III) 4.Large fraction of users may independently attempt unauthorized accesses 5.Each machine is under the control of its immediate user –Cannot be subverted by other people 6.No user sensitive data persist after logout or system reboot –Not true for any commodity OS

Enabling technology trends (I) 1.General increase in unused disk capacity: for 4800 desktops at Microsoft research YearUnused disk space % % %

Enabling technology trends (II) 2.Lowered cost of cryptographic operations: –Can now encrypt data at 72MB/s –Faster than disk sequential I/O bandwidth (32MB/s)

Namespace roots Farsite provides hierarchical directory namespaces –Each namespace has its own root –Each root has a unique root name –Each root is managed by a designated set of machines forming a Byzantine-fault-tolerant group No need for a protected set of machines

Trust and certification (I) Basic Requirements –Users must trust the machines that offer to present data or metadata –Machines must trust the validity of requests from remote users –System security must trust that machines that claim to be distinct are truly distinct To prevent Sybil attacks

Sybil attacks (Douceur 2002) Possible whenever redundancy is used to increase security Single rogue entity can –Pretend to be many and –End controlling a large part of the system Cannot prevent them without a logically centralized authority certifying identities

Trust and certification (II) Farsite manages trust through public-key cryptographic certificates – Namespace certificates – User certificates – Machine certificates

Trust and certification (III) Bootstrapped by fiat : –Machines told to accept certificates that can be authenticated with some public keys –Associated private keys are called Certification Authorities (CA) Certificates created either by CAs themselves or by users authorized to create certificates

Trust and certification (IV) User private keys are –Encrypted with a symmetric key derived from user password –Stored in a globally-readable directory in Farsite Does not require users to modify their behavior User or machine keys can be revoked

Handling malicious behaviors Most fault-tolerant file systems do not protect users’ files against malicious behaviors of hosts They assume that a host will either behave correctly or crash Malicious behaviors are often called Byzantine failures –One or more hosts act as if they were controlled by very clever traitors

System architecture (I) Each Farsite client will deal with two different sets of hosts –A set of machines constituting a directory group –A set of machines acting as file hosts In practice these three roles are shared by all machines

Client File Host Member Directory Group Client sees one directory group System architecture (II)

The directory group (I) Replicates directories on directory members Directory integrity enforced through a Byzantine-fault-tolerant protocol – Works as long as less than one-third of the hosts misbehave in any manner (“traitor) –Requires a minimum of four hosts to tolerate one misbehaving host

The directory group (II) Decisions for all operations that are not determined by the client request are made through a cryptographically secure distributed random number generator Issues leases on files to clients –Promise not to allow any incompatible access to the file during the duration of the lease without notifying the client

The directory group (III) Directory groups can split : –Randomly select a group of machines they know –Tell them to form a new directory group –Delegate a portion of their namespace to new group Both user and directory group mutually authenticate themselves

The file hosts (I) Farsite stores encrypted replicas of each file to ensure file integrity and file availability Continuously monitors host availability and relocates replicas whenever necessary Does not allow all replicas of a given file to reside on hosts owned by the same user Files that were recently accessed by a client are cached locally (for “roughly one week ”)

The file hosts (II) Farsite does not use voting: –Correct replicas are identified by the directory host Farsite does not update at once all replicas of a file: –Would be too slow –Uses instead a background update mechanism

Semantic differences Unlike NTFS, Farsite –Puts a limit on the number of clients that can have a file open for write –Allows a directory to be renamed even if there is an open handle on a file in the directory or any of its descendents –Uses background—”lazy”—propagation of directory updates

Reliability and availability (I) Trough redundancy –Metadata stored in a directory group of R D members remain accessible if no more than  R D - 1  / 3  members fail –Data replicated on R F file hosts remain accessible as long as one of these hosts remains alive

Reliability and availability (II) Farsite migrates duties of machines that have been unavailable for a long period of time to new machines ( regeneration ) –More aggressive approach to directory migration than to file-host migration Farsite continuously monitors host availability and relocates replicas whenever necessary Client cache files for a week after last access

Security (I) Write access control enforced through Access Control Lists managed by directory group –Requires Byzantine agreement Read access control achieved through strong cryptography –File is encrypted with symmetric file key –File key is encrypted with public keys of all authorized users

Security (II) Same technique is applied to directory names –Members of directory group cannot read them To ensure file integrity, Farsite stores a copy of a Merkle hash tree over the file data blocks in the directory group that manages the file’s metadata

What is a Merkle hash tree? (I) Consider a file made up of four blocks: A, B, C and D We successively compute: –a =leaf_hash(A), …, d = leaf_hash(D) –p = inner_hash( a, b), q = inner_hash( c, d) –r = inner_hash( p, q) Recomputing r (the root hash) an comparing it with its supposed value will detect any tampering

What is a Merkle hash tree? (II) ABCD a=leaf_hash(A)b=leaf_hash(B)d =leaf_hash(D)c=leaf_hash(C) q=inner_hash(c, d)p=inner_hash(a, b) r=inner_hash(p,q)

Durability (I) File creations, deletions and renames are not immediately forwarded to directory group –High cost of Byzantine protocol First stored in a log on client –Much as in Coda disconnected mode Log is pushed back to directory group –At fixed intervals –Whenever a lease is recalled

Durability (II) When a client reboots, it needs to send its committed updates to the directory group and have them accepted as authentic –Client will generate an authenticator key which it will distribute among members of the directory group –Can use this key to sign each committed update

Consistency (I) Directory group uses a lease mechanism: – Data read/write leases – Data read-only leases Concurrent write accesses are handled by redirecting them to a single client machine –Guarantees correctness –Non scalable

Consistency (II) Leases have variable granularity –Single file –Entire subtree No good way to handle read/write lease expiration on a disconnected client The fundamental paper on leases is C. G. Gray,.D. R. Cheriton: Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency. SOSP 1989: pp

Consistency (III) Special name leases for files and directories –A name lease on a directory allows holder to create files and subdirectories under that directory with any non-extant name More special-purpose leases were introduced to implement Windows file sharing semantics

Scalability Ensured through – Hint-based pathname translation: Hints are data items that are useful when they are correct and cause no harm when they are incorrect Think of a phone number – Delayed-directory change notification

Efficiency Space efficiency: –Almost 50% of disk space could be reclaimed by eliminating duplicate files –Farsite detects files with duplicate contents and co-locates them in same set of file hosts Performance: –Achieved through caching and delaying updates

Evaluation Designed to scale up to 10 5 machines –Roughly 300 new machines per day Andrew benchmark two times slower than NTFS Still to do –Implement disk quotas –Have mechanism to measure machine availability