G22.3250-001 Robert Grimm New York University Farsite: A Serverless File System.

Slides:



Advertisements
Similar presentations
What is OceanStore? - 10^10 users with files each - Goals: Durability, Availability, Enc. & Auth, High performance - Worldwide infrastructure to.
Advertisements

Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Presented by: Boon Thau Loo CS294-4 (Adapted from Adya’s OSDI’02.
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
G Robert Grimm New York University Disconnected Operation in the Coda File System.
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Hands-On Microsoft Windows Server 2003 Administration Chapter 5 Administering File Resources.
Chapter 12 File Management Systems
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
The Google File System.
Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment Microsoft Reseach, Appear in OSDI’02.
Wide-area cooperative storage with CFS
5.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 5: Working with File Systems.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 7 Configuring File Services in Windows Server 2008.
Chapter 7 Configuring & Managing Distributed File System
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, Bill Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken,
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Review Session for Fourth Quiz Jehan-François Pâris Summer 2011.
FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Module 5: Planning a DNS Strategy. Overview Planning DNS Servers Planning a Namespace Planning Zones Planning Zone Replication and Delegation Integrating.
FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak, R. Chaiken, J.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Pond: the OceanStore Prototype Sean Rhea, Patric Eaton, Dennis Gells, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz University of California, Berkeley.
Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring, Managing, and Troubleshooting Resource Access.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Practical Byzantine Fault Tolerance
Overview Managing a DHCP Database Monitoring DHCP
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Presenters: Rezan Amiri Sahar Delroshan
Peer-to-peer Information Systems Universität des Saarlandes Max-Planck-Institut für Informatik – AG5: Databases and Information Systems Group Prof. Dr.-Ing.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing
DNS DNS overview DNS operation DNS zones. DNS Overview Name to IP address lookup service based on Domain Names Some DNS servers hold name and address.
Distributed File Systems Architecture – 11.1 Processes – 11.2 Communication – 11.3 Naming – 11.4.
POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley.
1 Objectives Discuss File Services in Windows Server 2008 Install the Distributed File System in Windows Server 2008 Discuss and create shared file resources.
Introduction to Active Directory
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Module 11 Configuring and Managing Distributed File System.
Planning File and Print Services Lesson 5. File Services Role The File Services role and the other storage- related features included with Windows Server.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring, Managing, and Troubleshooting Resource Access.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Distributed File Systems
Google File System.
Google Filesystem Some slides taken from Alan Sussman.
Providing Secure Storage on the Internet
Introduction to Operating Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie.
Introducing NTFS Reliability Security Long file names Efficiency
Presentation transcript:

G Robert Grimm New York University Farsite: A Serverless File System

Altogether Now: The Three Questions  What is the problem?  What is new or different?  What are the contributions and limitations?

Distributed File Systems Take Two (late 90s and 00s)  Goal: Provide uniform file system access across several workstations  Single, possibly large organization  Computers connected by local network (not Internet)  General strategy: Serverless file systems  Server-based FS’s are well administered, have higher quality, more redundancy, and greater physical security  But they also are expensive!  Equipment, physical plant, personnel  Alternative: Use already existing workstations  Need to reconcile security, reliability, and performance

Farsite Design Assumptions  High-bandwidth, low-latency network  Topology can be ignored  Majority of machines up and running a majority of the time  Generally uncorrelated downtimes  Small fraction of users malicious  Try to destroy or corrupt data or metadata  Large fraction of users opportunistic  Try to read unauthorized data or metadata

Enabling Technology Trends  Remember: Reconcile security and performance  Increase in unused disk capacity  Use replication  Microsoft studies: Discs grow faster than Windows/Office  1998: 49% of disc space unused  1999: 50% of disc space unused  2000: 58% of disc space unused  Decrease in cost of cryptographic operations  Provide privacy and security  72 MB/s symmetric encryption bandwidth  53 MB/s one-way hashing bandwidth  32 MB/s sequential I/O bandwidth

Trust and Certification  Based on public-key certificates  Issued by certificate authority or its delegates  Important for manageability (e.g., HR and IT departments)  Structured into three types of certificates  Namespace certificates  Map FS namespace to set of machines managing metadata  User certificates  Map user to public key for access control  Machine certificates  Map machine to public key for establishing physical uniqueness

System Architecture  Basic system  Clients interact with user  Directory groups manage file information  Use Byzantine fault-tolerance [Castro & Liskov ’99]  3f + 1 replicated state machines tolerate up to f malicious nodes  Problems with basic system  Performance of Byzantine fault-tolerance protocol  No privacy of data  No access control  Large storage requirements (replicated state machines!)

System Architecture (cont.)  Improvements for actual system  Cache (and lease) files on clients and delay updates  Encrypt data with public keys of all readers  Provides privacy and read-access control  Check write-access permissions in directory group  Store opaque file data (but not metadata) on file hosts  Use cryptographic hash in metadata to ensure validity  Delegate portions of namespace to different directory groups  Semantic differences from NTFS  Limits on concurrent writers and readers  No locking for rename operations

Diggin’ Deeper  Reliability and availability  Security  Durability  Consistency  Scalability  Efficiency  Manageability

Reliability and Availability  Main mechanism: Replication  Byzantine for metadata, regular for file data  Challenge: Migrate replicates in face of failures  Aggressive directory migration  After all, need to access metadata to even get to file data  Random selection of new group members after short downtime  Bias towards high-availability machines  Balanced file host migration  Swap locations of high-availability files with low-availability files  Simulations show 1% of replicas swapped per day

Security  Access control  Based on public keys of all authorized file writers  Performed by directory group  Privacy  Based on block-level, convergent encryption (why?)  One-way hash serves as key for encrypting block  One-way hash encrypted with symmetric file key  File key encrypted with public keys of all file readers  Integrity  Ensured by hash tree over file data blocks  Tree stored with file, root hash stored with metadata

Durability  Updates to metadata collected in client-side log  Batched to directory group  What to do after a crash?  Obvious solutions are problematic  Private key signing of all log entries too expensive  Holding private key on machine opens security hole  Better solution: Symmetric authenticator key  Split into shares and distributed amongst directory group  Log entries signed using message authentication code  On crash recovery, log pushed to directory group  Members reconstruct authenticator key, validate entries

Consistency  Ensured by leases  Content leases control file content  Either read/write or read-only  Leases, together with batched updates, recalled on demand  Optimization allows for serialization through single client  Leases also include expiration times  Addresses failures and disconnections  Name leases control directory access  Allow for creation of file/directory  Allow for creation of children  Can be recalled on demand

Consistency (cont.)  Ensured by leases (cont.)  Mode leases control Windows file-sharing semantics  Implemented by six types: Read, write, delete, exclude-read, exclude-write, exclude-delete  Can be downgraded on demand  Access leases control Windows deletion semantics  Implemented by three types: Public, protected, private  Public: Lease holder has file open  Protected: No other client gains access without permission by holder  Private: No other client gains access  Delete operation requires protected/private lease  Forces metadata pushback, informs directory group of pending deletion

Scalability  Hint-based pathname translation  Cache pathnames and mappings to directory groups  Use longest-matching prefix to start search  On mismatch, directory group  Provides all delegation certificates  client moves on  Informs of mismatch  client checks shorter prefix  Delayed directory-change notification  Windows already allows best-effort notification  Farsite directory groups send delayed directory listings through application-level multicast

Efficiency  Space  Replication increases storage requirements  But measurements also show that a lot of data is duplicated  Convergent encryption makes it possible to detect duplicate data  Windows’ single instance store coalesces duplicate data  Time  Reflected throughout the system  Caching to improve availability and read performance  Leasing, hint-based translation reduce load and network round trips  Limited leases, hash trees, MAC-logging reduce overheads  Update delays reduce traffic

Manageability  Local machine administration  Preemptive migration to improve reliability  Backwards-compatible versions to simplify upgrades  Potential backup utilities to improve performance and avoid loss of privacy  Autonomic system operation  Timed Byzantine operations  Group member initiates time update which causes timed operations to be performed  Byzantine outcalls  Revert standard model of Byzantine fault tolercance  Members prod clients into performing a Byzantine operation

Phhh, That’s a Lot of Details  But how well does it work?  Windows-based implementation  Kernel-level driver & user-level daemon  NTFS for local storage  CIFS for data remote data transfer  Encrypted TCP for control channel  Analysis for 10 5 machines  Certification: 2x10 5 certificates per month, less than one CPU hour  Direct load on directory groups  Limited by leases per file, directory-change notifications  Indirect load on directory groups  1 year lifetime  300 new machines/day  trivial load on root group

Measured Performance Based on Trace Replay, One User  What do we conclude from this graph?

What Do You Think?