1 CS 194: Distributed Systems Other Distributed File Systems Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering.

Slides:



Advertisements
Similar presentations
Fred P. Baker CCIE, CCIP(security), CCSA, MCSE+I, MCSE(2000)
Advertisements

SUNDR: Secure Untrusted Data Repository
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
The Zebra Striped Network File System Presentation by Joseph Thompson.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Computer Science Lecture 20, page 1 CS677: Distributed OS Today: Coda, xFS Case Study: Coda File System Brief overview of other recent file systems –xFS.
Distributed Storage March 12, Distributed Storage What is Distributed Storage?  Simple answer: Storage that can be shared throughout a network.
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
1 Networking through Linux Partha Sarathi Dasgupta MIS Group Indian Institute of Management Calcutta.
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Computer Science Lecture 21, page 1 CS677: Distributed OS Today: Coda, xFS Case Study: Coda File System Brief overview of other recent file systems –xFS.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
Computer Networks IGCSE ICT Section 4.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Case Study - GFS.
File Systems (2). Readings r Silbershatz et al: 11.8.
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Network File Systems Victoria Krafft CS /4/05.
Sun NFS Distributed File System Presentation by Jeff Graham and David Larsen.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
1 CS 268: Lecture 19 A Quick Survey of Distributed Systems 80 slides in 80 minutes!
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Networked File System CS Introduction to Operating Systems.
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Computer Networking From LANs to WANs: Hardware, Software, and Security Chapter 13 FTP and Telnet.
Serverless Network File Systems Overview by Joseph Thompson.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Presented By: Samreen Tahir Coda is a network file system and a descendent of the Andrew File System 2. It was designed to be: Highly Highly secure Available.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
Security fundamentals Topic 5 Using a Public Key Infrastructure.
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
SSH. 2 SSH – Secure Shell SSH is a cryptographic protocol – Implemented in software originally for remote login applications – One most popular software.
Distributed File Systems Sun Network File Systems Andrew Fıle System CODA File System Plan 9 xFS SFS Hadoop.
IP Security (IPSec) Matt Hermanson. What is IPSec? It is an extension to the Internet Protocol (IP) suite that creates an encrypted and secure conversation.
COMP1321 Digital Infrastructure Richard Henson March 2016.
BIG DATA/ Hadoop Interview Questions.
Truly Distributed File Systems Paul Timmins CS 535.
Chapter 7: Using Network Clients The Complete Guide To Linux System Administration.
Skype.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Distributed File Systems
11.3 Distributed File Systems Communication
Today: Coda, xFS Case Study: Coda File System
Distributed File Systems
Distributed File Systems
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Distributed File Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Distributed File Systems
Distributed File Systems
Distributed File Systems
Presentation transcript:

1 CS 194: Distributed Systems Other Distributed File Systems Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA

2 Four Other Distributed File Systems Focus on goals, not details  Plan 9: turn everything into a file  xFS: turn every machine into a file server  SFS: don’t trust anything but the file server  SUNDR: don’t even trust the file server

3 Plan 9  Developed at Bell Labs by the UNIX group  Not a new distributed file system…  …but a new file-based distributed system  Every resource looks like a file -Like UNIX, but more consistently  Clients can locally mount a name space offered by a server -NFS style, not AFS style

4 Plan 9 Organization

5 Plan 9 Communications  Uses custom protocol 9P  Network interfaces represented by collection of special files  TCP connection represented by subdirectory with: -Ctl: write protocol-specific control commands -Data: read and write data -Listen: accept incoming connection setup requests -Remote: information about other side of connection -Status: diagnostic information on current status

6 Plan 9 Example  Open a telnet connection to using port 23 -Write “connect !23” to file ctl  To accept incoming telnet connections -Write “announce 23” to ctl  Window system offers files: -/dev/mouse: mouse position -/dev/cons: keyboard input

7 Plan 9 Naming  Each process has own private namespace constructed by mounting remote name spaces  File identified system-wide by four-tuple: -path: unique file number (relative to server) -version -device number (identifies its server) -type

8 Plan 9 Synchronization  UNIX file sharing semantics: -All changes sent to server  Caching is write-through

9 Plan 9 Rationale  Distributed systems are hard  Distributed file systems are a solved problem  Build a distributed system that looks like a file system  Impact: not clear....

10 xFS  Developed at Berkeley for the NOW project ~1995 -NOW = Network of workstations  All machines can be a server, no server is “special”  Like modern P2P systems in their symmetric and scalable design  Unlike modern P2P systems in their assumptions about trust, churn, and bandwidth -Trusted, stable machines connected by high-bandwidth LAN

11 xFS Processes  Storage server: stores parts of files on local node  Metadata manager: keeping track of where blocks of files are stored  Client: accepts user commands

12 Overall xFS Architecture

13 xFS Built on Three Ideas  RAID  Log-based File System  Cooperative caching

14 RAID  Partitions data into k blocks and a parity block  Stores blocks on different disks  High bandwidth: simultaneous reading/writing  Fault tolerance: can withstand failures

15 RAID on xFS

16 Log-Structured File System (LFS)  Developed at Berkeley (~1992)  Updates written to a log  Updates in logs asynchronously written to file system  Once written to file system, updates removed from log  Advantages: -better write performance (sequential) -failure recovery

17 RAID + LFS = Zebra  Large writes in LFS makes writes in the RAID efficient  Implements RAID in software  Log-based striping

18 Locating Files  Key challenge: how do you locate a file in this completely distributed system?  Manager map allows clients to determine which manager to contact  Manager keeps track of where file is

19 xFS Data Structures Data structureDescription Manager mapMaps file ID to manager ImapMaps file ID to log address of file's inode InodeMaps block number (i.e., offset) to log address of block File identifierReference used to index into manager map File directoryMaps a file name to a file identifier Log addressesTriplet of stripe group, ID, segment ID, and segment offset Stripe group mapMaps stripe group ID to list of storage servers

20 Reading a File in xFS

21 Caching in xFS  Managers keep track of who has cached copy of file  Manager can direct request to peer cache  To modify block, client must get ownership from manager -Manager invalidates all cached copies -Gives write permission (ownership) to client

22 xFS: Performance

23 xFS: Performance

24 xFS Rationale  Technology trends: -Fast LANs -Cheap PCs -Faster hardware expensive  To get higher performance, don’t build new hardware  Just build networks of workstations, and tie them together with a file system  RAID rationale very similar

25 Secure File System (SFS)  Developed by David Mazieres while at MIT (now NYU)  Key question: how do I know I’m accessing the server I think I’m accessing?  All the fancy distributed systems performance work is irrelevant if I’m not getting the data I wanted -Getting the wrong data faster is not an improvement  Several current stories about why I believe I’m accessing the server I want to access

26 Trust DNS and Network  Someone I trust hands me server name:  Verisign runs root servers for.com, directs me to DNS server for foo.com  I trust that packets sent to/from DNS and to/from server are indeed going to the intended destinations

27 Trust Certificate Authority  Server produces certificate (from, for example, Verisign) that attests that the server is who it says it is.  Disadvantages: -Verisign can screw up (which it has) -Hard for some sites to get meaningful Verisign certificate

28 Use Public Keys  Can demand proof that server has private key associated with public key  But how can I know that public key is associated with the server I want?

29 Secure File System (SFS)  Basic problem in normal operation is that the pathname (given to me by someone I trust) is disconnected from the public key (which will prove that I’m talking to the owner of the key).  In SFS, tie the two together. The pathname given to me automatically certifies the public key!

30 Self-Certifying Path Name  LOC: DNS or IP address of server, which has public key K  HID: Hash(LOC,K)  Pathname: local pathname on server /sfsLOCHIDPathname /sfs/sfs.vu.sc.nl:ag62hty4wior450hdh63u623i4f0kqere/home/steen/mbox

31 SFS Key Point  Whatever directed me to the server initially also provided me with enough information to verify their key  This design separates the issue of who I trust (my decision) from how I act on that trust (the SFS design)  Can still use Verisign or other trusted parties to hand out pathnames, or could get them from any other source

32 SUNDR  Developed by David Mazieres  SFS allows you to trust nothing but your server -But what happens if you don’t even trust that? -Why is this a problem?  P2P designs: my files on someone else’s machine  Corrupted servers: sourceforge hacked -Apache, Debian,Gnome, etc.

33 Traditional File System Model  Client send read and write requests to server  Server responds to those requests  Client/Server channel is secure, so attackers can’t modify requests/responses  But no way for clients to know if server is returning correct data  What if server isn’t trustworthy?

34 Byzantine Fault Tolerance  Replicate server  Check for consistency among responses  Can only protect against a limited number of corrupt servers

35 SUNDR Model V1  Clients send digitally signed requests to server  Server returns log of these requests -Server doesn’t compute anything -Server doesn’t know any keys  Problem: server can drop some updates from log, or reorder them

36 SUNDR Model V2  Have clients sign log, not just their own request  Only bad thing a server can do is a fork attack: -Keep two separate copies, and only show one to client 1 and the other to client 2  This is hopelessly inefficient, but various tricks can solve the efficiency problem

37 Summary  Plan 9: turn everything into a file  xFS: turn every machine into a file server  SFS: don’t trust anything but the file server  SUNDR: don’t even trust the file server