Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.

Similar presentations


Presentation on theme: "Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc."— Presentation transcript:

1 Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.

2 Review: DFS Design Considerations 1.Name space construction 2.AAA 3.Operator batching 4.Client caching 5.Data consistency 6.Locking

3 Summing it Up: CIFS as an Example Network transport in CIFS –Use SMB (Server Message block) messages over a reliable connection-oriented transport TCP NetBIOS over TCP –Use persistent connections called “sessions” If a session is broken, client does the recovery

4 Design Choices in CIFS Name space construction: –per-client linkage, multiple methods for server resolution file://fs.xyz.com/users/alice/stuff.doc \\cifsserver\users\alice\stuff.doc E:\stuff.doc –CIFS also offers “redirection” method A share can be replicated in multiple servers or moved Client open  server reply “STATUS_DFS_PATH_NOT_COVERED”  client issues “TRANS2_DFS_GET_REFERRAL”  server reply with new server

5 Design Choices in CIFS AAA: Kerberos –Older systems use NTLM Operator batching: supported –These methods have “AndX” variations: TREE_CONNECT, OPEN, CREATE, READ, WRITE, LOCK –Server implicitly takes results of preceding operations as input for subsequent operations –First command that encounters an error stops all subsequent processing in the batch

6 Design Choices in CIFS Client caching –Cache both file data and file metadata, write-back cache, can read- ahead –Offers strong cache consistency using an invalidation-based approach Data access consistency –Oplocks: similar to “tokens” in AFS v3 “level II oplock”: read-only data locks “exclusive oplock”: exclusive read/write data lock “batch oplock”: exclusive read/write “open” lock and data lock and metadata lock –Transition among the oplocks –Observation: can have a hierarchy of lock managers

7 Design Choices in CIFS File and data record locking –Offer “shared” (read-only) and “exclusive” (read/write) locks –Part of the file system; Mandatory –Can lock either a whole file or byte-range in the file –Lock request can specify a timeout for waiting –Enables atomic writes with the “ANDX” batching with Writes “Lock/write/unlock” as a batched command sequence Additional capability: “directory change notification”

8 DFS for Mobile Networks What properties of DFS are desirable: –Handle frequent connection and disconnection –Enable clients to operate in disconnected state for an extended period of time –Ways to resolve/merge conflicts

9 Design Issues for DFS in Mobile Networks What should be kept in client cache? How to update the client cache copies with changes made on the server? How to upload changes made by the client to the server? How to resolve conflicts when more than one clients change a file during disconnected state?

10 Example System: Coda Client cache content: –User can specify which directories should always be cached on the client –Also cache recently used files –Cache replacement: walk over the cached items every 10 min to reevaluate their priorities Updates from server to client: –The server keeps a log of callbacks that couldn’t be delivered and deliver them upon client connection

11 Coda File System Upload the changes from client to server –The client has to keep a “replay log” Contents of the “replay log” –Ways to reduce the “replay log” size Handling conflicts –Detecting conflicts –Resolving conflicts

12 Performance Issues in File Servers Components of server load –Network protocol handling –File system implementation –Disk accesses Read operations –Metadata –Data Write operations –Metadata –Data Workload characterization

13 DFS for High-Speed Networks: DAFS Proposal from Network Appliance and companies Goal: eliminate memory copies and protocol processing –Standard implementation: network buffers  file system buffer cache  user-level application buffers Designed to take advantage of RDMA (“Remote DMA”) network protocols –Network transport provides direct memory  memory transfer –Protocol processing is provided in hardware Suitable for high-bandwidth, low-error-rate, low-latency network

14 DAFS Protocol Data read from the client: –RDMA request from the server to copy file data directly into application buffer Data write from the client –RDMA request from the server to copy application buffer into server memory Implementation: –as a library linked to user application interface with RDMA network library directly Eliminate two data copies –as a new file system implementation in the kernel Eliminate one data copy Performance advantage: –Example: 90 usec/op in NFS vs. 25 usec/op in DAFS

15 DAFS Features Session-based Offer authentication of client machines Flow control by server Stateful lock implementation with leases Offers atomic writes Offers operator batching

16 Clustered File Servers Goal: scalability in file service –Build a high-performance file service using a collection of cheap file servers Methods for Partitioning the Workload –Each server can support one “subtree” Advantages Disadvantages –Each server can support a group of clients Advantages Disadvantages –Client requests are sent to server in round-robin or load-balanced fashion Advantages Disadvantages

17 Non-Subtree-Partition Clustered File Servers Design issues –On which disks should the data be stored? –Management of memory cache in file servers –Data consistency management Metadata operation consistency Data operation consistency –Server failure management Single server failure fault tolerance Disk failure fault tolerance

18 Mapping Between Disks and Servers Direct-attached disks Network-attached disks –Fiber-channel attached disks –iSCSI attached disks Managing the network-attached disks: “volume manager”

19 Functionalities of a Volume Manager Group multiple disk partitions into a “logical” disk volume Volume can expand or shrink in size without affecting existing data Volume can be RAID-0/1/5, tolerating disk failures Volume can offer “snapshot” functionalities for easy backup Volumes are “self-evident”

20 Implementations of Volume Manager In-kernel implementation –Example: Linux volume manager, Veritas volume manager, etc. Disk server implementation –Example: EMC storage systems

21 Serverless File Systems Serverless file systems in WAN –Motivation: peer-to-peer storage; never lose the file Serverless file system in LAN –Motivation: client powerful enough to be like servers; use all client’s memory to cache file data


Download ppt "Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc."

Similar presentations


Ads by Google