Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed File Systems Sun Network File Systems Andrew Fıle System CODA File System Plan 9 xFS SFS Hadoop.

Similar presentations


Presentation on theme: "Distributed File Systems Sun Network File Systems Andrew Fıle System CODA File System Plan 9 xFS SFS Hadoop."— Presentation transcript:

1 Distributed File Systems Sun Network File Systems Andrew Fıle System CODA File System Plan 9 xFS SFS Hadoop

2 Plan 9: Resources Unified to Files General organization of Plan 9: few servers, many clients. Client mounts server name space into its own.

3 Communication Files associated with a single TCP connection in Plan 9. FileDescription ctlUsed to write protocol-specific control commands dataUsed to read and write data listenUsed to accept incoming connection setup requests localProvides information on the caller's side of the connection remoteProvides information on the other side of the connection statusProvides diagnostic information on the current status of the connection

4 Processes The Plan 9 file server: data is stored in the Write Once Read Many storage, but cached on smaler magnetic disks

5 Naming A union directory in Plan 9: Different file systems can be mounted on the same mount point

6 Overview of Serverless File System- xFS. A typical distribution of xFS processes across multiple machines. The current design is not based on RPC, it is based on so called active messages implemented by message handlers. Each file has an associated manager, which found by using file identifier

7 Principle of log based striping The principle of log-based striping in xFS.

8 Reading a block of data in xFS Lookup file f in directory, pass identifier to metadata manager,, lookup location of inode, lookup the stripe group to which file is written,,find the server storing the inode, then read data

9 Naming Main data structures used in xFS. Data structureDescription Manager mapMaps file ID to manager ImapMaps file ID to log address of file's inode InodeMaps block number (i.e., offset) to log address of block File identifierReference used to index into manager map File directoryMaps a file name to a file identifier Log addressesTriplet of stripe group, ID, segment ID, and segment offset Stripe group mapMaps stripe group ID to list of storage servers

10 Overview of Secure File System: SFS The organization of SFS: Client and server sides have tree components each. Authentication is separated from the data

11 Naming A self-certifying pathname in SFS includes all the information to authenticate the server. HID=H(LOC, K S + as server’s public key) /sfsLOCHIDPathname /sfs/sfs.vu.sc.nl:ag62hty4wior450hdh63u623i4f0kqere/home/steen/mbox

12 Summary A comparison between NFS, Coda, Plan 9, xFS. N/S indicates that nothing has been specified. IssueNFSCodaPlan 9xFSSFS Design goalsAccess transparencyHigh availabilityUniformityServerless systemScalable security Access modelRemoteUp/DownloadRemoteLog-basedRemote CommunicationRPC SpecialActive msgsRPC Client processThin/FatFatThinFatMedium Server groupsNoYesNoYesNo Mount granularityDirectoryFile system Directory Name spacePer clientGlobalPer processGlobal File ID scopeFile serverGlobalServerGlobalFile system Sharing sem.SessionTransactionalUNIX N/S Cache consist.write-back write-throughwrite-back ReplicationMinimalROWANoneStripingNone Fault toleranceReliable comm. Replication and caching Reliable comm.StripingReliable comm. RecoveryClient-basedReintegrationN/S Checkpoint & write logs N/S Secure channels Existing mechanisms Needham-Schroeder No pathnamesSelf-cert. Access controlMany operationsDirectory operationsUNIX based NFS BASED

13 The Hadoop Distributed File System (HDFS) It has many similarities with existing distributed file systems. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop Core project. The project URL is http://hadoop.apache.org/core/.http://hadoop.apache.org/core/

14 HDFS: Moving Computation is Cheaper than Moving Data A computation requested by an application is much more efficient if it is executed near the data it operates on. This minimizes network congestion and increases the overall throughput of the system. HDFS provides interfaces for applications to move themselves closer to where the data is located.

15 HDFS: NameNode and DataNodes-1 HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. A number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes.

16 HDFS: NameNode and DataNodes-2 Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes, responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.

17 HDFS: Replica Placement The placement of replicas is critical to HDFS reliability and performance. Optimizing replica placement distinguishes HDFS from most other distributed file systems. Rack-aware replica placement policy is the first step, to improve data reliability, availability, and network bandwidth utilization. A block is considered safely replicated when the minimum number of replicas of that data block has checked in with the NameNode.


Download ppt "Distributed File Systems Sun Network File Systems Andrew Fıle System CODA File System Plan 9 xFS SFS Hadoop."

Similar presentations


Ads by Google