Presentation on theme: "The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is."— Presentation transcript:
Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is formed into a single stream –Data striped in approach similar to log structured file system –Parity for each stripe is written in style of RAID disk arrays –Parity used to allow system to continue to function with server failures –Layered on top of Sprite
Per-File Striping Collection of file data that spans servers is called stripe, portion of stripe stored on a single server is stripe fragment Per-file striping –Each file stored in its own set of stripes –Parity computed on per-file basis –Bad for small files If file is striped, each disk has to access little pieces of a file Small writes require 4 accesses as in RAIDs See Figure 3 –Consistency management must be dealt with We are dealing with separate file servers so some stripe fragments may get written successfully but others might not In that case parity will be inconsistent with data Appropriate protocols to protect against partial writes and incorrect parity would be needed (of course, analogous protocols are needed anyway in the log structured file approach).
Log Structured Network Filesystem LFS uses logging approach at the interface between file server and disks Zebra uses logging approach at the interface between client and servers Each Zebra client organizes new file data into append-only log, which it stripes across servers (Figure 4) Client computes parity for log, not for individual files Each client creates its own log, single strip contains data written by single client Issues: –How to share files between client workstations? –How is free space reclaimed?
Architecture Clients (many) –Machines that run application programs –Each client produces separate log with parity Storage servers (many) –Store file data File manager (one) –Manages metadata -- file and directory structure of file system –Metadata can also be stored in a logged manner to improve performance and eliminate major potential point of failure Stripe cleaner (one) –Reclaims unused space on storage servers See figure 5
Storage Server Stores stripe fragments (512KB) –Log structured – fragments must not already exist, except for parity fragments in which case new copy replaces old Appends to existing fragment Retrieves all or part of fragment Deletes fragment (invoked by stripe cleaner) Identify fragments –Most recent fragment written by client (used for recovery)
File Manager Stores all information in file except for data (metadata) –Protection information, block pointers to say where data is stored, directories, symbolic links –Carries out name lookup, maintaining consistency of client file caches –Client requests block pointers from file manager, reads data from storage server File manager implemented using Sprite file server with log-structured file system Zebra file – one file in file manager’s file system; data in file is array of block pointers that say where actual data are stored Spite network file protocols used with little modification – clients open, read, cache Zebra metadata in say way as caching regular Sprite files
Stripe Cleaner New stripes are initially full of data Over time, blocks in stripe become free – overwritten or deleted Zebra does’nt modify stripe, instead it writes new copy of block to new stripe Zebra stripe cleaner runs as user-level process, identifies stripes with large amounts of free space –Reads remaining live blocks –Writes live blocks to new stripe by appending to client’s log
System Operation Contents of log –Disk blocks – raw data from file –Delta Changes to blocks in file Used to communicate changes between clients, file manager, stripe cleaner E.g. client puts delta into log when it writes a file block, file manager reads delta to update metadata for that block Deltas stored in client logs –Deltas created When blocks are added to file, deleted from file or overwritten Created by stripe cleaner when it copies live blocks out of stripes (cleaner deltas) Created by file manager to resolve races between stripe cleaning and file updates (reject deltas)
Contents of Delta File identifier – unique identification for file File version – increments whenever block in file is written or deleted Block number – identifies particular block by position in file Old block pointer – fragment identifier and offset of block’s old storage location. If delta is for a new block, old block pointer has special null value New block pointer – fragment identifier and offset for block’s new storage location. If delta is for a block deletion, new block pointer has special null value
Writing and Reading Files New data placed in client’s file cache Data written to server –Reach threshold age –Cache fills with dirty data –Application issues fsync system call to force data to disk –File manager requests that data be written in order to maintain consistency between file caches When writing to disk –Data put into log, formed into stripe fragments and written to storage server –For each file block written, client puts delta into its log –File manager harvests deltas Reading files -- almost same as non-striped filesystem –Open and close via RPC to file manager –Reading – obtain block pointers from file manager, obtain file data from storage manager
Stripe Cleaning Compute how much live data is in stripe –Deltas used for this – stripe cleaner processes deltas from client logs and keeps running count of utilization in each stripe –Stripe cleaner appends all deltas pertaining to a stripe to a stripe status file –Stripe to be cleaned is chosen using cost benefit analysis is done (Rosenblum91) –To clean stripe Identify live blocks – use stripe status file Copy to new stripe Stripe cleaner copies live blocks to new stripe using kernel call –Read one or more blocks from storage server, append to its client log and write new log contents to storage server
Conflicts between cleaning and file access Application can modify or delete file while stripe cleaner is modifying file –Client could modify block after cleaner reads old copy but before cleaner rewrites the block –New data would be lost in favor of rewritten copy of old data –In LFS, cleaner locked files but this produced “lock convoys” which adversely impacted performance Optimistic approach –No locking, stripe cleaner copies block and issues cleaner delta –If block was updated during cleaning, update delta will be processed by client that made the change –File manager makes sure that final pointer for block reflects the update delta, not the cleaner delta –File manager detects conflicts by comparing the old block pointer in each incoming delta with the block pointer stored in the file manager’s delta, if they are different, block was simultaneously cleaned and update