Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed.

Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed parity: All disks have same number of check units mapped to them, –Balance the accesses to check units during writes or when a failure has occurred Distributed reconstruction: There is a constant K such that, for every pair of disks, K stripes have units mapped to both disks –Ensure that accesses to surviving disks during on-line reconstruction are spread evenly Large write optimization: each stripe contains a contiguous interval of client’s data, to process a write of all (k-1) data units without pre-reading the prior contents of any disk Maximal parallelism: Whenever a client requests a read of n contiguous data units, all n disks are accessed in parallel Efficient mapping: The functions that map client addresses to array locations are efficiently computable, with low time and space requirements

Disk Caching Disk Disk caching disk –Uses inexpensive disk space to cache Raid provides high throughput due to parallelism, does’nt reduce access latency Caching is used to reduce latency –DRAM used for reads Non-volatile RAM can be used for writes but buffer sizes are usually small due to expense LFS can be used to optimize write performance

Drawbacks of LFS Limited implementation –Sprite LFS, BSD-LFS, RAID systems (e.g. HP) –Extensive file system rewrite –Sensitivity to disk capacity utilization –Problems with read performance when read and write access patterns significantly differ due to dispersal of locally nearby blocks

I/O Bursty Large difference between average and maximum I/O rates Large I/O peaks can overwhelm DRAM caches Need DRAM cache or large RAID to rapidly process large number of I/O requests in bursty workload

DCD Disk used to cache logged write data Small DRAM buffer captures background write requests When large write burst occurs, data blocks in DRAM buffer are written to cache disk in a single large transfer Cache disk appears as extension of DRAM buffer RAID systems analogous to interleaved memory, DCD system analogous to multilevel CPU cache Use temporal locality of I/O transfers and idea of LFS to minimize seek time and rotational latency In effect, this provides a fast “cache” disk

DCD Architecture DCD can be implemented using a separate cache disk or cache disk can be a logical partition in the data disk (Figure 1) Data organization on disk is conventional file system DCD’s RAM buffer (figure 2) contains –Two data caches –Physical block address to logical block address table –Destaging buffer

Writing Small writes go directly to a data cache in the RAM buffer –Only one data cache is active at a given time Large requests are sent directly to the data disk If active data cache is full, –controller makes other data cache active –Controller writes the entire contents of the previous active data cache into the cache disk, in one large log format –When log write finishes, data cache is again available Data is written to cache disk whenever cache disk is available (data does not wait until data cache is full) Data stored within 10’s to 100’s of milliseconds –Longest time data has to stay in RAM is time required to write a full log

Reading First search RAM buffer and cache disk Simulation results – 99% of read requests go to data disk DCD is downstream from large read cache which captures most requests for newly written data

Cache Data Organization Host sends request to disk system to access disk block –Provides LBA to indicate the block’s position in the data disk –DCD may cache data in cache disk using different address – PBA –Hash table indexed by LBA’s –Entries are also in a linked list, ordered by their PBA values –Free entry list also maintained to keep track of empty blocks that can be used –CLP (Current log position) is used to indicate end of last log –CLP is like a stack pointer – new log is pushed into the cache disk by writing the log to the disk starting from CLP and appending corresponding PBA-LBA mapping entries for the blocks in the log to the end of the mapping list –When log write finishes, cache disk head is located about the track indicated by the CLP –Figure 3

Destaging Reads large data block from cache disk to destaging buffer Reorders blocks according to their LBA numbers to reduce seek overheads Writes blocks to original locations on disks Once data block is written, destaging processes can invalidate entry in PBA-LBA mapping table to indicate that data is not in cache disk anymore This occurs during idle time if disk cache is close to empty; destaging priority increases as disk cache fills

Enhancing DCD with NVRAM cache NVRAM buffer uses LRU cache and staging buffer instead of double data caches Staging buffer holds 64KB to 256KB PBA-LBA mapping table also in NVRAM Destaging buffer can be RAM as non-volatile data copies are in NVRAM Controller does not flush LRU cache’s contents to cache disk –Modified data is kept in LRU cache to capture write request locality When write request occurs and LRU cache is full, DCD controller copies LRU blocks to staging buffer until buffer is full or LRU cache is empty Space in LRU cache can be released as staging buffer is NVRAM Write request can be satisfied by allocating free block in cache and copying data Data written from staging buffer to cache disk in one large log format

Normal DCD v.s. Traditional Disk Figure 5 Immediate report mode Physical DCD –Simulates two physical drives at soame time Logical DC –Two disk partitions on a single drive to simulate two logical drives DCD’s RAM buffer 512KB Baseline system –RAM cache 4MB Workload – HP traces with I/O requests made by different HP-UX systems Workload traces are overlaid

Enhanced DC with NVRAM Compared enhanced DCD with LRC cached baseline system RAM buffer from 256KB to 4MB Figure 7

Serverless Network File System Workstations cooperate as peers to provide all file system services Any machine can assume responsibilities of failed component Fast LAN can be used as an I/O backplane Prototype file system – xFS Uses cooperative caching –Harvest portions of client memory as large global file cache –Uses ideas from non-uniform memory shared memory architectures (e.g. DASH) to provide scalable cache consistency Dynamically distributes data storage across server disks via software RAID using log based striping

Multiprocessor/File System Cache Consistency Provide uniform view of system –Track where blocks are cached –Invalidate stale cached copies –Common multiprocessor strategy is to divide control of system’s memory between processors –Each processor manages cache consistency state for its own physical memory locations In xFS, control can be located anywhere in the system and can be dynamically migrated during execution Uses token-based cache consistency scheme that manages consistency on a per-block basis (rather than per-file for Sprite or AFS)

xFS’s Consistency Scheme Before client modifies block, must acquire write ownership of the block –Client sends message to block’s manager –Manager invalidates other cached copies of block, updates cache consistency information to indicate new owner, replies to client, giving permission to write –Once client owns block, it may write repeatedly without having to ask manager for ownership each time –When another client reads or writes block, manager revokes ownership, forces client to stop writing to block, flushes any changes to stable storage and forwards data to new client Scheme supports cooperative caching –Read requests can be forwarded to clients with valid cached copies of data

How xFS Distributes Key Datastructures Manager Map –Globally replicated manager map used to locate file’s manager from file’s index number –File’s index number bits extracted and used to index into manager map –Map is table that indicates which physical machines manage which groups of index numbers –Manager of a file tracks disk location metadata and cache consistency state Cache consistency state lists clients caching a block or the client that has write ownership Imap –Each file’s index number has entry in imap that points to that file’s disk metadata in the log –Imap is distributed between managers according to the manager map so managers handle imap entries and cache consistency state of the same files –File’s imap entry contains log address of file’s index node Index node contains disk addresses of file’s data blocks For large files, index node can contain log addresses of indirect and double indirect blocks

Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed.

Similar presentations

Presentation on theme: "Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed.

Similar presentations

Presentation on theme: "Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed."— Presentation transcript:

Similar presentations

About project

Feedback