Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Venti : A new approach to archival storage Sean Quinlan & Sean Dorward Bell Labs, Lucent Technology.

Similar presentations

Presentation on theme: "CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Venti : A new approach to archival storage Sean Quinlan & Sean Dorward Bell Labs, Lucent Technology."— Presentation transcript:

1 CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Venti : A new approach to archival storage Sean Quinlan & Sean Dorward Bell Labs, Lucent Technology FAST 2002 Presented by: Fabián E. Bustamante

2 2 Archival Storage Abundant storage - storing data for long time, even forever Archival system – write-once policy – changes how we see storage Prevalent form of archival storage - tape backup –Central service for a number of clients –Restoring data tedious and error prone (requires sys admin or privileged software) –Infrequent, so problems may go undetected for long –Tradeoff bet/ performance of backup & restore (incremental backups are more efficient to generate, but not self-contained) Snapshots (Plan 8, Andrew FS, …) –A consistent read-only view of FS at some point in past –Maintains file system permissions –Can be accessed with standard tools – ls, cat, cp, grep, diff –Avoids tradeoff between full vs. incremental backup Looks like full backup Implementation resembles incremental backup – share blocks Only a small number of versions are kept

3 3 Venti GOAL: “To provide a write-once archival repository that can be shared by multiple client machines and applications” Block level network storage system –Actually a backend storage for client apps –Block level i/f places few restrictions on data structures/format Blocks addressed by hash of their contents –Uses SHA-1 algorithm –SHA-1 output is 160 bit (20 byte) fingerprint of data block Write once policy –Once written cannot be deleted Multiple writes of same data coalesced –Data sharing – increases effective storage capacity –Makes write operation idempotent

4 4 Venti Multiple clients can share a Venti server –Hash fn gives an universal namespace Inherent integrity checking of data –Fingerprint computation on retrieval Caching is simplified Uses magnetic disks as storage technology –Access time comparable to non-archival data Hashing function – sha1 is more than enough for now Exabyte of storage (10 18 bytes) in 8 Kbytes blocks (~10 14 blocks - n) with sha1 (b = 160 bits) – p < 10 -20 Number of pairs of blocks Probability that a given pair will collide

5 5 Applications - general Venti as a building block to construct storage apps Data divided into blocks App needs fingerprint for retrieval One approach - fingerprints packed into pointer blocks, also written to server Repeated recursively to get single fingerprint - root of tree

6 6 Applications - general Tree cannot be modified, but Unchanged sections of tree reused More complex data structures –Mixing data & fingerprints in a block e.g. structure for storing file system –3 types of blocks Directory – file metadata + root fingerprint Pointer Data

7 7 Example app: Vac Similar to zip & tar – store collection of files & directories as single object Tree of blocks formed for selected files vac archive file - 45 bytes long –20 bytes for root fingerprint –25 bytes fixed header string –any amount of data compressed down to 45 bytes Unvac – to restore file from archive Duplicate copies of file coalesced on server –Multiple users vac same data – only 1 copy stored –vac on changed contents

8 8 Example app: Physical level backup Copy the raw disk blocks to Venti –No need to walk file hierarchy –Higher throughput Duplicate blocks are coalesced Free block – null in pointer tree User sees full backup of device Storage space advs of incremental backup retained Random access possible –Directly mounting a backup file system image (w/ OS support) –Enables lazy (on demand) restore

9 9 Example app - Plan 9 FS Previously –Magnetic disk + write-once optical jukebox –Cache for faster access & accumulated changes bet/ snapshots –Cache smaller than active file system –Cache misses are painful Plan 9 FS on top of Venti –Venti as storage device (instead of jukebox) –Equalizes access to both active & archival storage –Cache can also be smaller

10 10 Implementation Append-only log of data blocks –RAID array Separate index maps fingerprints to log location –Can be regenerated from data log Index Cache FS Block Cache Data Index Venti Server Network Client

11 11 Venti’s data log Log divided into arenas for maintenance Append only section of data blocks (variable size) For integrity checking & data recovery Data compressed (LZ77) – encoding – if, algo. esize size after compression For fast index rebuild & error recovery Data log and directory grow in opposite directions When arena is filled, mark as sealed, fingerprint compute

12 12 Venti’s index No. of fingerprints >> no. of blocks on a server Index as disk-resident hash table Hashing function maps fingerprints to index buckets Going through index is main performance penalty –Caching (block and index) –Index striped across multiple disks – fingerprint location in index is random & index is too big to fit in memory –Write buffering (to increase number of concurrent accesses) Block cache - hit - index lookup & data log bypassed Index cache - hit - index lookup bypassed

13 13 Performance: computing environments Testbed –Dedicated dual 550Mhz Pentium 3, 2GB mem. –Access through a 100Mbps Ethernet –Data log stored on a 500GB MaxTronic IDE Raid 5 –Index stored on 8 Seagate Cheetah 18XL 9GB SCSI Micro-benchmark results Sequential reads Random reads Virgin writesDuplicates writes Uncached0. Index cache4.20.7-6.2 Block cache6.8--6.5 Raw raid14.81.012.4

14 14 Storage usage – file servers used 2 plan 9 file servers, bootes and emelie –Spanning 1990 to 2001 522 user accounts, 50-100 active at a time Numerous development projects hosted Several large data sets –Astronomical data, satellite imagery, multimedia files Next figures –Size of active file system –Space consumed on jukebox –Space when using Venti –Ratio – using Venti reduces the cost of retaining snapshots (so ratio is smaller)

15 15 Storage usage

16 16 Storage saving When stored on Venti, size of jukebox data reduced by 3 factors –Elimination of duplicate blocks –Elimination of block fragmentation –Compression of block contents % reduction of storageBootesEmelie Elimination of duplicates27.8%31.3% Elimination of fragments10.2%25.4% Data compression33.8%54.1% Total reduction59.7%76.5%

17 17 Reliability & recovery Tools for integrity checking & error recovery –Verifying structure of arena –Checking there is an index entry for every block in data log, vice versa –Rebuilding index from data log –Copying arena to removable media Data log on RAID 5 disk array –Protection against single drive failures Off-site mirrors for extra protection, if that’s not enough … Stored sealed arenas into removable media

18 18 Future Work Load balancing –Distribute Venti across multiple machines –Replicate server –Use of proxy server to hide it from client Better access control –Currently just authentication to server –Single root fingerprint gives access to entire file tree Use of variable sized blocks as in LBFS

19 19 Conclusion Addressing block by SHA-1 hash of contents – good fit for archival storage Magnetic disks as storage technology –Large capacity at low price – online storage –Random access –Performance comparable to non-archival data Write once model –Reduces accidental or malicious data loss –Simplifies administration –Simplifies caching –Allows sharing of data

Download ppt "CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Venti : A new approach to archival storage Sean Quinlan & Sean Dorward Bell Labs, Lucent Technology."

Similar presentations

Ads by Google