Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah.

Similar presentations


Presentation on theme: "Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah."— Presentation transcript:

1 Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah

2 this talk utilizing a deduplicating storage system within a fast disk-imaging system 3× decrease in storage negligible run-time overhead “don’t be the bottleneck” Aligned Fixed-size Chunking 2 VF results techniques

3 3 disk image server loaded on demand be fast! deliver data as fast as clients can receive it

4 4 disk image server

5 5 Utah Emulab 1,000+ disk images 21 TB total Amazon EC2 37,000+ public AMIs fast ☛ ☚ compact

6 deduplication 6 dedup. storage system

7 deduplication 7 Image 1 fingerprint 1; fingerprint 2; fingerprint 3; … Image 1 fingerprint 1; fingerprint 2; fingerprint 3; … Image 2 fingerprint 1; fingerprint 2; fingerprint 19; … Image 2 fingerprint 1; fingerprint 2; fingerprint 19; … small “recipe”

8 dedup. for disk images images are often derived from other images – users add packages to testbed “base” images – users’ work-in-progress snapshots – … a lot of duplicated data across images! 8

9 9 disk image server

10 10 disk image server dedup. disk image storage problem: dedup. storage can be slow our contrib: add dedup. without system slowdown

11 why is frisbee fast? compression use filesystem info pipeline independent “chunks” 11 disk image server lower network bandwidth smaller files fewer disk writes disk read net xfer decompress disk write keep receiving disk busy keep pipeline filled new clients can join sequential disk writes

12 from frisbee to VF Frisbee: disk images stored as files VF: disk image data stored in Venti reformed into chunks by Chunkmaker 12 disk image server Venti Chunkmaker [Quinlan & Dorward, FAST ’02] [Rhea et al., ATC ’08]

13 image corpus 430 Linux images from Utah Emulab – 76 “standard” images – 354 user-created images based on RedHat, Fedora, CentOS, & Ubuntu 13 Venti

14 addressing the challenges compression use filesystem info pipeline independent “chunks”

15 compression 15 Venti image server capture partition store retrieve

16 compression 16 Venti image server capture partition store retrieve compressed disk image compress poor deduplication (1.11×)

17 compression 17 Venti image server capture partition store retrieve

18 compression 18 Venti image server capture partition store retrieve compress disk data compress too slow compress30.29 MB/s disk write71.07 MB/s

19 compression 19 Venti image server capture partition store retrieve

20 compression 20 Venti image server capture partition retrieve store compressed dedup blocks compress preserves opportunities for dedup server retrieves & concatenates compressed blocks to form chunks 6% more chunks vs. original Frisbee

21 addressing the challenges compression use filesystem info pipeline independent “chunks”

22 use filesystem info exclude unallocated sectors from image promote sequential disk writes process the “stream” of allocated sectors 22

23 23 Venti 12345678 12345678 1234 5678 sector stream make dedup blocks via “fixed-size chunking”

24 24 Venti sector allocations & frees move the dedup block boundaries fixed-sized chunking over sector stream leads to poor deduplication across disk images 12345678 1234 5678 abc345678 abc345678 abc345678

25 aligned fixed-size chunking 25 123456789 abc3456789 deduplicate! Venti 12zz z345 6789zzab czzz block boundaries based on sector offsets “pad” partially filled blocks with zero sectors

26 how big should dedup blocks be? better dedup – more likely to match slower – more accesses to Venti lower compression ratio – less data per block more metadata per image lower dedup – less likely to match faster – fewer accesses to Venti higher compression ratio – more data per block less metadata per image 26 big — say, 48K small — say, 4K

27 addressing the challenges compression use filesystem info pipeline independent “chunks”

28 pipeline speed through parallelism choose maximum storage benefit that doesn’t slow down the pipeline 28 disk image server Venti read net xfer decompress disk write Venti i.e., the smallest dedup block size

29 29 ✖✖✖✔

30 image corpus @ 32K (compressed) image data:239.89 GB (compressed) data in Venti:073.62 GB deduplication ratio: 3.26 image metadata:001.49 GB total space savings versus Frisbee: 67.8% 30

31 addressing the challenges compression use filesystem info pipeline independent “chunks”

32 independent chunks chunk — Frisbee’s network protocol unit – contains multiple groups of sectors – client requests chunks until it has them all 32 disk image server Venti Chunkmaker Metadata chunk headers; fingerprints Metadata chunk headers; fingerprints

33 independent chunks client requests chunk find precomputed chunk metadata – chunk header – dedup block fingerprints retrieve dedup blocks from Venti concatenate blocks with header and transmit to client cache constructed chunk 33 disk image server Venti Chunkmaker Metadata chunk headers; fingerprints Metadata chunk headers; fingerprints

34 evaluation storage savings synchronized deployment staggered deployment

35 storage savings load our image corpus into Venti – 430 Linux images – load from oldest to newest track storage as images are added – compressed, dedup’ed data in Venti – storage required by “baseline Frisbee” 35

36 36 3× 233 GB 75 GB

37 disk image deployment setup 1 Gbps switched LAN single server – running “baseline Frisbee” or VF – configured to distribute data at 500 Mbps up to 20 client machines – Dell PowerEdge R710s (see paper for specs) 37

38 synchronized deployment deploy single disk image to – 1 client – 8 clients that start at the same time – 16 clients that start at the same time measure time to deploy over 10 trials (image: 1.4 GB uncompressed data) 38

39 39 2% increase in run time

40 staggered deployment deploy single disk image to – 20 clients – organized into 5 groups – groups start at 5-second intervals measure time to deploy over 10 trials 40

41 41 3% increase in run time

42 conclusions VF combines deduplicating storage with a high-performance disk distribution system 3× reduction in required storage 2–3% run-time overhead “don’t be the bottleneck”: careful design – obtain dedup benefit: AFC – preserve existing optimizations 42


Download ppt "Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah."

Similar presentations


Ads by Google