Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding the Benefits and Costs of Deduplication Mahmoud Abaza, and Joel Gibson School of Computing and Information Systems, Athabasca University.

Similar presentations


Presentation on theme: "Understanding the Benefits and Costs of Deduplication Mahmoud Abaza, and Joel Gibson School of Computing and Information Systems, Athabasca University."— Presentation transcript:

1 Understanding the Benefits and Costs of Deduplication Mahmoud Abaza, and Joel Gibson School of Computing and Information Systems, Athabasca University mahmouda@athabascau.ca

2 Questions to ask… What is deduplication? Why is it important to understand? Do all vendors implement deduplication the same way? How much reduction in physical disk storage can be expected, if any?

3 .. questions to ask What are the advantages, and disadvantages including risk? Is it worth it to my IT budget? Is deduplication strictly a business tool or could it benefit home users?

4 Types of deduplications File-based (example: Micrsoft’s SIS system) Block-based (digital signature for each block) Delta Encoding (storing one file as well as the difference between two files )

5 Deduplication side Client-side Deduplication (deduplication before copying to array server) Target-side Deduplication (deduplication that occurs on a backup set after it has been copied to a storage array )

6 Target-side Deduplication Process In-line processing (while data is being ingested into the storage system) Post processing (The data is first written to disk, and then checked for similar copies. )

7 Inline Processing Advantage : Reduces amount of overall disk IO Disadvantage: Slow ingestion time

8 Post Processing Advantage : multiple hosts and CPUs can be involved to make the process fast. Disadvantage: Requires a large pool of storage, plus large disk IO

9 How much reduction in physical disk storage can be expected, if any? Depends on type of data. Case studies: Data Domain LLC, TiVo was able to achieve “data compression rates of 30 to 1 consistently.” study of SIS found that “for 4 weeks of full backups, achieves 87% of the savings of block-based.”

10 Experimental Results A deduplication algorithm is run against some real-world data on personal workstation. We chose to backup a set of folders that contained mostly software downloads, music, photos, and videos – a real challenge considering these are typically compressed files already.

11 Home Based Deduplication Results Run # 1 - Initial Backup New files added to backup:15 935 Total size of files:98.8 GB Physical disk space used for backup: 85.5 GB Time to process:03:13:47 hh:mm:ss Run # 2 - Second Backup New files added to backup:57 Size of files: 105 MB Physical disk space used for backup: 83.7 MB Time to process: 00:01:49 hh:mm:ss

12 Conclusion: deduplication. It can mean different things to different vendors, but the basic premise is the same – eliminate duplicate data.


Download ppt "Understanding the Benefits and Costs of Deduplication Mahmoud Abaza, and Joel Gibson School of Computing and Information Systems, Athabasca University."

Similar presentations


Ads by Google