Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hosted by The Pros & Cons of Content Addressed Storage Arun Taneja Founder & Consulting Analyst.

Similar presentations


Presentation on theme: "Hosted by The Pros & Cons of Content Addressed Storage Arun Taneja Founder & Consulting Analyst."— Presentation transcript:

1 Hosted by The Pros & Cons of Content Addressed Storage Arun Taneja Founder & Consulting Analyst

2 Hosted by Current Data Protection Environment Data Tsunami No Backup Windows Cost of Downtime Increasing Regulations and Compliance Requirements Data Protection Technology at Break Point Something Must Be Done!

3 Hosted by iSCSI CAS RAIN GRID NDMP iFCP FCIP DAFS RDMA TOE SMI-S SATA SAS Many New Technologies to The Rescue

4 Hosted by What is CAS? Definition Concept whereby the address of an object is computed from the content of that object Advantages Location Independence Authenticity Simplified Indexing Scalability to Exabytes Load Balancing Elimination of DuplicationDisadvantages New and Unfamiliar May Require Changes to Applications May Require Procedural Changes May Require Abandoning Existing Applications

5 Hosted by CAS vs Networked Storage SAN & NAS Use File Systems to Place and Locate Data (/abc/xyz/acme.doc) Hierarchical Difficult to Scale Beyond TBs Application Determines if Duplication of Object Exists Indexing can Become Complicated

6 Hosted by 128-bit hash unique to that object (eg. MD5) How is CAS Done? Algorithm Applied to the Object’s Content File Portion of a file Directory or file system Unique 128-bit Coding Results (160-bits for Avamar) Object (File, FS, Dir)

7 Hosted by What Can CAS Be Used For? Archival Storage Backup and Restore Disaster Recovery Content Management

8 Hosted by Issues with Existing Architectures Archive/Content Mgmt Lack of Authenticity Media/Technology Changes Tape Environmental Issues Poor Access Times TCO Expensive Slow Queries from Large Reps Centralized Indexing Backup and Restore/DR Application Performance Generates Tons of Data 10:1 Backup Windows No Guarantee if Data is Recoverable DR Expensive DR: Potential Consistency Issues

9 Hosted by Methods for Keeping More Data Online Bigger Primary Storage Compression of Data Hierarchical Storage Architectures Data Normalization: Finding Subsets of Data That are Common and Storing Them Only Once No Limit on the Effective Compression Ratio Indexing Systems Super Critical

10 Hosted by Commonality Factoring Using CAS Fixed Size Atomics for Database Variable Size Atomics for File Systems CAS Algorithms Used to Calculate CA for Each Subset Data Structures Needed to Reconstruct from Atomics Above Data Kept with Atomics Data

11 Hosted by CAS Example: Avamar CAS Applied to BU/Restore, Archive and DR (initial focus BU/R) Focus on Data Reduction Typical Secondary to Primary Ratio is 10:1 Avamar Claims 1.2 to 1 Never Do Full + Incremental Backups, Only SnapUps

12 Hosted by CAS Example: Avamar Systems Architecture Distributed Backup Repository Peer-to-Peer RAIN Architecture Each Node has Uniform and Consistent View of Repository Clients can Request Services from any Node Data Striped Across Nodes (similar to RAID) No Single Point of Failure Requires Agent on Each Client System

13 Hosted by Calculate CA and extract metadata CA of CDF XML CDF C-clip store Centera Blob metadata CDF CA API Application CA of CDF Returned file CAS Archival Example: EMC Centera Source: EMC

14 Hosted by CAS Advantages: EMC Centera Due to CAS No LUNs to Create or Manage No Volumes to Create or Manage Flat Addressing, Simple Indexing Content Authentication One Copy of Blob Stored Due to Architecture RAIN=Non-disruptive Scalability No Reconfigs Required No Technology Obsolescence Policy-based Storage of Blobs Application Modification

15 Hosted by CAS Players Data Center Technologies Persist Technologies

16 Hosted by CAS Futures: What's Needed? Flexible Scaling Capabilities Integration with File Interfaces Easy API-free Application Integration Integrated Indexing

17 Hosted by Summary CAS +’s Location Independence Authenticity Eliminate Redundancy Simplify Indexing Simplify Management Improve Scalability Single System Image of Repository CAS -’s Many Aspects are Untested May Require New Procedures/Tools Disruptive Technology Not Good Enough for High Performance Primary Needs

18 Hosted by Taneja Group Recommendations Absolutely Test Out CAS Systems but… Apply to a Project at a Time (consider the disruptive factor) Keep a Fallback Position (run systems in parallel) Test Out Recoverability Regularly Keep in Mind…More Solutions Coming No Wholesale Changes!


Download ppt "Hosted by The Pros & Cons of Content Addressed Storage Arun Taneja Founder & Consulting Analyst."

Similar presentations


Ads by Google