1© Copyright 2013 EMC Corporation. All rights reserved. Characterization of Incremental Data Changes for Efficient Data Protection Hyong Shim, Philip Shilane,

Slides:



Advertisements
Similar presentations
1/16/20141 Introduction to InMage Solutions Eric Burgener Senior Vice President, Product Management December 2009.
Advertisements

- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
Tag line, tag line Protection Manager 4.0 Customer Strategic Presentation March 2010.
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
Snapshots in a Flash with ioSnap TM Sriram Subramanian, Swami Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau Copyright © 2014.
1© Copyright 2013 EMC Corporation. All rights reserved. EMC RECOVERPOINT FAMILY Protecting Your Data.
Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.
1© Copyright 2012 EMC Corporation. All rights reserved. Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Philip Shilane, Grant.
1 Disk Based Disaster Recovery & Data Replication Solutions Gavin Cole Storage Consultant SEE.
Information Means The World.. Enhanced Data Recovery Agenda EDR defined Backup to Disk (DDT) Tape Emulation (Tape Virtualization) Point-in-time Copy Replication.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
File System Implementation
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
Module – 11 Local Replication
Module – 12 Remote Replication
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
1© Copyright 2012 EMC Corporation. All rights reserved. WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane,
1© Copyright 2013 EMC Corporation. All rights reserved. EMC AVAMAR FOR NAS ENVIRONMENTS Backup, recovery, and disaster recovery for network-attached storage.
1© Copyright 2012 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel ® Xeon ® Processor Technology Run Microsoft Better with EMC.
1© Copyright 2012 EMC Corporation. All rights reserved. November 2013 Oracle Continuous Availability – Technical Overview.
1© Copyright 2013 EMC Corporation. All rights reserved. Fully Automated Storage Tiering for Virtual Pools (FAST VP) 30 PhDs 100s of customers 4,000 customer.
© 2010 IBM Corporation Kelly Beavers Director, IBM Storage Software Changing the Economics of Storage.
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint One way to protect everything you have…better.
IBM TotalStorage ® IBM logo must not be moved, added to, or altered in any way. © 2007 IBM Corporation Break through with IBM TotalStorage Business Continuity.
Usage Centric Green Metrics for Storage Doron Chen, Ealan Henis, Ronen Kat and Dmitry Sotnikov IBM Haifa Research Lab Most of the metrics defined today.
1 © Copyright 2009 EMC Corporation. All rights reserved. Agenda Storing More Efficiently  Storage Consolidation  Tiered Storage  Storing More Intelligently.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
Copyright © 2009 EMC Corporation. Do not Copy - All Rights Reserved.
DISKS IS421. DISK  A disk consists of Read/write head, and arm  A platter is divided into Tracks and sector  The R/W heads can R/W at the same time.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
© 2006 EMC Corporation. All rights reserved. Business Continuity: Remote Replication Module 4.4.
A Case Study The Next Step Business Continuity Workshop For major Medical Center.
DotHill Systems Data Management Services. Page 2 Agenda Why protect your data?  Causes of data loss  Hardware data protection  DMS data protection.
Module – 4 Intelligent storage system
1 © Copyright 2010 EMC Corporation. All rights reserved.  Consolidation  Create economies of scale through standardization  Reduce IT costs  Deliver.
Copyright © 2014 EMC Corporation. All Rights Reserved. VNX Block Local Replication Principles Upon completion of this module, you should be able to: Explain.
Copyright © 2001 VERITAS Software Corporation. All Rights Reserved. VERITAS, VERITAS SOFTWARE, the VERITAS logo and VERITAS The Intelligent Storage Software.
© 2009 EMC Corporation. All rights reserved. Intelligent Storage Systems Module 1.4.
1© Copyright 2012 EMC Corporation. All rights reserved. EMC Mission Critical Infrastructure For Microsoft SQL Server 2012 Accelerated With VFCache EMC.
1© Copyright 2012 EMC Corporation. All rights reserved. EMC PERFORMANCE OPTIMIZATION FOR MICROSOFT FAST SEARCH SERVER 2010 FOR SHAREPOINT EMC Symmetrix.
Eric Burgener VP, Product Management A New Approach to Storage in Virtual Environments March 2012.
Chapter 8 External Storage. Primary vs. Secondary Storage Primary storage: Main memory (RAM) Secondary Storage: Peripheral devices  Disk drives  Tape.
Lecture 40: Review Session #2 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
1 © Copyright 2010 EMC Corporation. All rights reserved. The Virtualization BenefitThe Physical Challenge Virtualizing Microsoft Applications Aging, Inefficient.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
1© Copyright 2012 EMC Corporation. All rights reserved. EMC VNX5700, EMC FAST Cache, SQL Server AlwaysOn Availability Groups Strategic Solutions Engineering.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Dot Hill NDA Material Snap Pool Size Planning Factors to consider  Number of snapshots planned for the master volume.  Size of the master volume  Amount.
Disk Average Seek Time. Multi-platter Disk platter Disk read/write arm read/write head.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Programmer’s View of Files Logical view of files: –An a array of bytes. –A file pointer marks the current position. Three fundamental operations: –Read.
Maximizing Performance – Why is the disk subsystem crucial to console performance and what’s the best disk configuration. Extending Performance – How.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
VVols with Adaptive Flash and InfoSight Analytics 1 Manchester Virtualisation User Group Rich Fenton (Nimble North Senior Systems Engineer)
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
Computer Architecture Principles Dr. Mike Frank
iSCSI Storage Area Network
Lecture 45 Syed Mansoor Sarwar
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Overview: File system implementation (cont)
Lecture 15 Reading: Bacon 7.6, 7.7
Persistence: hard disk drive
Mass-Storage Systems.
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
IBM Tivoli Storage Manager
The Design and Implementation of a Log-Structured File System
Presentation transcript:

1© Copyright 2013 EMC Corporation. All rights reserved. Characterization of Incremental Data Changes for Efficient Data Protection Hyong Shim, Philip Shilane, & Windsor Hsu Backup Recovery Systems Division EMC Corporation

2© Copyright 2013 EMC Corporation. All rights reserved. Data Protection Environment SAN or LAN WAN Application Servers Primary Storage Data Protection Storage High I/O per sec. Medium Capacity Large Capacity Medium I/O per sec. Virtual Machines

3© Copyright 2013 EMC Corporation. All rights reserved. Contributions  Detailed analysis of data change characteristics from enterprise customers  Design for replication snapshots to lower overheads on primary storage.  Evaluation of overheads on data protection storage  Rules-of-thumb for storage engineers and administrators

4© Copyright 2013 EMC Corporation. All rights reserved. EMC Symmetrix VMAX Traces Trace Set#Volume# Storage Systems Duration hrs Estimated Capacity (GB) 1hr_1Wrt109, [78.3]71 [203] 1hr_1GBWrt16, [6.7]132 [262] 24hr_1GBWrt [1.2]318 [439]  Collected from enterprise customer sites

5© Copyright 2013 EMC Corporation. All rights reserved. Capacity and Write Footprint  Analysis for 1hr_1GBWrit  Not collected: applications using each volume

6© Copyright 2013 EMC Corporation. All rights reserved. I/O Properties Trace Set#Write reqs (1000s) Write size (GB) #Read reqs (1000s) Read size (GB) 1hr_1Wrt72 [510] 2 [31] 167 [1963] 5 [66] 1hr_1GBWrt429 [1270] 11 [80] 796 [4987] 25 [166] 24hr_1GBWrt1803 [4839] 51 [338] 7824 [23875] 242 [763]  X more read I/Os than write I/Os  X more GB read than written  High variability  More analysis in the paper

7© Copyright 2013 EMC Corporation. All rights reserved. Sequential vs. Random Write I/O  We measure how much data are written, on average, after seeking to a non-consecutive sector.  Selected most sequential and most random for analysis Storage Volume wwww Trace Timeline (w = Write I/O, r = Read I/O) rw Sequential Write I/O ( )/ 3 = 3

8© Copyright 2013 EMC Corporation. All rights reserved. rwwwrwrwwwwwwrw … Replication Interval 1 Transfer Period may require snapshot storage and I/O Trace Timeline (w = Write I/O, r = Read I/O) Storage Volume Sectors Replication Interval 2 Block Trace Analysis Methodology Create a snapshot to protect block data

9© Copyright 2013 EMC Corporation. All rights reserved. Replication Snapshot 0 Storage Volume state before transfer takes place 1234 Block: = Modified block to be transferred Trace Timeline (w = Write I/O)  Goal: Create a snapshot technique that is integrated with replication that decreases overheads on primary storage  Change block tracking records modified blocks for next replication interval, possibly with a bit vector.  A snapshot has to maintain block values against overwrites.

10© Copyright 2013 EMC Corporation. All rights reserved. Replication Snapshot  Baseline Snapshot: All writes cause copy-on-write 0 Storage Volume state before transfer takes place 1234 Block: = Modified block to be transferred Snapshot Area Trace Timeline (w = Write I/O) www Baseline Transfer in progress

11© Copyright 2013 EMC Corporation. All rights reserved. Replication Snapshot  Changed Block Replication Snapshot (CB): Only writes to tracked blocks cause copy-on-write Block: Snapshot Area www Baseline Transfer in progress CB

12© Copyright 2013 EMC Corporation. All rights reserved. Replication Snapshot  Changed Block with Early Release Replication Snapshot (CBER): Only writes to tracked blocks cause copy-on-write, and blocks are released once transferred Block: Snapshot Area www Baseline Transfer in progress CB CBER

13© Copyright 2013 EMC Corporation. All rights reserved. Replication Snapshot Block: Snapshot Area www Baseline CB CBER  Baseline Snapshot: All writes cause copy-on-write  Changed Block Replication Snapshot (CB): Only writes to tracked blocks cause copy-on-write  Changed Block with Early Release Replication Snapshot (CBER): Only writes to tracked blocks cause copy-on-write, and blocks are released once transferred = Modified block to be transferred

14© Copyright 2013 EMC Corporation. All rights reserved. Snapshot Storage Overheads Rule-of-thumb: Over-provision primary capacity by 8% for snapshots

15© Copyright 2013 EMC Corporation. All rights reserved. Snapshot I/O Overheads Rule-of-thumb: Over-provision primary I/O by 100% to support copy-on-write related write-amplification

16© Copyright 2013 EMC Corporation. All rights reserved. Snapshot I/O Overheads Rule-of-thumb: Over-provision primary I/O by 100% to support copy-on-write related write-amplification

17© Copyright 2013 EMC Corporation. All rights reserved. Transfer Size to Protection Storage Rule-of-thumb: 40% of written bytes are transferred to protection storage

18© Copyright 2013 EMC Corporation. All rights reserved. IOPS Requirements for Protection Storage Rule-of-thumb: Protection storage must support 20% of the I/O per second capabilities of primary storage

19© Copyright 2013 EMC Corporation. All rights reserved. Related Work  Trace analysis –Numerous publications Most closely related is Patterson [2002]  Snapshots –Common paradigm for storage but rarely integrated with incremental transfer techniques –Storage overheads Azagury [2002] and Shah [2006]  Synchronous Mirroring –Effective when change rates are low and geographic distance is small –We are focused on periodic, asynchronous replication

20© Copyright 2013 EMC Corporation. All rights reserved. Conclusion SAN or LAN WAN Application Servers Primary Storage Data Protection Storage High I/O per sec. Medium Capacity Large Capacity Medium I/O per sec.

21© Copyright 2013 EMC Corporation. All rights reserved. Conclusion  Trace analysis shows diversity of storage characteristics  Snapshot overheads on primary storage can be decreased by improved integration with network transfer  Sequential versus random access patterns affect incremental change patterns on both primary and protection storage

22© Copyright 2013 EMC Corporation. All rights reserved. Rules-of-Thumb  Over-provision primary capacity by 8% for snapshots  Over-provision primary I/O by 100% to support copy-on-write related write-amplification  A write buffer decreases snapshot I/O overheads but has little impact on storage overheads  40% of written bytes are transferred to protection storage  Schedule at least 6 hours between transfers to minimize clean data in transferred blocks  Schedule at least 12 hours between transfers to minimize peak network bandwidth requirements  Protection storage must support 20% of the I/O per second capabilities of primary storage

23© Copyright 2013 EMC Corporation. All rights reserved. Questions?

25© Copyright 2013 EMC Corporation. All rights reserved. Trace Analysis: Replication of Snapshots The amount of data to replicate drops in half with 12 hours between snapshots. 4KB results are compared to Patterson 2002.

26© Copyright 2013 EMC Corporation. All rights reserved. I/O Per Second (IOPS) Request Rate Trace SetAverage Write Rate Peak Write Rate 10ms Average Read Rate Peak Read Rate 10 ms 1hr_1Wrt0.7 [8] 1762 [2602] 2 [25] 1693 [2457] 1hr_1GBWrt15 [37] 4360 [4379] 29 [118] 3603 [4135] 24hr_1GBWrt20 [55] 9004 [8165] 89 [269] 5647 [7012] Peak Values: IOPS are calculated every 10ms period, and the peaks for each volume are averaged. More analysis in the paper

27© Copyright 2013 EMC Corporation. All rights reserved. Snapshot I/O Overheads Rule-of-thumb: Over-provision primary I/O by 100% to support copy-on-write related write-amplification