© 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
Symantec 2010 Windows 7 Migration Global Results.
1 ZonicBook/618EZ-Analyst Resonance Testing & Data Recording.
Process Description and Control
AP STUDY SESSION 2.
1
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
Fixture Measurements Doug Rytting.
UNITED NATIONS Shipment Details Report – January 2006.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
David Burdett May 11, 2004 Package Binding for WS CDL.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
CALENDAR.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
Chapter 5 Input/Output 5.1 Principles of I/O hardware
1 Chapter 11 I/O Management and Disk Scheduling Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Break Time Remaining 10:00.
Copyright © 2009 EMC Corporation. Do not Copy - All Rights Reserved.
I/O Management and Disk Scheduling
4.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 4: Organizing a Disk for Data.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
Database Performance Tuning and Query Optimization
PP Test Review Sections 6-1 to 6-6
1 The Blue Café by Chris Rea My world is miles of endless roads.
Bright Futures Guidelines Priorities and Screening Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Chapter 10: Virtual Memory
Memory Management.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Operating Systems Operating Systems - Winter 2011 Dr. Melanie Rieback Design and Implementation.
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.5 Dividing Polynomials Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1.
Sample Service Screenshots Enterprise Cloud Service 11.3.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Subtraction: Adding UP
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
Essential Cell Biology
Converting a Fraction to %
Clock will move after 1 minute
PSSA Preparation.
Essential Cell Biology
The DDS Benchmarking Environment James Edmondson Vanderbilt University Nashville, TN.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Presentation transcript:

© 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware, Inc. Tal Garfinkel, VMware, Inc. Min Cai, VMware, Inc.

2 Agenda  What is live migration?  Migration architectures  Lessons

3 What is live migration (vMotion)?  Moves a VM between two physical hosts  No noticeable interruption to the VM (ideally)  Use cases: Hardware/software upgrades Distributed resource management Distributed power management

4 Virtual Machine Live Migration Virtual Machine  Disk is placed on a shared volume (100GBs-1TBs)  CPU and Device State is copied (MBs)  Memory is copied (GBs) Large and it changes often → Iteratively copy SourceDestination

5 Live Storage Migration  What is live storage migration? Migration of a VM’s virtual disks  Why does this matter? VMs can be very large Array maintenance means you may migrate all VMs in an array Migration time in hours

6 Live Migration and Storage Live Migration – a short history  ESX 2.0 (2003) – Live migration (vMotion) Virtual disks must live on shared volumes  ESX 3.0 (2006) – Live storage migration lite (Upgrade vMotion) Enabled upgrade of VMFS by migrating the disks  ESX 3.5 (2007) – Live storage migration (Storage vMotion) Storage array upgrade and repair Manual storage load balancing Snapshot based  ESX 4.0 (2009) – Dirty block tracking (DBT)  ESX 5.0 (2011) – IO Mirroring

7 Agenda  What is live migration?  Migration architectures  Lessons

8 Goals  Migration Time Minimize total end-to-end migration time Predictability of migration time  Guest Penalty Minimize performance loss Minimize downtime  Atomicity Avoid dependence on multiple volumes (for replication fault domains)  Guarantee Convergence Ideally we want migrations to always complete successfully

9 Convergence  Migration time vs. downtime  More migration time → more guest performance impact  More downtime → more service unavailability  Factors that effect convergence: Block dirty rate Available storage network bandwidth Workload interactions  Challenges: Many workloads interacting on storage array Unpredictable behavior Migration TimeDowntime Data RemainingTime

10 Migration Architectures  Snapshotting  Dirty Block Tracking (DBT) Heat Optimization  IO Mirroring

11 Dest Volume Src Volume ESX Host Snapshot Architecture – ESX 3.5 U1 VMDK

12 Synthetic Workload  Synthetic Iometer workload (OLTP): 30% Write/70% Read 100% Random 8KB IOs Outstanding IOs (OIOs) from 2 to 32  Migration Setup: Migrated both the 6 GB System Disk and 32 GB Data Disk  Hardware: Dell PowerEdge R710: Dual Xeon X GHz Two EMC CX4-120 arrays connected via 8Gb Fibre Channel

13 Migration Time vs. Varying OIO

14 Downtime vs. Varying OIO

15 Total Penalty vs. Varying OIO

16 Snapshot Architecture  Benefits Simple implementation Built on existing and well tested infrastructure  Challenges Suffers from snapshot performance issues Disk space: Up to 3x the VM size Not an atomic switch from source to destination A problem when spanning replication fault domains Downtime Long migration times

17 Snapshot versus Dirty Block Tracking Intuition  Virtual disk level snapshots have overhead to maintain metadata  Requires lots of disk space  We want to operate more like live migration Iterative copy Block level copy rather than disk level – enables optimizations  We need a mechanism to track writes

18 Dest Volume Src Volume VMDK Dirty Block Tracking (DBT) Architecture – ESX 4.0/4.1 ESX Host

19 Data Mover (DM)  Kernel Service Provides disk copy operations Avoids memory copy (DMAs only)  Operation (default configuration) 16 Outstanding IOs 256 KB IOs

20 Migration Time vs. Varying OIO

21 Downtime vs. Varying OIO

22 Total Penalty vs. Varying OIO

23 Dirty Block Tracking Architecture  Benefits Well understood architecture based similar to live VM migration Eliminated performance issues associated with snapshots Enables block level optimizations Atomicity  Challenges Migrations may not converge (and will not succeed with reasonable downtime) Destination slower than source Insufficient copy bandwidth Convergence logic difficult to tune Downtime

24 Block Write Frequency – Exchange Workload

25 Heat Optimization – Introduction  Defer copying data that is frequently written  Detects frequently written blocks File system metadata Circular logs Application specific data  No significant benefit for: Copy on write file systems (e.g. ZFS, HAMMER, WAFL) Workloads with limited locality (e.g. OLTP)

26 Heat Optimization – Design Disk LBAs

27 DBT versus IO Mirroring Intuition  Live migration intuition – intercepting all memory writes is expensive Trapping interferes with data fast path DBT traps only first write to a page Writes a batched to aggregate subsequent writes without trap  Intercepting all storage writes is cheap IO stack processes all IOs already  IO Mirroring Synchronously mirror all writes Single pass copy of the bulk of the disk

28 Dest VolumeSrc Volume VMDK IO Mirroring – ESX 5.0 ESX Host VMDK

29 Migration Time vs. Varying OIO

30 Downtime vs. Varying OIO

31 Total Penalty vs. Varying OIO

32 IO Mirroring  Benefits Minimal migration time Near-zero downtime Atomicity  Challenges Complex code to guarantee atomicity of the migration Odd guest interactions require code for verification and debugging

33 Throttling the source

34 IO Mirroring to Slow Destination Start End

35 Agenda  What is live migration?  Migration architectures  Lessons

36 Recap  In the beginning live migration  Snapshot: Usually has the worst downtime/penalty Whole disk level abstraction Snapshot overheads due to metadata No atomicity  DBT: Manageable downtime (except when OIO > 16) Enabled block level optimizations Difficult to make convergence decisions No natural throttling

37 Recap – Cont.  Insight: storage is not memory Interposing on all writes is practical and performant  IO Mirroring: Near-zero downtime Best migration time consistency Minimal performance penalty No convergence logic necessary Natural throttling

38 Future Work  Leverage workload analysis to reduce mirroring overhead Defer mirroring regions with potential sequential write IO patterns Defer hot blocks Read ahead for lazy mirroring  Apply mirroring to WAN migrations New optimizations and hybrid architecture

39 Thank You!

40 Backup Slides

41 Exchange Migration with Heat

42 Exchange Workload  Exchange 2010: Workload generated by Exchange Load Generator 2000 User mailboxes Migrated only the 350 GB mailbox disk  Hardware: Dell PowerEdge R910: 8-core Nehalem-EX EMC CX3-40 Migrated between two 6 disk RAID-0 volumes

43 Exchange Results TypeMigration TimeDowntime DBT Incremental DBT IO Mirroring DBT (2x)Failed- Incremental DBT (2x)Failed- IO Mirroring (2x)

44 IO Mirroring Lock Behavior  Moving the lock region 1. Wait for non-mirrored inflight read IOs to complete. (queue all IOs) 2. Move the lock range 3. Release queued IOs II Locked region: IOs deferred until lock release III Unlocked region: Write IOs to source only I Mirrored region: Write IOs mirrored

45 Non-trivial Guest Interactions  Guest IO crossing disk locked regions  Guest buffer cache changing  Overlapped IOs

46 Lock Latency and Data Mover Time Time (s) No Contention Lock Contention IO Mirroring Lock 2 nd Disk

47 Source/Destination Valid Inconsistencies  Normal Guest Buffer Cache Behavior  This inconsistency is okay! Source and destination are both valid crash consistent views of the disk Time Guest OS Issues IO Source IO Issued Destination IO Issued Guest OS Modifies Buffer Cache

48 Source/Destination Valid Inconsistencies  Overlapping IOs (Synthetic workloads only)  Seen in Iometer and other synthetic benchmarks  File systems do not generate this Virtual Disk Source DiskDestination Disk IO 1 IO 2 IO 1 IO 2 IO 1 IO 2 Issue Order LBA IO Reordering

49 Incremental DBT Optimization – ESX 4.1 Write to blocksDirty block ??? ? ? Disk Blocks CopyIgnoreCopyIgnore