EECS 262a Advanced Topics in Computer Systems Lecture 20 VM Migration/VM Cloning November 13 th, 2013 John Kubiatowicz and Anthony D. Joseph Electrical.

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Live migration of Virtual Machines Nour Stefan, SCPD.
Adding the Easy Button to the Cloud with SnowFlock and MPI Philip Patchin, H. Andrés Lagar-Cavilla, Eyal de Lara, Michael Brudno University of Toronto.
Operating System.
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
Virtual Machines What Why How Powerpoint?. What is a Virtual Machine? A Piece of software that emulates hardware.  Might emulate the I/O devices  Might.
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Xen and the Art of Virtualization A paper from the University of Cambridge, presented by Charlie Schluting For CS533 at Portland State University.
Eyal de Lara Department of Computer Science University of Toronto.
Memory Management 2010.
Eyal de Lara Department of Computer Science University of Toronto.
Lesson 1: Configuring Network Load Balancing
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Virtual Machines. Virtualization Virtualization deals with “extending or replacing an existing interface so as to mimic the behavior of another system”
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Virtualization Technology Prof D M Dhamdhere CSE Department IIT Bombay Moving towards Virtualization… Department of Computer Science and Engineering, IIT.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
RAMCloud Design Review Recovery Ryan Stutsman April 1,
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Yury Kissin Infrastructure Consultant Storage improvements Dynamic Memory Hyper-V Replica VM Mobility New and Improved Networking Capabilities.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
Virtual Machine Course Rofideh Hadighi University of Science and Technology of Mazandaran, 31 Dec 2009.
Remus: VM Replication Jeff Chase Duke University.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
Improving Network I/O Virtualization for Cloud Computing.
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath.
Virtual Machine and its Role in Distributed Systems.
Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology, Fall 2010 Performance.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Live Migration of Virtual Machines
Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel.
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen†,Eric Jul†, Christian Limpach, Ian Pratt, Andrew Warfield.
Nathanael Thompson and John Kelm
Server Virtualization
Computer Systems Week 14: Memory Management Amanda Oddie.
Introduction to virtualization
Operating Systems Security
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
Live Migration of Virtual Machines Authors: Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew.
Virtual Machine Movement and Hyper-V Replica
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
SnowFlock: Virtual Machine Cloning as a First- Class Cloud Primitive Lagar-Cavilla, et al. Reading Group Presentation 15 December 2011 Adriaan Middelkoop.
Virtual Machine Monitors
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Presented by Yoon-Soo Lee
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Kenichi Kourai Kouta Sannomiya Kyushu Institute of Technology, Japan
CS352H: Computer Systems Architecture
Introduction to Operating Systems
John Kubiatowicz Electrical Engineering and Computer Sciences
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Virtual Memory: Working Sets
Virtualization Dr. S. R. Ahmed.
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Presentation transcript:

EECS 262a Advanced Topics in Computer Systems Lecture 20 VM Migration/VM Cloning November 13 th, 2013 John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences University of California, Berkeley

11/13/20132cs262a-S13 Lecture-20 Today’s Papers Live Migration of Virtual Machines C. Clark, K. Fraser, S. Hand, J. Hansen, E. Jul, C. Limpach, I. Pratt, A. Warfield. Appears in Proceedings of the 2nd Symposium on Networked Systems Design and Implementation (NSDI), 2005Live Migration of Virtual Machines SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing H. Andrés Lagar-Cavilla, Joseph A. Whitney, Adin Scannell, Philip Patchin, Stephen M. Rumble, Eyal de Lara, Michael Brudno,and M. Satyanarayana. Appears in Proceedings of the European Professional Society on Computer Systems Conference (EuroSys), 2009SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing Today: explore value of leveraging the VMM interface for new properties (migration and cloning), many others as well including debugging and reliability Thoughts?

11/13/20133cs262a-S13 Lecture-20 Why Migration is Useful Load balancing for long-lived jobs (why not short lived?) Ease of management: controlled maintenance windows Fault tolerance: move job away from flaky (but not yet broken hardware) Energy efficiency: rearrange loads to reduce A/C needs Data center is the right target

11/13/20134cs262a-S13 Lecture-20 Benefits of Migrating Virtual Machines Instead of Processes Avoids `residual dependencies’ Can transfer in-memory state information Allows separation of concern between users and operator of a datacenter or cluster

11/13/20135cs262a-S13 Lecture-20 Two Basic Strategies Local names: move the state physically to the new machine –Local memory, CPU registers, local disk (if used – typically not in data centers) –Not really possible for some physical devices, e.g. tape drive Global names: can just use the same name in the new location –Network attached storage provides global names for persistent state –Network address translation or layer 2 names allows movement of IP addresses

11/13/20136cs262a-S13 Lecture-20 Background – Process-based Migration Typically move the process and leave some support for it back on the original machine –E.g., old host handles local disk access, forwards network traffic –these are “residual dependencies” – old host must remain up and in use Hard to move exactly the right data for a process – which bits of the OS must move? –E.g., hard to move TCP state of an active connection for a process

11/13/20137cs262a-S13 Lecture-20 VMM Migration Move the whole OS as a unit – don’t need to understand the OS or its state Can move apps for which you have no source code (and are not trusted by the owner) Can avoid residual dependencies in data center thanks to global names Non-live VMM migration is also useful: –Migrate your work environment home and back: put the suspended VMM on a USB key or send it over the network –Collective project, “Internet suspend and resume”

11/13/20138cs262a-S13 Lecture-20 Goals / Challenges Minimize downtime (maximize availability) Keep the total migration time manageable Avoid disrupting active services by limiting impact of migration on both migratee and local network

11/13/20139cs262a-S13 Lecture-20 VM Memory Migration Options Push phase Stop-and-copy phase Pull phase –Not in Xen VM migration paper, but in SnowFlock

11/13/201310cs262a-S13 Lecture-20 Implementation Pre-copy migration –Bounded iterative push phase »Rounds »Writable Working Set –Short stop-and-copy phase Be careful to avoid service degradation

11/13/201311cs262a-S13 Lecture-20 Live Migration Approach (I) Allocate resources at the destination (to ensure it can receive the domain) Iteratively copy memory pages to the destination host –Service continues to run at this time on the source host –Any page that gets written will have to be moved again –Iterate until a) only small amount remains, or b) not making much forward progress –Can increase bandwidth used for later iterations to reduce the time during which pages are dirtied Stop and copy the remaining (dirty) state –Service is down during this interval –At end of the copy, the source and destination domains are identical and either one could be restarted –Once copy is acknowledged, the migration is committed in the transactional

11/13/201312cs262a-S13 Lecture-20 Live Migration Approach (II) Update IP address to MAC address translation using “gratuitous ARP” packet –Service packets starting coming to the new host –May lose some packets, but this could have happened anyway and TCP will recover Restart service on the new host Delete domain from the source host (no residual dependencies)

11/13/201313cs262a-S13 Lecture-20 Tracking the Writable Working Set Xen inserts shadow pages under the guest OS, populated using guest OS's page tables The shadow pages are marked read-only If OS tries to write to a page, the resulting page fault is trapped by Xen Xen checks the OS's original page table and forwards the appropriate write permission If the page is not read-only in the OS's PTE, Xen marks the page as dirty

11/13/201314cs262a-S13 Lecture-20 Writable Working Set

11/13/201315cs262a-S13 Lecture-20 OLTP Database Compare with stop-and-copy: –32 seconds (128Mbit/sec) or 16seconds (256Mbit/sec)

11/13/201316cs262a-S13 Lecture-20 SPECweb Compare with stop-and-copy: –32 seconds (128Mbit/sec) or 16seconds (256Mbit/sec)

11/13/201317cs262a-S13 Lecture-20 Design Overview

11/13/201318cs262a-S13 Lecture-20 Handling Local Resources Open network connections –Migrating VM can keep IP and MAC address. –Broadcasts ARP new routing information »Some routers might ignore to prevent spoofing »A guest OS aware of migration can avoid this problem Local storage –Network Attached Storage

11/13/201319cs262a-S13 Lecture-20 Types of Live Migration Managed migration: move the OS without its participation Managed migration with some paravirtualization –Stun rogue processes that dirty memory too quickly –Move unused pages out of the domain so they don’t need to be copied Self migration: OS participates in the migration (paravirtualization) –Harder to get a consistent OS snapshot since the OS is running!

11/13/201320cs262a-S13 Lecture-20 Complex Web Workload: SPECweb99

11/13/201321cs262a-S13 Lecture-20 Low-Latency Server: Quake 3

11/13/201322cs262a-S13 Lecture-20 Summary Excellent results on all three goals: –Minimize downtime/max availability, manageable total migration time, avoid active service disruption Downtimes are very short (60ms for Quake 3 !) Impact on service and network are limited and reasonable Total migration time is minutes Once migration is complete, source domain is completely free

11/13/201323cs262a-S13 Lecture-20 Is this a good paper? What were the authors’ goals? What about the evaluation/metrics? Did they convince you that this was a good system/approach? Were there any red-flags? What mistakes did they make? Does the system/approach meet the “Test of Time” challenge? How would you review this paper today?

11/13/201324cs262a-S13 Lecture-20 BREAK

11/13/201325cs262a-S13 Lecture-20 Virtualization in the Cloud True “Utility Computing” –Illusion of infinite machines –Many, many users –Many, many applications –Virtualization is key Need to scale bursty, dynamic applications –Graphics render –DNA search –Quant finance –…–…

11/13/201326cs262a-S13 Lecture-20 Application Scaling Challenges Awkward programming model: “Boot and Push” –Not stateful: application state transmitted explicitly Slow response times due to big VM swap-in –Not swift: Predict load, pre-allocate, keep idle, consolidate, migrate –Choices for full VM swap-in: boot from scratch, live migrate, suspend/resume Stateful and Swift equivalent for process? –Fork!

11/13/201327cs262a-S13 Lecture-20 SnowFlock: VM Fork Stateful swift cloning of VMs State inherited up to the point of cloning Local modifications are not shared Clones make up an impromptu cluster VM 0 Host 0 VM 1 Host 1 VM 2 Host 2 VM 3 Host 3 VM 4 Host 4 Virtual Network

11/13/201328cs262a-S13 Lecture-20 Fork has Well Understood Semantics if more load: fork extra workers if load is low: dealloc excess workers trusted code fork if child: untrusted code partition data fork N workers if child: work on i th slice of data if cycles available: fork worker if child: do fraction of long computation Parallel Computation Load-balancing Server Opportunistic Computation Sandboxing

11/13/201329cs262a-S13 Lecture-20 VM Fork Challenge – Same as Migration! Transmitting big VM State –VMs are big: OS, disk, processes, … –Big means slow –Big means not scalable Same fundamental bottleneck issues as VM Migration – shared I/O resources: host and network Suspend/resume latency Number of VMs Seconds

11/13/201330cs262a-S13 Lecture-20 SnowFlock Insights VMs are BIG: Don’t send all the state! Clones need little state of the parent Clones exhibit common locality patterns Clones generate lots of private state

11/13/201331cs262a-S13 Lecture-20 SnowFlock Secret Sauce Metadata “Special” Pages Page tables GDT, vcpu ~1MB for 1GB VM Virtual Machine VM Descriptor Multicast ? ? State: Disk, OS, Processes 1. Start only with the basics2. Fetch state on-demand3. Multicast: exploit net hw parallelism4. Multicast: exploit locality to prefetch Clone 1 Private State Clone 2 Private State 5. Heuristics: don’t fetch if I’ll overwrite

11/13/201332cs262a-S13 Lecture-20 Why SnowFlock is Fast Start only with the basics Send only what you really need Leverage IP Multicast –Network hardware parallelism –Shared prefetching: exploit locality patterns Heuristics –Don’t send if it will be overwritten –Malloc: exploit clones generating new state

11/13/201333cs262a-S13 Lecture-20 Clone Time Clones Milliseconds Scalable Cloning: Roughly Constant Clone 32 VMs in 800 ms

11/13/201334cs262a-S13 Lecture-20 Page Fetching, SHRiMP 32 Clones 1GB Heuristics OFF Heuristics ON 10K 40MB sent instead of 32GB

11/13/201335cs262a-S13 Lecture-20 Application Evaluation Embarrassingly parallel –32 hosts x 4 processors CPU-intensive Internet server –Respond in seconds Bioinformatics Quantitative Finance Rendering

11/13/201336cs262a-S13 Lecture-20 Application Run Times ≤ 7% Runtime Overhead ~ 5 seconds

11/13/201337cs262a-S13 Lecture-20 Throwing Everything At It Four concurrent sets of VMs –BLAST, SHRiMP, QuantLib, Aqsis Cycling five times –Clone, do task, join Shorter tasks –Range of seconds: interactive service Evil allocation

11/13/201338cs262a-S13 Lecture-20 Throwing Everything At It Fork. Process 128 x 100% CPU. Disappear. 30 Seconds

11/13/201339cs262a-S13 Lecture-20 Summary: SnowFlock In One Slide VM fork: natural intuitive semantics The cloud bottleneck is the IO –Clones need little parent state –Generate their own state –Exhibit common locality patterns No more over-provisioning (pre-alloc, idle VMs, migration, …) –Sub-second cloning time –Negligible runtime overhead Scalable: experiments with 128 processors

11/13/201340cs262a-S13 Lecture-20 Is this a good paper? What were the authors’ goals? What about the evaluation/metrics? Did they convince you that this was a good system/approach? Were there any red-flags? What mistakes did they make? Does the system/approach meet the “Test of Time” challenge? How would you review this paper today?