Download presentation
Presentation is loading. Please wait.
Published byBrody Gowland Modified over 9 years ago
1
1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)
2
2 Short Bio Studied CS at DIKU in Copenhagen Worked for Io Interactive on the first two Hitman games Master’s thesis 2002 on “Nomadic Operating Systems” Ph.D. thesis 2007 “Virtual Machine Mobility with Self- Migration” –Early involvement in the Xen VMM project in Cambridge –Worked on “Tahoma” secure browser at the University of Washington –Interned at Microsoft Research Cambridge (2004) and Silicon Valley (2006) (security related projects) Presently working at VMware on top-secret cool stuff
3
3 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)
4
4 Talk Overview Motivation & Background Virtual Machine Migration –Live Migration in NomadBIOS –Self-Migration in Xen Laundromat Computing Virtual Machines on the desktop Related & future work + conclusion
5
5 Motivation & Background
6
6 Motivation Researchers and businesses need computing power on-demand –Science increasingly relies on simulation –Web2.0 startups grow quickly (and die just as fast) Hardware is cheap, manpower and electricity are not –Idle machines are expensive –Immobile jobs reduce utilization –Fear of untrusted users stealing secrets or access We need a dedicated Grid/Utility computing platform: –Simple configuration & instant provisioning –Strong isolation of untrusted users –Backwards compatible with legacy apps (C, Fortran, …) –Location independence & Automated load-balancing –Pay-for-access without the lawyer part
7
7 Our Proposal Use virtual machines as containers for untrusted code Use live VM migration to make execution transient and location-indepedent Use micro-payments for pay-as-you-go computing
8
8 Her application roams freely, looking for the cheapest and fastest resources She finds a Utility to host her application Grid & Utility Computing Vision Jill creates a web site for sending greeting cards She pays for access
9
9 Can’t We Do This With UNIX? Configuration complexity & lack of isolation –Hard to agree on a common software install (BSD, Redhat, Ubuntu?) –Name-space conflicts, e.g., files in /tmp –UNIX is designed for sharing, not security Mismatch of abstractions –Process ≠ Application –User login ≠ Customer –Quota ≠ Payment Location-dependence –No bullet-proof way of moving running application to a new host –Process migration in UNIX just doesn’t work
10
10 Use Virtual Machines Instead of Processes “A virtual machine is […] an efficient, isolated duplicate of the real machine” [Popek & Goldberg, 1974] ”A virtual machine cannot be compromised by the operation of any other VM. It provides a private, secure, and reliable computing environment for its users, …” [Creasy, 1981] VM VMM Hardware
11
11 Pros and Cons of VMs Pros: –Strongly isolated –Name-space is not shared –More configuration freedom –Simple interface to hardware –VMs can migrate between hosts Cons: –Memory and disk footprint of Guest OS –Less sharing potential –Extra layer adds I/O overhead –Not processor-independent VM VMM Hardware
12
12 Virtual Machine Migration
13
13 Why Process Migration Doesn’t Work Because of residual dependencies Interface between app and OS not clearly defined Part of application state resides in OS kernel process file process
14
14 Virtual Machine Migration is Simpler A VM is self-contained Interface to virtual hardware is clearly defined All dependencies abstracted via fault-resilient network protocols process file process VMM
15
15 VMs, VM Migration & Utility Computing Utility Computing on Commodity hardware Let customers submit their application as VMs Minimum-complexity base install –Stateless nodes are disposable –Small footprint, no bugs or patches Can only provide the basic mechanisms –Job submission –Scheduling and preemption (migration) –Pay-as-you-go accounting Essentially, a BIOS for Grid and Utility Computing
16
16 Live Migration in NomadBIOS Joint work with Asger Jensen, 2002
17
17 NomadBIOS: Hypervisor on L4 make xeyesbash emacs L4Linux NomadBIOS L4 micro-kernel Physical Hardware make vi bash gcc L4Linux untrusted trusted
18
18 NomadBIOS Live Migration VM NomadBIOS Hardware L4 VM NomadBIOS Hardware L4 Pre-copy migration + gratuitous ARP = sub-second downtime
19
19 Why Migration Downtime Matters Upsets users of interactive applications such as games May trigger failure detectors in a distributed system
20
20 Live Migration Reduces Downtime The VM can still be used while it is migrating Data is transferred in the background, changes sent later
21
21 Multi-Iteration Pre-copy Technique
22
22 Migration Downtime Two clients connected to a Quake2 server VM, 100Mbit network Response time increases by ~50ms when server migrates
23
23 Lessons Learned from NomadBIOS Migration & TCP/IP resulted in 10-fold code size increase –Simplicity/functionality tradeoff A lot of stuff was still missing: –Threading –Encryption & access control –Disk access VM VMM Hardware L4 VM VMM Hardware Migration + TCP/IP L4
24
24 Self-Migration in Xen Joint work with Cambridge University, 2004-2005
25
25 The Promise of Xen “Xen” open source VMM announced in late 2003 Xen 1.0 was –A lean system with many of the same goals as NomadBIOS –Optimized for para-virtualized VM hosting –Very low overhead (~5%) Our goal was to port Live Migration from NomadBIOS to Xen –Xen lacked layers of indirection that L4 had –Worse: They were removed for a reason –Nasty control plane “Dom0” VM
26
26 Xen Control Plane (Dom0) VM VMM Control Plane VM Xen uses a “side-car” model, with a trusted control VM –Has absolute powers –Adds millions of lines of code to the TCB Security-wise, the control VM is the Achilles' Heel
27
27 Reduce Complexity with Self-Migration VM migration needs: –TCP/IP for transferring system state –Page-table access for checkpointing A VM is self-paging & has its own TCP/IP stack Reduce VMM complexity by performing migration from within the VM No need for networking, threading or crypto in the TCB VM VMM Migration Paging TCP/IP Hardware Paging TCP/IP Paging TCP/IP
28
28 An Inspiring Example of Self-Migration von Münchhausen in the swamp
29
29 Simple Brute-Force Solution Reserve half of memory for a snapshot buffer Checkpoint by copying state into snapshot buffer Migrate by copying snapshot to destination host Source Destination
30
30 Combination with Pre-copy Combine Pre-copy with Snapshot Buffer
31
31 First Iteration
32
32 Delta Iteration
33
33 Snapshot/Copy-on-Write Phase
34
34 Impact of Migration on Foreground Load httperf
35
35 Self-Migration Summary Pros: –Self-Migration is more flexible, under application control –Self-Migration removes hardcoded and complex features from the trusted install –Self-Migration can work with direct-IO hardware Cons: –Self-Migration is not transparent, has to be implemented by each OS –Self-Migration cannot be forced from the outside
36
36 Laundromat Computing
37
37 Pay-as-you-go Processing Laundromats do this already –Accessible to anyone –Pre-paid & pay-as-you-go –Small initial investment We propose to manage clusters the same way –Micro-payment currency –Pay from first packet –Automatic garbage collection when payments run out
38
38 Token Payments Initial payment is enclosed in Boot Token Use a simple hash-chain for subsequent payments –H n (s), H n-1 (s), …, H(s), s Boot Token signed by trusted broker service Broker handles authentication
39
39 Injecting a New VM Two-stage boot loader handles different incoming formats –ELF loader for injecting a Linux kernel image –Checkpoint loader for injecting a migrating VM “Evil Man” service decodes Boot Token “magic ping” Evil Man is 500 lines of code + network driver
40
40 Laundromat Summary Pros: –Simple and flexible model –Hundreds instead of millions LOC –Built-in payment system –Supports self-scaling applications Cons: –Needs direct network access –Magic ping does not always get through firewalls etc.
41
41 Service-Oriented Model
42
42 Pull Instead of Push In real life, most Grid clusters are hidden behind NATs –No global IP address for nodes –No way to connect from the outside –Usually allowed to initiate a connection from within Possible workarounds: –Run a local broker at each site –Port-forwarding in the NAT –Switch to a pull-based model Pull model –Boot VMs over HTTP –Add HTTP client to trusted software for fetching a work description –VMs run a web service for cloning and migration
43
43 Pull Model
44
44 Workload Description
45
45 Pulse Notifications Periodic polling works, but introduces latency What we have essentially is a cache invalidation problem Pulse is a simple and secure wide-area cache invalidation protocol Clients listen on H(s), publishers release s to invalidate We can preserve the pull model, without adding latency
46
46 Virtual Machines on the Desktop
47
47 Security Problems on the Desktop Web browsers handle sensitive data, such as e-banking logins Risk of worms or spy-ware creeping from one site to another VMs could provide strong isolation features
48
48 The Blink Display System VMs have traditionally had only simple 2D graphics Modern applications need 3D acceleration Cannot sacrifice safety for performance here Blink: –JIT-compiled OpenGL stored procedures –Flexible, efficient and safe control of the screen –Blink VMs can be checkpointed and migrate to different graphics hardware
49
49 VMs on Desktop Summary VMs can have native performance graphics, without sacrificing safety Stored procedures more flexible than, e.g., shared memory off- screen buffers Introduces a new display model, but still backwards compatible
50
50 Concluding Remarks
51
51 Related Work All commercial VMMs have or will have live migration: –VMware VMotion –Citrix/XenSource XenMotion (derived from our work), Sun, Oracle –Microsoft Hyper-V (planned) Huge body of previous process migration work –Distributed V, Emerald cross-platform object mobility –MOSIX –Zap process group migration Grid/utility computing projects –BOINC (SETI@Home) from Berkeley –PlanetLab –Shirako from Duke, Amazon EC2, Minimun Intrusion Grid, … Security –L4 and EROS secure display systems –L4 Nizza architecture
52
52 Future Work A stateless VMM –All per-VM state stored sealed in the VM –Seamless checkpointing and migration –Cannot DoS the VMM or cause starvation of other VMs Migration-aware storage –Failure-resilient network file system for virtual disks –Peer-to-peer caching of common contents Self-Migration of a native OS, directly on the raw hardware –Also useful for software-suspend / hibernation
53
53 Conclusion & Contributions Compared to processes, VMs offer superior functionality –Control own paging and scheduling –Provide file systems and virtual memory –Backwards compatible –Safe containers for untrusted code We have shown: –How VMs can live-migrate across a network, with sub-second downtimes –How VMs can self-migrate, without help from the VMM Furthermore: –We have designed and implemented a “Laundromat Computing” system –Reduced the network control plane from millions to hundreds of lines of code –Pulse and Blink supporting systems
54
54 Questions
55
55 VMware is hiring in Aarhus Thank You http://www.diku.dk/~jacobg
56
56 Dealing with Network Side-effects The copy-on-write phase results in a network fork “Parent” and “child” overlap and diverge Firewall network traffic during final copy phase All except migration-traffic is silently dropped in last phase
57
57 Re-routing Network Traffic Simple techniques –IP redirection with gratuitous ARP –MAC address spoofing Wide-area: –IP-in-IP tunnelling
58
58 Overhead Added by Continuous Migration
59
59 Control Models Compared
60
60 User-Space Migration Driver
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.