Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

Similar presentations

Presentation on theme: "1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)"— Presentation transcript:

1 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

2 2 Short Bio Studied CS at DIKU in Copenhagen Worked for Io Interactive on the first two Hitman games Master’s thesis 2002 on “Nomadic Operating Systems” Ph.D. thesis 2007 “Virtual Machine Mobility with Self- Migration” –Early involvement in the Xen VMM project in Cambridge –Worked on “Tahoma” secure browser at the University of Washington –Interned at Microsoft Research Cambridge (2004) and Silicon Valley (2006) (security related projects) Presently working at VMware on top-secret cool stuff

3 3 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

4 4 Talk Overview Motivation & Background Virtual Machine Migration –Live Migration in NomadBIOS –Self-Migration in Xen Laundromat Computing Virtual Machines on the desktop Related & future work + conclusion

5 5 Motivation & Background

6 6 Motivation Researchers and businesses need computing power on-demand –Science increasingly relies on simulation –Web2.0 startups grow quickly (and die just as fast) Hardware is cheap, manpower and electricity are not –Idle machines are expensive –Immobile jobs reduce utilization –Fear of untrusted users stealing secrets or access We need a dedicated Grid/Utility computing platform: –Simple configuration & instant provisioning –Strong isolation of untrusted users –Backwards compatible with legacy apps (C, Fortran, …) –Location independence & Automated load-balancing –Pay-for-access without the lawyer part

7 7 Our Proposal Use virtual machines as containers for untrusted code Use live VM migration to make execution transient and location-indepedent Use micro-payments for pay-as-you-go computing

8 8 Her application roams freely, looking for the cheapest and fastest resources She finds a Utility to host her application Grid & Utility Computing Vision Jill creates a web site for sending greeting cards She pays for access

9 9 Can’t We Do This With UNIX? Configuration complexity & lack of isolation –Hard to agree on a common software install (BSD, Redhat, Ubuntu?) –Name-space conflicts, e.g., files in /tmp –UNIX is designed for sharing, not security Mismatch of abstractions –Process ≠ Application –User login ≠ Customer –Quota ≠ Payment Location-dependence –No bullet-proof way of moving running application to a new host –Process migration in UNIX just doesn’t work

10 10 Use Virtual Machines Instead of Processes “A virtual machine is […] an efficient, isolated duplicate of the real machine” [Popek & Goldberg, 1974] ”A virtual machine cannot be compromised by the operation of any other VM. It provides a private, secure, and reliable computing environment for its users, …” [Creasy, 1981] VM VMM Hardware

11 11 Pros and Cons of VMs Pros: –Strongly isolated –Name-space is not shared –More configuration freedom –Simple interface to hardware –VMs can migrate between hosts Cons: –Memory and disk footprint of Guest OS –Less sharing potential –Extra layer adds I/O overhead –Not processor-independent VM VMM Hardware

12 12 Virtual Machine Migration

13 13 Why Process Migration Doesn’t Work Because of residual dependencies Interface between app and OS not clearly defined Part of application state resides in OS kernel process file process

14 14 Virtual Machine Migration is Simpler A VM is self-contained Interface to virtual hardware is clearly defined All dependencies abstracted via fault-resilient network protocols process file process VMM

15 15 VMs, VM Migration & Utility Computing Utility Computing on Commodity hardware Let customers submit their application as VMs Minimum-complexity base install –Stateless nodes are disposable –Small footprint, no bugs or patches Can only provide the basic mechanisms –Job submission –Scheduling and preemption (migration) –Pay-as-you-go accounting Essentially, a BIOS for Grid and Utility Computing

16 16 Live Migration in NomadBIOS Joint work with Asger Jensen, 2002

17 17 NomadBIOS: Hypervisor on L4 make xeyesbash emacs L4Linux NomadBIOS L4 micro-kernel Physical Hardware make vi bash gcc L4Linux untrusted trusted

18 18 NomadBIOS Live Migration VM NomadBIOS Hardware L4 VM NomadBIOS Hardware L4 Pre-copy migration + gratuitous ARP = sub-second downtime

19 19 Why Migration Downtime Matters Upsets users of interactive applications such as games May trigger failure detectors in a distributed system

20 20 Live Migration Reduces Downtime The VM can still be used while it is migrating Data is transferred in the background, changes sent later

21 21 Multi-Iteration Pre-copy Technique

22 22 Migration Downtime Two clients connected to a Quake2 server VM, 100Mbit network Response time increases by ~50ms when server migrates

23 23 Lessons Learned from NomadBIOS Migration & TCP/IP resulted in 10-fold code size increase –Simplicity/functionality tradeoff A lot of stuff was still missing: –Threading –Encryption & access control –Disk access VM VMM Hardware L4 VM VMM Hardware Migration + TCP/IP L4

24 24 Self-Migration in Xen Joint work with Cambridge University,

25 25 The Promise of Xen “Xen” open source VMM announced in late 2003 Xen 1.0 was –A lean system with many of the same goals as NomadBIOS –Optimized for para-virtualized VM hosting –Very low overhead (~5%) Our goal was to port Live Migration from NomadBIOS to Xen –Xen lacked layers of indirection that L4 had –Worse: They were removed for a reason –Nasty control plane “Dom0” VM

26 26 Xen Control Plane (Dom0) VM VMM Control Plane VM Xen uses a “side-car” model, with a trusted control VM –Has absolute powers –Adds millions of lines of code to the TCB Security-wise, the control VM is the Achilles' Heel

27 27 Reduce Complexity with Self-Migration VM migration needs: –TCP/IP for transferring system state –Page-table access for checkpointing A VM is self-paging & has its own TCP/IP stack Reduce VMM complexity by performing migration from within the VM No need for networking, threading or crypto in the TCB VM VMM Migration Paging TCP/IP Hardware Paging TCP/IP Paging TCP/IP

28 28 An Inspiring Example of Self-Migration von Münchhausen in the swamp

29 29 Simple Brute-Force Solution Reserve half of memory for a snapshot buffer Checkpoint by copying state into snapshot buffer Migrate by copying snapshot to destination host Source Destination

30 30 Combination with Pre-copy Combine Pre-copy with Snapshot Buffer

31 31 First Iteration

32 32 Delta Iteration

33 33 Snapshot/Copy-on-Write Phase

34 34 Impact of Migration on Foreground Load httperf

35 35 Self-Migration Summary Pros: –Self-Migration is more flexible, under application control –Self-Migration removes hardcoded and complex features from the trusted install –Self-Migration can work with direct-IO hardware Cons: –Self-Migration is not transparent, has to be implemented by each OS –Self-Migration cannot be forced from the outside

36 36 Laundromat Computing

37 37 Pay-as-you-go Processing Laundromats do this already –Accessible to anyone –Pre-paid & pay-as-you-go –Small initial investment We propose to manage clusters the same way –Micro-payment currency –Pay from first packet –Automatic garbage collection when payments run out

38 38 Token Payments Initial payment is enclosed in Boot Token Use a simple hash-chain for subsequent payments –H n (s), H n-1 (s), …, H(s), s Boot Token signed by trusted broker service Broker handles authentication

39 39 Injecting a New VM Two-stage boot loader handles different incoming formats –ELF loader for injecting a Linux kernel image –Checkpoint loader for injecting a migrating VM “Evil Man” service decodes Boot Token “magic ping” Evil Man is 500 lines of code + network driver

40 40 Laundromat Summary Pros: –Simple and flexible model –Hundreds instead of millions LOC –Built-in payment system –Supports self-scaling applications Cons: –Needs direct network access –Magic ping does not always get through firewalls etc.

41 41 Service-Oriented Model

42 42 Pull Instead of Push In real life, most Grid clusters are hidden behind NATs –No global IP address for nodes –No way to connect from the outside –Usually allowed to initiate a connection from within Possible workarounds: –Run a local broker at each site –Port-forwarding in the NAT –Switch to a pull-based model Pull model –Boot VMs over HTTP –Add HTTP client to trusted software for fetching a work description –VMs run a web service for cloning and migration

43 43 Pull Model

44 44 Workload Description

45 45 Pulse Notifications Periodic polling works, but introduces latency What we have essentially is a cache invalidation problem Pulse is a simple and secure wide-area cache invalidation protocol Clients listen on H(s), publishers release s to invalidate We can preserve the pull model, without adding latency

46 46 Virtual Machines on the Desktop

47 47 Security Problems on the Desktop Web browsers handle sensitive data, such as e-banking logins Risk of worms or spy-ware creeping from one site to another VMs could provide strong isolation features

48 48 The Blink Display System VMs have traditionally had only simple 2D graphics Modern applications need 3D acceleration Cannot sacrifice safety for performance here Blink: –JIT-compiled OpenGL stored procedures –Flexible, efficient and safe control of the screen –Blink VMs can be checkpointed and migrate to different graphics hardware

49 49 VMs on Desktop Summary VMs can have native performance graphics, without sacrificing safety Stored procedures more flexible than, e.g., shared memory off- screen buffers Introduces a new display model, but still backwards compatible

50 50 Concluding Remarks

51 51 Related Work All commercial VMMs have or will have live migration: –VMware VMotion –Citrix/XenSource XenMotion (derived from our work), Sun, Oracle –Microsoft Hyper-V (planned) Huge body of previous process migration work –Distributed V, Emerald cross-platform object mobility –MOSIX –Zap process group migration Grid/utility computing projects –BOINC from Berkeley –PlanetLab –Shirako from Duke, Amazon EC2, Minimun Intrusion Grid, … Security –L4 and EROS secure display systems –L4 Nizza architecture

52 52 Future Work A stateless VMM –All per-VM state stored sealed in the VM –Seamless checkpointing and migration –Cannot DoS the VMM or cause starvation of other VMs Migration-aware storage –Failure-resilient network file system for virtual disks –Peer-to-peer caching of common contents Self-Migration of a native OS, directly on the raw hardware –Also useful for software-suspend / hibernation

53 53 Conclusion & Contributions Compared to processes, VMs offer superior functionality –Control own paging and scheduling –Provide file systems and virtual memory –Backwards compatible –Safe containers for untrusted code We have shown: –How VMs can live-migrate across a network, with sub-second downtimes –How VMs can self-migrate, without help from the VMM Furthermore: –We have designed and implemented a “Laundromat Computing” system –Reduced the network control plane from millions to hundreds of lines of code –Pulse and Blink supporting systems

54 54 Questions

55 55 VMware is hiring in Aarhus Thank You

56 56 Dealing with Network Side-effects The copy-on-write phase results in a network fork “Parent” and “child” overlap and diverge Firewall network traffic during final copy phase All except migration-traffic is silently dropped in last phase

57 57 Re-routing Network Traffic Simple techniques –IP redirection with gratuitous ARP –MAC address spoofing Wide-area: –IP-in-IP tunnelling

58 58 Overhead Added by Continuous Migration

59 59 Control Models Compared

60 60 User-Space Migration Driver

Download ppt "1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)"

Similar presentations

Ads by Google