SnowFlock: Virtual Machine Cloning as a First- Class Cloud Primitive Lagar-Cavilla, et al. Reading Group Presentation 15 December 2011 Adriaan Middelkoop.

SnowFlock: Virtual Machine Cloning as a First- Class Cloud Primitive Lagar-Cavilla, et al. Reading Group Presentation 15 December 2011 Adriaan Middelkoop adriaan.middelkoop@lip6.fr

2 Article ● Published: EuroSys '09, ACM Transactions on Computer Systems ● Authors: ● H. Andrés Lagar-Cavilla - AT&T ● Joseph A. Whitney – University of Toronto ● Roy Bryant, Philip Patchin, Michael Brudno, Eyal de Lara, Stephen M. Rumble – Standford University ● M. Satyanarayanan – Carnegie Mellon University ● Adin Scannell – GridCentric Inc.

3 Motivation ● Starting up a virtual machine is expensive ● initialize virtual hardware ● startup kernel / create kernel tables ● startup and initialize applications ● clone a VM instead? ● No clear API for creating VM instances ● some semi-automatic configuration ● startup scripts ● create VMs programmatically like processes?

4 What is SnowFlake

5 In a Nutshell ● Spawning VM instances ● Traditional: boot guest, run startup script ● Article: uses fork() on VMs in the application: ● Contributions of Snowflake ● Fast spawning of instances ● Only the active working set is transferred to the spawned instance + more bandwidth-saving tricks ● Works with conventional hardware but requires changes to VM and Guest/Host OS – file system driver, network driver, memory manager :(){ :|:&};:

6 General Idea ● Parent-child parallelism using process cloning ● Child starts from a clone of the state of the parent process ● Different value in a register to distinguish the child from the parent to execute different code long pid = fork(); if (pid == 0) { // child code } else { // parent code } Efficient with copy-on-write and page sharing Simple API, but: ● too little isolation: only isolation of the process' memory ● too much isolation: no direct support for exchange of results

7 General Idea ● Article presents: parallelism through VM cloning ● Child starts in a clone of the parent VM ● Different value in a register to distinguish the child from the parent to execute different code long vid = vm_fork(); if (vid == 0) { // child code } else { // parent code } Similar optimizations as with fork to efficiently clone the VM independent copy of memory, operating system, and disk

8 API (Table 1) ● Ticket sf_request(int n, bool same_node) ● Id sf_clone(Ticket t) ● void sf_exit() ● void sf_join(Ticket t) ● void sf_kill(Ticket t) ● CheckPoint sf_checkpoint_parent() ● Id sf_create_clones(CheckPoint c, Ticket t)

9 Caveats ● A VM may run multiple processes, which could all run a VM fork ● Guideline: one main process per VM ● Use separate VMs for different kind of processes ● Parent and child cannot communicate directly ● Child receives a copy of memory and disk of parent ● Use sockets or files on a network disk ● Why not use shared pipes?

10 Typical Fork Exploitation ● Sandboxing (Figure 1a) ● Load handling (Figure 1c) ● Parallel task pool (Figure 1b, Figure 1d) Requires small overhead Requires low latency

11 Achievements ● Snowflake replication: 0.8s clock time ● Independent of the number of clones ● if each clone gets its own physical node ● if multicast is used ● Conventional replication (Figure 2) ● 90s with multicast ● 40 + 10 * n without multicast ● not only 100x slower, but also: ● too high latency

12 Implementation

13 Four Insights ● Children can already resume execution with initially a small replicated state ● Children access only a bit of the parent's memory ● Children allocate memory after forking ● overwritten without accessing the original contents ● Children execute similar code and use common data structures only replicate 0.1% of the state: low latency memory on demand: don't replicate all state swap-files are based on a similar insight don't fetch pages that are allocated by the child use multicast double-edged sword latency vs caching/prefetching

14 VM Descriptors ● Minimal description of the VM in order to recreate it “on-demand” (approx 1 MB) ● Not the full state of the VM, instead: ● virtual CPU registers ● page tables (= biggest part) ● segmentation tables ● device specs ● some special memory pages

15 Clone creation ● Parent: SnowFlake save (100 ms) ● Parent: Xen save (100 ms) ● Start clones (10 ms) ● Multicast descriptors (100 ms) ● Child: SnowFlake restore (200 ms) ● Child: Xen restore (200 ms) Snowflock, http://sysweb.cs.toronto.edu/snowflock Xen 3.4.0 and linux guest 2.6.18 and linux host 2.6.27.45 Benchmarks on a 32 node, 4core 3.2 Ghz Xeon cluster with 4 GB Ram per node

16 Memory on Demand ● Parent: copy on write ● Child: on demand with avoidance heuristics ● don't request pages that are allocated by the child ● don't request pages that are written to by an I/O device of the child ● Benchmarks of on-demand memory (Figure 4a): ● page fetch 275 microseconds ● 85% of the time in the network ● Benchmarks of heuristics (Figure 4b): ● 40x reduction in page requests ● unicast: 4x faster with heuristics ● multicast: 2x faster with heuristics (and slightly faster than unicast) ● Benchmarks of multi-cast (Figure 4c): ● scales when a significant portion of the parent's state is needed

17 Virtual Disk ● Copy-on-write implementation ● Lazy fetching of blocks ● Similar heuristics as with memory: don't fetch blocks that are overwritten by a child a disk is less volatile than main memory, so it may be worthwhile to cache fetched blocks spawned processes usually perform only little I/O

18 Conclusions

19 Benchmarks ● See Section 5 ● Comparison versus a zero-cost fork, which are pre-cloned VMs waiting for only the job data. The difference in speedup and total time are within 5% ● NCBI Blast – DNA queries ● SHRiMP – DNA queries (more memory intensive) ● ClustalW – More DNA queries (more cpu- instensive, highly parallel) ● QuantLib – Quantative finance program ● Aqsis – Renderer of animation movies ● Distcc – Distributed make for C programs

20 Discussion Items ● For what applications would 800ms be a too high latency? ● Should operating systems not offer to run all processes in their own virtual machine? Thus, is using a fully-fledged guest-OS a hack to overcome a deficiency in current OS implementations? ● Processes need to communicate: they cannot be fully isolated from each other. => What are proper synchronization primitives? ● What to do with shared memory, shared files? ● Transactions? ● Paper claims seamless integration with MPI => Why not task pool? ● What about applications that use garbage collection?

SnowFlock: Virtual Machine Cloning as a First- Class Cloud Primitive Lagar-Cavilla, et al. Reading Group Presentation 15 December 2011 Adriaan Middelkoop.

Similar presentations

Presentation on theme: "SnowFlock: Virtual Machine Cloning as a First- Class Cloud Primitive Lagar-Cavilla, et al. Reading Group Presentation 15 December 2011 Adriaan Middelkoop."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SnowFlock: Virtual Machine Cloning as a First- Class Cloud Primitive Lagar-Cavilla, et al. Reading Group Presentation 15 December 2011 Adriaan Middelkoop.

Similar presentations

Presentation on theme: "SnowFlock: Virtual Machine Cloning as a First- Class Cloud Primitive Lagar-Cavilla, et al. Reading Group Presentation 15 December 2011 Adriaan Middelkoop."— Presentation transcript:

Similar presentations

About project

Feedback