Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Live migration of Virtual Machines Nour Stefan, SCPD.
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.
Chapter 4 Infrastructure as a Service (IaaS)
Parallelizing Live Migration of Virtual Machines
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Memory-efficient Virtual Machine High Availability Karen Kai-Yuan Hou Prof. Kang G. Shin University of Michigan Mustafa Uysal (VMware) Arif Merchant (HP.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Virtualization and Cloud Computing Virtualization David Bednárek, Jakub Yaghob, Filip Zavoral.
Adam Duffy Edina Public Schools.  The heart of virtualization is the “virtual machine” (VM), a tightly isolated software container with an operating.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
Remus: High Availability via Asynchronous Virtual Machine Replication.
DatacenterMicrosoft Azure Consistency Connectivity Code.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
1 Distributed Systems: Distributed Process Management – Process Migration.
Chapter 21: Mobile Virtualization Infrastracture and Related Security Issues Guide to Computer Network Security.
Virtualization for Cloud Computing
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
VMware vSphere 4 Introduction. Agenda VMware vSphere Virtualization Technology vMotion Storage vMotion Snapshot High Availability DRS Resource Pools Monitoring.
Presented by : Ran Koretzki. Basic Introduction What are VM’s ? What is migration ? What is Live migration ?
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Real Security for Server Virtualization Rajiv Motwani 2 nd October 2010.
Virtualization and Cloud Computing Research at Vasabilab Kasidit Chanchio Vasabilab Dept of Computer Science, Faculty of Science and Technology, Thammasat.
RAMCloud Design Review Recovery Ryan Stutsman April 1,
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Evaluation of Delta Compression Techniques for Efficient Live Migration of Large Virtual Machines Petter Svärd, Benoit Hudzia, Johan Tordsson and Erik.
1 The Google File System Reporter: You-Wei Zhang.
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Networked File System CS Introduction to Operating Systems.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
Virtual Machine Course Rofideh Hadighi University of Science and Technology of Mazandaran, 31 Dec 2009.
Remus: VM Replication Jeff Chase Duke University.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 2.
Improving Network I/O Virtualization for Cloud Computing.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.
Server Virtualization
1 Process migration n why migrate processes n main concepts n PM design objectives n design issues n freezing and restarting a process n address space.
Virtualization 3 Subtitle: “What can we do to a VM?” Learning Objectives: – To understand the VM-handling mechanisms of a hypervisor – To understand how.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
VMware vSphere Configuration and Management v6
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
COMP25212: Virtualization 3 Subtitle: “What can we do to a VM?” Learning Objectives: –To understand the VM-handling mechanisms of a hypervisor –To understand.
Virtualization Supplemental Material beyond the textbook.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
Ian Gable HEPiX Spring 2009, Umeå 1 VM CPU Benchmarking the HEPiX Way Manfred Alef, Ian Gable FZK Karlsruhe University of Victoria May 28, 2009.
Virtual cloud R 陳昌毅 R 顏昭恩 R 黃伯淳 2010/06/03.
Docker and Container Technology
Operating-System Structures
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Virtualization for Cloud Computing
Chapter 6: Securing the Cloud
Time-Bounded, Thread-Based Live Migration of Virtual Machines
Kenichi Kourai Kouta Sannomiya Kyushu Institute of Technology, Japan
Chapter 21: Virtualization Technology and Security
Chapter 22: Virtualization Security
Introduction to Operating Systems
Outline Virtualization Cloud Computing Microsoft Azure Platform
Ch 4. The Evolution of Analytic Scalability
Process Migration Troy Cogburn and Gilbert Podell-Blume
Specialized Cloud Architectures
Efficient Migration of Large-memory VMs Using Private Virtual Memory
Presentation transcript:

Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science, Faculty of Science and Technology, Thammasat University ALICE O2 Presentation

Outline Introduction and problems Checkpointing mechanisms Our Proposal – Time-bound Live Checkpointing (TLC) – A Scalable Checkpointing Technique Conclusion and Future Works

Introduction Today, applications require more CPUs and RAM – Big Data Analysis – Large Scale simulation – Scientific Computation – Legacy Applications, etc. Cloud computing has become a common platform for large-scale computations – Amazon offers VM with 8 vcpus and 68.4GiB Ram – Google offers VM with 8 vcpus and 52GB Ram Large-scale applications can have long exe time – In case of failures, users must restart apps from beginning

How do we handle server crashes? Checkpointing: The state of long running apps should be saved regularly so that the computation can be recovered from the last saved state if failures occur It usually take a long time to save state of CPU and memory-intensive apps – Downtime could also be high Parallel File System (PFS) can be a bottleneck and slowdown the entire system when saving state of multiple nodes simultaneously From

What is Checkpointing? Periodically Save Computation State to Persistent Storage for recovery if failures occur Linux/Hardware VM-Level OS-Level User Level Application-LevelModify App Link with Chkpt library Modify Kernel Modify Hypervisor More works on development Know exactly what to save Depend on exe environments Don’t have to recompile app Depend on Kernel version Can reuse executable Must handle all VM state Transparent to Guest OS/App

VM Checkpointing Highly Transparent to Guest OS & Applications Save all apps and execution environments Techniques: – Stop & Save [kvm] – Copy on Write & Chkpt Thread [vmware ESXi] – Copy to Memory Buffer [TLC 2009] – Live replication to a backup host [Remus] – Time-bound Live Checkpointing [TLC]

1. Stop and Save VM Hypervisor Local or Shared Storage Stop the VM to save state to disk Long Downtime and Checkpoint time Saving to shared storage is necessary if want to restore on a new host Saving to shared storage cause higher checkpoint time

2. Copy on Write Hypervisor VM Local or Shared Storage Hypervisor create a thread to scan memory and save unmodified pages If VM modifies a page, hypervisor copy the original contents of that page to directly to disk Can cause high downtime if large number of pages are modified in a short period of time One memory scan

3. Memory Buffer Hypervisor VM Memory Local/ Shared Storage Hypervisor create a thread to scan memory and save unmodified pages Hypervisor stop VM to copy dirty pages to a memory buffer and write the buffer to disk later when checkpointing done Need large amount of memory One memory scan

4. Replication Hypervisor VM Source Host Backup Host Memory Local/ Shared Storage Hypervisor stop VM periodically to copy and sync state information with a backup host Great for High Availability Need to reserve resource on a backup host for the VM throughout its lifetime

Time-bound Live Migration TLC is based on the Time-bound, Thread-based Live Migration (TLM) [CCgrid 2014] Basic Principles of TLM: – TLM finishes within a bounded period of time,i.e., one round of memory scan – Performs with best efforts to minimize downtime – Dynamically adjust VM computation speed to reduce downtime by balancing dirty page generation rate and available data transfer bandwidth

TLM Design VM State Transfer Add two threads to source hypervisor – Mtx: scan entire ram – Dtx: new dirty pages Use two receiver threads to dest Optimization Manage Resource Allocation and handle downtime minimization VM State Transfer Downtime reduction

Kvm Migration and Downtime (over a 10 Gbps network) kvm-1.x- 1. Hard to find right tolerable downtime 2. Same param may cause very different migration behaviors TLM

TLM:Kernel MG Class D 36GB VM Ram, 27.3GB WSS Low locality, 600,000 pages can be updated in one second but pages are transfer no more than 100,000 page/sec Reasonable Bandwidth (1) (2) (3) 1 Gbps network

Time-bound Live Checkpointing (TLC) Based on TLM Send state evenly to set of Distributed Memory Servers Let each DMS saves the state to local disk when finish Stage 3 Each DMS can write state to PFS later Perf: migtime + 1/3 of saving the entire VM state to local disk

Time-bound Live Checkpointing (TLC) Based on TLM Each DMS load state info from local disk When the loading is done, send data simultaneously to the restored VM The restored VM put the transmitted state info at the right place and resume computation Perf: 1/3 of traditional VM restoration time

How do we make TLC checkpointing scale? Define a set of host, namely a circle Let each host in the same circle takes turn to checkpoint while the rests help saving its state

Scalable Checkpointing Put each host in a circle into a separate group

Scalable Checkpointing VM on host in the same group chkpt at the same time VMs in the same group could be communicating with one another

Scalable Checkpointing VM on host in the same group chkpt at the same time

Scalable Checkpointing Every DMS on a helping host save state to local disk

Scalable Checkpointing DMS can later saves state to PFS

Scalable Checkpointing Or, DMS can collaborate to replicate state information

Conclusion and Future Works We propose a Time-bound Live Checkpointing (TLC) mechanism – Finish in a bound time period (proportional to Ram size) – Provide best effort downtime minimization – Reduce dirty page generation rate to minimize downtime We propose using a set of the Distributed Memory Server to speed up checkpointing time We propose a method to perform checkpointing at a large scale We have implemented TLC and DMS and conducted preliminary experiments Next, we will evaluate the scalable checkpointing ideas Thank you. Questions?

BACKUP

Experimental Setup Each VM uses 8 vcpu NAS Parallel Benchmark v3.3 – OpenMP Class D (and MPI Class D in paper) VM migrate from source to dest computer Two separate networks: – 10 Gbps for migration – 1 Gbps for iperf Iperf fires from supporting computer VM disk image of migrating VM is on NFS

TLM Performance: Kernel IS Class D 36GB VM Ram, 34.1GB WSS Update large amount of pages continuously VM page transfer rate is about half of dirty page generation The migration tome of TLM and TLM.1S are close TLM downtime is about 0.68 of that of TLM.1S