Remus: High Availability via Asynchronous Virtual Machine Replication

Slides:



Advertisements
Similar presentations
Live migration of Virtual Machines Nour Stefan, SCPD.
Advertisements

Chapter 16: Recovery System
Silberschatz and Galvin  Operating System Concepts Module 16: Distributed-System Structures Network-Operating Systems Distributed-Operating.
XEN AND THE ART OF VIRTUALIZATION Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, lan Pratt, Andrew Warfield.
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.
NWCLUG 01/05/2010 Jared Moore Xen Open Source Virtualization.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Transparent Checkpoint of Closed Distributed Systems in Emulab Anton Burtsev, Prashanth Radhakrishnan, Mike Hibler, and Jay Lepreau University of Utah,
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
Remus: High Availability via Asynchronous Virtual Machine Replication.
Module – 12 Remote Replication
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
PRASHANTHI NARAYAN NETTEM.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Case Study - GFS.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Presented by : Ran Koretzki. Basic Introduction What are VM’s ? What is migration ? What is Live migration ?
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
High-Availability Methods Lesson 25. Skills Matrix.
Business Continuity and Disaster Recovery Chapter 8 Part 2 Pages 914 to 945.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Remus: VM Replication Jeff Chase Duke University.
CH2 System models.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 2.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Improving Network I/O Virtualization for Cloud Computing.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Practical Byzantine Fault Tolerance
Virtualization 3 Subtitle: “What can we do to a VM?” Learning Objectives: – To understand the VM-handling mechanisms of a hypervisor – To understand how.
Copyright © cs-tutorial.com. Overview Introduction Architecture Implementation Evaluation.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Presented by: Reem Alshahrani. Outlines What is Virtualization Virtual environment components Advantages Security Challenges in virtualized environments.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Li Zhijian Fujitsu Limited.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Recovery.
Section 06 (a)RDBMS (a) Supplement RDBMS Issues 2 HSQ - DATABASES & SQL And Franchise Colleges By MANSHA NAWAZ.
COMP25212: Virtualization 3 Subtitle: “What can we do to a VM?” Learning Objectives: –To understand the VM-handling mechanisms of a hypervisor –To understand.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Virtual Machines Mr. Monil Adhikari. Agenda Introduction Classes of Virtual Machines System Virtual Machines Process Virtual Machines.
FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.
Virtual Machine Movement and Hyper-V Replica
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
Chapter 2: System Structures
Operating System Reliability
Operating System Reliability
Group 8 Virtualization of the Cloud
Introduction to Operating Systems
EECS 498 Introduction to Distributed Systems Fall 2017
Virtualization Layer Virtual Hardware Virtual Networking
Operating System Reliability
Operating System Reliability
Operating Systems Chapter 5: Input/Output Management
Module 17: Recovery System
Operating System Reliability
Transactions in Distributed Systems
Operating System Reliability
Operating System Reliability
Presentation transcript:

Remus: High Availability via Asynchronous Virtual Machine Replication Nour Stefan, SCPD

Introduction Related work Design Implementation Evaluation Future work Conclusions

Introduction Highly available systems it requires that systems be constructed with redundant components that are capable of maintaining and switching to backups in case of failure Commercial HA systems are expensive use specialized hardware customized software

Introduction Remus Virtualization High availability on commodity hardware migrate running VMs between physical hosts replicate snapshots of an entire running OS instance at very high frequencies (25ms) External output is not released until the system state that produced it has been replicated Virtualization create a copy of a running machine, but it does not guarantee that the process will be efficient

Introduction Running two hosts in lock-step (syncronously) Solution : reduces the throughput of memory to that of the network device Solution : 1 host executes speculatively and then checkpoint and replicate its state asynchronously system state is not made externally visible until the checkpoint is committed running the system tens of milliseconds in the past

Introduction Goals: Generality Transparency Seamless failure recovery The reality in many environments is that OS and application source may not even be available to modify Seamless failure recovery No externally visible state should ever be lost in the case of single-host failure

Approach

Approach Speculative execution Asynchronous replication buffer output1 until a more convenient time, performing computation speculatively ahead of synchronization points Asynchronous replication Buffering output at the primary server allows replication to be performed asynchronously The primary host can resume execution at the moment its machine state has been captured, without waiting for acknowledgment from the remote end

Related work XEN support for live migration Extended significantly to support frequent, remote checkpointing Virtual machine logging and replay ZAP Virtualization layer within the linux kernel SpecNFS Using speculative execution in order to isolate I/O processing from computation

Design Remus achieves high availability by propagating frequent checkpoints of an active VM to a backup physical host on the backup, the VM image is resident in memory and may begin execution immediately if failure of the active system is detected backup only periodically consistent with the primary => all network output must be buffered until state is synchronized on the backup Virtual machine does not actually execute on the backup until a failure occurs => allowing it to concurrently protect VMs running on multiple physical hosts in an N – to – 1 style configuration

Design

Design – Failure model Remus provides the following properties: The fail-stop failure of any single host is tolerable Both the primary and backup hosts fail concurrently => the protected system’s data will be left in a crash-consistent state No output will be made externally visible until the associated system state has been committed to the replica Remus does not aim to recover from software errors or nonfail-stop conditions

Design –Pipelined checkpoints Checkpointing a running virtual machine many times per second places extreme demands on the host system. Remus addresses this by aggressively pipelining the checkpoint operation epoch-based system in which execution of the active VM is bounded by brief pauses in execution in which changed state is atomically captured, and external output is released when that state has been propagated to the backup

Design –Pipelined checkpoints (1) Once per epoch, pause the running VM and copy any changed state into a buffer (2) Buffered state is transmitted and stored in memory on the backup host (3) Once the complete set of state has been received, the checkpoint is acknowledged to the primary (4) buffered network output is released

Design - Memory and CPU Writable working sets Shadow pages Remus implements checkpointing as repeated executions of the final stage of live migration: each epoch, the guest is paused while changed memory and CPU state is copied to a buffer The guest then resumes execution on the current host, rather than on the destination

Design - Memory and CPU Migration enhancements every checkpoint is just the final stop-and-copy phase of migration examination of Xen’s checkpoint code => the majority of the time spent while the guest is in the suspended state is lost to scheduling (xenstore daemon) it reduces the number of inter-process requests required to suspend and resume the guest domain it entirely removes xenstore from the suspend/resume process

Design - Memory and CPU Checkpoint support Asynchronous transmission resuming execution of a domain after it had been suspended Xen previously did not allow “live checkpoints” and instead destroyed the VM after writing its state out Asynchronous transmission the migration process was modified to copy touched pages to a staging buffer rather than delivering them directly to the network while the domain is paused

Design – Network buffering TCP Ccrucial that packets queued for transmission be held until the checkpointed state of the epoch in which they were generated is committed to the backup

Design – Disk buffering Remus must preserve crash consistency even if both hosts fail

Design – Detecting failure Simple failure detector: a timeout of the backup responding to commit requests will result in the primary assuming that the backup has crashed and disabling protection a timeout of new checkpoints being transmitted from the primary will result in the backup assuming that the primary has crashed and resuming execution from the most recent checkpoint

Evaluation

Evaluation

Future work Deadline scheduling Page compression Copy-on-write checkpoints Hardware virtualization support Cluster replication Disaster recovery Log-structured datacenters

Conclusions Remus novel system running on commodity hardware uses virtualization to encapsulate a protected VM performs frequent whole-system checkpoints to asynchronously replicate the state of a single speculatively executing virtual machine

?