Remus: High Availability via Asynchronous Virtual Machine Replication.

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Live migration of Virtual Machines Nour Stefan, SCPD.
OS Components and Structure
XEN AND THE ART OF VIRTUALIZATION Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, lan Pratt, Andrew Warfield.
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.
Bart Miller. Outline Definition and goals Paravirtualization System Architecture The Virtual Machine Interface Memory Management CPU Device I/O Network,
Memory-efficient Virtual Machine High Availability Karen Kai-Yuan Hou Prof. Kang G. Shin University of Michigan Mustafa Uysal (VMware) Arif Merchant (HP.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Transparent Checkpoint of Closed Distributed Systems in Emulab Anton Burtsev, Prashanth Radhakrishnan, Mike Hibler, and Jay Lepreau University of Utah,
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
G Robert Grimm New York University Cool Pet Tricks with… …Virtual Memory.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Evidor: The Evidence Collector Software using for: Software for lawyers, law firms, corporate law and IT security departments, licensed investigators,
G Robert Grimm New York University Disco.
1 Wolfgang Oberle Ferdinand Herrmann Wolfgang Graetsch Wolfgang Blau Anita Borg Presented by Marina Surlevich Fault Tolerance Under Unix.
PRASHANTHI NARAYAN NETTEM.
Double-Take Software Overview A Platform for Recoverability.
Virtualization 101.
Presented by : Ran Koretzki. Basic Introduction What are VM’s ? What is migration ? What is Live migration ?
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
Server Types Different servers do different jobs. Proxy Servers Mail Servers Web Servers Applications Servers FTP Servers Telnet Servers List Servers Video/Image.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Networked File System CS Introduction to Operating Systems.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
CS533 Concepts of Operating Systems Jonathan Walpole.
Remus: VM Replication Jeff Chase Duke University.
CH2 System models.
Leaders Have Vision™ visionsolutions.com 1 Leading Edge Solutions, Proven Technologies Anne-Elisabeth Caillot Pre-Sales & Business Development
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath.
MODULE I NETWORKING CONCEPTS.
Reliability and Recovery CS Introduction to Operating Systems.
Virtualization 3 Subtitle: “What can we do to a VM?” Learning Objectives: – To understand the VM-handling mechanisms of a hypervisor – To understand how.
SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan.
Supporting Multi-Processors Bernard Wong February 17, 2003.
Speculative Execution in a Distributed File System Ed Nightingale Peter Chen Jason Flinn University of Michigan.
© Logicalis Group HA options Cross Site Mirroring (XSM) and Orion Jonathan Woods.
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Li Zhijian Fujitsu Limited.
Storage Systems CSE 598d, Spring 2007 Rethink the Sync April 3, 2007 Mark Johnson.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Section 06 (a)RDBMS (a) Supplement RDBMS Issues 2 HSQ - DATABASES & SQL And Franchise Colleges By MANSHA NAWAZ.
COMP25212: Virtualization 3 Subtitle: “What can we do to a VM?” Learning Objectives: –To understand the VM-handling mechanisms of a hypervisor –To understand.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Improving the Reliability of Commodity Operating Systems Michael M. Swift, Brian N. Bershad, Henry M. Levy Presented by Ya-Yun Lo EECS 582 – W161.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Speculative Execution in a Distributed File System Ed Nightingale Peter Chen Jason Flinn University of Michigan Best Paper at SOSP 2005 Modified for CS739.
Virtual Machine Movement and Hyper-V Replica
April 6, 2016ASPLOS 2016Atlanta, Georgia. Yaron Weinsberg IBM Research Idit Keidar Technion Hagar Porat Technion Eran Harpaz Technion Noam Shalev Technion.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Virtual Machine Monitors
Credits: 3 CIE: 50 Marks SEE:100 Marks Lab: Embedded and IOT Lab
Chapter 9: Virtual Memory
Chapter 1: Introduction
Chapter 2: System Structures
CS490 Windows Internals Quiz 2 09/27/2013.
Introduction to Operating Systems
EECS 498 Introduction to Distributed Systems Fall 2017
Virtualization Layer Virtual Hardware Virtual Networking
Operating Systems Chapter 5: Input/Output Management
Process Migration Troy Cogburn and Gilbert Podell-Blume
Virtual Memory Overcoming main memory size limitation
OS Components and Structure
Distributed Availability Groups
Mr. M. D. Jamadar Assistant Professor
Network File System (NFS)
Presentation transcript:

Remus: High Availability via Asynchronous Virtual Machine Replication

Introduction Provides OS- and application-agnostic high- availability on commodity hardware Based on the ability of virtualization to migrate running VMs between physical hosts Very high frequencies (up to 40 times/sec) Replicates whole system state – CPU state, memory, hard disks

Goals Generality – Low level service – Regardless of the application or the hardware Transparency – No modifications made to OS or application code Seamless failure recovery – No externally visible state lost – Failure recovery should be very rapid

Approach Virtualized infrastructure allows whole-system replication Speculative execution increases system performance Buffering allows synchronization with the replicated server to be asynchronous

Design and implementation Pipelined checkpoints Epochs divided in four stages: 1.Stop execution and copy any changed state (CPU, memory, disks) to buffer. 2.Transmit buffered state to backup. All network output is being buffered. 3.Backup acknowledges checkpoint. 4.Buffered network output is released.

Design and implementation

Memory and CPU The guest OS is suspended and dirtied pages are copied to a buffer – Due to high frequencies most memory is unchanged – Guest’s entire physical memory is mapped at the beginning instead of mapping/unmapping The guest resumes execution on the current host

Network buffering All outbound traffic goes to a buffer implemented as a queue – Before resuming execution (of primary) a barrier is inserted into the outbound queue – No packet after the barrier is released – When the checkpoint is acknowledged all packaged up to the barrier are released

Disk buffering Before starting the protection system, the current state of the disk on primary is copied to the backup host. Writes are committed immediately on the primary and buffered on the backup host When backup has received the full checkpoint and has acknowledged, it commits writes to the hard disk

Disk buffering

Detecting failure If checkpoint acknowledgement times out – Primary assumes backup has crashed and disables protection If checkpoint transmission times out – Backup assumes primary has crashed and resumes execution from the most recent checkpoint

Evaluation Correctness verification – Kernel compilation (CPU, memory, disks) – X11 client (network) – Network failures introduced at every stage – Backup took over successfully – Forced file system check reported no inconsistencies

Evaluation

Optimizations Deadline scheduling – Rate could be changed between checkpoints, depending on the number of dirtied pages Page compression – Check against a cache for previously transmitted page, and only transmit its delta Copy-on-write checkpoints – Mark pages as copy-on-write and copy them from the COW buffer

Thank you