Improving the Reliability of Commodity Operating Systems Michael M. Swift, Brian N. Bershad, Henry M. Levy Presented by Ya-Yun Lo EECS 582 – W161.

Slides:



Advertisements
Similar presentations
Nooks: Safe Device Drivers with Lightweight Kernel Protection Domains Mike Swift, Steve Martin Hank Levy, Susan Eggers, Brian Bershad University of Washington.
Advertisements

Remus: High Availability via Asynchronous Virtual Machine Replication
Conserving Disk Energy in Network Servers ACM 17th annual international conference on Supercomputing Presented by Hsu Hao Chen.
Recovering Device Drivers Michael M Swift, Muthukaruppan Annamalai, Brian N Bershad and Henry Levy.
Operating System.
Operating Systems Manage system resources –CPU scheduling –Process management –Memory management –Input/Output device management –Storage device management.
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Fast and Safe Performance Recovery on OS Reboot Kenichi Kourai Kyushu Institute of Technology.
EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM B. Bershad, S. Savage, P. Pardyak, E. G. Sirer, D. Becker, M. Fiuczynski, C. Chambers,
Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.
Linux vs. Windows. Linux  Linux was originally built by Linus Torvalds at the University of Helsinki in  Linux is a Unix-like, Kernal-based, fully.
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat.
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, David Becker, Marc.
Extensibility, Safety and Performance in the SPIN Operating System Department of Computer Science and Engineering, University of Washington Brian N. Bershad,
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
MobiDesk: Mobile Virtual Desktop Computing Ricardo A. Baratto, Shaya Potter, Gong Su, Jason Nieh Network Computing Laboratory Columbia University September.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
Nooks: an architecture for safe device drivers Mike Swift, The Wild and Crazy Guy, Hank Levy and Susan Eggers.
An Integrated Framework for Dependable Revivable Architectures Using Multi-core Processors Weiding Shi, Hsien-Hsin S. Lee, Laura Falk, and Mrinmoy Ghosh.
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
MobiDesk: Mobile Virtual Desktop Computing Ricardo A. Baratto, Shaya Potter, Gong Su, Jason Nieh Network Computing Laboratory Columbia University.
USER LEVEL INTERPROCESS COMMUNICATION FOR SHARED MEMORY MULTIPROCESSORS Presented by Elakkiya Pandian CS 533 OPERATING SYSTEMS – SPRING 2011 Brian N. Bershad.
Figure 1.1 Interaction between applications and the operating system.
Remus: High Availability via Asynchronous Virtual Machine Replication.
G Robert Grimm New York University Xen and Nooks.
CacheMind: Fast Performance Recovery Using a Virtual Machine Monitor Kenichi Kourai Kyushu Institute of Technology, Japan.
BASE: Using Abstraction to Improve Fault Tolerance Rodrigo Rodrigues, Miguel Castro, and Barbara Liskov MIT Laboratory for Computer Science and Microsoft.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
1 Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska and Henry M. Levy Presented by: Karthika Kothapally.
CS533 Concepts of Operating Systems Class 9 Lightweight Remote Procedure Call (LRPC) Rizal Arryadi.
Stack Management Each process/thread has two stacks  Kernel stack  User stack Stack pointer changes when exiting/entering the kernel Q: Why is this necessary?
IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.
The Intel Architecture and Windows Internals
HyperSpector: Virtual Distributed Monitoring Environments for Secure Intrusion Detection Kenichi Kourai Shigeru Chiba Tokyo Institute of Technology.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Extensibility, Safety and Performance in the SPIN Operating System Ashwini Kulkarni Operating Systems Winter 2006.
Kernel, processes and threads Windows and Linux. Windows Architecture Operating system design Modified microkernel Layered Components HAL Interacts with.
Improving the Reliability of Commodity Operating Systems.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Company name KUAS HPDS dRamDisk: Efficient RAM Sharing on a Commodity Cluster Vassil Roussev, Golden G. Richard Reporter :
A Software Layer for Disk Fault Injection Jake Adriaens Dan Gibson CS 736 Spring 2005 Instructor: Remzi Arpaci-Dusseau.
Can We Make Operating Systems Reliable and Secure? Andrew S. Tanenbaum, Jorrit N. Herder, and Herbert Bos Vrije Universiteit, Amsterdam May 2006 Group.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Scott Ferguson Section 1
An OBSM method for Real Time Embedded Systems Veronica Eyo Sharvari Joshi.
EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Efficient software-based fault isolation Robert Wahbe, Steven Lucco, Thomas Anderson & Susan Graham Presented by: Stelian Coros.
Shakeel Rutgers University Vinod Rutgers University Michael M. University of Wisconsin-Madison Chih-Cheng Rutgers University.
Implementing Remote Procedure Call Landon Cox February 12, 2016.
CS5204 Fall 20051Oct. 26, 2005 Mondrix: Memory Isolation for Linux using Mondriaan Memory Protection Emmett Witchel Junghwan Rhee Krste Asanovic Sreeram.
Kernel Modules – Introduction CSC/ECE 573, Sections 001 Fall, 2012.
Overview of today’s lecture Major components of an operating system Structure and internal architecture of an operating system Monolithic Vs Micro-kernels.
Efficient Software-Based Fault Isolation
Operating System Structure
IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS
Introduction to Operating Systems
IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS
Presented by Neha Agrawal
CSE 542: Operating Systems
Operating System Overview
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Presentation transcript:

Improving the Reliability of Commodity Operating Systems Michael M. Swift, Brian N. Bershad, Henry M. Levy Presented by Ya-Yun Lo EECS 582 – W161

Outline Introduction Nooks Implementation Evaluating Reliability Performance EECS 582 – W162

Device Driver A module that translates high-level OS requests to device-specific requests Programmers writing device drivers are often less experienced EECS 582 – W163 Kernel Application Virtual Memory File Systems Networking Scheduling … Device Drivers 70% of Linux kernel code!

Motivation Kernel extensions are a major source of system failures EECS 582 – W164 Kernel Application Virtual Memory File Systems Networking Scheduling … Device Drivers 70% of Linux kernel code!

Motivation Kernel extensions are a major source of system failures EECS 582 – W165 Kernel Application Virtual Memory File Systems Networking Scheduling … Device Drivers 70% of Linux kernel code!

Goal Eliminate downtime caused by drivers Isolation - Prevent system crashes Recovery - Keep applications running EECS 582 – W166 Kernel Driver Application

Goal Eliminate downtime caused by drivers Isolation - Prevent system crashes Recovery - Keep applications running EECS 582 – W167 Kernel Driver Application

Nooks A reliability subsystem that Isolates extensions from the kernel For fault resistance, not fault tolerance System must prevent and recover from most extension mistakes For mistakes, not abuse Exclude malicious behavior EECS 582 – W168

Nooks Isolation Isolate kernel from extension failures Detect extension failures before they corrupt kernel Backward-compatible with existing systems and extensions Practical Efficient EECS 582 – W169

Nooks Isolation Manager (NIM) Transparent OS layer inserted between the kernel and kernel extensions EECS 582 – W1610

Nooks Isolation Manager (NIM) Isolation Lightweight kernel protection domain Extension Procedure Call (XPC): Communication between kernel and extensions must go this new kernel service Interposition Control flow: XPC Data transfer: Object tracking All interfaces are done through Wrappers (similar to stubs in RPC) EECS 582 – W1611

Nooks Isolation Manager (NIM) Object Tracking Control all modifications of data structures by each extensions Extensions cannot directly modify kernel data structures Recovery Detect and recover from various extension faults Recovery helped by Nooks isolation mechanisms EECS 582 – W1612

Implementation of Nooks Inside Linux kernel on Intel x86 architecture Linux kernel over 700 functions callable by extensions over 650 extension-entry functions callable by the kernel Most interactions between kernel and extensions go through function calls EECS 582 – W1613

Isolation Memory management Lightweight protection domains with virtual memory protection Read-only access to kernel Read-write access to its own domain Extension Procedure Call (XPC) Transfer control safely between extensions and the kernel Similar to Remote Procedure Call (RPC) EECS 582 – W1614

Interposition Bind extensions to wrappers when the extensions are loaded Enable the extension to execute within its lightweight protection domain Wrapper Check parameters for validity Implement call by value and result Perform an XPC to execute the desired function EECS 582 – W1615

Implementation Limitations Does not provide complete isolation or fault tolerance for all possible extension errors Current implementation of Recovery assumes that extensions can be killed and restarted safely EECS 582 – W1616

Evaluating Reliability Tested eight extensions Two sound card drivers Four Ethernet drivers A Win95 compatible file system (VFAT) An in-kernel Web server Injected 400 faults 317 resulted in extension failures EECS 582 – W1617

Reliability Results Nooks eliminated 99% of the crashes observed with native Linux EECS 582 – W1618

Reliability Results Overall, Nooks eliminated 55% of non-fatal extension failures caused by fault injection trials EECS 582 – W1619

Performance Dell 1.7 GHz Pentium 4 PC running Linux MB RAM SoundBlaster 16 sound card Intel Pro/1000 Gigabit Ethernet adapter single 7200 RPM, 41 GB IDE hard disk drive EECS 582 – W1620

Performance EECS 582 – W1621 Relative performance is determined by Comparing latency: Play-mp3, Compile-local Throughput: Send/Receive-stream, Serve-simple/complex-web-page

Conclusion Nooks focuses on achieving backward compatibility Cannot provide complete isolation and fault tolerance With modest engineering effort, isolation and recovery can dramatically improve the system’s reliability Performance loss rating from 0 to 60% EECS 582 – W1622