Debugging operating systems with time-traveling virtual machines Sam King, George Dunlap & Peter Chen University of Michigan Min Xie Slides modifed using.

Slides:



Advertisements
Similar presentations
Debugging operating systems with time-traveling virtual machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Advertisements

IGOR: A System for Program Debugging via Reversible Execution Stuart I. Feldman Channing B. Brown slides made by Qing Zhang.
Virtual Machines What Why How Powerpoint?. What is a Virtual Machine? A Piece of software that emulates hardware.  Might emulate the I/O devices  Might.
Computer Organization and Architecture
Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Operating System Support for Virtual Machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,
Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield.
CSE598C Virtual Machines and Their Applications Operating System Support for Virtual Machines Coauthored by Samuel T. King, George W. Dunlap and Peter.
Presented by: Zhiyong (Ricky) Cheng Samuel T. King, George W. Dunlap, and Peter M. Chen University of Michigan.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
A Portable Virtual Machine for Program Debugging and Directing Camil Demetrescu University of Rome “La Sapienza” Irene Finocchi University of Rome “Tor.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Operating System Support for Virtual Machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
Section 3.1: Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random.
OPERATING SYSTEMS Goals of the course Definitions of operating systems Operating system goals What is not an operating system Computer architecture O/S.
The Functions of Operating Systems Interrupts. Learning Objectives Explain how interrupts are used to obtain processor time. Explain how processing of.
CS533 Concepts of Operating Systems Jonathan Walpole.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Debugging and Profiling With some help from Software Carpentry resources.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Seminar of “Virtual Machines” Course Mohammad Mahdizadeh SM. University of Science and Technology Mazandaran-Babol January 2010.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
1 Process Description and Control Chapter 3. 2 Process A program in execution An instance of a program running on a computer The entity that can be assigned.
CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
I/O Software CS 537 – Introduction to Operating Systems.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
Lecture 4 Page 1 CS 111 Online Modularity and Memory Clearly, programs must have access to memory We need abstractions that give them the required access.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Introduction to Operating Systems Concepts
Virtualization.
Virtual Machine Monitors
Processes and threads.
CS 6560: Operating Systems Design
Chapter 2: System Structures
Operating System Structure
Introduction to Operating Systems
OS Virtualization.
Operating System Support for Virtual Machines
Computer-System Architecture
Processor Fundamentals
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Lecture Topics: 11/1 General Operating System Concepts Processes
CSE 451: Operating Systems Autumn 2005 Memory Management
Outline Chapter 2 (cont) OS Design OS structure
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
System calls….. C-program->POSIX call
CSE 153 Design of Operating Systems Winter 2019
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Chapter 13: I/O Systems.
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Debugging operating systems with time-traveling virtual machines Sam King, George Dunlap & Peter Chen University of Michigan Min Xie Slides modifed using Sam King's original version

Cyclic debugging  Iterate, revisit previous states Inspect state of the system at each point

Problems with cyclic debugging  Long runs Each iteration costly  Non-determinism Code might take different path each time bug executed Bug may not be triggered at all  Especially relevant for multithreaded apps, OS

Example: NULL pointer  Walk call stack Variable not modified ptr == NULL?

Example: NULL pointer  Set a conditional watchpoint ptr might change often ptr == NULL?

Example: NULL pointer  Conditional watchpoint Different code path, variable never set to NULL  All these are trying to find the LAST modification

Debugging with time traveling virtual machines  Provide what cyclic debugging trying to approx. ptr = NULL!

Debugging with time traveling virtual machines (TTVM)  Reverse equivalent to any debugger motion function Reverse watchpoint, breakpoint, step  Implement using time travel to previous states Must be identical to “buggy” run Instruction level granularity

Overview  Virtual machine platform  ReVirt: virtual machine replay system  Efficient checkpoints and time travel  Using time travel for debugging  Conclude

Typical OS level debugging  Requires two computers  OS state and debugger state are in same protection domain crashing OS may hang the debugger host machine application operating system debugging stub host machine kernel debugger operating system

Using virtual-machines for debugging host machine application operating system application kernel debugger virtual machine monitor [UML: includes host operating system] debugging stub  Guest OS, operating system running inside virtual machine  Debugger functions without any help from target OS Works even when guest kernel corrupted  Leverage convenient abstractions provided by VM  How similar is the guest OS?

Similarity of guest OS  Want guest OS to be similar to host OS so bugs are portable  Differences not fundamental, result of VM platform we use  Architecture dependent code different between guest OS Low-level trap handling MMU functionality Device drivers  Use the same host driver in guest  Trap and forward privileged instructions from guest IN/OUT instructions Memory mapped I/O Interrupts DMA  98% of Linux code runs unmodified in User-Mode Linux

ReVirt: fine grained time travel  Based on previous work (Dunlap02)  Re-executes any part of the prior run, instruction by instruction  Re-creates all state at any prior point in the run  Logs all sources of non-determinism external input (keyboard, mouse, network card, clock) interrupt point  Low space and time overhead SPECweb, PostMark, kernel compilation logging adds 3-12% time overhead logging adds 2-85 KB/sec

Checkpoints: coarse grained time travel  Periodic checkpoints for coarse grained time travel  Save complete copy of virtual-machine state: simple but inefficient CPU registers virtual machine’s physical memory image virtual machine’s disk image  Instead, use copy-on-write and undo/redo logging

Checkpointing for faster time travel  Restore back to a prior checkpoint Undo-log associated with this checkpoint n – Memory pages modified between checkpoint n and n+1  Move forward to a furture checkpoint Redo-log associated with next checkpoint n+1 – Memory pages modified between checkpoint n and n+1

How to time travel backward/forward checkpoint 2 redo log undo log checkpoint 1

Sharing Log Page checkpoint 2 redo log undo log checkpoint 1checkpoint 3

Logging for Disk  Avoid copying disk blocks into undo/redo logs Maintaining in memory maps to new/old pages

How to time travel backward checkpoint 1 redo log undo log

Using time travel to implement reverse watchpoints  Example: reverse watchpoint  First pass: count watchpoints  Second pass: wait for the last watchpoint before current time checkpoint

Runtime Adding & Deleting Checkpoints  Delete checkpoints to free up space Assume 3 checkpoints (c 1, c 2, c 3 ) Merge c 2 's undo log with c 1 's undo log Merge c 2 's redo log with c 3 's redo log  Optionally add checkpoints during replay to speed up time travel operation Monitor pages changed after last checkpoint -> redo COPY all pages in last checkpoint's undo log -> undo

Using TTVM  Checkpoint at moderate intervals (e.g., 25 seconds) < 4% time overhead < 6 MB/s space overhead  Exponentially thin out prior checkpoints (Boothe 00)  Take checkpoints at short intervals (e.g., 10 seconds) < 27% time overhead < 7 MB/s space overhead

Experiences with TTVM  Corrupted debugging information TTVM still effective when stack was corrupted  Device driver bugs Handles long runs Non-determinism Device timing issues  Race condition bugs Live demo

Host device drivers in the guest OS  Modified UML to run real device drivers in the guest OS Traps & Forwards the privileged I/O instructions, DMA requests issued by guest OS device driver to actual hardware Programmers specifies which devices UML can access and VMM Enforce I/O port space & memory access for the devices  Handle undeterministic of device Log any info. sent from the device to the driver When replay, output to the device is surpressed

Experiments  Setup Host OS: Linux with skas extensions for UML and TTVM modifications Guest OS: UML port of Linux with host drivers for USB and soundcard devices

Time & Space Overhead

Conclusions  Programmers want to debug in reverse  Current debugging techniques are poor substitutes for reverse debugging  Time traveling virtual machines efficient and effective mechanism for implementing reverse debugging

Questions  Is it possible to debug device drivers without the device being present? Is it possible to replay all the interaction (both requests and responses) in such a way that the debugger can later supply the values as if the device is? ReVirt only logged one side of the communication on the assumption that the identical output could be obtained by providing identical input. However, it could potentially be useful to log runs at several locations and then debug in a lab where the device is not available.

Questions  In this paper, they mention that a performance counter on the Intel P4 was used to count the number of branches during logging. In ReVirt they talked about it being the branch_retired counter of the Athlon. Which was it actually? Or did they change hardware between the experiments?

Questions  "Replay occurs at approximately the same speed as the logged run." Some bugs only show up after a long runtime of an application under heavy load (for example, a difficult-to-find bug in a Web server) While checkpoints can be used to skip forward in time quickly, they do not necessarily catch all accesses to a particular variable that is corrupted. Is it possible to do this faster?

Questions  The first example (the USB driver) doesn't sound like it should need time-travelling debugging. The stack trace is intact, and variables' values can be seen easily. The debugger in the kernel was working fine (the failure didn't break the kernel debugger itself, or any of its dependencies). In my experience, it's usually very easy to figure out the logic that leads to such things; the difficulty is usually what policy should be used to *FIX* the problem, not to find out how the problem occurs in the first place. Why is this a compelling example in favour of time-travelling debugging?

Questions  If give the symbol table of the OS source, can we debug the source code and let it run step by step just like most IDEs do? To this question, I have used a windows kernel debugger called windbg, but it is really horrible.

Questions  The VMM must be modified to support running real device drivers in the guest OS. Can a VMM run multiple different guest operating systems in this way? And the device drivers in guest OS will be physical device-specific or not?  In the system structure for this paper, how would guest-user host process and guest-kernel host process interact with each other? Why not make the guest-user host process above guest-kernel host process?

Questions  How do they make the guest OS look like a single process to the host OS?  They run the same guest and host OS's. Was the reason their port required few changes because both OS's ran on the same machine? So if you ran Windows as the guest and Linux as the host, as long as you used versions that were designed for the same machine (i.e. x86), then the port would require few changes? I guess I don't understand why they say "UML's VMM is similar enough to the hardware interface that most code is identical between a host OS and a guest OS". What is the significance of this, and what happens if they are different OS's and this isn't the case?  TTVM needs to track all modifications that gdb makes to the virtual state to make debugging state persistent across checkpoint restores. How do they do this?

Questions  I’m just curious about the statement ‘With our extension to UML, 98% of the host OS code base (including device drivers) can be debugged in the guest OS”. How did they get to know this exact number?

Questions  Is it efficient to implement the reverse continue by two-passes replaying? And if reverse step is needed for every reversed instruction, every reverse step needs a traveling from the nearest checkpoint, which seems too inefficient.  If we have the stack dump of the crash point, why do we really need the reverse-traveling to debug? So I think, cases we really need record-and-replay is debugging parallel-concurrent programs or losing corrupted stack dump.

Questions  Using emulators/VMMs for OS debugging is an established technique. However, the authors approach makes a qualitative difference by very clever use of visualization. The paper does not clearly indicate the debugging process of OS in a multi processor system. Is it possible to use this approach in a multi processor system? If so how check pointing can be done in this case?

Multi-processor support  Checkpointing does not change  Must be able to support replay Topic of ongoing research Support at hardware level, flight data recorder () – Fast, limited amount of time recorded Software level, page protections to track sharing – Might be slow?  Might not allocate all processors to one OS

Questions  The authors point out that a para-virtualized OS has the potential to diverge in behavior when compared to a non-paravirtualized version of the same OS. Further the authors claim that bugs can be 'lost' as the two versions (PV and vanilla) of the OS diverge. The authors claim that in this case 92% of the non- driver code is identical between vanilla linux and UML. Is this LoC metric an accurate way to measure the degree of divergence between the two kernels? It is probable that the 8% of code changed reflects modifications to 'core' parts of the OS code that could drastically change the behavior of the OS. Is there perhaps a better way to measure this divergence?

Questions  TTVM uses a two pass algorithm to locate previous breakpoints, watchpoints, &c.In a way, what they present here is a framework that compresses an execution history and then interprets the resulting uncompressed execution stream to do useful debugging things (reverse breakpoints in this case). what else could be done with this framework to help debug OSes?

Questions

Thank You ^_^