© Microsoft Corporation 20041 Windows Kernel Internals II Virtual Machine Architecture University of Tokyo – July 2004 Dave Probert, Ph.D. Advanced Operating.

Slides:



Advertisements
Similar presentations
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Advertisements

Microprocessors system architectures – IA32 real and virtual-8086 mode Jakub Yaghob.
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
Operating System Support Focus on Architecture
G Robert Grimm New York University Disco.
Operating System Structure. Announcements Make sure you are registered for CS 415 First CS 415 project is up –Initial design documents due next Friday,
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Virtualization for Cloud Computing
Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
虛擬化技術 Virtualization and Virtual Machines
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Tanenbaum 8.3 See references
Computer Organization
COP 4600 Operating Systems Spring 2011 Dan C. Marinescu Office: HEC 304 Office hours: Tu-Th 5:00-6:00 PM.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Protection and the Kernel: Mode, Space, and Context.
Operating System Support for Virtual Machines Samuel T. King, George W. Dunlap,Peter M.Chen Presented By, Rajesh 1 References [1] Virtual Machines: Supporting.
Virtualization Concepts Presented by: Mariano Diaz.
Benefits: Increased server utilization Reduced IT TCO Improved IT agility.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Composition and Evolution of Operating Systems Introduction to Operating Systems: Module 2.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
Architecture Support for OS CSCI 444/544 Operating Systems Fall 2008.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 2: Operating-System Structures.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
1 CSE451 Architectural Supports for Operating Systems Autumn 2002 Gary Kimura Lecture #2 October 2, 2002.
1.4 Hardware Review. CPU  Fetch-decode-execute cycle 1. Fetch 2. Bump PC 3. Decode 4. Determine operand addr (if necessary) 5. Fetch operand from memory.
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
Lecture 5 Rootkits Hoglund/Butler (Chapters 1-3).
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
Chapter 6 Limited Direct Execution Chien-Chung Shen CIS/UD
Introduction to Operating Systems Concepts
Virtualization.
Virtual Machine Monitors
Introduction to Operating Systems
Homework Reading Machine Projects Labs
Kernel Design & Implementation
Xen and the Art of Virtualization
Hardware and OS Design and Layout.
Module 12: I/O Systems I/O hardware Application I/O Interface
Operating System Structure
CS 286 Computer Organization and Architecture
OS Virtualization.
Virtualization Techniques
A Survey on Virtualization Technologies
Architectural Support for OS
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
CSE 451: Operating Systems Autumn 2001 Lecture 2 Architectural Support for Operating Systems Brian Bershad 310 Sieg Hall 1.
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
OS Components and Structure
Introduction to Virtual Machines
CSE 451: Operating Systems Winter 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 412 Sieg Hall 1.
Outline Operating System Organization Operating System Examples
Architectural Support for OS
Introduction to Virtual Machines
Xen and the Art of Virtualization
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Chapter 13: I/O Systems.
Presentation transcript:

© Microsoft Corporation Windows Kernel Internals II Virtual Machine Architecture University of Tokyo – July 2004 Dave Probert, Ph.D. Advanced Operating Systems Group Windows Core Operating Systems Division Microsoft Corporation

© Microsoft Corporation Hosted VM Model Windows acts as a “host” –Resources for each VM are allocated from the host –All I/O with external devices is performed through the host “Guest” code runs within a separate context –Independent address space –Specialized “VMM” kernel Process Host Kernel Host Physical Machine Virtual Machine Process VMM Kernel Host contextGuest context Guest Code

© Microsoft Corporation VM Components VMM Kernel –Thin layer, all in assembly –Code executed at ring-0 –Exception handling –External Interrupt pass- through –Page table maintenance –Located within a 32MB area of address space known as the “VMM work area” –Work area is relocatable –One VMM instance per virtual processor Host Physical Machine VMM Kernel Host contextGuest context Guest Code VMM Driver NDIS Driver Host Kernel Virtual PC Virtual Server Virtual Machine “Additions”

© Microsoft Corporation VM Components VMM Driver -Provides kernel-level VM-related services -CreateVirtualMachine -CreateVirtualProcessor -ExecuteVirtualProcessor -Implements context switching mechanism between the host and guest contexts -Loads and bootstraps the VMM kernel -Much of the security work we’ve done recently involved repackaging the VMM kernel code into the VMM driver Host Kernel Host Physical Machine VMM Kernel Host contextGuest context Guest Code VMM Driver NDIS Driver Virtual PC Virtual Server Virtual Machine “Additions”

© Microsoft Corporation VM Components NDIS Filter Driver -Allows VM to send and receive Ethernet packets via physical Ethernet hardware -Spoofs unique MAC addresses for virtual NICs -Injects packets into host Ethernet stack for guest- to-host networking Host Kernel Host Physical Machine VMM Kernel Host contextGuest context Guest Code VMM Driver NDIS Driver Virtual PC Virtual Server Virtual Machine “Additions”

© Microsoft Corporation VM Components Virtual PC / Virtual Server executables -Device emulation modules -Resource allocation -VM configuration creation & editing -VM control (start, stop, pause, save) -Scripting APIs -User interaction -Host side of guest/host integration features Virtual PC Host Kernel Host Physical Machine Virtual Server VMM Kernel Host contextGuest context Guest Code VMM Driver NDIS Driver Virtual Machine “Additions”

© Microsoft Corporation VM Components Virtual Machine “Additions” –Collection of components installed within the guest environment by the user –Implement optimizations Video SCSI Networking (in the future) Guest kernel patches –Implement guest half of guest/host integration features Clipboard sharing File drag and drop Arbitrary video resizing Virtual PC Host Kernel Host Physical Machine Virtual Server VMM Kernel Host contextGuest context Guest Code VMM Driver NDIS Driver Virtual Machine “Additions”

© Microsoft Corporation VM Execution Loop Host code repeatedly calls ExecuteVirtualProcessor VMM acts as “co-routine” (i.e. VMM state is saved and restored each time ExecuteVirtualProcessor is called) Cycles spent inside guest context are counted against the calling thread –Host code can control how much time is spent in guest Return code indicates why ExecuteVirtualProcessor returned –Time slice complete –IN or OUT instruction encountered –HLT instruction encountered

© Microsoft Corporation Processor Virtualization x86 Virtualization –Processor is non-virtualizable Poor privileged and user state separation –For example, EFLAGS register contains condition codes (user state) and interrupt mask (privileged state) Some instructions that access privileged state are non-trapping –Overly complex and messy architecture Many modes, legacy protection mechanisms and general “warts”

© Microsoft Corporation Processor Emulation In general, emulation is necessary –VM uses a binary translation mechanism Most instructions are copied directly Non-virtualizable (“dangerous”) instructions are modified –Binary translation execution imposes ~50% performance overhead

© Microsoft Corporation Direct Execution In some processor modes, it’s safe to use direct execution, others require emulation Real ModeEmulation Virtual 8086 (v86) modeDirect Execution Protected Mode Ring 3Direct Execution (with a few exceptions) Protected Mode Ring 0Emulation, unless known to be safe

© Microsoft Corporation Direct Execution “Ring Compression” –Guest ring-0, 1, 2 code is executed at ring 1 –Guest ring-3 code is executed at ring 3 –Provides correct MMU protection semantics (since ring 0-2 can access privileged pages) Direct execution of ring-0 code is only allowed if the VMM is notified that it’s “safe” –This requires patching certain “dangerous” instruction sequences in the Windows kernel and HAL –Patching is performed at runtime in memory only –Patches are different for each version of Windows kernel & HAL

© Microsoft Corporation Guest OS Patching Examples: –PUSHFD / POPFD –CLI / STI –Spin lock acquisition failure (in the future) pushfd cli moveax,[ebp+8] call[eax] popfd ret Original Code pushfd never traps (breaks IF virtualization) popfd never traps (breaks IF virtualization) cli traps, but cannot be easily patched with a jmp because it only takes up one byte This sequence prevents correct behavior in direct execution

© Microsoft Corporation Guest OS Patching Synthetic instructions –Use an illegal instruction form (reserved for us by Intel) –Five bytes in length (for ease in patching) –Exhibit same side effects of real instruction pushfd cli moveax,[ebp+8] call[eax] popfd ret vmpushfd vmcli moveax,[ebp+8] call[eax] vmpopf ret Original CodeWith Synthetic Instructions All synthetic instructions trap and are five bytes long so they can be replaced with jmp or call instructions at runtime This sequence allows correct behavior in direct execution, but generates three traps

© Microsoft Corporation Guest OS Patching Runtime Guest OS Patching –Replace synthetic instructions with subroutine calls –This technique prevents us from exposing internal VMM implementation details to OS vendors. We can change the subroutine implementations in the future. pushfd cli moveax,[ebp+8] call[eax] popfd ret vmpushfd vmcli moveax,[ebp+8] call[eax] vmpopf ret Original CodeWith Synthetic Instructions call_vmpushfd call_vmcli moveax,[ebp+8] call[eax] call_vmpopfd ret With Runtime Patches This patched sequence is correct and fast

© Microsoft Corporation Direct Execution Overhead Necessary to trap into the VMM kernel on some instructions –IN & OUT for I/O device emulation –STI & CLI for interrupt mask virtualization –INT & IRET to catch ring transitions –INVLPG and MOV to CR3 for page table virtualization Traps are expensive – and getting worse –~500 cycles on Pentium III or AMD processors; ~2000 cycles on Pentium 4 –Runtime patching of some trapping instructions is possible

© Microsoft Corporation Physical Memory & RAM Virtualized RAM –User decides how much RAM is associated with each virtual machine Physical pages –Allocated by VMM from host OS –Currently allocated at the time the VM starts, but could be allocated on demand –Host physical addresses don’t match guest physical addresses

© Microsoft Corporation Logical Page Mappings Logical Memory –Logical mappings defined by guest page tables (mostly) –VMM finds 32MB unused area for the VMM code and data (the “VMM work area”). –VMM monitors guest OS address space usage and relocates itself if necessary

© Microsoft Corporation VMM Page Tables VMM maintains its own private page table –Initially, only the VMM work area is mapped Physical CR3 PD Table VMM work area mapped here VMM Page Tables Virtual CR3 PD Table Guest Page Tables Unused area

© Microsoft Corporation VMM Page Tables VMM maintains its own private page table –Initially, only the VMM work area is mapped –Guest pages are mapped on demand as they are accessed Physical CR3 PD Table VMM work area mapped here VMM Page Tables Virtual CR3 PD Table Guest Page Tables Unused area

© Microsoft Corporation VMM Page Tables VMM maintains its own private page table –Initially, only the VMM work area is mapped –Guest pages are mapped on demand as they are accessed –Guest pages are unmapped when guest flushes its TLB –VMM work area is relocated as necessary Physical CR3 PD Table VMM work area mapped here VMM Page Tables Virtual CR3 PD Table Guest Page Tables Previous VMM location now in use by the guest

© Microsoft Corporation Memory Sharing Memory allocated with VMM APIs can be used in three ways –Mapped within the VMM work area –As guest virtual RAM (mapped into the guest address space according to the guest page tables) –Mapped within the host context (for emulated DMA operations)

© Microsoft Corporation Device Emulation Device emulation modules -Emulate behaviors of a real hardware device -Register “callbacks” for I/O port accesses -Can access virtualized “RAM” for emulated DMA operations -Communicate among themselves (e.g. Ethernet module “plugs into” the PCI bus module and communicates with the PIC module to assert interrupts) -May call host services to perform emulation -Can be suspended, saved and restored Device Emulation Models 440BX chipset with PIIX4 System BIOS (AMI) PCI Bus ISA Bus Power Management SM Bus 8259 PIC PIT DMA Controller CMOS RTC Memory Controller RAM & VRAM COM (Serial) Ports LPT (Parallel) Ports IDE/ATAPI Controllers SCSI Adapters (Adaptec 2940) SVGA Video Adapter (S3 Trio64) VESA BIOS 2D Graphics Accelerator Hardware Cursor Ethernet Adapters (DEC 21140) SoundBlaster Sound Card Keyboard Mouse

© Microsoft Corporation Device I/O Accesses I/O accesses (IN & OUT instructions) -Trap into VMM kernel -Force a context switch back to the host context where device emulation module is invoked -“Fast I/O handlers” can be called from within the VMM context -Some OUTs can be batched MMIO accesses -Caught in VMM’s page fault handler -Very expensive Host Kernel Host Physical Machine Virtual PC VMM Kernel Host contextGuest context Guest User Code Guest Kernel Guest HAL Host HAL VMM Driver Device Emulation Module OUT instr. GPF trap Context Switch

© Microsoft Corporation Discussion