Presentation is loading. Please wait.

Presentation is loading. Please wait.

System VMs This material is based on the book, Virtual Machines: Versatile Platforms for Systems and Processes, Copyright 2005 by Elsevier Inc. All rights.

Similar presentations


Presentation on theme: "System VMs This material is based on the book, Virtual Machines: Versatile Platforms for Systems and Processes, Copyright 2005 by Elsevier Inc. All rights."— Presentation transcript:

1 System VMs This material is based on the book, Virtual Machines: Versatile Platforms for Systems and Processes, Copyright 2005 by Elsevier Inc. All rights reserved. It has been used and/or modified with permission from Elsevier Inc.

2 System VMs 2  Support multiple guest OSes on single hardware platform; all running the same ISA Linux Application Linux OS Windows Application Windows OS OS/2 Application OS/2 OS Intel x86 Hardware Virtual Intel x86

3 System VMs 3 System VM Outline  Applications  Virtualizing Processors  Virtualizing Memory  Virtualizing I/O  Formal Virtualizability – ISA features  Case Studies – IBM VM, x86/VMware, Intel VT-x

4 System VMs 4 Applications  Simultaneous support for multiple OSes/Apps Easy way to implement timesharing  Simultaneous support for different OSes/Apps E.g. Windows and Unix  Error containment If a VM crashes, the other VMs can continue to work Assumes VMM is correct (smaller/simpler)  Operating System debugging Can proceed while system is being used for normal work

5 System VMs 5 Applications, contd.  Operating System Migration Can proceed while “old” OS continues to be used Production Users System Programmers New Release Old Release Unconverted Production Users Converted Production Users new release being tested new release installed System Programmers Converted Production Users Permanently Unconverted Production Users newer release being tested TIME

6 System VMs 6 Applications, contd.  Retrofitting new features Have VMM transform new device into a virtual device  Support for multiple networked machines on one physical machine Allows debug of network software  Enables complex debugging and performance monitoring tools By putting them in the VMM (not the guest OS)  Education

7 System VMs 7 History  Early-60s IBM M44/44X VM for modified IBM 7044 “close enough to a virtual machine to show that ‘close enough’ did not count”  Mid-60s IBM CP-40 Time-sharing system that protects users via virtual machines aka “pseudo machines” Used modified IBM 360/40 Implemented via assoc. memory and microcode VMs used real memory; VMM managed virtual memory  CMS Cambridge (conversational) monitor system – Single user OS developed for VMs (like DOS)

8 System VMs 8 History  Mid/late-60s IBM 360/67 -- CP-67 First 360 with VM. CMS an essential part  Late 60s/early 70s VMs blossomed as a research topic  Early 70s several VM implementations Honeywell DEC RCA Several university projects

9 System VMs 9  Virtual Machine Monitor (VMM) manages real hardware resources  All Guest systems must be given logical hardware resources  All resources are virtualized By partitioning real resources By sharing real resources  Guest state must be managed By using indirection By copying x86 PC Linux applications Windows applications OS/2 applications Virtual Machine Monitor (VMM)

10 System VMs 10 System VMs: Processor Mgmt/Protection  VMM runs in system mode VMM manages/protects processor through conventional mechanisms  Guest OSes run in user mode  Guest OSes do not have direct control over hardware resources All attempts to interact w/ hardware resources are intercepted by VMM  VMM manages shadow copies of Guest System state (incl. control registers)  VMM schedules and runs Guest Systems

11 System VMs 11 VM Timesharing  VMM Timeshares resources among guests Similar to OS timesharing applications VMM saves architected state of running VM VMM restores architected state for next VM VMM sets PC to timer interrupt handler of OS in next VM VMM sets timer interval and enables interrupts Timer interrupt occurs VMM determines next VM to be activated VMM ActiveFirst VM ActiveNext VM Active

12 System VMs 12 Native and Hosted VMs Non-privileged modes Privileged Mode Applications OS Traditional uniprocessor system Hardware Virtual Machine VMM Hardware Virtual Machine Host OS Hardware VMM Virtual Machine Host OS Hardware VMM Native VM system User-mode Hosted VM system Dual-mode Hosted VM system

13 System VMs 13 Virtualizing State  Indirection  Hold guest state in VMM memory  Change pointer on guest switch  Example: registers Processor VMM Memory Register values for VM 1 Register values for VM 2 Register values for VM 3 Register Block Pointer

14 System VMs 14 Virtualizing State  Copying  Hold guest state in VMM Memory  Copy state on guest switch VMM Memory Processor Registers Register values for VM 1 Register values for VM 2 Register values for VM 3

15 System VMs 15 Processor Management/Protection  Traps and interrupts (& sys calls) Transfer to VMM VMM determines appropriate Guest OS VMM transfers to Guest OS  Guest OS “return” to user app. Transfer to VMM VMM bounces return back to Guest app.  Read/Write of protected control registers Trap to VMM VMM reads/modifies guest copy May modify shadow copy Returns to Guest privileged operation next instruction check privileges perform operation return system call/trap vector location: virtual vector location: Application Guest OS VMM

16 System VMs 16 OS VMs: Key Issue – ISA Virtualizability  What if privileged instruction no-ops in user mode? (rather than trapping) Then… VMM can’t intercept when Guest OS attempts the privileged instruction  What if user can access memory with real address? Then… a guest OS may see that the real memory it really has is different from the memory it thinks it has  What if user can read system control registers? Then… guest OS may not read the same state value that it thinks it wrote

17 System VMs 17 Virtualizability (Popek, Goldberg, 74)  Classic work in formalizing OS VM concepts  Defines basic VM properties  Defines properties of instruction sets  Proves that VMM can be constructed if instruction set properties hold  Extends to recursive VMs  Reduces to hybrid VMs

18 System VMs 18 VM Properties  Virtual Machine: efficient, isolated duplicate of the real machine  Virtual Machine Monitor: software that implements VMs  Essential VMM characteristics 1) Provides an environment essentially identical to the real machine Except timing and availability of resources 2) Programs show only minor decreases in speed  Mostly native instruction execution 3) Has complete control of system resources

19 System VMs 19 Privileged Instructions, Definition:  Trap if executed in user mode; not in supervisor mode  Privileged instructions are required to trap No-op in user mode is not enough

20 System VMs 20 Control Sensitive instructions: 1.All instructions that change the amount of (memory) resources (or the mapping) base/limit register in simplified paper version page table in general 2.All instructions that change the processor mode  Instructions that provide control of resources  Examples: Load TLB (if TLB is architected) Load control register Return to user mode

21 System VMs 21 Behavior Sensitive instructions: 1. All instructions whose results depend on the mapping of physical memory 2. All instructions whose behavior depends on the mode  Instructions whose behavior depends on configuration of specific resources (and who owns them)  Examples: Load physical address POPF (Intel x86): Interrupt-enable flag remains unaffected in user mode

22 System VMs 22 Instruction Types -- Summary Privileged Non- Privileged Sensitive Behavior- sensitive Control- sensitive Innocuous Sensitive  Innocuous Instructions: Those that are not control or behavior sensitive

23 System VMs 23 VMM components Instruction trap occurs Dispatcher Allocator Interpreter Routine 2 Interpreter Routine 1 Interpreter Routinen These instructions desire to change machine resources, e.g. Load Relocation Bounds Register These instructions do not change machine resources, but access privileged resources, e.g. IN, OUT, Write TLB Privileged Instruction Privileged Instruction Privileged Instruction Privileged Instruction

24 System VMs 24 VMM components  Dispatcher Target of vectored traps – entry point for VMM Decides which of other components to call  Allocator Decides which system resources should be provided and to manage shared resources among VMs  Interpreters Emulate the effects of privileged instructions  VMM runs in supervisor mode; all other software in user mode

25 System VMs 25 Privileged Instruction Handling LPSW: Load Program Status Word Includes Mode Bit and PC (among other things) Guest OS code in VM (user mode) Privileged instruction (LPSW) …... … Next instruction (target of LPSW) VMM code (privileged mode) LPSW Routine: Change mode to privileged Check privilege level in VM Emulate instruction Compute target Restore mode to user Jump to target Dispatcher

26 System VMs 26 Virtual Machine “requirements” 1. All innocuous instructions are executed by the hardware directly 2. The allocator must be invoked when any program attempts to affect system resources 3. Any program executes exactly as on real hardware except For timing Availability of system resources  A VMM satisfies all three requirements  Precise versions of informal definitions given earlier

27 System VMs 27 Virtual Machines: Main Theorem A virtual machine monitor can be constructed if the set of sensitive instructions is a subset of the set of privileged instructions Proof shows Equivalence by interpreting privileged instructions and executing remaining instructions natively Resource control by having all instructions that change resources trap to the VMM Efficiency by executing all non-privileged instructions directly on hardware A key aspect of the theorem is that it is easy to check

28 System VMs 28 Recursive Virtualization Non-privileged modes Privileged Mode Virtual Machine VMM Hardware Virtual Machine Virtual Machine 2 nd level VMM Virtual Machine

29 System VMs 29 Recursive Virtualization Running a VMM as a VM on a VM on a VM…. Theorem: A conventional third generation computer is recursively virtualizable if it is (a) virtualizable, and (b) a VMM without any timing dependences can be constructed for it Proof – A VMM is a program and from the VM theorem will be “identically performing” except for timing dependences and resource constraints. Timing is excluded in the theorem; Resource constraints only limit the depth of recursion.

30 System VMs 30 Hybrid Virtualization  Some ISAs are more virtualizable than others User sensitive instructions Executed in user mode and can change memory resources or processor mode, or whose behavior depends on real memory locations Supervisor sensitive instructions Executed in supervisor mode and can change memory resources or processor mode, or whose behavior depends on real memory locations

31 System VMs 31 Hybrid Virtualization: Theorem A hybrid virtual machine monitor can be constructed if the set of user sensitive instructions is a subset of the set of privileged instructions  Nonprivileged supervisor sensitive instructions are OK Example: PDP-10 JRST 1 – return to user mode (does not trap if already in user mode)  When the VMM executes the VM supervisor, it must use some form of emulation to locate supervisor sensitive instructions Low efficiency, but only in VM supervisor, not user code  If a user sensitive instruction is not privileged, then the VMM must emulate all the user code Fails efficiency condition But “binary translation” be done more efficiently than interpreting

32 System VMs 32 Case Study: Virtualizing the x86 ISA  x86 Evolved through many extensions  Instruction set is not (strictly) virtualizable Nor is it hybrid virtualizable

33 System VMs 33 X86 Processor Control  Uses “baroque” late 1970s style protection rings Unix was a reaction to this style  Four rings, 0-3; 0 OS Kernel 1 High priority drivers and OS services 2 Low priority device drivers 3 User  Transfer to lower ring (higher privilege) must go through “gate” 3 2 1 0

34 System VMs 34 Memory Mapping  Segments map to 2GB memory space  2 GB space maps to fixed-size pages  Segment descriptor info Valid, Base, Limit, Type (code or data) R/W rights, Descriptor Privilege Level (DPL) Etc.

35 System VMs 35 Memory Mapping segment register stack segment register segment register data segment register segment register data segment register segment register data segment register segment register data segment register segment register code segment register base, limit, rights (R/W), Desc. Priv. Level (DPL) Req. Priv. Level (RPL) 2GB Memory Code Data Segment Descriptor Tables loaded into segment registers 2 Level Page Table Real Pages Descriptor Table Registers

36 System VMs 36 Addressing  Addressing is via Segment Registers  Segment Registers CS code segment SS stack segment DS, ES, FS, GS data segments All memory accesses are via a segment register  Segment descriptors are entered into segment registers And given an RPL, Requestor Privilege Level In some cases privilege is lowest of RPL,DPL e.g. when pointers are passed

37 System VMs 37 X86 Processor/Memory Protection  CPL –current protection level, normally determined by DPL of current code segment CPL == processor mode  To access data, CPL  DPL  To call procedure, must enter through gate if CPL(callee) < CPL(caller) >

38 System VMs 38 X86 Instruction Set Virtualizability  Ordinarily: Levels 0,1,2 == supervisor Level 3 == user  To virtualize, everything runs at level 3  IN, INS, OUT, OUTS – I/O instructions Perform check CPL  IOPL (I/O Privilege Level) Not privileged (by Goldberg’s defn) Control sensitive (I/O is resource), action sensitive to CPL Could be user sensitive  POPF, PUSHF Push/pop stack to/from EFLAGS register EFLAGS contains IOPL (among other things) And this flag indicates IO privilege level of current task

39 System VMs 39 X86 ISA Virtualizability, contd.  SGDT, SIDT, SLDT, SMSW, STR Copy descriptor pointer register, or system state information Typical manual entry: “The SGDT and SIDT instructions are only useful in operating-system software; however, they can be used in application programs without causing an exception to be generated” E.g. behavior sensitive, non privileged  VERR/VERW Verify if addressed segment is readable or writeable by CPL – Seem like perfectly reasonable instructions, BUT behavior sensitive and not privileged

40 System VMs 40 X86 ISA Virtualizability, contd.  LAR/LSL LAR -- load access rights and DPL LSL – load segment limit May no-op, in effect, if CPL isn't good enough. I.e. performs CPL/RPL check before it does inst. Behavior sensitive and not privileged  MOVs, PUSH/POP to/from segment registers Copy RPL from segment register Behavior sensitive and not privileged  Pre-Scanning is probably a necessity

41 System VMs 41 Hybrid Virtualization: Patching Scan Guest OS, find problem instructions, replace with jump to VMM Control transfer, e.g. trap Scanner and Patcher VMM Code Patch for discovered critical instruction Original ProgramPatched Program

42 System VMs 42 Hybrid Virtualization: Code Caching Scan Guest OS, “translate” into code cache, find problem instructions, replace with jump to VMM Control transfer, e.g. trap Code section emulated in code cache Patched ProgramVMM Block 1 Block 2 Block 3 Translation Table Code Cache Block 1 Block 2 Block 3 Two critical instructions combined into a single block Specialized Emulation Routines

43 System VMs 43 Virtualizing Memory: Review PT Pointer OS managed Real Pages user super user super process 1 PT process n PT OS memory region Context switch

44 System VMs 44 Virtualizing Memory  Real memory partitioning? Could be fixed partition per guest => inefficient Typically flexible partitioning via VMM management  Guest manages its virtual page tables  Guest page table addresses are write protected  VMM manages shadow page tables that reflect actual mapping to physical pages Note Real / Physical page distinction  VMM can change shadow page table by writing page table pointer i.e. virtual machine state change via indirection

45 System VMs 45 Virtualizing Memory – Example guest OS switch context switch context switch system call

46 System VMs 46 Virtualizing Memory – Operations  Guest application performs system call Trap to VMM VMM changes shadow mapping to reflect guest privilege change  Guest OS performs context switch Writes PT pointer Trap to VMM VMM writes guest PT pointer VMM modifies shadow PT pointer write PT pointer next instruction check privileges write guest PT ptr return Guest OS VMM Guest PT ptr write shd. PT ptr Shadow PT ptr

47 System VMs 47 Virtualizing Memory -- TLBs  TLB plays role of page table  Page table is just a software structure of which the VMM has no special knowledge  Assume TLB entry: PId, Protection bits, usage bits, real page frame  Virtualize TLBs VMM keeps track of Guest’s copies VMM manages real copy Real TLB holds subset of pages mapped in Guest copy  Virtualize PIds VMM manages real PIds Keep track of mapping from guest PIds to real PIds At any given time all TLB entries with same PId are associated with same guest

48 System VMs 48 Virtualizing TLBs  TLB Read/Write are privileged instructions Behavior and control sensitive  Guest OS write TLB Intercepted by VMM VMM updates guest’s virtual copy VMM may modify real TLB  Guest OS read TLB Intercepted by VMM VMM reads guest’s virtual copy and returns contents to guest May have to merge in usage data from real version in TLB

49 System VMs 49 Virtualizing TLBs  TLB miss Traps to VMM VMM check to see if virtual TLB maps page If so, VMM handles it and silently returns Else, VMM reflects fault to guest OS  TLB management Can’t switch TLBs via indirection as with page tables PId management can give similar control, however  Guest system call/returns (privilege changes) Flush old mode PId entries New mode TLB entries re-loaded on demand Or write new TLB entries with privileges Use two real PIds per virtual PId One with virtual system mode privileges Other with virtual user mode privileges

50 System VMs 50 Virtualizing I/O  Hardest part of virtualization Many device types Many devices of each type Each with its own driver New devices may be added during lifetime of system  In older, “classic” systems, less of a problem Entire system developed by one company Far fewer devices to worry about Channels (IO Processors) isolated key IO software

51 System VMs 51 I/O Architecture  I/O instructions Special privileged opcodes Similar to loads/stores Address and data read/written on I/O bus  Memory mapped I/O Load/stores to special (protected) memory addresses Addresses/data decoded by hardware and translated to I/O addresses/data  Addresses indicate I/O devices/registers  Data can be status, commands, or real data

52 System VMs 52 I/O Architecture (contd)  DMA (block) transfers may require several I/O operations Starting address(es) Block length Command (read, write, interrupt on completion) Requires exclusive device access  Interrupts From I/O devices to force processor transfers to I/O software routines Status Decode Start AddressBlock SizeData Buff data bus address bus

53 System VMs 53 I/O Management: review  OS manages I/O resource Allocates space on storage devices, etc. Serializes requests for shared devices  User software performs system calls with general I/O requests  OS converts I/O calls to driver calls Driver contains device-specific software Exact commands, controller registers, etc.  Driver generates device (and bus)- specific I/O operations system calls Hardware I/O Drivers phy. mem. and I/O operations VM mgr Operating System driver calls Application

54 System VMs 54 Device Types  Dedicated Monitor, mouse, keyboard Device can’t be virtualized; must be shared (under user control) VMM still controls due to privileged mode  Partitioned Disk Make multiple, smaller virtualized versions  Shared Network adapter VMM manages virtual state information Translate virtual requests to physical requests  Spooled Printer Shared but at coarse granularity

55 System VMs 55 Spooled Devices  Two level spool table  First write to VM spool area  When ready, VMM copies to VMM spool area  Then invokes device  When device finished Both VM and VM spool tables receive “complete”  Optimizations are possible E.g. VMM uses VM spool buffer Virtual Machine 1 Spool Table ProgramStatusLocationSize APrinted1000400 BCompleted2000200 CRunning3000200 DCompleted4000500 Virtual Machine 2 Spool Table ProgramStatusLocationSize PRunning1000400 QCompleted2000800 Real loc. 11000 12000 13000 14000 Real loc. 21000 22000 VMM Spool Table ProgramStatusSize APrinted400 QPrinting800 BWaiting200 DWaiting500 Real loc. 30000 31000 31800 30400 VM 1 2 1 1 10000 20000 30000

56 System VMs 56 Spooled Devices Virtual Machine 1 Spool Table ProgramStatusLocationSize APrinted1000400 BCompleted2000200 CRunning3000200 DCompleted4000500 Virtual Machine 2 Spool Table ProgramStatusLocationSize PRunning1000400 QCompleted2000800 Real loc. 11000 12000 13000 14000 Real loc. 21000 22000 VMM Spool Table ProgramStatusSize APrinted400 QPrinting800 BWaiting200 DWaiting500 Real loc. 30000 31000 31800 30400 VM 1 2 1 1 10000 20000 30000

57 System VMs 57 Non-existent Devices  Implement virtual version only  Example: network adapter Allows VMs on same platform to communicate

58 System VMs 58 I/O Interception Points At system call interface At driver call interface At I/O device interface system calls Hardware I/O Drivers phy. mem. and I/O operations VM mgr Operating System driver calls Application  Attempts to interact with virtual devices are intercepted by VMM which translates to real devices

59 System VMs 59 At system call interface  System call traps to VMM  VMM interprets system call to produce driver calls  VMM contains shadow drivers (Implement VMM with driver interface compatible with some existing OS?)  Guest OS contains virtual I/O code and drivers Must still be executed, for correct guest state updates  Problems VMM must interpret all I/O system calls for all guest OSes VMM must have access to drivers for all real devices I/O initiated by guest OS may not always pass through call interface

60 System VMs 60 At driver call interface  Guest OS contains driver stubs  Guest OS driver calls can operate on generic virtual devices To simplify conversion  VMM contains shadow drivers These drivers correspond to real devices  Generic I/O operations passed to VMM and converted to shadow driver calls  Problem VMM must have access to real drivers Need generic drivers for each guest OS Guest OSes must have well defined, modular driver call interface VMM. generic I/O operations Generic I/O Drivers Hardware I/O operations Guest OS driver calls system calls Guest Application I/O drivers interpret

61 System VMs 61 At I/O device interface  Guest OSes contain real drivers  Low level I/O operations trap to VMM  VMM must check/translate I/O operation  If legal, VMM performs I/O operation on behalf of guest  VMM passes control back to guest  Problems VMM must know some device specifics (even if it doesn’t contain full drivers) VMM must manage serialization for shared devices VMM must check correctness of I/O operations VMM. I/O operations I/O Drivers check/ trans- late Hardware I/O operations Guest OS driver calls system calls Guest Application

62 System VMs 62 Virtualization with IOPs (IBM Style)  IO instruction points to Channel program Similar to driver Micro-code like Very simple control flow “Packages” sequences of related operations  VMM can translate channel program as a whole Mostly consists of address re-mapping And dealing with non-contiguous pages Reduces/eliminates problems with I/O sequences that require exclusive access to a device

63 System VMs 63 Case Study: IBM 360/370/390  CP-67 on 360/67 in 1960s First production VM implementation Provided means for supporting timesharing via Multiple guest versions of CMS – single user OS Used basic virtualization concepts described by Goldberg  VM/370 (1972) led to widespread use of VMs  Virtual Machine Assist (1974) Enhancements to support VMs  Extended Control Program Support (1978) Further enhances VM support  Handshaking Lets Guest OS in on the secret  Interpretive Execution Facility (IEF)

64 System VMs 64 Reasons for VM Slowdown  VM initialization Setting up virtual state  Privileged Instruction overhead Trap to VMM Interpretation by VMM Return from VMM to guest  System Calls (SVC) by guest in user mode Requires trap/reflection back to Guest OS  Interrupts Reflect through VMM before getting to Guest OS  Virtual Memory Management Shadow page faults when page is already mapped  Duplicated effort between VMM and Guest OS Memory management done by both

65 System VMs 65 Virtual Machine Assists  Ways of making application on VM run faster Have no performance effect if run in native mode  Instruction Emulation  Shadow Table Management  Virtual interval timer

66 System VMs 66 IBM 370 Virtual Machine Assist  Add Control Register 6 (CR6) Bit 0 VM Assist On/Off Bit 1 Virtual user/supervisor state Bit 4 SVC handling On/Off Bit 5 Shadow table fixup On/Off Bit 7 Virtual interval timer assist Bits 8-28 address of VM pointer list  CR6 Set by VMM when Guest is dispatched

67 System VMs 67 Instruction Emulation  Certain privileged instructions emulated directly in microcode Avoids trap/interpretation by VMM Guest must be in Virtual Supervisor mode (held in CR6) Examples: Load PSW Load Real Address Reset Reference Bit Store Control  Supervisor Calls also emulated If SVC handling is enabled via CR6 Avoids trap/reflection through VMM

68 System VMs 68 Shadow Table Management  When page fault occurs: If Guest OS has page mapped and page is already present in real memory but not mapped by guest’s shadow table then VM assist updates shadow table automatically Else, reflects fault to VMM  Uses VM pointer list to find guest tables

69 System VMs 69 Performance Improvement  Reduction in Supervisor State Time 70-90%  Reduction in Elapsed Time 40-65%  Reduction in Priv. Insts. Interpreted by VMM 75-95%

70 System VMs 70 Extended Control Program Support  Emulates additional Privileged Instructions e.g. Purge TLB, Test Channel  Partially handles other Privileged Instructions (with help from VMM)  Non-architected instructions for use by VMM Examples Decode channel words Dispatch a virtual machine Locate virtual I/O control blocks (many others)  Virtual Timer Assist Maintains a virtual interval timer for guest VM Real interval timer is a hardware resource

71 System VMs 71 Interpretive Execution Facility  Provides a way to execute most of the VMM functions in hardware  Function of VMM separated between hardware and software Cleaner separation compared to earlier VM assists  Advantages of interpretive execution Better performance Better predictability of performance Applicable for all types of guest operating systems  Key instruction: SIE (Start Interpretive Execution) Used by VMM to give control to hardware Architectural state of VM in table accessible to hardware Privileged instructions interpreted in hardware Occasionally need to get back to the software part of the VMM

72 System VMs 72 Entry and Exit from IE mode Entry into Interpretive- Execution mode VMM Software... SIE........ Emulation Interpretive Execution Mode Host Interrupt Handler Exit for interception Exit for host interrupt

73 System VMs 73 Inter VM Communication  Other VMM extensions focus on inter-machine communication by emulating many distributed system features e.g. virtual LANs VMs by their nature are isolated – but inter-user communication is also desirable

74 System VMs 74 IBM Handshaking (Para Virtualization)  Allow Guest OS to discover that it is running on VMM Guest “probes” for VMM when it is booted Then informs VMM that it expects VMM support  Reduces duplicated effort OS can mark all page frames fixed, disable demand paging, bypass channel address translation  Pseudo page fault handling Under operator control, VMM notifies Guest OS when VMM is handling a page fault by the Guest VM Guest OS marks faulting task as “page wait” Guest OS Dispatches another task (I.e. whole Guest VM does not have to wait)

75 System VMs 75 VMware: an x86 System Virtual Machine  Applying Conventional VMs to PCs – Problems: Installing the VMM on bare hardware, then booting Guests onto VMM. Need to support many device types, many more drivers  VMware solves both problems  Uses Host OS/Guest OS model “Hosted VM” Uses Host OS for some VMM functions Including I/O

76 System VMs 76 VMware: Three Main components  Begin with already-loaded Host OS  VMDriver (Pseudo-Driver) Host OS-specific Installed as a driver, but can take over the machine Acts as conduit between System and User VMMs  VMMonitor (System-level VMM) Slipped under installed OS via Pseudo-Driver  VMApp (User-level VMM) Appears as ordinary application to installed OS Can make normal I/O calls (and use installed drivers) Virtual Machine Host OS Hardware Applications OS (eg. Linux, Windows) Hardware (x86motherboard, display, adapters, etc.) VMMonitor Host Apps VMDriver VMApp User mode Privileged Mode

77 System VMs 77 VMM Communication  VMM control passes back and forth between user and system-level VMM portions  User VMM performs system call to pseudo- driver; then waits for response  System VMM maintains control, then sends response message back to User VMM

78 System VMs 78 Resource Management  Host OS schedules processor resource User-level VMM is just another application  Host OS manages memory VM memory is allocated as address space of User- level VMM User level VMM “mallocs”; whole VM uses it

79 System VMs 79 VMware I/O  Guest OS contains generic drivers  Generic drivers operate on virtual devices managed by user mode portion of VMM  User mode portion of VMM makes normal system calls  System calls cause Host OS to use real drivers and devices Hardware Guest OS system calls phy. mem. and I/O operations Generic I/O dirvers VM mgr mgr phy. mem. and I/O operations Host OS System Calls VM mgrI/O Drivers Guest Application (user mode) VMM SW VM Virtual Devices

80 System VMs 80 I/O Sequence  Guest application makes system call  Intercepted by System-level VMM, reflected to Guest OS  Guest OS performs I/O operations specified in generic drivers  System-level VMM captures I/O operations, and interprets them  Passes operation back up to User-level VMM  User-level VMM performs I/O call to Host OS

81 System VMs 81 Example: Network Virtualization  Virtual and Physical Network Interface Card (NIC) the same  Message Send X86 OUT or OUTS plus port# (in range of IDs for NIC) Each port has state bit  trap on I/O request  VMM saves permission “map” for all ports per guest VM

82 System VMs 82 Example: Network Virtualization  Sequence below  Guest OUT traps to VMM  VMM checks guest permissions before making request to physical NIC Device DriverVMMOS on VM 1User on VM 1 User sends message to external machine e.g.. usingsend() OS converts into I/O instructions for virtual NIC e.g. OUTS 0xf0,... VMM sends packet on virtual bridge to device driver of physical NIC e.g. OUTS 0x280,…. NIC device driver launches packet on network using wire signals To network User modePrivileged mode

83 System VMs 83 Virtual Network  Virtual and Physical NIC different  Special case: virtual network User sends message to local virtual machine e.g.usingsend() OS converts into I/O instructions e.g. OUTS 0xf0,... NIC device driver converts send message to a receivemessage for receiving VM. No wire signals are generated. VMM sends packet on virtual bridge to device driver of physical NIC e.g. OUTS 0x280,…. User on VM 2 Receiver gets packet OS on VM 2 Interrupt handler in OS generates I/O instructions to receive packet VMM raises interrupt in receiver’s OS User modePrivileged mode

84 System VMs 84 Case Study: Intel VT-x (Vanderpool)  x86 Virtualization Extensions recently announced by Intel  New VMX mode Two privilege levels: root and non-root  Root level Similar to conventional x86 Plus new VMX instructions VMM runs in root level  Non-root level Limited control of resources Including when in ring 0 Guest OS plus apps runs in non-root level

85 System VMs 85 VT-x Operation  Transition from normal mode to VMX root mode via vmxon instruction  VMM in root level, sets up the environment for each VM and initiates the virtual machine via vmlaunch instruction  Attempts to modify resource cause return to root level  Explicit vmcall causes return to root mode  vmresume instructions causes return to guest in non-root mode  vmxoff instruction causes exit from VMX mode

86 System VMs 86 VT-x Capabilities  Root mode eliminates need to run all guest code in user mode VMM runs in root mode For code regions with no critical instructions, HW is as efficient as normal machine  VT-x HW maps state-holding data elements directly to native structures during VM execution. VMCS (virtual machine control structure) encapsulates VM state HW implementation can take over loading and unloading state No need for VMM to perform load/stores of state info.  Eliminates the need for paravirtualization, Allows standard versions of OSes to be used as guests The vmcall instruction, can be used to pass hints and data to the VMM if desired

87 System VMs 87 VMCS  Can be implemented by HW or SW in root mode VMM is implementation-dependent  Aligned on 4KB boundary  Pointed to by VMPTR Load VMPTR with vmptrld instruction Read VMCS with vmread ; Write VMCS with vmwrite Register State Interruptibility State Host StateRegister State Pin-based Execution Controls Processor-based Execution Controls Bitmap Fields etc. Control Bitmap MSR Controls Control Bitmap MSR Controls Controls for Event Injection VM-Exit Information Vectoring Event Information Due to Event Delivery Due to Instruction Execution Control Area VM Exit Information Other Exit Information Basic Information State Area Guest State VM Execution Controls VM Exit Controls VM Entry Controls

88 System VMs 88 Critical Instructions  Programmable VM exit conditions given in VMCS  E.g., which instructions should cause exit to VMM  Example: Read Time Stamp Counter (RDST) Contained in 64-bit MSR -- IA32_TIME_STAMP_COUNTER Works in any mode if TSD bit in control register 4 is off Otherwise works only in Ring 0; otherwise traps (protection mode exception)

89 System VMs 89 RDST rdtscinstruction encountered Machine in VMX mode? RDTSC exiting bit is set in VMCS? Yes No Yes TSD bit of CR4 is set in VM? Use TSC Offsetting bit is set in VMCS? Add TSC offset to time- stamp counter value. Perform normal operation Save exit information Exit VM. Return control to VMM Yes Protection Exception. Save exit information. Exit VM. Return control to VMM Yes No Return time- stamp counter value Ring 0 operation? Yes No Return sum No


Download ppt "System VMs This material is based on the book, Virtual Machines: Versatile Platforms for Systems and Processes, Copyright 2005 by Elsevier Inc. All rights."

Similar presentations


Ads by Google