Presentation is loading. Please wait.

Presentation is loading. Please wait.

OS Virtualization.

Similar presentations


Presentation on theme: "OS Virtualization."— Presentation transcript:

1 OS Virtualization

2 Outline Background What is Virtualization? Why would we want it?
Why is it hard? How do we do it? Choices

3 What is Virtualization?
11/23/2018 What is Virtualization? OS virtualization Create a platform that emulates a hardware platform and allows multiple instances of an OS to use that platform, as though they have full and exclusive access to the underlying hardware cs431-cotter

4 What is Virtualization?
Applications Applications Applications Applications OS 1 OS 2 OS 3 OS 4 Virtualization Platform Hardware

5 The Problem OS uses kernel mode / user mode to protect the OS.
System calls (privileged instructions) generate a trap (software interrupt) that forces a switch to kernel mode Assembly sensitive instructions (I/O, MMU control, etc.) that must only be executed by the kernel

6 The Problem If our VM now runs in user space, we cannot run sensitive instructions in it, since those must trap to kernel space. We would like such instructions to force a trap into the hypervisor Hypervisor responsible to assist with sensitive instructions

7 The Problem Hardware protection rings
Supervisor mode Can run any instruction (ring 0). Trusted to not fail, in case of failure the system crashes. User programs use ring 3 Hypervisor runs on ring 0, guest OS does not

8 The Problem On x86, some instructions are sensitive but not privileged
11/23/2018 The Problem On x86, some instructions are sensitive but not privileged Example: POPF Pops data from stack to the EFLAGS register Can be called from all protection rings, behaves differently when not in ring 0 Interrupt flag is part of EFLAGS, only changes on ring 0 Is not privileged (does not trap) Note that in rings 1-3 IF flag does not change cs431-cotter

9 Solution – binary translation
11/23/2018 Solution – binary translation Replace problematic calls dynamically Read in code, looking for basic blocks Then inspect basic block to find problematic instructions. If found, replace with VM call (process called binary translation) Then, cache block and execute. Eventually, most basic blocks will be modified and cached, and will run at near native speed. Can force traps on sensitive non-privileged instructions Note that in rings 1-3 IF flag does not change cs431-cotter

10 Solution – VM hardware Systems with intel VT-x or AMD SVM (since 2005)
11/23/2018 Solution – VM hardware Systems with intel VT-x or AMD SVM (since 2005) New assembly commands to enter VM mode Hypervisor runs on ring 0 under root mode Guest OS runs in ring 0 under non-root mode Changes are done within VM specific state called VMCS (Virtual Machine Control Structure) Even with VM hardware support binary translation can still be used to improve performance Note that in rings 1-3 IF flag does not change cs431-cotter

11 Implementation Type 1 Hypervisor Type 2 Hypervisor Paravirtualization

12 Type 1 Hypervisor Runs on “bare metal”
Hypervisor is the machine’s kernel Made for servers, includes interface for remote / admin access Examples: Xen, Vmware vSphare, etc.

13 Type 2 Hypervisor Runs from within a OS. Supports guest OSs above it.
11/23/2018 Type 2 Hypervisor Runs from within a OS. Supports guest OSs above it. VM software must include kernel module Example: Oracle VirtualBox, VMware Player, etc. cs431-cotter

14 Paravirtualization Modify Guest OS so that all calls to non-privileged sensitive instructions are changed to hypervisor calls. Much easier (and more efficient) to modify source code than to emulate hardware instructions (as in binary translation).

15 Problems with Paravirtualization
Paravirtualized systems won’t run on native hardware There are many different paravirtualization systems that use different commands, etc. VMware, Xen, etc. Proposed solution: Modify the OS kernel so that it calls a special set of procedures to execute sensitive instructions (Virtual Machine Interface ) Bare metal – link to library that implement code On VM – link to VM specific library

16 Memory Virtualization
OS tracks mapping of virtual memory pages to physical memory page frames. Builds page tables, then updates paging register (trap). Allow hypervisor to manage page mapping, and use shadow page tables for the VMs

17 Shadow Page Table Guest page tables map: Guest VA Guest PA
Shadow tables: Guest VA Host PA.

18 Nested/extended page tables
Requires hardware support Two “CR3”s (CR3 and EPTP) MMU translates each guest mapping level Guest OS Hypervisor Page table VMM SW Host page table Page dir. TLB CPU CR3 EPTP HW

19 Nested page tables Guest page table map: Guest VA  Guest PA
Nested page table map: Guest PA  Host PA

20 I/O Virtualization Each guest OS holds its own “partition”.
Typically implemented as a file or region on disk Hypervisor must convert guest OS address (block #) into physical address in region May convert between storage types. Must deal with DMA (Direct memory access) requests

21 Question (Moed B 2017) במערכת וירטואלית ישנו hypervisor התומך בshadow page tables. תאר בקצרה תהליך חיפוש כתובת וירטואלית במערכת מה היתרון של מערכת כזו על פני מערכת המשתמשת ב brute force?

22 Question (Moed B 2017) Guest OS Hypervisor TLB CPU HW G-CR3 CR3
Page table VMM SW Shadow page table Page dir. Interrupt & VMM corrects page table. G-CR3 TLB CPU CR3 HW

23 Question (Moed B 2017) במערכת כזו ישנו רגיסטר בCPU המצביע על הShadow page table. בהינתן כתובת וירטואלית של לקוח במערכת כזו ישנן 2 אופציות: טבלאות הshadow ממפות את הכתובת והתהליך זהה למערכת רגילה הטבלאות shadow אינן ממפות את הכתובת, יתקבל interrupt מסוג page fault שיעביר אותנו לקוד בhypervisor אשר יבדוק האם הכתובת ממופה בטבלאות הלקוח (ע"י משתנה שישמור את כתובת הטבלה הראשית של הלקוח), אם הכתובת לא ממופה נחזיר page fault ללקוח, אחרת נמפה את הדף בטבלאות הshadow כדי שימפה ישירות לזכרון המערכת ונחזור למערכת האורחת (כעת חזרנו למקרה (1)

24 Question (Moed B 2017) Guest OS Hypervisor TLB CPU HW CR3
Define these pages as not R/W Guest OS Hypervisor Page table VMM SW VM memory layout Page dir. TLB CPU CR3 HW

25 Question (Moed B 2017) במערכת המבצעת גישות brute force סימנו את כל דפי המיפוי כnon read non write ולכן על כל גישה יתקבל interrupt. במערכת מסוג shadow page table נקבל interrupt רק על הגישה הראשונה לדף כלשהו, מהרגע שמיפינו אותו המערכת האורחת יכולה להמשיך לעבוד כרגיל


Download ppt "OS Virtualization."

Similar presentations


Ads by Google