Download presentation
Presentation is loading. Please wait.
Published byFlorence Underwood Modified over 9 years ago
1
Introduction to Xen -A Hypervisor (on x86) Advisor: Chih-Wen Hsueh Student: Tang-Hsun Tu National Taiwan University Graduate Institute of Networking and Multimedia Wireless Networking and Embedded Systems Laboratory Real-Time System Software Group October 20, 2015
2
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /482 Outline Introduction What is Virtualization ? Why Virtualization is Difficult ? How to Virtualize ? Xen Architecture Hypervisor CPU Virtualization Memory Virtualization I/O Device Virtualization Hardware-Assisted Virtualization Conclusion
3
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /483 Outline Introduction What is Virtualization ? Why Virtualization is Difficult ? How to Virtualize ? Xen Architecture Hypervisor CPU Virtualization Memory Virtualization I/O Device Virtualization Hardware-Assisted Virtualization Conclusion
4
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /484 What is Virtualization ? etc Virtualization Running Applications (x-platform) Running Applications (x-platform) Security Sharing Hardware Resource Sharing Hardware Resource Virtual Machine ! Fully Utilizing Hardware Fully Utilizing Hardware
5
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /485 Why Virtualization is Difficult ? (1/2) OS is moved to ring1/3 On x86 Some instructions Sensitive Instructions Cannot be trapped 0/1/3 Ring, e.g. x86_32 0/3/3 Ring, e.g. x86_64, ARM OS Critical InstructionsInstructions Sensitive Register Instructions SGDT, SIDT, SLDT SMSW PUSHF(D), POPF(D) Protection System Instructions LAR, LSL, VERR, VERW PUSH, POP CALL, JMP, INT, RET STR MOV Privileged Instructions
6
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /486 Why Virtualization is Difficult ? (2/2) - Examples SGDT, SIDT and SLDT SGDT m // save gdtr to memory SIDT m // save idtr to memory SLDT r/m16 // save ldtr to memory Only one gdtr, idtr and ldtr on a cpu ! POP POP ss // need to satisfy RPL=CPL=DPL CPL changes from 0 to 1 or 3 !
7
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /487 Binary translation Hypercall How to Virtualize ? (1/2) Full VirtualizationPara VirtualizationHardware Assisted Virtualization Intel VT-x & AMD SVM
8
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /488 How to Virtualize ? (2/2) Hypervisor (VMM) Type Type I + Microkernel Xen (open source, citrix), Microsoft Hyper-V Type I + Integrated kernel VMware ESX, KVM (kernel-base VM) Type II (Host OS + Guest OS) VMware GSX, workstation, Microsoft virtual PC, Microsoft virtual server, Sun Virtual Box Type I Type II
9
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /489 Outline Introduction What is Virtualization ? Why Virtualization is Difficult ? How to Virtualize ? Xen Architecture Hypervisor CPU Virtualization Memory Virtualization I/O Device Virtualization Hardware-Assisted Virtualization Conclusion
10
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4810 Xen Architecture (1/2) Domain 0 Domain U Hypervisor
11
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4811 Xen Architecture (2/2) LinuxXen System CallsHyper Calls SignalsEvents InterruptsPhysical + Virtual Interrupts CPUPhysical + Virtual CPU FilesystemXenStore Virtual Memory3-level memory POSIX Shared MemoryGrant Tables/Shared Pages Compare to common Linux
12
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4812 Xen Architecture Boot Hypervisor Hyper Call & System Call Event Channel Grant Table CPU Virtualization Virtual CPU Architecture Scheduling Interrupt Memory Virtualization Shared Info Page Memory Architecture Translation I/O Device Virtualization Split Device Driver Device I/O Ring Build System Build Xen Build XCI
13
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4813 Boot For paravirtualized guest OSes Start in “ protected mode ” Use start info page Start info page Put the address to “ esi ” register For HVM guest OSes Start in “ real mode ” (emulated BIOS) With QEMU
14
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4814 int 0x80 int 0x82 System Call // xen/include/public/xen.h #define __HYPERVISOR_set_trap_table 0 #define __HYPERVISOR_mmu_update 1 #define __HYPERVISOR_set_gdt 2 #define __HYPERVISOR_stack_switch 3 … 01 02 03 04 05 06 07 // linux/include/asm/unistd.h #define __NR_restart_syscall 0 #define __NR_exit 1 #define __NR_fork 2 #define __NR_read 3 … 01 02 03 04 05 06 07 Hyper Call Guest OSHypervisor int 82h hypercall Hypercall_table resume Guest OS HYPERVOSIR_sched_op do_sched_op iret Hypervisor - Hyper Call & System Call (1/2) eax
15
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4815 How system calls work with hyper calls ? HVM can use SYSENTER/SYSCALL How to do hyper calls in applications ? Guest OS Hypervisor User space xm, xendioctl() privcmd services procfs hyper call Hypervisor - Hyper Call & System Call (2/2) ring3 User Space Application system call ring1 OS Service ring0 User Space Application Guest OS Service Hypervisor system call services hyper call exception
16
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4816 Hypervisor - Grant Table Grant reference (GR) Grant entry A request with an index Use in communication Page mapping & Page transferring Domain ADomain B create GR send GR inform release GR map page unmap page access page Domain ADomain B transfer page send GR create GR release GR receive page inform
17
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4817 Hypervisor - Event Channel A lightweight signal mechanism Use “ ports ” as identifers (pending+mask) Four major purposes Guest OS Hypervisor Hardware Virtual CPU Virtual Memory Scheduling Physical CPU Physical Memory Eth1 … … … Eth0 VCPU … … IPI IDC vIRQpIRQ IPI 015 Event Channel port 0 port 1 …
18
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4818 Architecture 2 scheduling algorithms (Non/Work Conserving) Simple Earliest Deadline First (SEDF) Credit CPU Virtualization Guest OS VCPU Guest OS VCPU … … PCPU … App Hypervisor Scheduling
19
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4819 CPU Virtualization - Earliest Deadline First Assign process priorities according to the deadlines of their current request An example, two processes T 1 = (slice, deadline) = (1, 2) T 2 = (2, 8) T2T2 T1T1 T1T1 T2T2 T1T1 T1T1 T1T1 d 1 : 2 d 2 : 8 d 1 : X d 2 : 8 d 1 : 4 d 2 : 8 d 1 : X d 2 : 8 d 1 : 6 d 2 : X T2T2 d 1 : 8 d 2 : X d 1 : 10 d 2 : 16 0123 4 5678 t 9 10 d 1 : X d 2 : 16
20
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4820 CPU Virtualization - SEDF (slice, period, deadline) Two queues Cannot do load balancing on SMP e.g 3 domains (A:80%, B:80%, C:30%), 2 PCPUs slice period VCPU 1 Run queue Wait queue VCPU 2 VCPU 3 VCPU 4 VCPU 1 VCPU 2 VCPU 3 d 1 < d 2 < d 3 < d 4 … s 1 < s 2 < s 3 …
21
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4821 CPU Virtualization - Credit Each PCPU has a VCPU list Priority queue Two priority states, over, under Over: consume > allocate Under: consume < allocate If there is no “under” VCPU, hypervisor will select “under” VCPU from other PCPU (weight, cap) credit under or over VCPU 1 VCPU 2 VCPU 3 VCPU 4 Priority queue under over
22
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4822 CPU Virtualization - Interrupt (1/2) 8259A IOAPIC+LAPIC PIT Keyboard RTC
23
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4823 CPU Virtualization - Interrupt (2/2) Physical interrupt For the hypervisor or for guest OSes Virtual interrupt Ask guest OSes to do 8 for now (max is 24) PIC IRQn Device OS Hardware PIC IRQn Device Guest OS Hardware Hypervisor Guest OS … ISR event
24
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4824 Two-level memory Three-level memory Virtual, Pseudo-physical, Machine Memory Virtualization - Memory Architecture (1/2) hypervisor Application OS - Virtual Memory -Physical Memory Hypervisor -Machine Memory Guest OS -Pseudo-Physical Memory P2MM2P
25
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4825 168M memory for hypervisor Memory Virtualization - Memory Architecture (2/2) AreaSize MPT, Machine-to-Physical Translation Table (RO)16M Page-Frame Information96M MPT, Machine-to-Physical Translation Table (R/W)16M Linear Page Table8M Shadow Linear Page Table8M Per Domain Mappings8M Direct Map12M I/O Remap4M 0xFFFFFFFF 0xFC000000 0xFC400000 Heap
26
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4826 4 mechanisms to manipulate page tables Paravirtualized page tables Write page tables (Only level 1 is writable) Shadow page tables Hardware-assisted paging (Intel:Extend, AMD: Nest) Memory Virtualization - Translation (1/2) Virtual Memory Machine Memory Pseudo-Physical Memory Page Table Page Fault ! Shadow Page Table P2M (VM->PFN) (VM->MFN or VM->P2M) Second Level Paging HAP MMU
27
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4827 Comparison Memory Virtualization - Translation (2/2) Type Space Overhead Computation Overhead Guest OS Modification Requiring HW support Paravirtualized page table Low (N) LowA lotNo Writable page table Low (N) HighSomeNo Shadow page table High (2N) HighNoneNo Hardware- assisted paging Medium (N+M) MediumNoneYes N is the number of page tables in all guests. M is the number of all guests.
28
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4828 Structure Compare with start_info_page Memory Virtualization - Shared Info Page wall clock event channel Start Info PageShared Info Page Mapped byDomain BuilderGuest OS InformationStaticDynamically Updated MAX is 32 VCPUs memory TSC
29
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4829 I/O Device Virtualization - Device Model Hypervisor also provides three mechanisms to use devices. Emulated Devices Paravirtualized Driver Pass-through
30
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4830 I/O Device Virtualization - Emulated Devices Implemented by QEMU e.g. sound card, ac97, sb16, etc QEMU-DM
31
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4831 I/O Device Virtualization - Paravirtualized Driver Split Device Driver Model An example of sending packets Front-End DriverBack-End Driver Native Driver
32
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4832 I/O Device Virtualization - I/O Ring Without data, it only transfers request/reply A example with GR Grant Table Active Grant Table Hypervisor Dom UDom 0 GR Device I/O Channel
33
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4833 I/O Device Virtualization - Pass-Through Pass and directly use the device Dom UDom 0 Hypervisor Hardware Virtual CPU Virtual Memory Scheduling Physical CPU Physical Memory Eth1 … … Native Driver … Eth0
34
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4834 Hardware Virtual Machine (1/3) Intel Virtualization Technology TechnologyDescriptionVirtualizationImplementation VT-x Root/NonRoot Extended Page Tables CPU, MemoryInstructions Set VT-iAs VT-x, for Itanium VT-dDMA, InterruptDevicesIOMMU (Chipset) VT-cClassify PacketsNetwork DevicesVMDq, VMDc
35
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4835 Hardware Virtual Machine (2/3) Architecture Intel VT-x Support if CPUID.1:ECX.VMX[bit 5] = 1 DescriptionsInstructions En/Disabling VMXVMON, VMOFF Launch/Resume MVVMLAUNCH, VMRESUME Calling to VMMVMCALL Controlling Virtual Machine Control Structure (VMCS) VMPTRLD, VMPTRST VMREAD, VMWRITE, VMCLEAR Invalidate TranslationsINVEPT, INVVPID ring0 ring1 ring3 non-root root Guest App Guest OS Hypervisor Guest App Guest OS Hypervisor VMLAUNCH VMRESUME
36
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4836 Hardware Virtual Machine (3/3) Use BIOS code from Bochs Replace several functions, e.g. SYSENTER HVM Device QEMU-DM
37
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4837 http://rswiki.csie.org/lxr/http/source/?v=xen- 3.4.1 http://rswiki.csie.org/lxr/http/source/?v=xen- 3.4.1 Build Xen - Xen Source Tree hypervisor QEMU-DM, Bootloader, xm, xend, … A mini paravirtualized OS
38
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4838 Build Xen - Screenshot
39
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4839 Build Xen - A Simplest Xen Kernel Headers to tell Xen loader OS #include.section __xen_guest.ascii"GUEST_OS=Hacking_Xen_Example".ascii",XEN_VER=xen-3.0".ascii",VIRT_BASE=0x0".ascii",ELF_PADDR_OFFSET=0x0".ascii",HYPERCALL_PAGE=0x2".ascii",PAE=yes".ascii",LOADER=generic".byte0 01 02 03 04 05 06 07 08 09 10 11 0x0 0x1000 0x2000 0x3000 … hypercall_page shared_info _start stack_start _start: cld lss stack_start, %esp push %esi call start_kernel 01 02 03 04 05 page number void start_kernel( start_info_t *start_info) { HYPERVISOR_console_io( CONSOLEIO_write, 12, "Hello World\n"); while(1); } 01 02 03 04 05 06 07 08 hypercall
40
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4840 Build XCI - Xen Client Initiative (1/2) Goals Creating a minimal environment of Xen, i.e. Xen hypervisor + Linux domain 0, suitable for clients Supporting more devices through ioemu XCI consists three subprojects Hypervisor (original code + patches + new management tools) ioemu (separating from original Xen source tree) Domain-0 Linux
41
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4841 Build XCI - Xen Client Initiative (2/2) Only x86, ia64 and arm in “ arch ” directory XenXCI Hypervisor482 KB533 KB Kernel Version2.6.18.82.6.27.23 Kernel Source Diff692,054 lines5,790,133 lines Kernel Size 2.22 MB (Dom0) 1.24 MB (DomU) 4.32 MB (Dom0) Filesystem and LibraryUp to you uClibc+ Busybox Total: 100M/33.9M
42
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4842 Experimental Environment CPU: Intel Core2 U9400 1.4GHz (use one core) Memory: 512MB Network Interface Card: Atheros AR8131 (at 100MBps) Hypervisor: Xen 3.4.2 Dom-0: Linux 2.6.18.8 Guest OS: Windows XP CPU Benchmark Tools: Chrome V8 Benchmark Suite SuperPI 1.1e Hard Disk Drive Benchmark Tools HD Tune Pro v3.50 Network Benchmark Tools Iperf (Server: 2.0.4, Client: 1.7.0)
43
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4843 CPU Benchmark (1/2) 8.3%
44
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4844 CPU Benchmark (2/2) 5%
45
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4845 Network Benchmark (1/2) Testing Time: 180 seconds Benchmark Deviation: 0.12%~0.26 59%
46
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4846 Network Benchmark (2/2) Sample Period: 2 seconds Average: 9.82%
47
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4847 Conclusion We introduce the techniques for how to virtualize. i.e. full, para and hardware-assisted virtualization We present the architecture of Xen. Several parts in Xen are also introduced. PartIntroductions HypervisorBoot, Hyper Call, Grant Table, Event Channel CPU VirtualizationVMLAUNCH, VMRESUME Memory VirtualizationArchitecture, Translation, Shared Info Page I/O Device Virtualization Device Model (Emulated, PV and Pass-Through), I/O Ring Hardware Virtual MachineVirtualization Technology
48
National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4848 Q & A
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.