Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Xen -A Hypervisor (on x86) Advisor: Chih-Wen Hsueh Student: Tang-Hsun Tu National Taiwan University Graduate Institute of Networking and.

Similar presentations


Presentation on theme: "Introduction to Xen -A Hypervisor (on x86) Advisor: Chih-Wen Hsueh Student: Tang-Hsun Tu National Taiwan University Graduate Institute of Networking and."— Presentation transcript:

1 Introduction to Xen -A Hypervisor (on x86) Advisor: Chih-Wen Hsueh Student: Tang-Hsun Tu National Taiwan University Graduate Institute of Networking and Multimedia Wireless Networking and Embedded Systems Laboratory Real-Time System Software Group October 20, 2015

2 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /482 Outline  Introduction What is Virtualization ? Why Virtualization is Difficult ? How to Virtualize ?  Xen Architecture Hypervisor CPU Virtualization Memory Virtualization I/O Device Virtualization Hardware-Assisted Virtualization  Conclusion

3 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /483 Outline  Introduction What is Virtualization ? Why Virtualization is Difficult ? How to Virtualize ?  Xen Architecture Hypervisor CPU Virtualization Memory Virtualization I/O Device Virtualization Hardware-Assisted Virtualization  Conclusion

4 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /484 What is Virtualization ? etc Virtualization Running Applications (x-platform) Running Applications (x-platform) Security Sharing Hardware Resource Sharing Hardware Resource Virtual Machine ! Fully Utilizing Hardware Fully Utilizing Hardware

5 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /485 Why Virtualization is Difficult ? (1/2)  OS is moved to ring1/3  On x86 Some instructions  Sensitive Instructions  Cannot be trapped 0/1/3 Ring, e.g. x86_32 0/3/3 Ring, e.g. x86_64, ARM OS Critical InstructionsInstructions Sensitive Register Instructions SGDT, SIDT, SLDT SMSW PUSHF(D), POPF(D) Protection System Instructions LAR, LSL, VERR, VERW PUSH, POP CALL, JMP, INT, RET STR MOV Privileged Instructions

6 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /486 Why Virtualization is Difficult ? (2/2) - Examples  SGDT, SIDT and SLDT SGDT m // save gdtr to memory SIDT m // save idtr to memory SLDT r/m16 // save ldtr to memory Only one gdtr, idtr and ldtr on a cpu !  POP POP ss // need to satisfy RPL=CPL=DPL CPL changes from 0 to 1 or 3 !

7 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /487  Binary translation  Hypercall How to Virtualize ? (1/2) Full VirtualizationPara VirtualizationHardware Assisted Virtualization Intel VT-x & AMD SVM

8 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /488 How to Virtualize ? (2/2)  Hypervisor (VMM) Type Type I + Microkernel  Xen (open source, citrix),  Microsoft Hyper-V Type I + Integrated kernel  VMware ESX,  KVM (kernel-base VM) Type II (Host OS + Guest OS)  VMware GSX, workstation,  Microsoft virtual PC,  Microsoft virtual server,  Sun Virtual Box Type I Type II

9 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /489 Outline  Introduction What is Virtualization ? Why Virtualization is Difficult ? How to Virtualize ?  Xen Architecture Hypervisor CPU Virtualization Memory Virtualization I/O Device Virtualization Hardware-Assisted Virtualization  Conclusion

10 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4810 Xen Architecture (1/2) Domain 0 Domain U Hypervisor

11 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4811 Xen Architecture (2/2) LinuxXen System CallsHyper Calls SignalsEvents InterruptsPhysical + Virtual Interrupts CPUPhysical + Virtual CPU FilesystemXenStore Virtual Memory3-level memory POSIX Shared MemoryGrant Tables/Shared Pages  Compare to common Linux

12 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4812 Xen Architecture  Boot  Hypervisor Hyper Call & System Call Event Channel Grant Table  CPU Virtualization Virtual CPU Architecture Scheduling Interrupt  Memory Virtualization Shared Info Page Memory Architecture Translation  I/O Device Virtualization Split Device Driver Device I/O Ring  Build System Build Xen Build XCI

13 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4813 Boot  For paravirtualized guest OSes Start in “ protected mode ” Use start info page  Start info page Put the address to “ esi ” register  For HVM guest OSes Start in “ real mode ” (emulated BIOS) With QEMU

14 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4814  int 0x80  int 0x82 System Call // xen/include/public/xen.h #define __HYPERVISOR_set_trap_table 0 #define __HYPERVISOR_mmu_update 1 #define __HYPERVISOR_set_gdt 2 #define __HYPERVISOR_stack_switch 3 … 01 02 03 04 05 06 07 // linux/include/asm/unistd.h #define __NR_restart_syscall 0 #define __NR_exit 1 #define __NR_fork 2 #define __NR_read 3 … 01 02 03 04 05 06 07 Hyper Call Guest OSHypervisor int 82h hypercall Hypercall_table resume Guest OS HYPERVOSIR_sched_op do_sched_op iret Hypervisor - Hyper Call & System Call (1/2) eax

15 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4815  How system calls work with hyper calls ? HVM can use SYSENTER/SYSCALL  How to do hyper calls in applications ? Guest OS Hypervisor User space xm, xendioctl() privcmd services procfs hyper call Hypervisor - Hyper Call & System Call (2/2) ring3 User Space Application system call ring1 OS Service ring0 User Space Application Guest OS Service Hypervisor system call services hyper call exception

16 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4816 Hypervisor - Grant Table  Grant reference (GR)  Grant entry A request with an index  Use in communication  Page mapping & Page transferring Domain ADomain B create GR send GR inform release GR map page unmap page access page Domain ADomain B transfer page send GR create GR release GR receive page inform

17 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4817 Hypervisor - Event Channel  A lightweight signal mechanism Use “ ports ” as identifers (pending+mask)  Four major purposes Guest OS Hypervisor Hardware Virtual CPU Virtual Memory Scheduling Physical CPU Physical Memory Eth1 … … … Eth0 VCPU … … IPI IDC vIRQpIRQ IPI 015 Event Channel port 0 port 1 …

18 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4818  Architecture  2 scheduling algorithms (Non/Work Conserving) Simple Earliest Deadline First (SEDF) Credit CPU Virtualization Guest OS VCPU Guest OS VCPU … … PCPU … App Hypervisor Scheduling

19 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4819 CPU Virtualization - Earliest Deadline First  Assign process priorities according to the deadlines of their current request  An example, two processes T 1 = (slice, deadline) = (1, 2) T 2 = (2, 8) T2T2 T1T1 T1T1 T2T2 T1T1 T1T1 T1T1 d 1 : 2 d 2 : 8 d 1 : X d 2 : 8 d 1 : 4 d 2 : 8 d 1 : X d 2 : 8 d 1 : 6 d 2 : X T2T2 d 1 : 8 d 2 : X d 1 : 10 d 2 : 16 0123 4 5678 t 9 10 d 1 : X d 2 : 16

20 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4820 CPU Virtualization - SEDF  (slice, period, deadline)  Two queues  Cannot do load balancing on SMP e.g 3 domains (A:80%, B:80%, C:30%), 2 PCPUs slice period VCPU 1 Run queue Wait queue VCPU 2 VCPU 3 VCPU 4 VCPU 1 VCPU 2 VCPU 3 d 1 < d 2 < d 3 < d 4 … s 1 < s 2 < s 3 …

21 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4821 CPU Virtualization - Credit  Each PCPU has a VCPU list  Priority queue Two priority states, over, under  Over: consume > allocate  Under: consume < allocate  If there is no “under” VCPU, hypervisor will select “under” VCPU from other PCPU  (weight, cap)  credit  under or over VCPU 1 VCPU 2 VCPU 3 VCPU 4 Priority queue under over

22 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4822 CPU Virtualization - Interrupt (1/2)  8259A  IOAPIC+LAPIC PIT Keyboard RTC

23 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4823 CPU Virtualization - Interrupt (2/2)  Physical interrupt For the hypervisor or for guest OSes  Virtual interrupt Ask guest OSes to do 8 for now (max is 24) PIC IRQn Device OS Hardware PIC IRQn Device Guest OS Hardware Hypervisor Guest OS … ISR event

24 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4824  Two-level memory  Three-level memory Virtual, Pseudo-physical, Machine Memory Virtualization - Memory Architecture (1/2) hypervisor Application OS - Virtual Memory -Physical Memory Hypervisor -Machine Memory Guest OS -Pseudo-Physical Memory P2MM2P

25 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4825  168M memory for hypervisor Memory Virtualization - Memory Architecture (2/2) AreaSize MPT, Machine-to-Physical Translation Table (RO)16M Page-Frame Information96M MPT, Machine-to-Physical Translation Table (R/W)16M Linear Page Table8M Shadow Linear Page Table8M Per Domain Mappings8M Direct Map12M I/O Remap4M 0xFFFFFFFF 0xFC000000 0xFC400000 Heap

26 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4826  4 mechanisms to manipulate page tables Paravirtualized page tables Write page tables (Only level 1 is writable) Shadow page tables Hardware-assisted paging (Intel:Extend, AMD: Nest) Memory Virtualization - Translation (1/2) Virtual Memory Machine Memory Pseudo-Physical Memory Page Table Page Fault ! Shadow Page Table P2M (VM->PFN) (VM->MFN or VM->P2M) Second Level Paging HAP MMU

27 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4827  Comparison Memory Virtualization - Translation (2/2) Type Space Overhead Computation Overhead Guest OS Modification Requiring HW support Paravirtualized page table Low (N) LowA lotNo Writable page table Low (N) HighSomeNo Shadow page table High (2N) HighNoneNo Hardware- assisted paging Medium (N+M) MediumNoneYes N is the number of page tables in all guests. M is the number of all guests.

28 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4828  Structure  Compare with start_info_page Memory Virtualization - Shared Info Page wall clock event channel Start Info PageShared Info Page Mapped byDomain BuilderGuest OS InformationStaticDynamically Updated MAX is 32 VCPUs memory TSC

29 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4829 I/O Device Virtualization - Device Model  Hypervisor also provides three mechanisms to use devices. Emulated Devices Paravirtualized Driver Pass-through

30 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4830 I/O Device Virtualization - Emulated Devices  Implemented by QEMU e.g. sound card, ac97, sb16, etc QEMU-DM

31 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4831 I/O Device Virtualization - Paravirtualized Driver  Split Device Driver Model  An example of sending packets Front-End DriverBack-End Driver Native Driver

32 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4832 I/O Device Virtualization - I/O Ring  Without data, it only transfers request/reply  A example with GR Grant Table Active Grant Table Hypervisor Dom UDom 0 GR Device I/O Channel

33 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4833 I/O Device Virtualization - Pass-Through  Pass and directly use the device Dom UDom 0 Hypervisor Hardware Virtual CPU Virtual Memory Scheduling Physical CPU Physical Memory Eth1 … … Native Driver … Eth0

34 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4834 Hardware Virtual Machine (1/3)  Intel Virtualization Technology TechnologyDescriptionVirtualizationImplementation VT-x Root/NonRoot Extended Page Tables CPU, MemoryInstructions Set VT-iAs VT-x, for Itanium VT-dDMA, InterruptDevicesIOMMU (Chipset) VT-cClassify PacketsNetwork DevicesVMDq, VMDc

35 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4835 Hardware Virtual Machine (2/3)  Architecture  Intel VT-x Support if CPUID.1:ECX.VMX[bit 5] = 1 DescriptionsInstructions En/Disabling VMXVMON, VMOFF Launch/Resume MVVMLAUNCH, VMRESUME Calling to VMMVMCALL Controlling Virtual Machine Control Structure (VMCS) VMPTRLD, VMPTRST VMREAD, VMWRITE, VMCLEAR Invalidate TranslationsINVEPT, INVVPID ring0 ring1 ring3 non-root root Guest App Guest OS Hypervisor Guest App Guest OS Hypervisor VMLAUNCH VMRESUME

36 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4836 Hardware Virtual Machine (3/3)  Use BIOS code from Bochs  Replace several functions, e.g. SYSENTER  HVM Device QEMU-DM

37 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4837  http://rswiki.csie.org/lxr/http/source/?v=xen- 3.4.1 http://rswiki.csie.org/lxr/http/source/?v=xen- 3.4.1 Build Xen - Xen Source Tree hypervisor QEMU-DM, Bootloader, xm, xend, … A mini paravirtualized OS

38 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4838 Build Xen - Screenshot

39 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4839 Build Xen - A Simplest Xen Kernel  Headers to tell Xen loader  OS #include.section __xen_guest.ascii"GUEST_OS=Hacking_Xen_Example".ascii",XEN_VER=xen-3.0".ascii",VIRT_BASE=0x0".ascii",ELF_PADDR_OFFSET=0x0".ascii",HYPERCALL_PAGE=0x2".ascii",PAE=yes".ascii",LOADER=generic".byte0 01 02 03 04 05 06 07 08 09 10 11 0x0 0x1000 0x2000 0x3000 … hypercall_page shared_info _start stack_start _start: cld lss stack_start, %esp push %esi call start_kernel 01 02 03 04 05 page number void start_kernel( start_info_t *start_info) { HYPERVISOR_console_io( CONSOLEIO_write, 12, "Hello World\n"); while(1); } 01 02 03 04 05 06 07 08 hypercall

40 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4840 Build XCI - Xen Client Initiative (1/2)  Goals Creating a minimal environment of Xen, i.e. Xen hypervisor + Linux domain 0, suitable for clients Supporting more devices through ioemu  XCI consists three subprojects Hypervisor (original code + patches + new management tools) ioemu (separating from original Xen source tree) Domain-0 Linux

41 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4841 Build XCI - Xen Client Initiative (2/2)  Only x86, ia64 and arm in “ arch ” directory XenXCI Hypervisor482 KB533 KB Kernel Version2.6.18.82.6.27.23 Kernel Source Diff692,054 lines5,790,133 lines Kernel Size 2.22 MB (Dom0) 1.24 MB (DomU) 4.32 MB (Dom0) Filesystem and LibraryUp to you uClibc+ Busybox Total: 100M/33.9M

42 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4842 Experimental Environment  CPU: Intel Core2 U9400 1.4GHz (use one core)  Memory: 512MB  Network Interface Card: Atheros AR8131 (at 100MBps)  Hypervisor: Xen 3.4.2  Dom-0: Linux 2.6.18.8  Guest OS: Windows XP  CPU Benchmark Tools: Chrome V8 Benchmark Suite SuperPI 1.1e  Hard Disk Drive Benchmark Tools HD Tune Pro v3.50  Network Benchmark Tools Iperf (Server: 2.0.4, Client: 1.7.0)

43 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4843 CPU Benchmark (1/2) 8.3%

44 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4844 CPU Benchmark (2/2) 5%

45 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4845 Network Benchmark (1/2) Testing Time: 180 seconds Benchmark Deviation: 0.12%~0.26 59%

46 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4846 Network Benchmark (2/2) Sample Period: 2 seconds Average: 9.82%

47 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4847 Conclusion  We introduce the techniques for how to virtualize. i.e. full, para and hardware-assisted virtualization  We present the architecture of Xen.  Several parts in Xen are also introduced. PartIntroductions HypervisorBoot, Hyper Call, Grant Table, Event Channel CPU VirtualizationVMLAUNCH, VMRESUME Memory VirtualizationArchitecture, Translation, Shared Info Page I/O Device Virtualization Device Model (Emulated, PV and Pass-Through), I/O Ring Hardware Virtual MachineVirtualization Technology

48 National Taiwan University, Graduate Institute of Networking and Multimedia Tang-Hsun Tu /4848 Q & A


Download ppt "Introduction to Xen -A Hypervisor (on x86) Advisor: Chih-Wen Hsueh Student: Tang-Hsun Tu National Taiwan University Graduate Institute of Networking and."

Similar presentations


Ads by Google