Virtualisation Front Side Buses SMP systems COMP311 2007 Jamie Curtis.

Slides:



Advertisements
Similar presentations
Virtualization Technology
Advertisements

AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Bart Miller. Outline Definition and goals Paravirtualization System Architecture The Virtual Machine Interface Memory Management CPU Device I/O Network,
Virtual Machines What Why How Powerpoint?. What is a Virtual Machine? A Piece of software that emulates hardware.  Might emulate the I/O devices  Might.
Nov COMP60621 Concurrent Programming for Numerical Applications Lecture 6 Chronos – a Dell Multicore Computer Len Freeman, Graham Riley Centre for.
G Robert Grimm New York University Disco.
Multiprocessing Memory Management
Xen and the Art of Virtualization A paper from the University of Cambridge, presented by Charlie Schluting For CS533 at Portland State University.
Network Implementation for Xen and KVM Class project for E : Network System Design and Implantation 12 Apr 2010 Kangkook Jee (kj2181)
A critical assault upon “A Comparison of Software and Hardware Techniques for x86 Virtualization” Chris Smowton.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Virtual Machines. Virtualization Virtualization deals with “extending or replacing an existing interface so as to mimic the behavior of another system”
Virtualization for Cloud Computing
The AMD and Intel Architectures COMP Jamie Curtis.
Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,
LINUX Virtualization Running other code under LINUX.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Tanenbaum 8.3 See references
A Comparison of Software and Hardware Techniques for x86 Virtualization Keith Adams Ole Agesen Oct. 23, 2006.
Zen and the Art of Virtualization Paul Barham, et al. University of Cambridge, Microsoft Research Cambridge Published by ACM SOSP’03 Presented by Tina.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Chapter 5. Outline (2nd part)
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D FIS Distinguished Professor of Computer Science School of.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
CSC 7080 Graduate Computer Architecture Lec 12 – Advanced Memory Hierarchy 2 Dr. Khalaf Notes adapted from: David Patterson Electrical Engineering and.
Virtual Machine Monitors: Technology and Trends Jonathan Kaldor CS614 / F07.
Virtual Machine and its Role in Distributed Systems.
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
IT253: Computer Organization
CS533 Concepts of Operating Systems Jonathan Walpole.
System Virtualization 1 Learning Objective: –To understand the implementation choices and details of System Virtualization COMP
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Cloud Operating System Unit 09 Cloud OS Core Technology M. C. Chiang Department of Computer Science and Engineering National Sun Yat-sen University Kaohsiung,
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Introduction to virtualization
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
Lecture 12 Virtualization Overview 1 Dec. 1, 2015 Prof. Kyu Ho Park “Understanding Full Virtualization, Paravirtualization, and Hardware Assist”, White.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Extending Xen * with Intel ® Virtualization Technology Mobile Embedded System Choi, Jin-yong
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Translation Lookaside Buffer
Virtualization for Cloud Computing
Introduction to Virtualization
Virtualization.
Virtual Machine Monitors
Virtualization Technology
Xen and the Art of Virtualization
L2- Virtualization Technology
Presented by Mike Marty
Presented by Yoon-Soo Lee
Central Processing Unit- CPU
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Virtualization overview
Running other code under LINUX
OS Virtualization.
Translation Lookaside Buffer
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Virtualization Dr. S. R. Ahmed.
Xen and the Art of Virtualization
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
System Virtualization
CS295: Modern Systems Virtualization
Presentation transcript:

Virtualisation Front Side Buses SMP systems COMP Jamie Curtis

Virtualisation Allows a “Host” OS to execute a “Guest” OS as a task No need for “Guest” OS’s cooperation Host is often called a Hypervisor or VMM VMM controls devices Often presents virtual devices to guest OS’s

Virtualisation cont. PC architecture makes this tricky Protected mode introduced privilege rings 4 Levels – 0,1,2,3 with decreasing privilege Operating systems assume full control (ring 0) and userspace at ring 3 There is no control on reading this state Some instructions silently fail if not run at the correct privilege level instead of causing a fault

Virtualisation cont. 4 approaches to fix this Emulation Paravirtualisation Binary Translation Hardware Assisted Emulation Fake the entire machine Very slow but doesn’t require the same host architecture as the guest

Paravirtualisation Alter the guest so that it doesn’t use the “bad” instructions. Guest instead calls out to the VMM Very fast with minimal overhead Requires support from all guest OS’s Effectively limited to open source OS’s Championed by Xen

Binary Translation The VMM now dynamically translates the byte stream before it’s executed, replacing “bad” instructions as it goes Lots of optimisations make this not nearly as bad as it sounds. Allows running un-modified OS’s Performance hit can be anything from 5 – 60% depending on workload. Championed by VMware

Hardware Virtualisation Adds another privilege level for the VMM Hardware maintains state for each guest VMM gets very flexible control of what causes faults Intel and AMD again have similar specs (but incompatible !) AMD – “Pacifica”, Intel – “Vanderpool”

Hardware Virtualisation cont. Initial solutions don’t deal with enough in hardware, causing them often to be slower than BT Transition into and out of guests is very slow VMware and Xen both have added support Hardware support is getting more complete in newer versions

Virtual I/O Devices Typically VMMs provide virtual hardware drivers to the guests These may then map onto real hardware inside the VMM Allowing secure and separated direct access to hardware is difficult

AMD 10h Family AMD’s latest architecture adds a number of important virtualisation enhancements Primarily focused around making fewer calls into the VMM Three main enhancements Nested Page Tables Tagged TLB Device Exclusion Vector (DEV)

Nested Page Tables Current designs use shadowed page tables The MMU is setup for the current guest / process Changes to the MMU are trapped by the VMM Nested page tables replicate the page table per VM. Looks similar to the virtual address space provided by the OS to processes, just one more level up Makes lookups through the page table slower

Tagged TLB TLB’s cache lookups through the page tables Typically they are flushed each time you switch process or VM Tagged TLB’s add a tag to reference which VM this tag relates to No longer have to flush TLB’s when switching VM Makes VM – VMM – VM transitions cheaper Reduces the performance hit of nested page tables

Device Exclusion Vector Contains a mapping of what pages a device can access Allows the VMM to give DMA access to specific pages Allows a device to do DMA directly into a specific VM’s memory space

Virtualisation Future Linux now has 4 full virtualisation packages VMWare, Xen, KVM, Lguest Intel and AMD working on next-gen hardware support Dense, multi-core solutions making virtualisation very attractive Power and space efficient Keeps getting more and more important

Traditional Intel Architecture Front Side Bus Northbridge Southbridge

Clock vs. Data rates Clock rates no longer equal data rates Watch specifications as both can look identical e.g. Intel's FSB is normally referred to as an 1333MHz bus It is actually a 333MHz clock with a quad-pumped data rate Should really be 1333MT/s

Frontside Bus & Dual Core 64 bit, quad-pumped 333MHz 8 * 4 * 333 = 10.6GB/s Half Duplex Traditional P4 FSB has been point to point Xeon’s SMP requires chipsets support a proper multipoint bus for the FSB. How did a dual core Pentium D work ?

Intel Dual Core Slap two cores on the same die ! How do they communicate ? Over the FSB like a Xeon ! Requires a new chipset to support it.

Caches There are two sets of L1 and L2 cache, one for each core. What happens if both cores are caching the same memory location ? Need a protocol to make sure this doesn’t cause problems Cache Coherency Protocol

Cache Coherency Intel use the MESI Protocol Modified 1 cache contains a modified copy of the location Exclusive 1 cache contains a un-modified copy of the location Shared 2 or more caches contain un-modified copies of the location Invalid Another cache contains a modified copy of the location

MESI What happens if a core has an Invalid entry but tries to access it ? The cache with the Modified entry needs to write the entry back to memory and become a Shared entry This means an Intel Pentium D has to involve the FSB in all of it’s cache coherency updates even though they are on the same die

Intel Core Core architecture is designed for dual-core Cache is now shared between cores Cache coherency is now between each L1 and L2 Bus Control L2 Cache L2 Control Core 1 Core 2 FSB

Intel Core 2 Quad Current quad core design adds two dual core designs together Cache coherency between dies again happens over the FSB

AMD K8 Architecture Integrated Memory Controller Very low latency CPU determines memory technology Requires both CPU and Motherboard to be changed for a new type of memory Athlon 64, Socket 754 Single channel, 64bit DDR at 200MHz 8 * 2 * 200 = 3.2GB/s

AMD K8 Memory Interfaces Athlon 64, Socket 939 Dual channel, 64bit DDR at 200MHz 2 * 8 * 2 * 200 = 6.4 GB/s Athlon 64, Socket AM2 Dual channel, 64bit DDR2 at 400MHz 2 * 8 * 2 * 400 = 12.8 GB/s Opteron, Socket 940 Dual channel, 64bit DDR at 200MHz 2 * 8 * 2 * 200 = 6.4 GB/s

AMD K8 FSB AMD use the HyperTransport technology Don’t confuse with the very different Intel Hyper Threading Technology Packet based, point to point link that provides the lowest possible latency Full duplex bi-directional link Available in 2, 4, 8, 16 or 32 bits wide 50MHz – 1.4GHz clock rates Clock rates and bit widths can be asymmetric Double-pumped data rate 1.4GHz * 2 * 4 = 11.2GB/s per direction

Athlon 64 FSB The Athlon 64 has a single HyperTransport link to connect to I/O subsystem 16bit, bi-directional, DDR at 800MHz (754) or 1GHz (939) 2 * 2 * 2 * 1000 = 8GB/s Can have either a single chip solution or stay with two chip, northbridge, southbridge combination

Opteron Uses HT for both the I/O interconnect and CPU interconnect CPU interconnect requires an additional Cache Coherency protocol addition to HT Come in three variants 1xx – Memory controller and 3 HT links 2xx – Memory controller and 3 HT links 8xx – Memory controller and 3 HT links

Opteron What makes them different ? Number of HT busses that support the CC protocol 1xx – Zero 2xx – One 8xx – Three This allows you to scale to different numbers of processors 1xx – 1 way, 2xx – 2way, 8xx – 4 or 8 way

AMD MOESI Cache Coherency slightly different to Intel’s Owner – This cache owns this memory location and it (not memory) services all requests for it from other caches The request goes across the high speed dedicated HT bus

AMD Dual Core In the dual core situation, it becomes even better:

AMD Dual Core Requests now go across the System Request Interface This runs at CPU core frequency System Request Interface also controls HT – Memory access as well as HT – HT communication in 2xx and 8xx Opterons

AMD Quad Core AMD again targeted native quad core instead of dual-die, dual core Introduced shared L3 cache