Agile Paging: Exceeding the Best of Nested and Shadow Paging

Slides:



Advertisements
Similar presentations
虛擬化技術 Virtualization Technique
Advertisements

Memory Management: Overlays and Virtual Memory
Efficient Virtual Memory for Big Memory Servers U Wisc and HP Labs ISCA’13 Architecture Reading Club Summer'131.
E Virtual Machines Lecture 3 Memory Virtualization
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley and Rabi Mahapatra & Hank Walker.
CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan
Operating System Support for Virtual Machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
CSE 451: Operating Systems Winter 2012 Module 18 Virtual Machines Mark Zbikowski and Gary Kimura.
Tanenbaum 8.3 See references
Some VM Complications Extra memory accesses Page tables are huge
Operating System Support for Virtual Machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
CS533 Concepts of Operating Systems Jonathan Walpole.
Revisiting Hardware-Assisted Page Walks for Virtualized Systems
System Virtualization 1 Learning Objective: –To understand the implementation choices and details of System Virtualization COMP
Accelerating Two-Dimensional Page Walks for Virtualized Systems Jun Ma.
CS399 New Beginnings Jonathan Walpole. Virtual Memory (1)
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
Redundant Memory Mappings for Fast Access to Large Memories
Virtual Memory Questions answered in this lecture: How to run process when not enough physical memory? When should a page be moved from disk to memory?
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
CHAPTER 3-3: PAGE MAPPING MEMORY MANAGEMENT. VIRTUAL MEMORY Key Idea Disassociate addresses referenced in a running process from addresses available in.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
CS203 – Advanced Computer Architecture Virtual Memory.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Virtual Machines (part 2) CPS210 Spring Papers  Xen and the Art of Virtualization  Paul Barham  ReVirt: Enabling Intrusion Analysis through Virtual.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
COSC6385 Advanced Computer Architecture Lecture 7. Virtual Memory
Virtual Memory Chapter 7.4.
Performance Implications of Extended Page Tables on Virtualized x86 Processors Tim Merrifield and Reza Taheri © 2014 VMware Inc. All rights reserved.
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
Virtual Memory - Part II
Section 9: Virtual Memory (VM)
From Address Translation to Demand Paging
143A: Principles of Operating Systems Lecture 6: Address translation (Paging) Anton Burtsev October, 2017.
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
Lecture 24 Virtual Machine Monitors
CS510 Operating System Foundations
CSE 153 Design of Operating Systems Winter 2018
Energy-Efficient Address Translation
Executive Summary Problem: Overheads of virtual memory can be high
OS Virtualization.
Reducing Memory Reference Energy with Opportunistic Virtual Caching
Operating System Support for Virtual Machines
Translation Lookaside Buffer
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Virtual Memory Overcoming main memory size limitation
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE451 Virtual Memory Paging Autumn 2002
Lecture 8: Efficient Address Translation
Paging and Segmentation
CSE 153 Design of Operating Systems Winter 2019
Paging Andrew Whitaker CSE451.
Slides from E0-253 taught by Arkaprava Basu and Vinod Ganapathy
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

Agile Paging: Exceeding the Best of Nested and Shadow Paging Jayneel Gandhi, Mark D. Hill, Michael M. Swift

Can we get best of both for same address space (or same page walk)? Executive Summary Problem: Virtualization valuable but have high overheads with larger workloads (at most 70% slower than native) Existing Choices: Nested Paging: slow page walk but fast page table updates Shadow Paging: fast page walk but slow page table updates Can we get best of both for same address space (or same page walk)? Yes, Agile Paging: use shadow paging and sometime switch to nested paging within the same page walk (at most 4% slower than native)

Outline Motivation  Agile Paging Results Summary

Virtualization Overview APP APP Benefits: Foundation of our cloud infrastructure Provides on-demand virtual instances Helps server consolidation Guest OS VMM Problem: Overheads of virtualizing memory is high At most 70% slower than unvirtualized Hardware

Guest Physical Address Virtualizing Memory APP APP gVA Guest Virtual Address Guest OS Guest Page Table gPA Guest Physical Address VMM Nested Page Table Hardware hPA Host Physical Address

Virtualizing Memory Two techniques to manage both page tables gVA gPA Guest Page Table Nested Page Table Two techniques to manage both page tables Nested Paging -- Hardware Shadow Paging – Software Evaluated on two axis: Page Walk Latency & Page Table Updates

Unvirtualized x86-64 Translation VA Virtual Address APP APP OS CR3 Hardware PA Physical Address At most mem accesses = 4

1. Nested Paging – Hardware hPA gVA gPA Guest Page Table Nested Page Table gVA Longer Page Walk gCR3 hPA At most Mem accesses 5 + 5 + 5 + 5 + 4 = 24

2. Shadow Paging – Software APP APP gVA Guest OS Guest Page Table (Read Only) Guest Page Table RO RO gPA Shadow Page Table VMM Nested Page Table Hardware hPA

2. Shadow Paging – Software hPA Guest Page Table (Read Only) Nested Page Table gVA Shadow Page Table Shorter Page Walk sCR3 At most mem accesses = 4

Page Table Updates In-place fast update Slow meditated update 1. Nested Paging 2. Shadow Paging gVA gVA VMM Trap Guest Page Table Guest Page Table (Read Only) gPA Shadow Page Table Nested Page Table Nested Page Table hPA hPA In-place fast update Slow meditated update

Guest Virtual Address Space Key Observation Fully static address space Reality !!! Guest Virtual Address Space Shadow Paging preferred Fully dynamic address space Small fraction of address space is dynamic Nested Paging preferred

Key Observation Guest Page Table gCR3 Nested Shadow

Outline Motivation Agile Paging Results Summary

Agile Paging Start page walk in shadow mode -- Achieving fast TLB misses Optionally switch to nested mode -- Allowing fast in-place updates Two parts of design: 1. Mechanism 2. Policy

1. Mechanism gVA gPA hPA Guest Page Table Shadow Page Table Nested Page Table gCR3 Shadow Page Table Guest Page Table sCR3 1 Read only Nested Page Table

1. Mechanism: Example Page Walk gVA gVA sCR3 gCR3 hPA Switch modes @ level 4 of guest page table At most Mem accesses 1 + 1 + 1 + 5 = 8

2. Policy: Shadow  Nested Start Shadow Write to page table (VMM Trap) Shadow (1 Write) Write to page table (VMM Trap) Nested Subsequent Writes (No VMM Traps)

2. Policy: Nested  Shadow Start Shadow Write to page table (VMM Trap) Shadow (1 Write) Write to page table (VMM Trap) Move non-dirty Timeout Use dirty bits to track writes to guest page table Nested Subsequent Writes (No VMM Traps)

Outline Motivation Agile Paging Results  Summary

Methodology Measure cost on page walks on real hardware Intel 12-core Sandy-bridge with 96GB memory 64-entry L1 TLB + 512-entry L2 TLB 4-way associative for 4KB pages 32-entry L1 TLB 4-way associative for 2MB pages Prototype VMM and emulate hardware in Linux v3.12.13 BadgerTrap for online analysis of TLB misses and emulate agile paging Linear model to predict performance Workloads Big-memory workloads, SPEC 2006, BioBench, PARSEC

Performance Results Modeled based on emulator: BadgerTrap B: Unvirtualized N: Nested Paging S: Shadow Paging A: Agile Paging Modeled based on emulator: BadgerTrap Measured using performance counters Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads

Performance Results Nested Paging has high overheads of TLB misses B: Unvirtualized N: Nested Paging S: Shadow Paging A: Agile Paging Nested Paging has high overheads of TLB misses Effect of longer page walk 28% 19% 18% 6% Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads

Shadow Paging has high overheads of VMM interventions Performance Results B: Unvirtualized N: Nested Paging S: Shadow Paging A: Agile Paging Shadow Paging has high overheads of VMM interventions 28% 70% 11% 19% 30% 18% 6% 6% Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads

Agile paging consistently performs better than both techniques Performance Results B: Unvirtualized N: Nested Paging S: Shadow Paging A: Agile Paging Agile paging consistently performs better than both techniques 28% 70% 11% 2% 19% 30% 18% 6% 2% 4% 6% 3% Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads

Can we get best of both for same address space (or same page walk)? Summary Problem: Virtualization valuable but have high overheads with larger workloads (At most 70% slower than native) Existing Choices: Nested Paging: slow page walk but fast page table updates Shadow Paging: fast page walk but slow page table updates Can we get best of both for same address space (or same page walk)? Yes, Agile Paging: use shadow paging and sometime switch to nested paging within the same page walk (At most 4% slower than native)

Questions ?

Can we get best of both worlds? Nested Paging Shadow Paging Agile Paging Dimensions 2D 1D # of memory accesses 24 4 ~4-5 Page table updates Fast in-place Slow out of place

Short-Lived Processes Issue: The cost of creating shadow page table is high Solution: Start shadow mode after 1 sec for agile paging Give user mode access to run only in nested mode

Accessed/Dirty Bits Issue: Shadow mode is slow for setting A/D bits Coherence between shadow and guest page tables causes VMM traps. Solution: Hardware Optimization Intel sets accessed/dirty bits on both guest and nested page tables Broadwell supports multiple page table walkers per-core We propose to write A/D bits on all three page tables by hardware

Context-Switches Issue: Intra-guest context switches with shadow mode are slower Guest OS does not know existence of shadow page table --- VMM trap Solution: Hardware Optimization Add a small VMM managed cache of guest CR3  shadow CR3 Looked up by hardware for matching entry on context-switch If hits, does not require VMM trap

Why does agile paging work? Switch Level Shadow L4 L3 L2 L1 Nested Mem. Acc. 4 8 12 16 20 24 Avg. graph500 99.8% 0.2% - 4.01 memcached 88.2% 4.5% 7.3% 4.76 canneal 94.7% 4.6% 0.7% 4.24 dedup 91.4% 2.2% 6.4% 4.60 Brings average number of memory accesses down to ~(4-5) from 24

Transparent Huge Page (2MB) B: Unvirtualized N: Nested Paging S: Shadow Paging A: Agile Paging 68% 13% 14% 4% 2% 14% 5% 2% 10% 6% 3% 2% Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads

Design Components Hardware VMM Three page table pointers Points to each of the page tables Enhanced page table walker Interprets switching bit Bridges the two state machines Manage three page tables Incremental from shadow paging Policies for changing modes Encapsulate policies in VMM