Address Translation with Paging Case studies for X86, SPARC, and PowerPC.

Slides:



Advertisements
Similar presentations
Paging Andrew Whitaker CSE451.
Advertisements

Paging 1 CS502 Spring 2006 Paging CS-502 Operating Systems.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 3 Memory Management Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
4/14/2017 Discussed Earlier segmentation - the process address space is divided into logical pieces called segments. The following are the example of types.
Virtual Memory Chapter 18 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
CS 153 Design of Operating Systems Spring 2015
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Memory Management (II)
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
CE6105 Linux 作業系統 Linux Operating System 許 富 皓. Chapter 2 Memory Addressing.
Chapter 3.2 : Virtual Memory
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan
Address Translation Mechanism of 80386
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
Lecture 19: Virtual Memory
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
Computer Architecture Memory Management Units Iolanthe II - Reefed down, heading for Great Barrier Island.
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Computer Structure 2012 – VM 1 Computer Structure X86 Virtual Memory and TLB Franck Sala Slides from Lihu and Adi’s Lecture.
Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.
1 Linux Operating System 許 富 皓. 2 Memory Addressing.
© 2004, D. J. Foreman 1 Virtual Memory. © 2004, D. J. Foreman 2 Objectives  Avoid copy/restore entire address space  Avoid unusable holes in memory.
Computer Structure 2012 – VM 1 Computer Structure X86 Virtual Memory and TLB Franck Sala Slides from Lihu and Adi’s Lecture.
Virtual Memory Muhamed Mudawar CS 282 – KAUST – Spring 2010 Computer Architecture & Organization.
Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
CHAPTER 3-3: PAGE MAPPING MEMORY MANAGEMENT. VIRTUAL MEMORY Key Idea Disassociate addresses referenced in a running process from addresses available in.
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Address Translation Case Studies Questions answered in this lecture: Case studies: x86, PowerPC, SPARC architecture How are page tables organized? What.
CS203 – Advanced Computer Architecture Virtual Memory.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Translation Lookaside Buffer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
ECE232: Hardware Organization and Design
Lecture 12 Virtual Memory.
From Address Translation to Demand Paging
From Address Translation to Demand Paging
Address Translation Mechanism of 80386
Chapter 8: Main Memory Source & Copyright: Operating System Concepts, Silberschatz, Galvin and Gagne.
Paging Adapted from: © Ed Lazowska, Hank Levy, Andrea And Remzi Arpaci-Dussea, Michael Swift.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Andy Wang Operating Systems COP 4610 / CGS 5765
Translation Lookaside Buffer
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
© 2004 Ed Lazowska & Hank Levy
CSE451 Virtual Memory Paging Autumn 2002
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Lecture 8: Efficient Address Translation
Paging and Segmentation
CS703 - Advanced Operating Systems
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Paging Andrew Whitaker CSE451.
Sarah Diesburg Operating Systems COP 4610
CS 444/544 Operating Systems II Virtual Memory Translation
Virtual Memory 1 1.
Paging Adapted from: © Ed Lazowska, Hank Levy, Andrea And Remzi Arpaci-Dussea, Michael Swift.
Presentation transcript:

Address Translation with Paging Case studies for X86, SPARC, and PowerPC

Overview Page tables –What are they? (review) –What does a page table entry (PTE) contain? –How are page tables organized? Making page table access fast –Caching entries –Translation lookaside buffer (TLB) –TLB management

Generic Page Table Memory divided into pages Page table is a collection of PTEs that maps a virtual page number to a PTE Organization and content vary with architecture If no virtual to physical mapping => page fault Virtual page # PTE (or page fault) ? … Page Table

Generic PTE PTE maps virtual page to physical page Includes some page properties –Valid?, writable?, dirty?, cacheable? Physical Page #Property bitsVirtual Page # Some acronyms used in this lecture: PTE = page table entry PDE = page directory entry VA = virtual address PA = physical address VPN = virtual page number {R,P}PN = {real, physical} page number

Real Page Tables Design requirements –Minimize memory use (PT are pure overhead) –Fast (logically accessed on every memory ref) Requirements lead to –Compact data structures –O(1) access (e.g. indexed lookup, hashtable) Examples: X86 and PowerPC

X86-32 Address Translation Page tables organized as a two-level tree Efficient because address space is sparse Each level of the tree indexed using a piece of the virtual page number for fast lookups One set of page tables per process Current set of page tables pointed to by CR3 CPU walks the page tables to find translations Accessed and dirty bits updated by CPU 4K or 4M (sometimes 2M) pages

X86-32 PDE and PTE Details Available Global PAT Dirty Accessed Cache Disabled Write-through User/Supervisor Read/Write Present IA-32 Intel Architecture Software Developer’s Manual, Volume 3, pg bit page number of a physical memory page12 bit properties Where is the virtual page number? If a page is not present, all but bit 0 are available for OS Available Global 4K or 4M Page Reserved Accessed Cache Disabled Write-through User/Supervisor Read/Write Present 20 bit page number of a PTE12 bit properties Page Directory Entry (PDE) Page Table Entry (PDE)

X86-32 Page Table Lookup Top 10 address bits index page directory and return a page directory entry that points to a page table Middle 10 bits index the page table that points to a physical memory page Bottom 12 bits are an offset to a single byte in the physical page Checks made at each step to ensure desired page is available in memory and that the process making the request has sufficient rights to access the page Page Directory Page Tables Physical Memory 32-bit virtual address 10-bit page dir index10-bit page tbl index12-bit offset of byte in page

X86-32 and PAE Intel added support for up to 64GB of physical memory in the Pentium Pro - called Physical Address Extensions (PAE) Introduced a new CPU mode and another layer in the page tables In PAE mode, 32-bit VAs map to 36-bit PAs Single-process address space is still 32 bits 4-entry page-directory-pointer-table (PDPT) points to a page directory and then translation proceeds as normal Page directory and page table entries expanded to 64 bits to hold 36 bit physical addresses Only 512 entries per 4K page 4K or 2M page sizes

What about 64-bit X86? X86-64 (AMD64 or EM64T) supports a 64- bit virtual address (only 48 bits effective) Three modes –Legacy 32-bit (32-bit VA, 32-bit PA) –Legacy PAE (32-bit VA, up to 52-bit PA) –Long PAE mode (64-bit VA, 52-bit PA) Long mode requires four levels of page tables to map 48-bit VA to 52-bit PA AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Ch. 5

PowerPC Address Translation 80-bit virtual address obtained via PowerPC segmentation mechanism 62-bit physical (“real”) address PTEs organized in a hash table (HTAB) Each HTAB entry is a page table entry group (PTEG) Each PTEG has (8) 16-byte PTEs Hash function on VPN gives the index of two PTEGs (Primary and secondary PTEGs) Resulting 16 PTEs searched for a VPN match No match => page fault

PowerPC Segmentation SLB is an “associative memory” Top 36 bits of a program- generated “effective address used as a tag called the effective segment id (ESID) Search for tag value in SLB If a match exists, property bits validated for access A failed match causes segment fault Associated 52-bit virtual segment id (VSID) is concatenated with the remaining address bits to form an 80-bit virtual address Segmentation used to separate processes within the large virtual address space Segment Lookaside Buffer (SLB) 36-bit ESID28 address bits ESID52-bit VSID Property bits (U/S, X, V) Associative Lookup 52-bit VSID 64-bit “effective” address generated by a program 28 address bits 80-bit “virtual” address used for page table lookup Matching entry

PowerPC Page Table Lookup Variable size hash table Processor register points to hash table base and gives table’s size Architecture-defined hash function on virtual address returns two possible hash table entries Each of the 16 possible PTEs is checked for a VA match If no match then page fault Possibility that a translation exists but that it can’t fit in the hash table – OS must handle Hash Table (HTAB) 80-bit virtual address Primary hash index Secondary hash index Hash function Secondary PTEG Primary PTEG 16-byte PTE Match? No Match? Page Fault

PowerPC PTE Details 16-byte PTE Both VPN and RPN Why only 57 bit VPN? Abbreviated Virtual Page NumberSW/HV //Real Page Number// ACAC RCWIMGNPP Key SW=Available for OS use H=Hash function ID V=Valid bit AC=Address compare bit R=Referenced bit C=Changed bit WIMG=Storage control bits N=No execute bit PP=Page protection bits PowerPC Operating Environment Architecture, Book III, Version 2.01, Sections

Making Translation Fast Page table logically accessed on every instruction Paging has turned each memory reference into at least three memory references Page table access has temporal locality Use a cache to speed up access Translation Lookaside Buffer (TLB)

Generic TLB Cache of recently used PTEs Small – usually about 64 entries Huge impact on performance Various organizations, search strategies, and levels of OS involvement possible Consider X86 and SPARC TLB Virtual Address Physical Address or TLB Miss or Access fault

TLB Organization A AB ABCD ABCDELMNOP Direct mapped Fully associative Two-way set associative Four-way set associative Tag (virtual page number) Value (page table entry) TLB Entry Various ways to organize a 16-entry TLB Lookup Calculate index (index = tag % num_sets) Search for tag within the resulting set Why not use upper bits of tag value for index? Set Index

Associativity Trade-offs Higher associativity – Better utilization, fewer collisions – Slower – More hardware Lower associativity – Fast – Simple, less hardware – Greater chance of collisions How does page size affect TLB performance?

X86 TLB TLB management shared by processor and OS CPU fills TLB on demand from page table (the OS is unaware of TLB misses) CPU evicts entries when a new entry must be added and no free slots exist Operating system ensures TLB/page table consistency by flushing entries as needed when the page tables are updated or switched (e.g. during a context switch) TLB entries can be removed by the OS one at a time using the INVLPG instruction or the entire TLB can be flushed at once by writing a new entry into CR3

Example: Pentium-M TLBs Four different TLBs –Instruction TLB for 4K pages 128 entries, 4-way set associative –Instruction TLB for large pages 2 entries, fully associative –Data TLB for 4K pages 128 entries, 4-way set associative –Data TLB for large pages 8 entries, 4-way set associative All TLBs use LRU replacement policy Why different TLBs for instruction, data, and page sizes?

SPARC TLB SPARC is RISC (simpler is better) CPU Example of a “software-managed” TLB TLB miss causes a fault, handled by OS OS explicitly adds entries to TLB OS is free to organize its page tables in any way it wants because the CPU does not use them E.g. Linux uses a tree like X86, Solaris uses a hash table

Minimizing Flushes On SPARC, TLB misses trap to OS (SLOW) We want to avoid TLB misses Retain TLB contents across context switch SPARC TLB entries enhanced with a context id Context id allows entries with the same VPN to coexist in the TLB (e.g. entries from different process address spaces) When a process is switched back onto a processor, chances are that some of its TLB state has been retained from the last time it ran Some TLB entries shared (OS kernel memory) –Mark as global –Context id ignored during matching

Example:UltraSPARC III TLBs Five different TLBs Instruction TLBs –16 entries, fully associative (supports all page sizes) –128 entries, 2-way set associative (8K pages only) Data TLBs –16 entries, fully associative (supports all page sizes) –2 x 512 entries, 2-way set associative (each supports one page size per process) Valid page sizes – 8K (default), 64K, 512K, and 4M 13-bit context id – 8192 different concurrent address spaces What happens if you have > 8192 processes?

Speeding Up TLB Miss Handling In some cases a huge amount of time can be spent handling TLB misses (2-50% in one study of SuperSPARC and SunOS) Many architectures that use software managed TLBs have hardware assisted TLB miss handling SPARC uses a large, virtually-indexed, direct-mapped, physically contiguous table of recently used TLB entries called the Translation Storage Buffer (TSB) The location of the TSB is loaded into the processor on context switch (implies one TSB per process) On TLB miss, hardware calculates the offset of the matching entry into the TSB and supplies it to the software TLB miss handler In most cases, the software TLB miss handler only needs to make a tag comparison to the TSB entry, load it into the TLB, and return If an access misses in the TSB then a slow software search of page tables is required