Virtual Memory CS740 October 13, 1998 Topics page tables TLBs Alpha 21X64 memory system.

Slides:



Advertisements
Similar presentations
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
Advertisements

Fabián E. Bustamante, Spring 2007
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Virtual Memory. The Limits of Physical Addressing CPU Memory A0-A31 D0-D31 “Physical addresses” of memory locations Data All programs share one address.
COMP381 by M. Hamdi 1 Virtual Memory. COMP381 by M. Hamdi 2 Virtual Memory: The Problem For example: MIPS64 is a 64-bit architecture allowing an address.
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Virtual Memory I Steve Ko Computer Sciences and Engineering University at Buffalo.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
Virtual Memory Nov 27, 2007 Slide Source: Topics Motivations for VM Address translation Accelerating translation with TLBs class12.ppt.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley and Rabi Mahapatra & Hank Walker.
Virtual Memory.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
Virtual Memory October 2, 2000
Virtual Memory Topics Motivations for VM Address translation Accelerating translation with TLBs CS213.
Virtual Memory October 30, 2001 Topics Motivations for VM Address translation Accelerating translation with TLBs class19.ppt “The course that gives.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
Vm Computer Architecture Lecture 16: Virtual Memory.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan
Virtual Memory May 19, 2008 Topics Motivations for VM Address translation Accelerating translation with TLBs EECS213.
Virtual Memory Topics Motivations for VM Address translation Accelerating translation with TLBs CS 105 “Tour of the Black Holes of Computing!”
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
Lecture 19: Virtual Memory
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5B:Virtual Memory Adapted from Slides by Prof. Mary Jane Irwin, Penn State University Read Section 5.4,
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Virtual Memory March 23, 2000 Topics Motivations for VM Address translation Accelerating address translation with TLBs Pentium II/III memory system
Virtual Memory April 3, 2001 Topics Motivations for VM Address translation Accelerating translation with TLBs class20.ppt.
University of Amsterdam Computer Systems – virtual memory Arnoud Visser 1 Computer Systems Virtual Memory.
Virtual Memory Additional Slides Slide Source: Topics Address translation Accelerating translation with TLBs class12.ppt.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Virtual Memory.  Next in memory hierarchy  Motivations:  to remove programming burdens of a small, limited amount of main memory  to allow efficient.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
Carnegie Mellon 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Virtual Memory: Concepts Slides adapted from Bryant.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
3/1/2002CSE Virtual Memory Virtual Memory CPU On-chip cache Off-chip cache DRAM memory Disk memory Note: Some of the material in this lecture are.
CS203 – Advanced Computer Architecture Virtual Memory.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CS161 – Design and Architecture of Computer
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Section 9: Virtual Memory (VM)
From Address Translation to Demand Paging
From Address Translation to Demand Paging
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
CS 105 “Tour of the Black Holes of Computing!”
CS 105 “Tour of the Black Holes of Computing!”
Virtual Memory: Concepts /18-213/14-513/15-513: Introduction to Computer Systems 17th Lecture, October 23, 2018.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CS 105 “Tour of the Black Holes of Computing!”
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Virtual Memory Nov 27, 2007 Slide Source:
CS 105 “Tour of the Black Holes of Computing!”
CS 105 “Tour of the Black Holes of Computing!”
CS703 - Advanced Operating Systems
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Virtual Memory.
Instructor: Phil Gibbons
Presentation transcript:

Virtual Memory CS740 October 13, 1998 Topics page tables TLBs Alpha 21X64 memory system

CS 740 F’98– 2 – Levels in a Typical Memory Hierarchy CPU regs CacheCache Memory disk size: speed: $/Mbyte: block size: 200 B 3 ns 4 B register reference cache reference memory reference disk memory reference 32 KB / 4MB 6 ns $256/MB 8 B 128 MB 100 ns $2/MB 4 KB 20 GB 10 ms $0.10/MB larger, slower, cheaper 4 B8 B4 KB cachevirtual memory

CS 740 F’98– 3 – Virtual Memory Main memory acts as a cache for the secondary storage (disk) Increases Program-Accessible Memory address space of each job larger than physical memory sum of the memory of many jobs greater than physical memory

CS 740 F’98– 4 – Process 1: Virtual addresses (VA) Physical addresses (PA) VP 1 VP 2 Process 2: PP2 address translation n m -1 VP 1 VP 2 PP7 PP10 (Read-only library code) Address Spaces Virtual and physical address spaces divided into equal-sized blocks –“Pages” (both virtual and physical) Virtual address space typically larger than physical Each process has separate virtual address space

CS 740 F’98– 5 – Other Motivations Simplifies memory management main reason today Can have multiple processes resident in physical memory Their program addresses mapped dynamically –Address 0x100 for process P1 doesn’t collide with address 0x100 for process P2 Allocate more memory to process as its needs grow Provides Protection One process can’t interfere with another –Since operate in different address spaces Process cannot access privileged information –Different sections of address space have different access permissions

CS 740 F’98– 6 – Contrast: Macintosh Memory Model Does not Use Traditional Virtual Memory All objects accessed through “Handles” Indirect reference through table Objects can be relocated by updating pointer in table P1 Handles P2 Handles Process P1 Process P1 Shared Address Space

CS 740 F’98– 7 – VM as part of the memory hierarchy p p q Access word w in virtual page p (hit) p p q Access word v in virtual page q (miss or “page fault”) w cache block page q p q p q v cache block memory disk (page frames) (pages)

CS 740 F’98– 8 – VM address translation V = {0, 1,..., n - 1} virtual address space M = {0, 1,..., m - 1} physical address space MAP: V --> M U {  } address mapping function n > m MAP(a) = a' if data at virtual address a is present at physical address a' and a' in M =  if data at virtual address a is not present in M Processor Name Space V Addr Trans Mechanism fault handler Main Memory Secondary memory a a a'  missing item fault physical address OS performs this transfer

CS 740 F’98– 9 – VM address translation virtual page numberpage offset virtual address physical page numberpage offset physical address 011 address translation Notice that the page offset bits don't change as a result of translation

CS 740 F’98– 10 – Address translation with a page table virtual page numberpage offset virtual address physical page numberpage offset physical address page table base register if valid=0 then page is not in memory and page fault exception valid physical page number access VPN acts as table index

CS 740 F’98– 11 – Page Tables

CS 740 F’98– 12 – Page Table Operation Translation separate (set of) page table(s) per process VPN forms index into page table Computing Physical Address Page Table Entry (PTE) provides information about page –Valid bit = 1 ==> page in memory. »Use physical page number (PPN) to construct address –Valid bit = 0 ==> page in secondary memory »Page fault »Must load into main memory before continuing Checking Protection Access rights field indicate allowable access –E.g., read-only, read-write, execute-only –Typically support multiple protection modes (e.g., kernel vs. user) Protection violation fault if don’t have necessary permission

CS 740 F’98– 13 – VM design issues Everything driven by enormous cost of misses: hundreds of thousands to millions of clocks. –vs units or tens of clocks for cache misses. disks are high latency –Typically 10 ms access time Moderate disk to memory bandwidth –10 MBytes/sec transfer rate Large block sizes: Typically 4KB–16 KB amortize high access time reduce miss rate by exploiting spatial locality Perform Context Switch While Waiting Memory filled from disk by direct memory access Meanwhile, processor can be executing other processes

CS 740 F’98– 14 – VM design issues (cont) Fully associative page placement: eliminates conflict misses every miss is a killer, so worth the lower hit time Use smart replacement algorithms handle misses in software miss penalty is so high anyway, no reason to handle in hardware small improvements pay big dividends Write back only: disk access too slow to afford write through + write buffer

CS 740 F’98– 15 – CPU Trans- lation Cache Main Memory VAPA miss hit data Integrating VM and cache Most Caches “Physically Addressed” Accessed by physical addresses Allows multiple processes to have blocks in cache at same time Allows multiple processes to share pages Cache doesn’t need to be concerned with protection issues –Access rights checked as part of address translation Perform Address Translation Before Cache Lookup But this could involve a memory access itself Of course, page table entries can also become cached

CS 740 F’98– 16 – CPU TLB Lookup Cache Main Memory VAPA miss hit data Trans- lation hit miss Speeding up Translation with a TLB Translation lookaside buffer (TLB) small, usually fully associative cache maps virtual page numbers to physical page numbers Contains complete page table entries for small number of pages

CS 740 F’98– 17 – Address translation with a TLB virtual address virtual page number page offset physical address validphysical page numbertag valid dirty valid tagdata = cache hit tagbyte offset index = TLB hit process ID

CS 740 F’98– 18 – Alpha AXP TLB page size: 8KB hit time: 1 clock miss penalty: 20 clocks TLB size: ITLB 8 PTEs, DTLB 32 PTEs replacement: random(but not last used) placement: Fully assoc

CS 740 F’98– 19 – TLB-Process Interactions TLB Translates Virtual Addresses But virtual address space changes each time have context switch Could flush TLB Every time perform context switch Refill for new process by series of TLB misses ~100 clock cycles each Could Include Process ID Tag with TLB Entry Identifies which address space being accessed OK even when sharing physical pages

CS 740 F’98– 20 – TLB Lookup Cache VA PA CPU Data Tag = Hit Index Virtually-Indexed Cache Cache Index Determined from Virtual Address Can begin cache and TLB index at same time Cache Physically Addressed Cache tag indicates physical address Compare with TLB result to see if match –Only then is it considered a hit

CS 740 F’98– 21 – Generating Index from Virtual Address Size cache so that index is determined by page offset Can increase associativity to allow larger cache E.g., early PowerPC’s had 32KB cache –8-way associative, 4KB page size Page Coloring Make sure lower k bits of VPN match those of PPN Page replacement becomes set associative Number of sets = 2 k physical page numberpage offset virtual page numberpage offset Index

CS 740 F’98– 22 – Example: Alpha Addressing Page Size Currently 8KB Page Tables Each table fits in single page Page Table Entry 8 bytes –32 bit physical page number –Other bits for valid bit, access information, etc. 8K page can have 1024 PTEs Alpha Virtual Address Based on 3-level paging structure Each level indexes into page table Allows 43-bit virtual address when have 8KB page size page offsetlevel 3level 2level

CS 740 F’98– 23 – Alpha Page Table Structure Tree Structure Node degree ≤ 1024 Depth 3 Nice Features No need to enforce contiguous page layout Dynamically grow tree as memory needs increase Level 1 Page Table Level 2 Page Tables Physical Pages Level 3 Page Tables

CS 740 F’98– 24 – Mapping an Alpha virtual address PTE size: 8 Bytes 13 bits 10 bits PT size: 1K PTEs (8 KBytes) 13 bits 21 bits

CS 740 F’98– 25 – Alpha Virtual Addresses Binary AddressSegmentPurpose 1…1 11 xxxx…xxxseg1Kernel accessible virtual addresses –E.g., page tables for this process 1…1 10 xxxx…xxxksegKernel accessible physical addresses –No address translation performed –Used by OS to indicate physical addresses 0…0 0x xxxx…xxxseg0User accessible virtual addresses –Only part accessible by user program Address Patterns Must have high order bits all 0’s or all 1’s –Currently 64–43 = 21 wasted bits in each virtual address Prevents programmers from sticking in extra information –Could lead to problems when want to expand virtual address space in future

CS 740 F’98– 26 – Alpha Seg0 Memory Layout Regions Data –Static space for global variables »Allocation determined at compile time »Access via $gp –Dynamic space for runtime allocation »E.g., using malloc Text –Stores machine code for program Stack –Implements runtime stack –Access via $sp Reserved –Used by operating system »shared libraries, process info, etc. Reserved Text (Code) Static Data Not yet allocated Stack Dynamic Data FF Reserved (shared libraries) Not yet allocated $gp $sp

CS 740 F’98– 27 – Alpha Seg0 Memory Allocation Address Range User code can access memory locations in range 0x to 0x000003FF Nearly 2 42  X10 12 byte range In practice, programs access far fewer Dynamic Memory Allocation Virtual memory system only allocates blocks of memory (“pages”) as needed As stack reaches lower addresses, add to lower allocation As break moves toward higher addresses, add to upper allocation –Due to calls to malloc, calloc, etc. Text (Code) Static Data Stack Region Dynamic Data break Current $sp Minimum $sp

CS 740 F’98– 28 – Page Table Configurations Minimal: 8MB Maximal: 4TB (All of Seg0) PTE 0 PTE 1 PTE 1023 Entries for 1024 pages Level 1Level 2Level 3 PTE 0 PTE 1 PTE 1023 Entries for 2 29 pages Level 1 Level 2 (X 512) Level 3 (X 524,288) PTE 0 PTE 1 PTE 1023 PTE 0 PTE 1 PTE 1023

CS 740 F’98– 29 – Where Are the Page Tables? All in Physical Memory? Uses up large fraction of physical address space –~8GB for maximal configuration Hard to move around –E.g., whenever context switch Some in Virtual Memory? E.g., level 3 page tables put in seg1 Level 2 PTE give VPN for level 3 page Make sure seg1 page tables in physical memory –Full configuration would require 4GB of page tables –1026 must be in physical memory »1 Level 1 »512 (map seg0) + 1 (maps seg1) Level 2’s »512 (maps seg1) Level 3’s May have two page faults to get single word into memory

CS 740 F’98– 30 – Expanding Alpha Address Space Increase Page Size Increasing page size 2X increases virtual address space 16X –1 bit page offset, 1 bit for each level index Physical Memory Limits Cannot be larger than kseg VA bits –2 ≥ PA bits Cannot be larger than 32 + page offset bits –Since PTE only has 32 bits for PPN Configurations Page Size8K16K32K64K VA Size PA Size page offsetlevel 3level 2level 1 13+k10+k

CS 740 F’98– 31 – Alpha AXP memory hierarchy cache block size: 32 bytes page size: 8 KBytes virtual address size: 43 bits physical address size: 34 bits 8 entries byte blocks 8 KBytes direct mapped 32 entries byte blocks 8 KBytes direct mapped write through no write alloc 4 entries64K 32-byte blocks 2 MBytes direct mapped write back write allocate

CS 740 F’98– 32 – Block Diagram Microprocessor Report, Sept. ‘94 L1 caches small enough to allow virtual indexing L2 cache access not required until after TLB completes

CS 740 F’98– 33 – Processor Chip Alpha Hierarchy Improving memory performance was main design goal Earlier Alpha’s CPUs starved for data L1 Data 1 cycle latency 8KB, direct Write-through Dual Ported 32B lines L1 Instruction 8KB, direct 32B lines Regs. L2 Unified 8 cycle latency 96KB 3-way assoc. Write-back Write allocate 32B/64B lines L3 Unified 1M-64M direct Write-back Write allocate 32B or 64B lines Main Memory Up to 1TB

CS 740 F’98– 34 – Other System Examples