Dr. George Michelogiannakis EECS, University of California at Berkeley

Slides:



Advertisements
Similar presentations
Virtual Memory Basics.
Advertisements

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Address Translation and Protection Steve Ko Computer Sciences and Engineering University at.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Virtual Memory I Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590 Computer Architecture Virtual Memory II
February 25, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 11 - Virtual Memory and Caches Krste Asanovic Electrical Engineering.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CS 152 Computer Architecture and Engineering Lecture 11 - Virtual Memory and Caches Krste Asanovic Electrical Engineering and Computer Sciences University.
CS 152 Computer Architecture and Engineering Lecture 10 - Virtual Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California.
February 16, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 8 - Address Translation Krste Asanovic Electrical Engineering.
CS 152 Computer Architecture and Engineering Lecture 9 - Address Translation Krste Asanovic Electrical Engineering and Computer Sciences University of.
February 23, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 10 - Virtual Memory Krste Asanovic Electrical Engineering and.
CS 152 Computer Architecture and Engineering Lecture 9 - Address Translation Krste Asanovic Electrical Engineering and Computer Sciences University of.
February 18, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 9 - Address Translation Krste Asanovic Electrical Engineering.
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 15 CS252 Graduate Computer Architecture Spring 2014 Lecture 15: Virtual Memory and Caches Krste Asanovic.
Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.
Virtual Memory Expanding Memory Multiple Concurrent Processes.
CS 61C: Great Ideas in Computer Architecture Virtual Memory Instructors: Krste Asanovic, Randy H. Katz 1Fall.
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 14 Virtual Memory Benjamin Lee Electrical and Computer Engineering Duke University
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1
2/14/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 8 - Address Translation Krste Asanovic Electrical Engineering and Computer.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 14 Virtual Memory Benjamin Lee Electrical and Computer Engineering Duke University
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Virtual Memory.  Next in memory hierarchy  Motivations:  to remove programming burdens of a small, limited amount of main memory  to allow efficient.
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 14 CS252 Graduate Computer Architecture Spring 2014 Lecture 14: Memory Protection and Address Translation.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
CENG709 Computer Architecture and Operating Systems Lecture 8 - Address Translation Murat Manguoglu Department of Computer Engineering Middle East Technical.
Computer Architecture Lecture 12: Virtual Memory I
CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.
Bernhard Boser & Randy Katz
Non Contiguous Memory Allocation
Memory COMPUTER ARCHITECTURE
COSC3330 Computer Architecture Lecture 20. Virtual Memory
A Real Problem What if you wanted to run a program that needs more memory than you have? September 11, 2018.
From Address Translation to Demand Paging
CS703 - Advanced Operating Systems
From Address Translation to Demand Paging
CS 704 Advanced Computer Architecture
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
CSE 153 Design of Operating Systems Winter 2018
Dr. George Michelogiannakis EECS, University of California at Berkeley
CS 61C: Great Ideas in Computer Architecture Lecture 23: Virtual Memory Krste Asanović & Randy H. Katz 11/12/2018.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Peng Liu Lecture 13 Virtual Memory and Architectural Support for Operating System Peng Liu
From Address Translation to Demand Paging
Memory Management 11/17/2018 A. Berrached:CS4315:UHD.
CS 105 “Tour of the Black Holes of Computing!”
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 8 11/24/2018.
Lecture 13 Virtual Memory
Lecture 12 Virtual Memory
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 12/1/2018.
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Virtual Memory Nov 27, 2007 Slide Source:
CSE 451: Operating Systems Autumn 2005 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Contents Memory types & memory hierarchy Virtual memory (VM)
CSE451 Virtual Memory Paging Autumn 2002
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 8 – Address Translation Krste Asanovic Electrical Engineering.
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CS 105 “Tour of the Black Holes of Computing!”
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Address Translation and Virtual Memory CENG331 - Computer Organization
CS 105 “Tour of the Black Holes of Computing!”
CSE 153 Design of Operating Systems Winter 2019
Review What are the advantages/disadvantages of pages versus segments?
Virtual Memory 1 1.
Presentation transcript:

CS 152 Computer Architecture and Engineering Lecture 8 - Address Translation Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152 CS252 S05

Last time in Lecture 7 3 C’s of cache misses Write policies Compulsory, Capacity, Conflict Write policies Write back, write-through, write-allocate, no write allocate Multi-level cache hierarchies reduce miss penalty 3 levels common in modern systems (some have 4!) Can change design tradeoffs of L1 cache if known to have L2 Prefetching: retrieve memory data before CPU request Prefetching can waste bandwidth and cause cache pollution Software vs hardware prefetching Software memory hierarchy optimizations Loop interchange, loop fusion, cache tiling CS252 S05

Question of the Day After talking about virtual addresses, what challenge does this mean for caches (especially L1)? And what to do about it? CS252 S05

Bare Machine PC Physical Address D E M Physical Address W Inst. Cache Decode Data Cache + Memory Controller Physical Address Physical Address Physical Address Main Memory (DRAM) In a bare machine, the only kind of address is a physical address CS252 S05

How could location independence be achieved? Absolute Addresses EDSAC, early 50’s Only one program ran at a time, with unrestricted access to entire machine (RAM + I/O devices) Addresses in a program depended upon where the program was to be loaded in memory But it was more convenient for programmers to write location-independent subroutines How could location independence be achieved? Location independence: Load time rewriting PC relative branches PC relative data access or base pointer on entry to routine. Linker and/or loader modify addresses of subroutines and callers when building a program memory image CS252 S05

Dynamic Address Translation Motivation In early machines, I/O was slow and each I/O transfer involved the CPU (programmed I/O) Higher throughput possible if CPU and I/O of 2 or more programs were overlapped, how? =>multiprogramming with DMA I/O devices, interrupts Location-independent programs Programming and storage management ease => need for a base register Protection Independent programs should not affect each other inadvertently => need for a bound register Multiprogramming drives requirement for resident supervisor software to manage context switches between multiple programs Program 1 Physical Memory Program 2 OS CS252 S05

Simple Base and Bound Translation Segment Length Bound Register Bounds Violation? ≥ Physical Address Current Segment Physical Memory Logical Address Load X + Base Register Base Physical Address Program Address Space Base and bounds registers are visible/accessible only when processor is running in the supervisor mode CS252 S05

Separate Areas for Program and Data (Scheme used on all Cray vector supercomputers prior to X1, 2002) Load X Program Address Space Bounds Violation? Data Bound Register ≥ Data Segment Logical Address Mem. Address Register Physical Address Data Base Register + Main Memory Bounds Violation? Program Bound Register ≥ Logical Address Program Segment Program Counter Permits sharing of program segments. Program Base Register + Physical Address What is an advantage of this separation? CS252 S05

Program Bound Register Base and Bound Machine Program Bound Register Data Bound Register Bounds Violation? Bounds Violation? ≥ ≥ Logical Address Logical Address PC D E M W Decode Data Cache Inst. Cache + + + Physical Address Physical Address Program Base Register Data Base Register Physical Address Physical Address Memory Controller Physical Address Main Memory (DRAM) Can fold addition of base register into (register+immediate) address calculation using a carry-save adder (sums three numbers with only a few gate delays more than adding two numbers) CS252 S05

Question How much memory to allocate to each program? I.e., How to set the upper bound? What if we allocate not enough? What if we allocate too much?

Memory Fragmentation What if the next request is for 32K? free Users 4 & 5 arrive Users 2 & 5 leave OS Space 16K 24K 32K user 1 user 2 user 3 OS Space 16K 24K 32K user 1 user 2 user 3 user 5 user 4 8K OS Space 16K 24K 32K user 1 user 4 8K user 3 Called Burping the memory. What if the next request is for 32K? At some stage programs have to be moved around to compact the storage. CS252 S05

Paged Memory Systems Processor-generated address can be split into: Page Number Offset A Page Table contains the physical address at the start of each page 1 2 3 1 1 Physical Memory 2 2 3 3 Address Space of User-1 Page Table of User-1 Relaxes the contiguous allocation requirement. Page tables make it possible to store the pages of a program non-contiguously. CS252 S05

Private Address Space per Process (User) Page Table User 2 User 3 Physical Memory free OS pages OS ensures that the page tables are disjoint. Each user (process) has a page table Page table contains an entry for each user page CS252 S05

Question (2) How large do we make pages? Why make them large? Why make them small?

Where Should Page Tables Reside? Space required by the page tables (PT) is proportional to the address space, number of users, ... Too large to keep in registers Idea: Keep PTs in the main memory needs one reference to retrieve the page base address and another to access the data word  doubles the number of memory references! CS252 S05

Page Tables in Physical Memory VA1 User 1 Virtual Address Space User 2 Virtual Address Space PT User 1 PT User 2 Physical Memory Did we just increase memory access latency? CS252 S05

A Problem in the Early Sixties There were many applications whose data could not fit in the main memory, e.g., payroll Paged memory system reduced fragmentation but still required the whole program to be resident in the main memory CS252 S05

Manual Overlays Assume an instruction can address all the storage on the drum Method 1: programmer keeps track of addresses in the main memory and initiates an I/O transfer when required Difficult, error-prone! Method 2: automatic initiation of I/O transfers by software address translation Brooker’s interpretive coding, 1960 Inefficient! 40k bits main 640k bits drum Central Store Ferranti Mercury 1956 British Firm Ferranti, did Mercury and then Atlas Method 1 too difficult for users Method 2 too slow. Not just an ancient black art, e.g., IBM Cell microprocessor using in Playstation-3 has explicitly managed local store! CS252 S05

Demand Paging in Atlas (1962) “A page from secondary storage is brought into the primary storage whenever it is (implicitly) demanded by the processor.” Tom Kilburn Secondary (Drum) 32x6 pages Primary 32 Pages 512 words/page Central Memory Primary memory as a cache for secondary memory Single-level Store User sees 32 x 6 x 512 words of storage CS252 S05

Hardware Organization of Atlas 16 ROM pages 0.4-1 sec system code (not swapped) system data Effective Address Initial Address Decode 2 subsidiary pages 1.4 sec PARs 48-bit words 512-word pages 1 Page Address Register (PAR) per page frame Main 32 pages 1.4 sec Drum (4) 192 pages 8 Tape decks 88 sec/word 31 <effective PN , status> Compare the effective page address against all 32 PARs match  normal access no match  page fault save the state of the partially executed instruction CS252 S05

Atlas Demand Paging Scheme On a page fault: Input transfer into a free page is initiated The Page Address Register (PAR) is updated If no free page is left, a page is selected to be replaced (based on usage) The replaced page is written on the drum to minimize drum latency effect, the first empty page on the drum was selected The page table is updated to point to the new location of the page on the drum CS252 S05

Linear Page Table Page Table Entry (PTE) contains: Data word Data Pages Page Table Entry (PTE) contains: A bit to indicate if a page exists PPN (physical page number) for a memory-resident page DPN (disk page number) for a page on the disk Status bits for protection and usage OS sets the Page Table Base Register whenever active user process changes Page Table PPN PPN DPN PPN Offset DPN PPN PPN DPN DPN VPN DPN PPN PT Base Register PPN Supervisor Accessible Control Register inside CPU VPN Offset Virtual address from CPU Execute Stage CS252 S05

Size of Linear Page Table With 32-bit addresses, 4-KB pages & 4-byte PTEs: 220 PTEs, i.e, 4 MB page table per user 4 GB of swap needed to back up full virtual address space Larger pages? Internal fragmentation (Not all memory in page is used) Larger page fault penalty (more time to read from disk) What about 64-bit virtual address space??? How many page table entries (PTEs)? What is the “saving grace” ? Virtual address space is large but only a small fraction of the pages are populated. So we can use a sparse representation of the table. CS252 S05

Hierarchical Page Table Virtual Address from CPU 31 22 21 12 11 p1 p2 offset 10-bit L1 index 10-bit L2 index offset Root of the Current Page Table p2 p1 Physical Memory (Processor Register) Level 1 Page Table Level 2 Page Tables page in primary memory page in secondary memory What is the downside? PTE of a nonexistent page Data Pages CS252 S05

Two-Level Page Tables in Physical Memory Virtual Address Spaces Level 1 PT User 1 VA1 User 1 Level 1 PT User 2 Level 2 PT User 1 User2/VA1 VA1 User1/VA1 User 2 Level 2 PT User 2 CS252 S05

Address Translation & Protection Virtual Address Virtual Page No. (VPN) offset Kernel/User Mode Read/Write Protection Check Address Translation Exception? Physical Page No. (PPN) offset Physical Address Every instruction and data access needs address translation and protection checks A good VM design needs to be fast (~ one cycle) and space efficient CS252 S05

Translation Lookaside Buffers (TLB) Address translation is very expensive! In a two-level page table, each reference becomes several memory accesses Solution: Cache translations in TLB TLB hit  Single-Cycle Translation TLB miss  Page-Table Walk to refill W, R: Read and write permission bits D: Dirty V: Valid virtual address VPN offset V R W D tag PPN 3 memory references 2 page faults (disk accesses) + .. Actually used in IBM before paged memory. (PPN = physical page number) hit? physical address PPN offset CS252 S05

How Would You Design the TLB? Associativity? Replacement policy?

64 entries * 4 KB = 256 KB (if contiguous) TLB Designs Typically 32-128 entries, usually fully associative Each entry maps a large page, hence less spatial locality across pages  more likely that two entries conflict Sometimes larger TLBs (256-512 entries) are 4-8 way set-associative Larger systems sometimes have multi-level (L1 and L2) TLBs Random or FIFO replacement policy No process information in TLB? TLB Reach: Size of largest virtual address space that can be simultaneously mapped by TLB How much of the physical address space is the TLB mapping to? Example: 64 TLB entries, 4KB pages, one page per entry TLB Reach = _____________________________________________? 64 entries * 4 KB = 256 KB (if contiguous) CS252 S05

Hardware (SPARC v8, x86, PowerPC, RISC-V) Handling a TLB Miss Software (MIPS, Alpha) TLB miss causes an exception and the operating system walks the page tables and reloads TLB. A privileged “untranslated” addressing mode used for walk. Hardware (SPARC v8, x86, PowerPC, RISC-V) A memory management unit (MMU) walks the page tables and reloads the TLB. If a missing (data or PT) page is encountered during the TLB reloading, MMU gives up and signals a Page Fault exception for the original instruction. Tradeoffs? The hardware needs to know the structure of the page table for hardware handling. CS252 S05

Hierarchical H/W Page Table Walk: SPARC v8 31 11 0 Virtual Address Index 1 Index 2 Index 3 Offset 31 23 17 11 0 Context Table Register root ptr PTP PTE Context Table L1 Table L2 Table L3 Table Physical Address PPN Offset MMU does this table walk in hardware on a TLB miss CS252 S05

Page-Based Virtual-Memory Machine (Hardware Page-Table Walk) Page Fault? Protection violation? Page Fault? Protection violation? Virtual Address Virtual Address Physical Address Physical Address PC D E M W Inst. TLB Decode Data TLB Inst. Cache Data Cache + Miss? Miss? Page-Table Base Register Hardware Page Table Walker Physical Address Physical Address Memory Controller Physical Address Main Memory (DRAM) Assumes page tables held in untranslated physical memory CS252 S05

Address Translation: putting it all together Virtual Address hardware hardware or software software TLB Lookup miss hit Page Table Walk Protection Check the page is Ï memory Î memory denied permitted Need to restart instruction. Soft and hard page faults. Page Fault (OS loads page) Protection Fault Update TLB Physical Address (to cache) Where? SEGFAULT CS252 S05

Question of the Day After talking about virtual addresses, what challenge does this mean for caches (especially L1)? And what to do about it? CS252 S05

Acknowledgements These slides contain material developed and copyright by: Arvind (MIT) Krste Asanovic (MIT/UCB) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) David Patterson (UCB) MIT material derived from course 6.823 UCB material derived from course CS252 CS252 S05