Address Translation.

Slides:

Advertisements

Similar presentations

Virtual Memory Basics.

Advertisements

Tore Larsen Material developed by: Kai Li, Princeton University

Part IV: Memory Management

Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.

MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 3 Memory Management Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,

OS Fall’02 Virtual Memory Operating Systems Fall 2002.

4/14/2017 Discussed Earlier segmentation - the process address space is divided into logical pieces called segments. The following are the example of types.

Virtual Memory Chapter 18 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.

Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.

CS 153 Design of Operating Systems Spring 2015

Misc Exercise 2 updated with Part III.  Due on next Tuesday 12:30pm. Project 2 (Suggestion)  Write a small test for each call.  Start from file system.

CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.

Virtual Memory & Address Translation Vivek Pai Princeton University.

Memory Management (II)

Computer ArchitectureFall 2008 © November 10, 2007 Nael Abu-Ghazaleh Lecture 23 Virtual.

Memory Management and Paging CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.

Paging and Virtual Memory. Memory management: Review  Fixed partitioning, dynamic partitioning  Problems Internal/external fragmentation A process can.

Memory Management. 2 How to create a process? On Unix systems, executable read by loader Compiler: generates one object file per source file Linker: combines.

Memory Management 2010.

Chapter 3.2 : Virtual Memory

Address Translation.

Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.

Review of Memory Management, Virtual Memory CS448.

1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.

Operating Systems COMP 4850/CISG 5550 Page Tables TLBs Inverted Page Tables Dr. James Money.

COSC 3407: Operating Systems Lecture 13: Address Translation.

Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.

8.1 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Paging Physical address space of a process can be noncontiguous Avoids.

1 Address Translation Memory Allocation –Linked lists –Bit maps Options for managing memory –Base and Bound –Segmentation –Paging Paged page tables Inverted.

COS 318: Operating Systems Virtual Memory and Its Address Translations.

Fundamentals of Programming Languages-II Subject Code: Teaching SchemeExamination Scheme Theory: 1 Hr./WeekOnline Examination: 50 Marks Practical:

Address Translation. Recall from Last Time… Virtual addresses Physical addresses Translation table Data reads or writes (untranslated) Translation tables.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.

Processes and Virtual Memory

Address Translation Mark Stanovich Operating Systems COP 4610.

Memory Management. 2 How to create a process? On Unix systems, executable read by loader Compiler: generates one object file per source file Linker: combines.

CS203 – Advanced Computer Architecture Virtual Memory.

CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.

Computer Architecture Lecture 12: Virtual Memory I

CS161 – Design and Architecture of Computer

Non Contiguous Memory Allocation

ECE232: Hardware Organization and Design

Memory COMPUTER ARCHITECTURE

CS161 – Design and Architecture of Computer

CS703 - Advanced Operating Systems

Chapter 8: Main Memory Source & Copyright: Operating System Concepts, Silberschatz, Galvin and Gagne.

CSE 153 Design of Operating Systems Winter 2018

Memory Management.

Introduction to Operating Systems

Chapter 8: Main Memory.

Introduction to Operating Systems

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

CS399 New Beginnings Jonathan Walpole.

Virtual Memory Hardware

CSE 451: Operating Systems Autumn 2005 Memory Management

Virtual Memory Overcoming main memory size limitation

CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs

CSE451 Virtual Memory Paging Autumn 2002

CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management

CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs

CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management

Lecture 7: Flexible Address Translation

Lecture 8: Efficient Address Translation

Paging and Segmentation

CS703 - Advanced Operating Systems

CSE 153 Design of Operating Systems Winter 2019

CSE 542: Operating Systems

Review What are the advantages/disadvantages of pages versus segments?

Virtual Memory 1 1.

Presentation transcript:

Address Translation

Main Points Address Translation Concept Flexible Address Translation How do we convert a virtual address to a physical address? Flexible Address Translation Base and bound Segmentation Paging Multilevel translation Efficient Address Translation Translation Lookaside Buffers Virtually and Physically Addressed Caches

Address Translation Concept Two views of memory: view from the CPU -- what program sees, virtual memory view from memory -- physical memory Translation box converts between the two views. Translation helps implement protection because no way for program to even talk about other program's addresses; no way for them to touch operating system code or data.

Address Translation Goals Memory protection Memory sharing Flexible memory placement Sparse addresses Runtime lookup efficiency Compact translation tables Portability

Address Translation What can you do if you can (selectively) gain control whenever a program reads or writes a particular memory location? With hardware support With compiler-level support Memory management is one of the most complex parts of the OS Serves many different purposes

Address Translation Uses Process isolation Keep a process from touching anyone else’s memory, or the kernel’s Efficient interprocess communication Shared regions of memory between processes Shared code segments E.g., common libraries used by many different programs Program initialization Start running a program before it is entirely in memory Dynamic memory allocation Allocate and initialize stack/heap pages on demand

Address Translation (more) Cache management Page coloring Program debugging Data breakpoints when address is accessed Zero-copy I/O Directly from I/O device into/out of user memory Memory mapped files Access file data using load/store instructions Demand-paged virtual memory Illusion of near-infinite memory, backed by disk or memory on other machines

Address Translation (even more) Checkpointing/restart Transparently save a copy of a process, without stopping the program while the save happens Persistent data structures Implement data structures that can survive system reboots Process migration Transparently move processes between machines Information flow control Track what data is being shared externally Distributed shared memory Illusion of memory that is shared between machines

Virtual Base and Bounds

Virtual Base and Bounds Pros? Simple Fast (2 registers, adder, comparator) Can relocate in physical memory without changing process Cons? Can’t keep program from accidentally overwriting its own code Can’t share code/data with other processes Can’t grow stack/heap as needed Provides level of indirection: OS can move bits around behind the program's back, for instance, if program needs to grow beyond its bounds, or if need to coalesce fragments of memory. Stop program, copy bits, change base and bounds registers, restart. Because OS manages the translation, we control the vertical, we control the horizontal. Only the OS gets to change the base and bounds! Clearly, user program can't, or else lose protection. With base&bounds system, what gets saved/restored on a context switch? contents of base and bounds register; or maybe even contents of memory! Hardware cost: 2 registers adder, comparator Plus, slows down hardware because need to take time to do add/compare on every memory reference. In 60's, thought to be too expensive! But get better use of memory, and you get protection. These are really important. CPU speed (especially now) is the easy part.

Segmentation Segment is a contiguous region of memory Virtual or (for now) physical memory Each process has a segment table (in hardware) Entry in table = segment Segment can be located anywhere in physical memory Start Length Access permission Processes can share segments Same start, length, same/different access permissions

This should seem a bit strange: the virtual address space has gaps in it! Each segment gets mapped to contiguous locations in physical memory, but may be gaps between segments. This is a little like walking around in the dark, and there are huge pits in the ground where you die if you step in the pit. But! of course, a correct program will never step off into a pit, so ok. A correct program will never address gaps; if it does, trap to kernel and then core dump. Minor exception: stack, heap can grow. In UNIX, sbrk() increases size of heap segment. For stack, just take fault, system automatically increases size of stack. Detail: Protection mode in segmentation table entries. For example, code segment would be read-only (only execution and loads are allowed). Data and stack segment would be read-write (stores allowed). What must be saved/restored on context switch? Typically, segment table stored in CPU, not in memory, because it’s small. Contents must be saved/restored.

Segment start length code 0x4000 0x700 data 0x500 heap - stack 0x2000 0x500 heap - stack 0x2000 0x1000 2 bit segment # 12 bit offset Physical Memory Virtual Memory main: 240 store #1108, r2 244 store pc+8, r31 248 jump 360 24c … strlen: 360 loadbyte (r2), r3 420 jump (r31) x: 1108 a b c \0 x: 108 a b c \0 … main: 4240 store #1108, r2 4244 store pc+8, r31 4248 jump 360 424c strlen: 4360 loadbyte (r2),r3 4420 jump (r31) Initially, pc = 240. What happens first? Simulate machine, building up physical memory contents as we go. 1. fetch 240? virtual segment #? 0; offset ? 240. Physical address: 4240 from base = 4000 + offset = 240. Fetch value at 4240. Store address of x into r2 (first argument to procedure). 2. fetch 244? virtual segment #? 0; offset ? 244. Physical address: 4244. Fetch value at 4244. Instruction to Store PC + 8 into r31 (return value). 5. What is PC? (Note PC is untranslated!) 244 +8 = 24c Instruction to execute after return. 6. Fetch 248? Physical address 4248. Fetch instruction: jump 360. 7. Fetch 360? Physaddr 4360. Fetch instruction: load (r2) into r3. Contents of r2? 1108. 8. r2 is used as a pointer. Contents of r2 is a virtual address. Therefore, need to translate it! Seg # ? 1 offset ? 108. Physaddr: 108. Fetch value a. Do remainder of strlen. 9. On return, jump to (r31) -- this holds return PC in *virtual space* -- 24c

UNIX fork and Copy on Write Makes a complete copy of a process Segments allow a more efficient implementation Copy segment table into child Mark parent and child segments read-only Start child process; return to parent If child or parent writes to a segment, will trap into kernel make a copy of the segment and resume

Zero-on-Reference How much physical memory do we need to allocate for the stack or heap? Zero bytes! When program touches the heap Segmentation fault into OS kernel Kernel allocates some memory How much? Zeros the memory avoid accidentally leaking information! Restart process

Segmentation Pros? Cons? Can share code/data segments between processes Can protect code segment from being overwritten Can transparently grow stack/heap as needed Can detect if need to copy-on-write Cons? Complex memory management Need to find chunk of a particular size May need to rearrange memory from time to time to make room for new segment or growing segment External fragmentation: wasted space between chunks

Paged Translation Manage memory in fixed size units, or pages Finding a free page is easy Bitmap allocation: 0011111100000001100 Each bit represents one physical page frame Each process has its own page table Stored in physical memory Hardware registers pointer to page table start page table length Simpler, because allows use of a bitmap. What's a bitmap? 001111100000001100 Each bit represents one page of physical memory -- 1 means allocated, 0 means unallocated. Lots simpler than base&bounds or segmentation

Process View A Physical Memory B C D E 4 F 3 G H 1 I Page Table J K L Suppose page size is 4 bytes Where is virtual address 6? 9?

Paging Questions What must be saved/restored on a process context switch? Pointer to page table/size of page table Page table itself is in main memory What if page size is very small? What if page size is very large? Internal fragmentation: if we don’t need all of the space inside a fixed size chunk Means lots of space taken up with page table entries.

Paging and Copy on Write Can we share memory between processes? Set entries in both page tables to point to same page frames Need core map of page frames to track which processes are pointing to which page frames UNIX fork with copy on write at page granularity Copy page table entries to new process Mark all pages as read-only Trap into kernel on write (in child or parent) Copy page and resume execution

Paging and Fast Program Start Can I start running a program before its code is in physical memory? Set all page table entries to invalid When a page is referenced for first time Trap to OS kernel OS kernel brings in page Resumes execution Remaining pages can be transferred in the background while program is running On windows, the compiler reorganizes where procedures are in the executable, to put the initialization code in a few pages right at the beginning

Sparse Address Spaces Might want many separate segments Per-processor heaps Per-thread stacks Memory-mapped files Dynamically linked libraries What if virtual address space is sparse? On 32-bit UNIX, code starts at 0 Stack starts at 2^31 4KB pages => 500K page table entries 64-bits => 4 quadrillion page table entries Is there a solution that allows simple memory allocation, easy to share memory, and is efficient for sparse address spaces?

Multi-level Translation Tree of translation tables Paged segmentation Multi-level page tables Multi-level paged segmentation All 3: Fixed size page as lowest level unit Efficient memory allocation Efficient disk transfers Easier to build translation lookaside buffers Efficient reverse lookup (from physical -> virtual) Page granularity for protection/sharing

Paged Segmentation Process memory is segmented Segment table entry: Pointer to page table Page table length (# of pages in segment) Access permissions Page table entry: Page frame Share/protection at either page or segment-level

Multilevel Paging

x86 Multilevel Paged Segmentation Global Descriptor Table (segment table) Pointer to page table for each segment Segment length Segment access permissions Context switch: change global descriptor table register (GDTR, pointer to global descriptor table) Multilevel page table 4KB pages; each level of page table fits in one page Only fill page table if needed 32-bit: two level page table (per segment) 64-bit: four level page table (per segment)

Multilevel Translation Pros: Allocate/fill only as many page tables as used Simple memory allocation Share at segment or page level Cons: Space overhead: at least one pointer per virtual page Two or more lookups per memory reference

Portability Many operating systems keep their own memory translation data structures List of memory objects (segments) Virtual -> physical Physical -> virtual Simplifies porting from x86 to ARM, 32 bit to 64 bit Inverted page table Hash from virtual page -> physical page Space proportional to # of physical pages

Do we need multi-level page tables? Use inverted page table in hardware instead of multilevel tree IBM PowerPC Hash virtual page # to inverted page table bucket Location in IPT => physical page frame Pros/cons?

Efficient Address Translation Translation lookaside buffer (TLB) Cache of recent virtual page -> physical page translations If cache hit, use translation If cache miss, walk multi-level page table Cost of translation = Cost of TLB lookup + Prob(TLB miss) * cost of page table lookup

Software Loaded TLB Do we need a page table at all? Pros/cons? MIPS processor architecture If translation is in TLB, ok If translation is not in TLB, trap to kernel Kernel computes translation and loads TLB Kernel can use whatever data structures it wants Pros/cons?

When Do TLBs Work/Not Work?

When Do TLBs Work/Not Work? Video Frame Buffer: 32 bits x 1K x 1K = 4MB

Superpages TLB entry can be A page A superpage: a set of contiguous pages x86: superpage is set of pages in one page table x86 TLB entries 4KB 2MB 1GB

When Do TLBs Work/Not Work, part 2 What happens on a context switch? Reuse TLB? Discard TLB? Motivates hardware tagged TLB Each TLB entry has process ID TLB hit only if process ID matches current process

When Do TLBs Work/Not Work, part 3 What happens when the OS changes the permissions on a page? For demand paging, copy on write, zero on reference, … TLB may contain old translation OS must ask hardware to purge TLB entry On a multicore: TLB shootdown OS must ask each CPU to purge TLB entry

TLB Shootdown

Address Translation with TLB

Virtually Addressed Caches

Memory Hierarchy Anecdote: the system that I used for building my first OS, would fit in today’s 1st level cache i7 has 8MB as shared 3rd level cache; 2nd level cache is per-core

Hardware Design Principle The bigger the memory, the slower the memory

Virtually Addressed Caches

Memory Hierarchy i7 has 8MB as shared 3rd level cache; 2nd level cache is per-core

Translation on a Modern Processor Note that the TLB miss walks the page table in the physical cache, so you may not even need to go to physical memory on a TLB miss

Question What is the cost of a first level TLB miss? Second level TLB lookup What is the cost of a second level TLB miss? x86: 2-4 level page table walk How expensive is a 4-level page table walk on a modern processor?

Questions With a virtual cache, what do we need to do on a context switch? What if the virtual cache > page size? Page size: 4KB (x86) First level cache size: 64KB (i7) Cache block size: 32 bytes Need to invalidate the cache, or tag the cache with process ID’s If tagged, and the virtual cache is > page size, means that it is possible to have multiple cache blocks that have different contents

Aliasing Alias: two (or more) virtual cache entries that refer to the same physical memory What if we modify one alias and then context switch? Typical solution On a write, lookup virtual cache and TLB in parallel Physical address from TLB used to check for aliases

From 2009

Multicore and Hyperthreading Modern CPU has several functional units Instruction decode Arithmetic/branch Floating point Instruction/data cache TLB Multicore: replicate functional units (i7: 4) Share second/third level cache, second level TLB Hyperthreading: logical processors that share functional units (i7: 2) Better functional unit utilization during memory stalls No difference from the OS/programmer perspective Except for performance, affinity, …