Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2006.

Slides:



Advertisements
Similar presentations
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
Advertisements

Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 12 Reduce Miss Penalty and Hit Time
EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
OS Fall’02 Virtual Memory Operating Systems Fall 2002.
4/14/2017 Discussed Earlier segmentation - the process address space is divided into logical pieces called segments. The following are the example of types.
Virtual Memory Chapter 18 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
CS 153 Design of Operating Systems Spring 2015
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Memory Management (II)
Translation Look-Aside Buffers TLBs usually small, typically entries Like any other cache, the TLB can be fully associative, set associative,
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
Chapter 3.2 : Virtual Memory
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
ENGS 116 Lecture 131 Caches and Virtual Memory Vincent H. Berk October 31 st, 2008 Reading for Today: Sections C.1 – C.3 (Jouppi article) Reading for Monday:
Mem. Hier. CSE 471 Aut 011 Evolution in Memory Management Techniques In early days, single program run on the whole machine –Used all the memory available.
Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2008.
Lecture 19: Virtual Memory
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
Lecture 9: Memory Hierarchy Virtual Memory Kai Bu
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 4, 2002 Topic: 1. Caches (contd.); 2. Virtual Memory.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Lecture 12: Memory Hierarchy— Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.
MBG 1 CIS501, Fall 99 Lecture 11: Memory Hierarchy: Caches, Main Memory, & Virtual Memory Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
CS203 – Advanced Computer Architecture Virtual Memory.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
CMSC 611: Advanced Computer Architecture
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
From Address Translation to Demand Paging
From Address Translation to Demand Paging
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
CSE 153 Design of Operating Systems Winter 2018
Chapter 8: Main Memory.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Evolution in Memory Management Techniques
CMSC 611: Advanced Computer Architecture
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSE 451: Operating Systems Autumn 2005 Memory Management
Virtual Memory Overcoming main memory size limitation
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Summary 3 Cs: Compulsory, Capacity, Conflict Misses Reducing Miss Rate
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Paging and Segmentation
CSE 451: Operating Systems Lecture 10 Paging & TLBs
CS703 - Advanced Operating Systems
Main Memory Background
CSE 153 Design of Operating Systems Winter 2019
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Virtual Memory 1 1.
Presentation transcript:

Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2006

2 © Alvin R. Lebeck 2001 CPS 220 Outline Finish Main Memory Address Translation –basics –64-bit Address Space Managing memory OS Performance Throughout Review Computer Architecture Interaction with Architectural Decisions

3 © Alvin R. Lebeck 2001 CPS 220 Fast Memory Systems: DRAM specific Multiple RAS accesses: several names (page mode) –64 Mbit DRAM: cycle time = 100 ns, page mode = 20 ns New DRAMs to address gap; what will they cost, will they survive? –Synchronous DRAM: Provide a clock signal to DRAM, transfer synchronous to system clock –RAMBUS: reinvent DRAM interface (Intel will use it) »Each Chip a module vs. slice of memory »Short bus between CPU and chips »Does own refresh »Variable amount of data returned »1 byte / 2 ns (500 MB/s per chip) –Cached DRAM (CDRAM): Keep entire row in SRAM

4 © Alvin R. Lebeck 2001 CPS 220 Main Memory Summary Big DRAM + Small SRAM = Cost Effective –Cray C-90 uses all SRAM (how many sold?) Wider Memory Interleaved Memory: for sequential or independent accesses Avoiding bank conflicts: SW & HW DRAM specific optimizations: page mode & Specialty DRAM, CDRAM –Niche memory or main memory? »e.g., Video RAM for frame buffers, DRAM + fast serial output IRAM: Do you know what it is?

5 © Alvin R. Lebeck 2001 CPS 220 Review: Reducing Miss Penalty Summary Five techniques –Read priority over write on miss –Subblock placement –Early Restart and Critical Word First on miss –Non-blocking Caches (Hit Under Miss) –Second Level Cache Can be applied recursively to Multilevel Caches –Danger is that time to DRAM will grow with multiple levels in between

6 © Alvin R. Lebeck 2001 CPS 220 Review: Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the cache

7 © Alvin R. Lebeck 2001 CPS 220 Review: Cache Optimization Summary TechniqueMRMPHTComplexity Larger Block Size+–0 Higher Associativity+–1 Victim Caches+2 Pseudo-Associative Caches +2 HW Prefetching of Instr/Data+2 Compiler Controlled Prefetching+3 Compiler Reduce Misses+0 Priority to Read Misses+1 Subblock Placement ++1 Early Restart & Critical Word 1st +2 Non-Blocking Caches+3 Second Level Caches+2 Small & Simple Caches–+0 Avoiding Address Translation+2 Pipelining Writes+1

8 © Alvin R. Lebeck 2001 CPS 220 I/O Bus Core Chip Set Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface Graphics Network interrupts System Organization

9 © Alvin R. Lebeck 2001 CPS 220 Computer Architecture Interface Between Hardware and Software Hardware Software Operating System Compiler Applications CPUMemoryI/O MultiprocessorNetworks This is IT

10 © Alvin R. Lebeck 2001 CPS 220 Memory Hierarchy 101 P $ Memory Very fast <1ns clock Multiple Instructions per cycle SRAM, Fast, Small Expensive DRAM, Slow, Big,Cheap (called physical or main) => Cost Effective Memory System (Price/Performance) Magnetic, Really Slow, Really Big, Really Cheap

11 © Alvin R. Lebeck 2001 CPS 220 Virtual Memory: Motivation Process = Address Space + thread(s) of control Address space = PA –programmer controls movement from disk –protection? –relocation? Linear Address space –larger than physical address space »32, 64 bits v.s. 28-bit physical (256MB) Automatic management Virtual Physical

12 © Alvin R. Lebeck 2001 CPS 220 Virtual Memory Process = virtual address space + thread(s) of control Translation –VA -> PA –What physical address does virtual address A map to –Is VA in physical memory? Protection (access control) –Do you have permission to access it?

13 © Alvin R. Lebeck 2001 CPS 220 Virtual Memory: Questions How is data found if it is in physical memory? Where can data be placed in physical memory? Fully Associative, Set Associative, Direct Mapped What data should be replaced on a miss? (Take Compsci 210 …)

14 © Alvin R. Lebeck 2001 CPS 220 Segmented Virtual Memory Virtual address (2 32, 2 64 ) to Physical Address mapping (2 30 ) Variable size, base + offset, contiguous in both VA and PA Virtual Physical 0x1000 0x6000 0x9000 0x0000 0x1000 0x2000 0x11000

15 © Alvin R. Lebeck 2001 CPS 220 Intel Pentium Segmentation Seg Selector Offset Logical Address Segment Descriptor Global Descriptor Table (GDT) Segment Base Address Physical Address Space

16 © Alvin R. Lebeck 2001 CPS 220 Pentium Segmention (Continued) Segment Descriptors –Local and Global –base, limit, access rights –Can define many Segment Registers –contain segment descriptors (faster than load from mem) –Only 6 Must load segment register with a valid entry before segment can be accessed – generally managed by compiler, linker, not programmer

17 © Alvin R. Lebeck 2001 CPS 220 Paged Virtual Memory Virtual address (2 32, 2 64 ) to Physical Address mapping (2 28 ) –virtual page to physical page frame Fixed Size units for access control & translation Virtual Physical 0x1000 0x6000 0x9000 0x0000 0x1000 0x2000 0x11000 Virtual page number Offset

18 © Alvin R. Lebeck 2001 CPS 220 Page Table Kernel data structure (per process) Page Table Entry (PTE) –VA -> PA translations (if none page fault) –access rights (Read, Write, Execute, User/Kernel, cached/uncached) –reference, dirty bits Many designs –Linear, Forward mapped, Inverted, Hashed, Clustered Design Issues –support for aliasing (multiple VA to single PA) –large virtual address space –time to obtain translation

19 © Alvin R. Lebeck 2001 CPS 220 Alpha VM Mapping (Forward Mapped) “64-bit” address divided into 3 segments –seg0 (bit 63=0) user code/heap –seg1 (bit 63 = 1, 62 = 1) user stack –kseg (bit 63 = 1, 62 = 0) kernel segment for OS Three level page table, each one page –Alpha only 43 unique bits of VA –(future min page size up to 64KB => 55 bits of VA) PTE bits; valid, kernel & user read & write enable (No reference, use, or dirty bit) –What do you do for replacement? POL3L2L1 base phys page frame number seg 0/1

20 © Alvin R. Lebeck 2001 CPS 220 Inverted Page Table (HP, IBM) One PTE per page frame –only one VA per physical frame Must search for virtual address More difficult to support aliasing Force all sharing to use the same VA Virtual page numberOffset VA PA,ST Hash Anchor Table (HAT) Inverted Page Table (IPT) Hash

21 © Alvin R. Lebeck 2001 CPS 220 Intel Pentium Segmentation + Paging Seg Selector Offset Logical Address Segment Descriptor Global Descriptor Table (GDT) Segment Base Address Linear Address Space Page Dir Physical Address Space DirOffsetTable Page Table

22 © Alvin R. Lebeck 2001 CPS 220 The Memory Management Unit (MMU) Input –virtual address Output –physical address –access violation (exception, interrupts the processor) Access Violations –not present –user v.s. kernel –write –read –execute

23 © Alvin R. Lebeck 2001 CPS 220 Translation Lookaside Buffers (TLB) Need to perform address translation on every memory reference –30% of instructions are memory references –4-way superscalar processor –at least one memory reference per cycle Make Common Case Fast, others correct Throw HW at the problem Cache PTEs

24 © Alvin R. Lebeck 2001 CPS 220 Fast Translation: Translation Buffer Cache of translated addresses Alpha TLB: 48 entry fully associative Page Number Page offset... vrwtag phys frame... 48:1 mux

25 © Alvin R. Lebeck 2001 CPS 220 TLB Design Must be fast, not increase critical path Must achieve high hit ratio Generally small highly associative Mapping change –page removed from physical memory –processor must invalidate the TLB entry PTE is per process entity –Multiple processes with same virtual addresses –Context Switches? Flush TLB Add ASID (PID) –part of processor state, must be set on context switch

26 © Alvin R. Lebeck 2001 CPS 220 Hardware Managed TLBs Hardware Handles TLB miss Dictates page table organization Compilicated state machine to “walk page table” –Multiple levels for forward mapped –Linked list for inverted Exception only if access violation Control Memory TLB CPU

27 © Alvin R. Lebeck 2001 CPS 220 Software Managed TLBs Software Handles TLB miss Flexible page table organization Simple Hardware to detect Hit or Miss Exception if TLB miss or access violation Should you check for access violation on TLB miss? Control Memory TLB CPU

28 © Alvin R. Lebeck 2001 CPS 220 Kernel Mapping the Kernel Digital Unix Kseg –kseg (bit 63 = 1, 62 = 0) Kernel has direct access to physical memory One VA->PA mapping for entire Kernel Lock (pin) TLB entry –or special HW detection User Stack Kernel User Code/ Data Physical Memory

29 © Alvin R. Lebeck 2001 CPS 220 Considerations for Address Translation Large virtual address space Can map more things –files –frame buffers –network interfaces –memory from another workstation Sparse use of address space Page Table Design –space –less locality => TLB misses OS structure microkernel => more TLB misses

30 © Alvin R. Lebeck 2001 CPS 220 Address Translation for Large Address Spaces Forward Mapped Page Table –grows with virtual address space »worst case 100% overhead not likely –TLB miss time: memory reference for each level Inverted Page Table –grows with physical address space »independent of virtual address space usage –TLB miss time: memory reference to HAT, IPT, list search

31 © Alvin R. Lebeck 2001 CPS 220 Hashed Page Table (HP) Combine Hash Table and IPT [Huck96] –can have more entries than physical page frames Must search for virtual address Easier to support aliasing than IPT Space –grows with physical space TLB miss –one less memory ref than IPT Virtual page numberOffset VA PA,ST Hashed Page Table (HPT) Hash

32 © Alvin R. Lebeck 2001 CPS 220 Clustered Page Table (SUN) Combine benefits of HPT and Linear [Talluri95] Store one base VPN (TAG) and several PPN values –virtual page block number (VPBN) –block offset VPBNOffset VPBN next PA0 attrib Hash Boff VPBN next PA0 attrib... PA1 attrib PA2 attrib PA3 attrib VPBN next PA0 attrib VPBN next PA0 attrib

33 © Alvin R. Lebeck 2001 CPS 220 Reducing TLB Miss Handling Time Problem –must walk Page Table on TLB miss –usually incur cache misses –big problem for IPC in microkernels Solution –build a small second-level cache in SW –on TLB miss, first check SW cache »use simple shift and mask index to hash table

34 © Alvin R. Lebeck 2001 CPS 220 Cache Indexing Tag on each block –No need to check index or block offset Increasing associativity shrinks index, expands tag Fully Associative: No index Direct-Mapped: Large index Block offset Block Address TAGIndex

35 © Alvin R. Lebeck 2001 CPS 220 Address Translation and Caches Where is the TLB wrt the cache? What are the consequences? Most of today’s systems have more than 1 cache –Digital has 3 levels –2 levels on chip (8KB-data,8KB-inst,96KB-unified) –one level off chip (2-4MB) Does the OS need to worry about this? Definition: page coloring = careful selection of va->pa mapping

36 © Alvin R. Lebeck 2001 CPS 220 TLBs and Caches CPU TLB $ MEM VA PA Conventional Organization CPU $ TLB MEM VA PA Virtually Addressed Cache Translate only on miss Alias (Synonym) Problem CPU $TLB MEM VA PA Tags PA Overlap $ access with VA translation: requires $ index to remain invariant across translation VA Tags L2 $

37 © Alvin R. Lebeck 2001 CPS 220 Virtual Caches Send virtual address to cache. Called Virtually Addressed Cache or just Virtual Cache vs. Physical Cache or Real Cache Avoid address translation before accessing cache –faster hit time to cache Context Switches? –Just like the TLB (flush or pid) –Cost is time to flush + “compulsory” misses from empty cache –Add process identifier tag that identifies process as well as address within process: can’t get a hit if wrong process I/O must interact with cache

38 © Alvin R. Lebeck 2001 CPS 220 I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface Graphics Network interrupts I/O and Virtual Caches I/O Bridge Virtual Cache Physical Addresses I/O is accomplished with physical addresses DMA flush pages from cache need pa->va reverse translation coherent DMA

39 © Alvin R. Lebeck 2001 CPS 220 Aliases and Virtual Caches aliases (sometimes called synonyms); Two different virtual addresses map to same physical address But, but... the virtual address is used to index the cache Could have data in two different locations in the cache Kernel User Stack Kernel User Code/ Data Physical Memory

40 © Alvin R. Lebeck 2001 CPS 220 If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag Limits cache to page size: what if want bigger caches and use same trick? –Higher associativity –Page coloring Index with Physical Portion of Address Page Address Page Offset Address Tag Index Block Offset

41 © Alvin R. Lebeck 2001 CPS 220 Page Coloring for Aliases HW that guarantees that every cache frame holds unique physical address OS guarantee: lower n bits of virtual & physical page numbers must have same value; if direct-mapped, then aliases map to same cache frame –one form of page coloring Page Address Page Offset Address Tag Index Block Offset

42 © Alvin R. Lebeck 2001 CPS 220 Page Coloring to reduce misses Notion of bin –region of cache that may contain cache blocks from a page Random vs careful mapping Selection of physical page frame dictates cache index Overall goal is to minimize cache misses CachePage frames

43 © Alvin R. Lebeck 2001 CPS 220 Careful Page Mapping [Kessler92, Bershad94] Select a page frame such that cache conflict misses are reduced –only choose from available pages (no VM replacement induced) static –“smart” selection of page frame at page fault time dynamic –move pages around

44 © Alvin R. Lebeck 2001 CPS 220 A Case for Large Pages Page table size is inversely proportional to the page size –memory saved Fast cache hit time easy when cache <= page size (VA caches); –bigger page makes it feasible as cache size grows Transferring larger pages to or from secondary storage, possibly over a network, is more efficient Number of TLB entries are restricted by clock cycle time, –larger page size maps more memory –reduces TLB misses

45 © Alvin R. Lebeck 2001 CPS 220 A Case for Small Pages Fragmentation –large pages can waste storage –data must be contiguous within page Quicker process start for small processes(??)

46 © Alvin R. Lebeck 2001 CPS 220 Superpages Hybrid solution: multiple page sizes –8KB, 16KB, 32KB, 64KB pages –4KB, 64KB, 256KB, 1MB, 4MB, 16MB pages Need to identify candidate superpages –Kernel –Frame buffers –Database buffer pools Application/compiler hints Detecting superpages –static, at page fault time –dynamically create superpages Page Table & TLB modifications

47 © Alvin R. Lebeck 2001 CPS 220 Page Coloring Make physical index match virtual index Behaves like virtual index cache –no conflicts for sequential pages Possibly many conflicts between processes –address spaces all have same structure (stack, code, heap) –modify to xor PID with address (MIPS used variant of this) Simple implementation Pick abitrary page if necessary

48 © Alvin R. Lebeck 2001 CPS 220 Bin Hopping Allocate sequentially mapped pages (time) to sequential bins (space) Can exploit temporal locality –pages mapped close in time will be accessed close in time Search from last allocated bin until bin with available page frame Separate search list per process Simple implementation

49 © Alvin R. Lebeck 2001 CPS 220 Best Bin Keep track of two counters per bin –used: # of pages allocated to this bin for this address space –free: # of available pages in the system for this bin Bin selection is based on low values of used and high values of free Low used value –reduce conflicts within the address space High free value –reduce conflicts between address spaces

50 © Alvin R. Lebeck 2001 CPS 220 Hierarchical Best bin could be linear in # of bins Build a tree –internal nodes contain sum of child values Independent of cache size –simply stop at a particular level in the tree

51 © Alvin R. Lebeck 2001 CPS 220 Benefit of Static Page Coloring Reduces cache misses by 10% to 20% Multiprogramming –want to distribute mapping to avoid inter-address space conflicts

52 © Alvin R. Lebeck 2001 CPS 220 Dynamic Page Coloring Cache Miss Lookaside (CML) buffer [Bershad94] –proposed hardware device Monitor # of misses per page If # of misses >> # of cache blocks in page –must be conflict misses –interrupt processor –move a page (recolor) Cost of moving page << benefit