CMSC 611: Advanced Computer Architecture

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Lecture 12 Reduce Miss Penalty and Hit Time
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley and Rabi Mahapatra & Hank Walker.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
Lecture 19: Virtual Memory
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Lecture 9: Memory Hierarchy Virtual Memory Kai Bu
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
MBG 1 CIS501, Fall 99 Lecture 11: Memory Hierarchy: Caches, Main Memory, & Virtual Memory Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
CS203 – Advanced Computer Architecture Virtual Memory.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
Administration Midterm on Thursday Oct 28. Covers material through 10/21. Histogram of grades for HW#1 posted on newsgroup. Sample problem set (and solutions)
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
CS 704 Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
CS352H: Computer Systems Architecture
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Cache Memory Presentation I
Morgan Kaufmann Publishers
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CMSC 611: Advanced Computer Architecture
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Translation Buffers (TLB’s)
Virtual Memory Overcoming main memory size limitation
Summary 3 Cs: Compulsory, Capacity, Conflict Misses Reducing Miss Rate
Translation Buffers (TLB’s)
CSC3050 – Computer Architecture
Computer Architecture
Chapter Five Large and Fast: Exploiting Memory Hierarchy
CS703 - Advanced Operating Systems
Main Memory Background
Translation Buffers (TLBs)
Virtual Memory.
Review What are the advantages/disadvantages of pages versus segments?
Presentation transcript:

CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / © 2003 Elsevier Science

Memory Hierarchy Upper Level Capacity Access Time faster Registers Staging Transfer Unit faster CPU Registers 100s Bytes <10s ns Registers Instr. Operands Prog./compiler 1-8 bytes Cache K-M Bytes 10-40 ns Cache cache cntl 8-128 bytes Blocks Main Memory G Bytes 70ns-1us Main Memory OS 512-4K bytes Pages Disk G-T Bytes ms Disk Larger Lower Level

Main Memory Background Performance of Main Memory: Latency: affects cache miss penalty Access Time: time between request and word arrives Cycle Time: time between requests Bandwidth: primary concern for I/O & large block Main Memory is DRAM: Dynamic RAM Dynamic since needs to be refreshed periodically Addresses divided into 2 halves (Row/Column) Cache uses SRAM: Static RAM No refresh 6 transistors/bit vs. 1 transistor/bit, 10X area Address not divided: Full address

DRAM Logical Organization 4 Mbit DRAM: square root of bits per RAS/CAS Refreshing prevent access to the DRAM (typically 1-5% of the time) Reading one byte refreshes the entire row Read is destructive and thus data need to be re-written after reading Cycle time is significantly larger than access time

Processor-Memory Performance 60%/yr. (2X/1.5yr) 1000 CPU CPU-DRAM Gap“Moore’s Law” Processor-Memory Performance Gap: (grows 50% / year) 100 Performance 10 DRAM 9%/yr. (2X/10 yrs) DRAM 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time Problem: Improvements in access time are not enough to catch up Solution: Increase the bandwidth of main memory (improve throughput)

Memory Organization Simple: CPU, Cache, Bus, Memory same width (32 bits) Wide: CPU/Mux 1 word; Mux/Cache, Bus, Memory N words Interleaved: CPU, Cache, Bus 1 word: Memory N Modules (4 Modules); example is word interleaved Memory organization has a significant effect on bandwidth

Memory Interleaving Access Pattern without Interleaving: Access Pattern with 4-way Interleaving: CPU Memory D1 available Start Access for D1 Start Access for D2 Memory Bank 0 Memory Bank 1 CPU Memory Bank 2 Memory Bank 3 Access Bank 0 Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank 0 again

Virtual Memory Using virtual addressing, main memory plays the role of cache for disks The virtual space is much larger than the physical memory space Physical main memory contains only the active portion of the virtual space Address space can be divided into fixed size (pages) or variable size (segments) blocks Cache Virtual memory Block  Page Cache miss  page fault Block  Address addressing translation

Virtual Memory Advantages Allows efficient and safe data sharing of memory among multiple programs Moves programming burdens of a small, limited amount of main memory Simplifies program loading and avoid the need for contiguous memory block allows programs to be loaded at any physical memory location Cache Virtual memory Block  Page Cache miss  page fault Block  Address addressing translation

Virtual Addressing Page faults are costly and take millions of cycles to process (disks are slow) Optimization Strategies: Pages should be large enough to amortize the access time Fully associative placement of pages reduces page fault rate Software-based so can use clever page placement Write-through can make writing very time consuming (use copy back)

Translation Look-aside Buffer Special cache for recently used translation TLB misses are typically handled as exceptions by operating system Simple replacement strategy since TLB misses happen frequently

Virtually Addressed Caches VA: Virtual address TLB: Translation buffer PA: Physical address CPU CPU CPU VA VA VA VA Tags TLB L1 VA Tags L1 TLB PA VA PA L1 TLB L2 PA PA Lower Lower Lower Conventional Organization Virtually Addressed Cache Translate only on miss Overlap L1 access with VA translation

Indexing via Physical Addresses If index is physical part of address, can start tag access in parallel with translation To get the best of the physical and virtual caches, use the page offset (not affected by the address translation) to index the cache The drawback is that direct-mapped caches cannot be bigger than the page size (typically 4-KB) To support bigger caches and use same technique: Use higher associativity since the tag size gets smaller OS implements page coloring since it will fix a few least significant bits in the address (move part of the index to the tag)

Aliases Every time process is switched logically must flush the cache; otherwise get false hits Cost is time to flush + “compulsory” misses from empty cache Dealing with aliases (sometimes called synonyms) Two different virtual addresses map to same physical address causing unnecessary read misses or even RAW I/O must interact with cache, so need virtual address

Solutions Solution to aliases Solution to cache flush HW guarantees that every cache block has unique physical address (simply check all cache entries) SW guarantee: lower n bits must have same address so that it overlaps with index; as long as covers index field & direct mapped, they must be unique; called page coloring Solution to cache flush Add process identifier tag that identifies process as well as address within process: cannot get a hit if wrong process

Impact of Using Process ID Miss rate vs. virtually addressed cache size of a program measured three ways: Without process switches (uniprocessor) With process switches using a PID tag (PID) With process switches but without PID (purge)

TLB and Cache in MIPS Fully associative TLB Address translation and block identification Direct-mapped Cache

TLB and Cache in MIPS A cache hit can only occur after TLB hit (TLB miss & No Page fault  load page address to TLB) Write-through cache

Memory Related Exceptions Possible exceptions: Cache miss: referenced block not in cache and needs to be fetched from main memory TLB miss: referenced page of virtual address needs to be checked in the page table Page fault: referenced page is not in main memory and needs to be copied from disk

Memory Protection Want to prevent a process from corrupting memory space of other processes Privileged and non-privileged execution Implementation can map independent virtual pages to separate physical pages Write protection bits in the page table for authentication Sharing pages through mapping virtual pages of different processes to same physical pages

Memory Protection To enable the operating system to implement protection, the hardware must provide at least the following capabilities: Support at least two mode of operations, one of them is a user mode Provide a portion of CPU state that a user process can read but not write, e.g. page pointer and TLB Enable change of operation modes through special instructions