Memory Hierarchy and Cache Design (3). Reducing Cache Miss Penalty 1. Giving priority to read misses over writes 2. Sub-block placement for reduced miss.

Slides:

Advertisements

Similar presentations

Lecture 12 Reduce Miss Penalty and Hit Time

Advertisements

Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.

CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )

Replicated Block Cache... block_id d e c o d e r N=2 n direct mapped cache FAi1i2i b word lines Final Collapse Fetch Buffer c o p y - 2 c o p y - 3 c o.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

CS252/Culler Lec 4.1 1/31/02 CS203A Graduate Computer Architecture Lecture 14 Cache Design Taken from Prof. David Culler’s notes.

The Memory Hierarchy II CPSC 321 Andreas Klappenecker.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)

331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 30, 2002 Topic: Caches (contd.)

Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.

1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)

Lecture 41: Review Session #3 Reminders –Office hours during final week TA as usual (Tuesday & Thursday 12:50pm-2:50pm) Hassan: Wednesday 1pm to 4pm or.

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

Lecture 31: Chapter 5 Today’s topic –Direct mapped cache Reminder –HW8 due 11/21/

Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.

Memory Hierarchy— Reducing Miss Penalty Reducing Hit Time Main Memory Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.

B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.

Lecture 12: Memory Hierarchy— Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 2001.

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

Lecture 08: Memory Hierarchy Cache Performance Kai Bu

Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.

Computer Organization CS224 Fall 2012 Lessons 45 & 46.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)

Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.

M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.

Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.

MBG 1 CIS501, Fall 99 Lecture 11: Memory Hierarchy: Caches, Main Memory, & Virtual Memory Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.

CS6290 Caches. Locality and Caches Data Locality –Temporal: if data item needed now, it is likely to be needed again in near future –Spatial: if data.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,

Lecture 20 Last lecture: Today’s lecture: Types of memory

1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.

For each of these, where could the data be and how would we find it? TLB hit – cache or physical memory TLB miss – cache, memory, or disk Virtual memory.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

Memory Hierarchy and Cache Design (4). Reducing Hit Time 1. Small and Simple Caches 2. Avoiding Address Translation During Indexing of the Cache –Using.

Memory Design Principles Principle of locality dominates design Smaller = faster Hierarchy goal: total memory system almost as cheap as the cheapest component,

Memory Hierarchy— Five Ways to Reduce Miss Penalty.

Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 29 Memory Hierarchy Design Cache Performance Enhancement by: Reducing Cache.

Chapter 5 Memory Hierarchy Design. 2 Many Levels in Memory Hierarchy Pipeline registers Register file 1st-level cache (on-chip) 2nd-level cache (on same.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

CSC 4250 Computer Architectures

CS 704 Advanced Computer Architecture

5.2 Eleven Advanced Optimizations of Cache Performance

ECE 445 – Computer Organization

Lecture 21: Memory Hierarchy

Lecture 22: Cache Hierarchies, Memory

Direct Mapping.

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Summary 3 Cs: Compulsory, Capacity, Conflict Misses Reducing Miss Rate

Caches: reducing miss penalty Prof. Eric Rotenberg

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Cache - Optimization.

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Memory Hierarchy and Cache Design (3)

Reducing Cache Miss Penalty 1. Giving priority to read misses over writes 2. Sub-block placement for reduced miss penalty 3. Early restart and critical work first 4. Nonblocking caches to reduce stalls on cache misses 5. Second-level caches

Giving priority to read misses over writes Give priority to reads due to read misses over writes from the write buffer in accessing main memory Problem - example SW 512(R0), R3 ; M[512] <- R3 (cache index 0) LW R1, 1024(R0) ; R1 <- M[1024](cache index 0) LW R2, 512(R0) ; R2 <- M[512] (cache index 0) Solution: (1) Wait until the write buffer becomes empty (2) Check the addresses of the words in the write buffer

Sub-block placement for reduced miss penalty Write-through cache direct-mapped cache minimum unit for a write  sub-block size Cases to consider 1. tag match and valid bit already set 2. tag match and valid bit not set 3. tag mismatch Tag is associated with block consisting of a number of sub-blocks, each of which has a valid bit - reduced tag storage & miss penalty Can also be used to make writes faster if

Early restart and critical word first Early restart Critical word first Processing performed background Requested word: word 3 1 -> 2 -> 3 -> 4 3 -> 4 -> 1 -> 2

Nonblocking caches to reduce stalls on cache misses Nonblocking cache - does not block on a miss Possibility –Hit under miss (requires at least out-of-order completion capability) –Hit under multiple misses (requires in addition a memory system that can service multiple misses simultaneously)

Nonblocking caches to reduce stalls on cache misses 8-KB direct-mapped 32-bytes blocks 16-clock-cycle miss penalty

Second-level caches Instruction cache Second-level cache Data cache DRAM memory Processor L1 cache L2 cache L1 (first-level) cache: Optimized for fast hit time L2 (second-level) cache: Optimized for high hit rate Important concern: Inclusion property

Second-level caches With 32-KB L1 cache Average memory access time = Hit time (L1) + Miss rate (L1) x (Hit time (L2) + Miss rate (L2) x Miss penalty (L2))

Second-level caches With 32-KB L1 write-back cache 1.00 = 4096-KB L2 cache one-clock-cycle L2 hit

Second-level caches 512-KB L2 cache With 32-KB L1 write-back cache 1.00 = 4096-KB L2 cache one-clock-cycle L2 hit

Summary Techniques for reducing cache miss penalty – Giving priority to read misses over writes – Sub-block placement for reduced miss penalty – Early restart and critical work first – Nonblocking caches to reduce stalls on cache misses – Second-level caches