1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.

Slides:



Advertisements
Similar presentations
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Advertisements

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014
Performance of Cache Memory
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
The Memory Hierarchy & Cache
Memory Hierarchy: The motivation
Review of Mem. HierarchyCSCE430/830 Review of Memory Hierarchy CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U.
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
Memory Chapter 7 Cache Memories.
Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
EENG449b/Savvides Lec /7/05 April 7, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
Memory: PerformanceCSCE430/830 Memory Hierarchy: Performance CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine)
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Storage HierarchyCS510 Computer ArchitectureLecture Lecture 12 Storage Hierarchy.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Ch5-1 Ping-Liang Lai ( 賴秉樑 ) Lecture 5 Review of Memory Hierarchy (Appendix C in textbook) Computer Architecture 計算機結構.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Computer Organization & Programming
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
December 18, Digital System Architecture Memory Hierarchy Design Pradondet Nilagupta Spring 2005 (original notes from Prof. Shaaban)
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
Lecture 15 Calculating and Improving Cache Perfomance
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Cache memory. Cache memory Overview CPU Cache Main memory Transfer of words Transfer of blocks of words.
Chapter 5 Memory II CSE 820. Michigan State University Computer Science and Engineering Equations CPU execution time = (CPU cycles + Memory-stall cycles)
CMSC 611: Advanced Computer Architecture
Soner Onder Michigan Technological University
CSC 4250 Computer Architectures
Cache Memory Presentation I
William Stallings Computer Organization and Architecture 7th Edition
Chapter 5 Memory CSE 820.
Systems Architecture II
Lecture 08: Memory Hierarchy Cache Performance
CPE 631 Lecture 05: Cache Design
ECE232: Hardware Organization and Design
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Review of Memory Hierarchy
CSC3050 – Computer Architecture
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Cache Memory Rabi Mahapatra
Lecture 7 Memory Hierarchy and Cache Design
Memory & Cache.
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

1 Memory Hierarchy Design Chapter 5

2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU Cache

3 Basic Cache Read Operation CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot

4 Elements of Cache Design Cache size Line (block) size Number of caches Mapping function –Block placement –Block identification Replacement Algorithm Write Policy

5 Cache Size Cache size << main memory size Small enough –Minimize cost –Speed up access (less gates to address the cache) –Keep cache on chip Large enough –Minimize average access time Optimum size depends on the workload Practical size?

6 Line Size Optimum size depends on workload Small blocks do not use locality of reference principle Larger blocks reduce the number of blocks –Replacement overhead Practical sizes? Cache Main Memory Tag

7 Number of Caches Increased logic density => on-chip cache –Internal cache: level 1 (L1) –External cache: level 2 (L2) Unified cache –Balances the load between instruction and data fetches –Only one cache needs to be designed / implemented Split caches (data and instruction) –Pipelined, parallel architectures

8 Mapping Function Cache lines << main memory blocks Direct mapping –Maps each block into only one possible line –(block address) MOD (number of lines) Fully associative –Block can be placed anywhere in the cache Set associative –Block can be placed in a restricted set of lines –(block address) MOD (number of sets in cache)

9 Cache Addressing Block address Block offset IndexTag Block offset – selects data object from the block Index – selects the block set Tag – used to detect a hit

10 Direct Mapping

11 Associative Mapping

12 K-Way Set Associative Mapping

FIGURE 5.13 The location of a memory block whose address is 12 in a cache with eight blocks varies for directmapped, set- associative, and fully associative placement. In direct-mapped placement, there is only one cache block where memory block 12 can be found, and that block is given by (12 modulo 8) = 4. In a two-way set-associative cache, there would be four sets, and memory block 12 must be in set (12 mod 4) = 0; the memory block could be in either element of the set. In a fully associative placement, the memory block for block address 12 can appear in any of the eight cache blocks. Copyright © 2009 Elsevier, Inc. All rights reserved.

14 Write Policy Write is more complex than read –Write and tag comparison can not proceed simultaneously –Only a portion of the line has to be updated Write policies –Write through – write to the cache and memory –Write back – write only to the cache (dirty bit) Write miss: –Write allocate – load block on a write miss –No-write allocate – update directly in memory

15 Cache Performance Measures Hit rate: fraction found in that level –So high that usually talk about Miss rate –Miss rate fallacy: as MIPS to CPU performance, Average memory-access time = Hit time + Miss rate x Miss penalty (ns) Miss penalty: time to replace a block from lower level, including time to replace in CPU –access time to lower level = f(latency to lower level) –transfer time : time to transfer block =f(bandwidth)

16 Cache Performance Improvements Average memory-access time = Hit time + Miss rate x Miss penalty Cache optimizations –Reducing the miss rate –Reducing the miss penalty –Reducing the hit time

17 Reducing Cache Misses Causes of Misses: 3 Cs –Compulsory( Cold Start or First reference) The very first access to a block cannot be in the cache, so the first block must be brought into the cache. –Capacity If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur because of blocks being discarded and later retrieved –Conflict( Collision or Interference) If the block placement strategy is set associative or direct mapped, conflict misses will occur because a block can be discarded and later retrieved if too many blocks map to its set.

18 Example Which has the lower average memory access time: A 16-KB instruction cache with a 16-KB data cache or A 32-KB unified cache Hit time = 1 cycle Miss penalty = 50 cycles Load/store hit = 2 cycles on a unified cache Given: 75% of memory accesses are instruction references. Overall miss rate for split caches = 0.75*0.64% *6.47% = 2.10% Miss rate for unified cache = 1.99% Average memory access times: Split = 0.75 * ( * 50) * ( * 50) = 2.05 Unified = 0.75 * ( * 50) * ( * 50) = 2.24

19 Cache Performance Equations CPU time = (CPU execution cycles + Mem stall cycles) * Cycle time Mem stall cycles = Mem accesses * Miss rate * Miss penalty CPU time = IC * (CPI execution + Mem accesses per instr * Miss rate * Miss penalty) * Cycle time Misses per instr = Mem accesses per instr * Miss rate CPU time = IC * (CPI execution + Misses per instr * Miss penalty) * Cycle time