Cache Memories September 30, 2008 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance.

Slides:



Advertisements
Similar presentations
CS 105 Tour of the Black Holes of Computing
Advertisements

Cache Performance 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
Today Memory hierarchy, caches, locality Cache organization
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Cache Memories May 5, 2008 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance EECS213.
Cache Memories February 24, 2004 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance class13.ppt.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
CPSC 312 Cache Memories Slides Source: Bryant Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on.
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
Caches Oct. 22, 1998 Topics Memory Hierarchy
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance CS213.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Systems I Locality and Caching
ECE Dept., University of Toronto
Lecture 19: Virtual Memory
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
University of Washington Memory and Caches I The Hardware/Software Interface CSE351 Winter 2013.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
1 Cache Memories Andrew Case Slides adapted from Jinyang Li, Randy Bryant and Dave O’Hallaron.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
– 1 – , F’02 Caching in a Memory Hierarchy Larger, slower, cheaper storage device at level k+1 is partitioned into blocks.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
1 Seoul National University Cache Memories. 2 Seoul National University Cache Memories Cache memory organization and operation Performance impact of caches.
Memory hierarchy. – 2 – Memory Operating system and CPU memory management unit gives each process the “illusion” of a uniform, dedicated memory space.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
1 Memory Hierarchy ( Ⅲ ). 2 Outline The memory hierarchy Cache memories Suggested Reading: 6.3, 6.4.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Cache Memories February 28, 2002 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Reading:
Systems I Cache Organization
Cache Memories Topics Generic cache-memory organization Direct-mapped caches Set-associative caches Impact of caches on performance CS 105 Tour of the.
1 Cache Memory. 2 Outline General concepts 3 ways to organize cache memory Issues with writes Write cache friendly codes Cache mountain Suggested Reading:
Carnegie Mellon Introduction to Computer Systems /18-243, spring th Lecture, Feb. 19 th Instructors: Gregory Kesden and Markus Püschel.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Cache Memories Topics Generic cache-memory organization Direct-mapped caches Set-associative caches Impact of caches on performance CS 105 Tour of the.
University of Washington Today Midterm topics end here. HW 2 is due Wednesday: great midterm review. Lab 3 is posted. Due next Wednesday (July 31)  Time.
Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance cache.ppt CS 105 Tour.
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
Memory Hierarchy Computer Organization and Assembly Languages Yung-Yu Chuang 2007/01/08 with slides by CMU
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Cache Organization 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Cache Memories February 26, 2008 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance The.
Vassar College 1 Jason Waterman, CMPU 224: Computer Organization, Fall 2015 Cache Memories CMPU 224: Computer Organization Nov 19 th Fall 2015.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CSE 351 Section 9 3/1/12.
Cache Memory and Performance
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
Cache Memories CSE 238/2038/2138: Systems Programming
The Hardware/Software Interface CSE351 Winter 2013
Today How’s Lab 3 going? HW 3 will be out today
Memory hierarchy.
ReCap Random-Access Memory (RAM) Nonvolatile Memory
Cache Memories September 30, 2008
Cache Memories Topics Cache memory organization Direct mapped caches
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Cache Memories September 30, 2008 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance lecture-10.ppt “The course that gives CMU its Zip!”

, S’08 Announcements Exam grading done Everyone should have gotten with score (out of 72) mean was 50, high was 70 solution sample should be up on website soon Getting your exam back some got them in recitation working on plan for everyone else (worst case = recitation on Monday) If you think we made a mistake in grading please read the syllabus for details about the process for handling it

, S’08 General cache mechanics Larger, slower, cheaper memory is partitioned into “blocks” Data is copied between levels in block-sized transfer units Smaller, faster, more expensive memory caches a subset of the blocks Cache: Memory: From lecture-9.ppt

, S’08 Cache Performance Metrics Miss Rate Fraction of memory references not found in cache (misses / accesses) 1 – hit rate Typical numbers (in percentages): 3-10% for L1 can be quite small (e.g., < 1%) for L2, depending on size, etc. Hit Time Time to deliver a line in the cache to the processor includes time to determine whether the line is in the cache Typical numbers: 1-2 clock cycle for L clock cycles for L2 Miss Penalty Additional time required because of a miss typically cycles for main memory (Trend: increasing!)

, S’08 Lets think about those numbers Huge difference between a hit and a miss 100X, if just L1 and main memory Would you believe 99% hits is twice as good as 97%? Consider these numbers: cache hit time of 1 cycle miss penalty of 100 cycles So, average access time is: 97% hits: 1 cycle * 100 cycles = 4 cycles 99% hits: 1 cycle * 100 cycles = 2 cycles This is why “miss rate” is used instead of “hit rate”

, S’08 Many types of caches Examples Hardware: L1 and L2 CPU caches, TLBs, … Software: virtual memory, FS buffers, web browser caches, … Many common design issues each cached item has a “tag” (an ID) plus contents need a mechanism to efficiently determine whether given item is cached combinations of indices and constraints on valid locations on a miss, usually need to pick something to replace with the new item called a “replacement policy” on writes, need to either propagate change or mark item as “dirty” write-through vs. write-back Different solutions for different caches Lets talk about CPU caches as a concrete example…

, S’08 Hardware cache memories Cache memories are small, fast SRAM-based memories managed automatically in hardware Hold frequently accessed blocks of main memory CPU looks first for data in L1, then in main memory Typical system structure: main memory bus interface ALU register file CPU chip bus L1 cache

, S’08 Inserting an L1 Cache Between the CPU and Main Memory a b c d block 10 p q r s block w x y z block The big slow main memory has room for many 4-word blocks The tiny, very fast CPU register file has room for four 4-byte words The transfer unit between the cache and main memory is a 4-word block (16 bytes) The transfer unit between the CPU register file and the cache is a 4-byte word The small fast L1 cache has room for two 4-word blocks line 0 line 1

, S’08 Inserting an L1 Cache Between the CPU and Main Memory The tiny, very fast CPU register file has room for four 4-byte words The transfer unit between the cache and main memory is a 4-word block (16 bytes) The transfer unit between the CPU register file and the cache is a 4-byte word The small fast L1 cache has room for two 4-word blocks line 0 line 1 w w block 10 w w block w w block The big slow main memory has room for many 4-word blocks

, S’08 General Organization of a Cache B = 2 b bytes per cache block E lines per set S = 2 s sets t tag bits per line Cache size: C = B x E x S data bytes B–110 B–110 valid tag set 0: B–110 B–110 valid tag set 1: B–110 B–110 valid tag set S-1: Cache is an array of sets Each set contains one or more lines Each line holds a block of data 1 valid bit per line

, S’08 General Organization of a Cache B = 2 b bytes per cache block E lines per set S = 2 s sets t tag bits per line Cache size: C = B x E x S data bytes B–110 B–110 valid tag set 0: B–110 B–110 valid tag set 1: B–110 B–110 valid tag set S-1: Cache is an array of sets Each set contains one or more lines Each line holds a block of data 1 valid bit per line

, S’08 Addressing Caches t bitss bits b bits 0m-1 Address A: B–110 B–110 v v tag set 0: B–110 B–110 v v tag set 1: B–110 B–110 v v tag set S-1: The word at address A is in the cache if the tag bits in one of the lines in set match The word contents begin at offset bytes from the beginning of the block

, S’08 Addressing Caches t bitss bits b bits 0m-1 Address A: B–110 B–110 v v tag set 0: B–110 B–110 v v tag set 1: B–110 B–110 v v tag set S-1: 1.Locate the set based on 2.Locate the line in the set based on 3.Check that the line is valid 4.Locate the data in the line based on

, S’08 Example: Direct-Mapped Cache Simplest kind of cache, easy to build (only 1 tag compare required per access) Characterized by exactly one line per set. valid tag set 0: set 1: set S-1: E=1 lines per set cache block Cache size: C = B x S data bytes

, S’08 Accessing Direct-Mapped Caches Set selection Use the set index bits to determine the set of interest. t bitss bits m-1 b bits tagset indexblock offset selected set valid tag set 0: set 1: set S-1: cache block

, S’08 Accessing Direct-Mapped Caches Line matching and word selection Line matching: Find a valid line in the selected set with a matching tag Word selection: Then extract the word t bitss bits 100i0110 0m-1 b bits tagset indexblock offset selected set (i): b3b3 b0b0 b1b1 b2b =1? (1) The valid bit must be set = ? (2)The tag bits in the cache line must match the tag bits in the address If (1) and (2), then cache hit

, S’08 Accessing Direct-Mapped Caches Line matching and word selection Line matching: Find a valid line in the selected set with a matching tag Word selection: Then extract the word t bitss bits 100i0110 0m-1 b bits tagset indexblock offset selected set (i): b3b3 b0b0 b1b1 b2b (3)If cache hit, block offset selects starting byte.

, S’08 Direct-Mapped Cache Simulation M=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 entry/set Address trace (reads): 0[ ], 1[ ], 7[ ], 8[ ], 0[ ] x t=1s=2b=1 xxx 0?? vtagdata miss 10M[0-1] hit miss 10M[6-7] miss 11M[8-9] miss 10M[0-1]

, S’08 Example: Set Associative Cache Characterized by more than one line per set E=2 lines per set validtag set 0: set 1: set S-1: cache block validtagcache block validtagcache block validtagcache block validtagcache block validtagcache block E-way associative cache

, S’08 Accessing Set Associative Caches Set selection identical to direct-mapped cache valid tag set 0: valid tag set 1: valid tag set S-1: cache block t bitss bits m-1 b bits tagset indexblock offset selected set

, S’08 Accessing Set Associative Caches Line matching and word selection must compare the tag in each valid line in the selected set b3b3 b0b0 b1b1 b2b selected set (i): t bitss bits 100i0110 0m-1 b bits tagset indexblock offset =1? (1) The valid bit must be set = ? (2)The tag bits in one of the cache lines must match the tag bits in the address If (1) and (2), then cache hit

, S’08 Accessing Set Associative Caches Line matching and word selection Word selection is the same as in a direct mapped cache b3b3 b0b0 b1b1 b2b selected set (i): t bitss bits 100i0110 0m-1 b bits tagset indexblock offset (3)If cache hit, block offset selects starting byte.

, S’08 2-Way Associative Cache Simulation M=16 byte addresses, B=2 bytes/block, S=2 sets, E=2 entry/set Address trace (reads): 0[ ], 1[ ], 7[ ], 8[ ], 0[ ] xx t=2s=1b=1 xx 0?? vtagdata miss 100M[0-1] hit miss 101M[6-7] miss 110M[8-9] hit

, S’08 Notice that middle bits used as index t bitss bits m-1 b bits tagset indexblock offset

, S’08 Why Use Middle Bits as Index? High-Order Bit Indexing Adjacent memory lines would map to same cache entry Poor use of spatial locality 4-line Cache High-Order Bit Indexing Middle-Order Bit Indexing Middle-Order Bit Indexing Consecutive memory lines map to different cache lines Can hold S*B*E-byte region of address space in cache at one time

, S’08 Why Use Middle Bits as Index? High-Order Bit Indexing Adjacent memory lines would map to same cache entry Poor use of spatial locality 4-line Cache High-Order Bit Indexing Middle-Order Bit Indexing Middle-Order Bit Indexing Consecutive memory lines map to different cache lines Can hold S*B*E-byte region of address space in cache at one time

, S’08 Sidebar: Multi-Level Caches Options: separate data and instruction caches, or a unified cache size: speed: $/Mbyte: line size: 200 B 3 ns 8 B 8-64 KB 3 ns 32 B 128 MB DRAM 60 ns $1.50/MB 8 KB 30 GB 8 ms $0.05/MB larger, slower, cheaper Memory Regs Unified L2 Cache Unified L2 Cache Processor 1-4MB SRAM 6 ns $100/MB 32 B disk L1 d-cache L1 i-cache

, S’08 What about writes? Multiple copies of data exist: L1 L2 Main Memory Disk What to do when we write? Write-through Write-back need a dirty bit What to do on a write-miss? What to do on a replacement? Depends on whether it is write through or write back

, S’08 Software caches are more flexible Examples File system buffer caches, web browser caches, etc. Some design differences Almost always fully associative so, no placement restrictions index structures like hash tables are common Often use complex replacement policies misses are very expensive when disk or network involved worth thousands of cycles to avoid them Not necessarily constrained to single “block” transfers may fetch or write-back in larger units, opportunistically

, S’08 Locality Example #1 Being able to look at code and get a qualitative sense of its locality is a key skill for a professional programmer Question: Does this function have good locality? int sum_array_rows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum; }

, S’08 Locality Example #2 Question: Does this function have good locality? int sum_array_cols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; }

, S’08 Locality Example #3 Question: Can you permute the loops so that the function scans the 3-d array a[] with a stride-1 reference pattern (and thus has good spatial locality)? int sum_array_3d(int a[M][N][N]) { int i, j, k, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) for (k = 0; k < N; k++) sum += a[k][i][j]; return sum; }