Presentation is loading. Please wait.

Presentation is loading. Please wait.

Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science

Similar presentations


Presentation on theme: "Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science"— Presentation transcript:

1 Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science
Cornell University See P&H 5.1, 5.2 (except writes)

2 What will you do over Spring Break?
Relax Head home Head to a warm destination Stay in (frigid) Ithaca Study

3 compute jump/branch targets
Big Picture: Memory Write- Back Memory Instruction Fetch Execute Instruction Decode extend register file control alu memory din dout addr PC new pc inst IF/ID ID/EX EX/MEM MEM/WB imm B A ctrl D M compute jump/branch targets +4 forward unit detect hazard Code Stored in Memory (also, data and stack) $0 (zero) $1 ($at) $29 ($sp) $31 ($ra) Stack, Data, Code Stored in Memory

4 compute jump/branch targets
Big Picture: Memory Memory: big & slow vs Caches: small & fast Write- Back Code Stored in Memory (also, data and stack) compute jump/branch targets Instruction Fetch Instruction Decode Execute Memory memory register file $0 (zero) $1 ($at) $29 ($sp) $31 ($ra) A alu D D B +4 addr PC din dout inst control B M memory extend new pc imm forward unit detect hazard Stack, Data, Code Stored in Memory ctrl ctrl ctrl IF/ID ID/EX EX/MEM MEM/WB

5 compute jump/branch targets
Big Picture: Memory Memory: big & slow vs Caches: small & fast Write- Back Code Stored in Memory (also, data and stack) compute jump/branch targets Instruction Fetch Instruction Decode Execute Memory $$ memory register file $0 (zero) $1 ($at) $29 ($sp) $31 ($ra) A alu D D B +4 $$ addr PC din dout inst control B M memory extend new pc imm forward unit detect hazard Stack, Data, Code Stored in Memory ctrl ctrl ctrl IF/ID ID/EX EX/MEM MEM/WB

6 Goals for Today: caches
Caches vs memory vs tertiary storage Tradeoffs: big & slow vs small & fast working set: 90/10 rule How to predict future: temporal & spacial locality Examples of caches: Direct Mapped Fully Associative N-way set associative Caching Questions How does a cache work? How effective is the cache (hit rate/miss rate)? How large is the cache? How fast is the cache (AMAT=average memory access time)

7 Big Picture How do we make the processor fast, Given that memory is VEEERRRYYYY SLLOOOWWW!!

8 Performance CPU clock rates ~0.2ns – 2ns (5GHz-500MHz)
Technology Capacity $/GB Latency Tape 1 TB $ s of seconds Disk 4 TB $.03 Millions of cycles (ms) SSD (Flash) 512 GB $2 Thousands of cycles (us) DRAM 8 GB $ (10s of ns) SRAM off-chip 32 MB $ cycles (few ns) SRAM on-chip 256 KB ??? 1-3 cycles (ns) Others: eDRAM aka 1T SRAM , FeRAM, CD, DVD, … Q: Can we create illusion of cheap + large + fast? cycles Capacity and latency are closely coupled Cost is inversely proportional eDRAM is embedded DRAM that is part of the same ASIC and die instead of a separate module 1T-SRAM, stands for 1-transistor SRAM, is an instance eDRAM where the refresh controller is integrated so the memory can externally be treated as SRAM.

9 Memory Pyramid Memory Pyramid < 1 cycle access L1 Cache
RegFile 100s bytes < 1 cycle access L1 Cache (several KB) 1-3 cycle access L3 becoming more common (eDRAM ?) L2 Cache (½-32MB) 5-15 cycle access Memory Pyramid Memory (128MB – few GB) cycle access L1 L2 and even L3 all on-chip these days using SRAM and eDRAM Disk (Many GB – few TB) cycle access These are rough numbers: mileage may vary for latest/greatest Caches usually made of SRAM (or eDRAM)

10 Memory Hierarchy Memory closer to processor
small & fast stores active data Memory farther from processor big & slow stores inactive data L1 Cache SRAM-on-chip L2/L3 Cache SRAM Memory DRAM

11 Memory Hierarchy Memory closer to processor
small & fast stores active data Memory farther from processor big & slow stores inactive data $R3 Reg LW $R3, Mem 1% of data access the most L1 Cache SRAM-on-chip L2/L3 Cache SRAM 9% of data is “active” 90% of data inactive (not accessed) Memory DRAM

12 Memory Hierarchy Insight for Caches If Mem[x] was accessed recently...
A: Data that will be used soon Insight for Caches If Mem[x] was accessed recently... … then Mem[x] is likely to be accessed soon Exploit temporal locality: Put recently accessed Mem[x] higher in memory hierarchy since it will likely be accessed again soon … then Mem[x ± ε] is likely to be accessed soon Exploit spatial locality: Put entire block containing Mem[x] and surrounding addresses higher in memory hierarchy since nearby address will likely be accessed put recently accessed Mem[x] higher in the pyramid soon = frequency of loads put entire block containing Mem[x] higher in the pyramid

13 Memory Hierarchy Memory closer to processor is fast but small
usually stores subset of memory farther away “strictly inclusive” Transfer whole blocks (cache lines): 4kb: disk ↔ ram 256b: ram ↔ L2 64b: L2 ↔ L1

14 Memory Hierarchy Memory trace int n = 4; int k[] = { 3, 14, 0, 10 };
0x7c9a2b18 0x7c9a2b19 0x7c9a2b1a 0x7c9a2b1b 0x7c9a2b1c 0x7c9a2b1d 0x7c9a2b1e 0x7c9a2b1f 0x7c9a2b20 0x7c9a2b21 0x7c9a2b22 0x7c9a2b23 0x7c9a2b28 0x7c9a2b2c 0x c 0x 0x7c9a2b04 0x 0x7c9a2b00 0x 0x c ... int n = 4; int k[] = { 3, 14, 0, 10 }; int fib(int i) { if (i <= 2) return i; else return fib(i-1)+fib(i-2); } int main(int ac, char **av) { for (int i = 0; i < n; i++) { printi(fib(k[i])); prints("\n"); Temporal Locality Spatial Locality

15 Cache Lookups (Read) Processor tries to access Mem[x]
Check: is block containing Mem[x] in the cache? Yes: cache hit return requested data from cache line No: cache miss read block from memory (or lower level cache) (evict an existing cache line to make room) place new block in cache return requested data  and stall the pipeline while all of this happens

16 Three common designs A given data block can be placed…
… in exactly one cache line  Direct Mapped … in any cache line  Fully Associative … in a small set of cache lines  Set Associative

17 Direct Mapped Cache Direct Mapped Cache Memory
0x000000 0x000004 0x000008 0x00000c 0x000010 0x000014 0x000018 0x00001c 0x000020 0x000024 0x000028 0x00002c 0x000030 0x000034 0x000038 0x00003c 0x000040 0x000044 0x000048 Direct Mapped Cache Each block number mapped to a single cache line index Simplest hardware Cache 4 bytes each, 8 bytes per line, show tags line 0 line 1 2 cachelines 4-words per cacheline

18 Direct Mapped Cache Direct Mapped Cache Memory
0x000000 0x000004 0x000008 0x00000c 0x000010 0x000014 0x000018 0x00001c 0x000020 0x000024 0x000028 0x00002c 0x000030 0x000034 0x000038 0x00003c 0x000040 0x000044 0x000048 Direct Mapped Cache Each block number mapped to a single cache line index Simplest hardware addr 32-addr tag index offset 27-bits 1-bits 4-bits Cache 4 bytes each, 8 bytes per line, show tags line 0 line 1 0x000000 0x000004 0x000008 0x00000c 2 cachelines 4-words per cacheline

19 Direct Mapped Cache Direct Mapped Cache Memory
0x000000 0x000004 0x000008 0x00000c 0x000010 0x000014 0x000018 0x00001c 0x000020 0x000024 0x000028 0x00002c 0x000030 0x000034 0x000038 0x00003c 0x000040 0x000044 0x000048 Direct Mapped Cache Each block number mapped to a single cache line index Simplest hardware line 0 line 1 32-addr tag index offset 27-bits 1-bits 4-bits Cache 4 bytes each, 8 bytes per line, show tags line 0 line 1 0x000000 0x000004 0x000008 0x00000c 2 cachelines 4-words per cacheline

20 Direct Mapped Cache Direct Mapped Cache Memory
0x000000 0x000004 0x000008 0x00000c 0x000010 0x000014 0x000018 0x00001c 0x000020 0x000024 0x000028 0x00002c 0x000030 0x000034 0x000038 0x00003c 0x000040 0x000044 0x000048 Direct Mapped Cache Each block number mapped to a single cache line index Simplest hardware line 0 line 1 line 2 line 3 line 0 32-addr tag index offset 27-bits 2-bits 3-bits line 1 Cache 4 bytes each, 8 bytes per line, show tags line 0 line 1 line 2 line 3 0x000000 0x000004 line 2 line 3 4 cachelines 2-words per cacheline

21 Direct Mapped Cache

22 Direct Mapped Cache (Reading)
Tag Index Offset V Tag Block word selector = = Byte offset in word word select 0…001000 tag offset word select data index hit? hit? data 32bits

23 Direct Mapped Cache (Reading)
Tag Index Offset V Tag n bit index, m bit offset Q: How big is cache (data only)? = word select data hit?

24 Direct Mapped Cache (Reading)
Tag Index Offset 2m bytes per block V Tag Block 2n blocks n bit index, m bit offset Q: How big is cache (data only)? Cache of size 2n blocks Block size of 2m bytes Cache Size: 2n bytes per block x 2n blocks = 2n+m bytes = word select data hit?

25 Direct Mapped Cache (Reading)
Tag Index Offset V Tag Block n bit index, m bit offset Q: How much SRAM is needed (data + overhead)? = word select data hit?

26 Direct Mapped Cache (Reading)
Tag Index Offset 2m bytes per block V Tag Block 2n blocks n bit index, m bit offset Q: How much SRAM is needed (data + overhead)? Cache of size 2n blocks Block size of 2m bytes Tag field: 32 – (n + m), Valid bit: 1 SRAM Size: 2n x (block size tag size valid bit size) = 2n x (2m bytes x 8 bits-per-byte + (32–n–m) + 1) = word select data hit?

27 Example:A Simple Direct Mapped Cache
Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] 130 tag data V 140 150 160 170 180 190 200 $0 $1 $2 $3 210 220 230 240 250

28 Example:A Simple Direct Mapped Cache
Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 2 bit tag field 2 bit index field 1 bit block offset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] 130 tag data V 140 150 160 170 180 190 200 $0 $1 $2 $3 210 220 230 240 250

29 1st Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] 130 tag data V 140 150 160 170 180 190 200 $0 $1 $2 $3 210 220 230 240 250

30 1st Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 Addr: 00001 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M 130 tag data block offset V 140 1 00 100 150 110 160 170 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 1 Hits: 230 240 250

31 2nd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M 130 tag data V 140 1 00 100 150 110 160 170 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 1 Hits: 230 240 250

32 2nd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00101 110 block offset 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M 130 tag data V 140 1 00 100 150 110 160 170 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 240 250

33 3rd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M 130 tag data V 140 1 00 100 150 110 160 170 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 240 250

34 3rd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00001 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M H 130 tag data V 140 1 00 100 150 110 160 170 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 110 240 250

35 4th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 110 240 250

36 4th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M H block offset 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 140 240 250

37 5th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 140 240 250

38 5th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00000 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 100 140 230 140 240 250

39 6th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 10 ] LB $2  M[ 15 ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 100 140 230 140 240 250

40 6th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01010 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 10 ] LB $2  M[ 15 ] M H 130 tag data V 140 1 00 100 150 110 160 1 1 2 01 140 200 170 150 210 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 200 140 230 140 240 250

41 7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 10 ] LB $2  M[ 15 ] M H 130 tag data V 140 1 00 100 150 110 160 1 1 2 01 140 200 170 150 210 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 140 200 230 140 240 250

42 7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01111 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 10 ] LB $2  M[ 15 ] M H 130 tag data V 140 1 00 100 150 110 160 1 1 2 01 140 200 170 150 210 180 1 00 140 190 150 200 1 01 240 $0 $1 $2 $3 210 250 110 220 Misses: 4 Hits: 250 140 230 140 240 250

43 8th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 10 ] LB $2  M[ 15 ] LB $2  M[ ] M H 130 tag data V 140 1 00 100 150 110 160 1 1 01 2 140 200 170 150 210 180 1 00 140 190 150 200 1 01 240 $0 $1 $2 $3 210 250 110 220 Misses: 4 Hits: 140 250 230 140 240 250

44 8th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01000 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 10 ] LB $2  M[ 15 ] LB $2  M[ ] M H 130 tag data V 140 1 01 100 180 150 190 110 160 1 1 01 2 140 200 170 150 210 180 1 00 140 190 150 200 1 01 240 $0 $1 $2 $3 210 250 110 220 Misses: 5 Hits: 180 140 230 140 240 250

45 Misses Three types of misses Cold (aka Compulsory) Capacity Conflict
The line is being referenced for the first time Capacity The line was evicted because the cache was not large enough Conflict The line was evicted because of another access whose index conflicted

46 Misses Q: How to avoid… Cold Misses Capacity Misses Conflict Misses
Unavoidable? The data was never in the cache… Prefetching! Capacity Misses Buy more SRAM Conflict Misses Use a more flexible cache design

47 Direct Mapped Example: 6th Access
Using byte addresses in this example! Addr Bus = 5 bits Pathological example Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 100 140 230 140 240 250

48 6th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 01 220 190 230 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 220 140 230 140 240 250

49 7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 01 220 190 230 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 220 140 230 140 240 250

50 7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01000 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] M H 130 tag data V 140 1 01 180 100 150 110 190 160 1 2 140 170 150 180 1 01 220 190 230 200 $0 $1 $2 $3 210 180 220 Misses: 4 Hits: 140 220 230 140 240 250

51 8th and 9th Access Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 01 100 180 150 110 190 160 1 2 140 170 150 180 1 01 220 190 230 200 $0 $1 $2 $3 210 180 220 Misses: 4 Hits: 140 220 230 140 240 250

52 8th and 9th Access Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 00 180 100 100 150 110 190 110 160 1 2 140 170 150 180 1 00 220 140 190 230 150 200 $0 $1 $2 $3 210 180 220 Misses: 4+2 Hits: 100 230 140 240 250

53 10th and 11th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 00 180 100 100 150 110 190 110 160 1 2 140 170 150 180 1 00 220 140 190 150 230 200 210 220 Misses: 4+2 Hits: 230 240 250

54 10th and 11th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 1 180 100 150 190 110 160 1 2 140 170 150 180 1 1 220 190 230 200 210 220 Misses: Hits: 230 240 250

55 Cache Organization How to avoid Conflict Misses Three common designs
Fully associative: Block can be anywhere in the cache Direct mapped: Block can only be in one line in the cache Set-associative: Block can be in a few (2 to 8) places in the cache

56 Fully Associative Cache (Reading)
Tag Offset V Tag Block = = = = line select “way” = line 64bytes word select 32bits hit? data

57 Fully Associative Cache (Reading)
Tag Offset V Tag Block m bit offset Q: How big is cache (data only)? , 2n blocks (cache lines) “way” = line

58 Fully Associative Cache (Reading)
Tag Offset V Tag Block m bit offset Q: How big is cache (data only)? Cache of size 2n blocks Block size of 2m bytes Cache Size: number-of-blocks x block size = 2n x 2m bytes = 2n+m bytes , 2n blocks (cache lines) “way” = line

59 Fully Associative Cache (Reading)
Tag Offset V Tag Block m bit offset Q: How much SRAM needed (data + overhead)? , 2n blocks (cache lines) “way” = line

60 Fully Associative Cache (Reading)
Tag Offset V Tag Block m bit offset Q: How much SRAM needed (data + overhead)? Cache of size 2n blocks Block size of 2m bytes Tag field: 32 – m Valid bit: 1 SRAM size: 2n x (block size tag size + valid bit size) = 2nx (2m bytes x 8 bits-per-byte + (32-m) + 1) , 2n blocks (cache lines) “way” = line

61 Example:Simple Fully Associative Cache
Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 4 bit tag field 1 bit block offset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] 130 tag data V 140 V 150 160 V 170 180 V 190 200 V $0 $1 $2 $3 210 220 230 240 250

62 1st Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] 130 tag data V 140 150 160 170 180 190 200 $0 $1 $2 $3 210 220 230 240 250

63 1st Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00001 110 block offset 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M 130 tag data 140 1 0000 100 150 110 160 LRU 170 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 1 Hits: 230 240 250

64 2nd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M 130 tag data 140 1 0000 100 150 110 160 lru 170 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 1 Hits: 230 240 250

65 2nd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00101 110 120 block offset LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 240 250

66 3rd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 240 250

67 3rd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00001 110 block offset 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 110 240 250

68 4th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 110 240 250

69 4th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00100 110 120 block offset LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 140 240 250

70 5th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 150 230 140 240 250

71 5th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00000 110 block offset 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] M H 130 tag data 140 1 100 150 110 160 1 2 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 100 140 230 140 240 250

72 6th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] M H 130 tag data 140 1 100 150 110 160 1 2 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 100 140 230 140 240 250

73 6th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 220 140 230 140 240 250

74 7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 220 140 230 140 240 250

75 7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01000 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 1 0100 180 $0 $1 $2 $3 210 190 110 220 Misses: 4 Hits: 180 140 100 230 140 240 250

76 8th and 9th Access Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 1 0100 180 $0 $1 $2 $3 210 190 110 220 Misses: 4 Hits: 180 140 100 230 140 240 250

77 8th and 9th Access Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 1 0100 180 $0 $1 $2 $3 210 190 110 220 Misses: 4 Hits: 100 140 180 230 140 240 250

78 10th and 11th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 1 0100 180 210 190 220 Misses: 4 Hits: 230 240 250

79 10th and 11th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Processor
Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 1 0100 180 210 190 220 Misses: 4 Hits: 230 240 250

80 Eviction Which cache line should be evicted from the cache to make room for a new line? Direct-mapped no choice, must evict line selected by index Associative caches random: select one of the lines at random round-robin: similar to random FIFO: replace oldest line LRU: replace line that has not been used in the longest time

81 Cache Tradeoffs Direct Mapped + Smaller + Less + Faster + Very – Lots – Low – Common Fully Associative Larger – More – Slower – Not Very – Zero + High + ? Tag Size SRAM Overhead Controller Logic Speed Price Scalability # of conflict misses Hit rate Pathological Cases?

82 Set-associative cache Like a direct-mapped cache
Compromise Set-associative cache Like a direct-mapped cache Index into a location Fast Like a fully-associative cache Can store multiple entries decreases thrashing in cache Search in each element

83 3-Way Set Associative Cache (Reading)
Tag Index Offset = = = line select 64bytes word select 32bits hit? data

84 Comparison: Direct Mapped
Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 2 bit tag field 2 bit index field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 130 tag data 140 100 150 110 160 1 2 140 170 150 180 190 200 210 220 Misses: Hits: 230 240 250

85 Comparison: Direct Mapped
Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 2 bit tag field 2 bit index field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] M H 130 tag data 140 1 00 180 100 100 150 110 190 110 160 1 2 140 170 150 180 1 00 220 140 190 230 150 200 210 220 Misses: 8 Hits: 230 240 250

86 Comparison: Fully Associative
Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 4 bit tag field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 130 tag data 140 150 160 170 180 190 200 210 220 Misses: Hits: 230 240 250

87 Comparison: Fully Associative
Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 4 bit tag field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 210 220 Misses: 3 Hits: 230 240 250

88 Comparison: 2 Way Set Assoc
Using byte addresses in this example! Addr Bus = 5 bits Cache 2 sets 2 word block 3 bit tag field 1 bit set index field 1 bit block offset field Memory Processor 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 tag data LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 120 130 140 150 160 170 180 0001 M: 1, H: 0 0101 M: 2, H: 0 0001 M: 2, H: /45/00/00 1100 M: 3, H: ,13/01/00/00 1000 M:4, H: ,13/4,5/00/00 0100 M:5, H:3 8,9/45/00/00 0000 M:5, H: /45/00/00 1000 M:4, H:3 8,9/12,13/00/00 M:7,H:4 190 200 210 220 Misses: Hits: 230 240 250

89 Comparison: 2 Way Set Assoc
Using byte addresses in this example! Addr Bus = 5 bits Cache 2 sets 2 word block 3 bit tag field 1 bit set index field 1 bit block offset field Memory Processor 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 tag data LB $1  M[ ] LB $2  M[ ] LB $3  M[ ] LB $3  M[ ] LB $2  M[ ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 120 M H 130 140 150 160 170 180 0001 M: 1, H: 0 0101 M: 2, H: 0 0001 M: 2, H: /45/00/00 1100 M: 3, H: ,13/01/00/00 1000 M:4, H: ,13/4,5/00/00 0100 M:5, H:3 8,9/45/00/00 0000 M:5, H: /45/00/00 1000 M:4, H:3 8,9/12,13/00/00 M:7,H:4 190 200 210 220 Misses: 4 Hits: 230 240 250

90 Remaining Issues To Do: Evicting cache lines Picking cache parameters
Writing using the cache

91 Summary Caching assumptions Benefits
small working set: 90/10 rule can predict future: spatial & temporal locality Benefits big & fast memory built from (big & slow) + (small & fast) Tradeoffs: associativity, line size, hit cost, miss penalty, hit rate Fully Associative  higher hit cost, higher hit rate Larger block size  lower hit cost, higher miss penalty Next up: other designs; writing to caches

92 Administrivia Upcoming agenda
HW3 was due yesterday Wednesday, March 13th PA2 Work-in-Progress circuit due before spring break Spring break: Saturday, March 16th to Sunday, March 24th Prelim2 Thursday, March 28th, right after spring break PA2 due Thursday, April 4th

93 Have a great Spring Break!!!


Download ppt "Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science"

Similar presentations


Ads by Google