Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache II Steve Ko Computer Sciences and Engineering University at Buffalo.

Similar presentations


Presentation on theme: "CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache II Steve Ko Computer Sciences and Engineering University at Buffalo."— Presentation transcript:

1 CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache II Steve Ko Computer Sciences and Engineering University at Buffalo

2 CSE 490/590, Spring 2011 2 Last time… Dynamic RAM (DRAM) is main form of main memory storage in use today –Holds values on small capacitors, need refreshing (hence dynamic) –Slow multi-step access: precharge, read row, read column Static RAM (SRAM) is faster but more expensive –Used to build on-chip memory for caches Cache holds small set of values in fast memory (SRAM) close to processor –Need to develop search scheme to find values in cache, and replacement policy to make space for newly accessed locations Caches exploit two forms of predictability in memory reference streams –Temporal locality, same location likely to be accessed again soon –Spatial locality, neighboring location likely to be accessed soon

3 CSE 490/590, Spring 2011 Some Basics (Again) Block: the unit of access/storage in cache Word: the unit of access by CPU A block contains multiple words. –Why multiple words? On cache miss, –Memory access –Cache block (re)placement –Why keep it? Five things to decide –After fetching a block from the memory, where do we place it inside the cache? –If the line is taken or the cache is full already, which block to evict? –How many words per block? –How big? –What happens on write? 3

4 CSE 490/590, Spring 2011 4 Placement Policy 0 1 2 3 4 5 6 7 0 1 2 3 Set Number Cache Fully (2-way) Set Direct AssociativeAssociative Mapped anywhereanywhere in only into set 0 block 4 (12 mod 4) (12 mod 8) 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 2 2 2 2 2 2 2 2 2 2 0 1 2 3 4 5 6 7 8 9 3 0 1 Memory Block Number block 12 can be placed

5 CSE 490/590, Spring 2011 Direct-Mapped Cache TagData Block V = Block Offset TagIndex t k b t HIT Data Word or Byte 2 k lines

6 CSE 490/590, Spring 2011 2-Way Set-Associative Cache TagData Block V = Block Offset TagIndex t k b HIT TagData Block V Data Word or Byte = t

7 CSE 490/590, Spring 2011 Fully Associative Cache TagData Block V = Block Offset Tag t b HIT Data Word or Byte = = t

8 CSE 490/590, Spring 2011 8 Replacement Policy In an associative cache, which block from a set should be evicted when the set becomes full? Random Least Recently Used (LRU) LRU cache state must be updated on every access true implementation only feasible for small sets (2-way) pseudo-LRU binary tree often used for 4-8 way First In, First Out (FIFO) a.k.a. Round-Robin used in highly associative caches Not Most Recently Used (NMRU) FIFO with exception for most recently used block or blocks This is a second-order effect. Why? Replacement only happens on misses

9 CSE 490/590, Spring 2011 9 Word3Word0Word1Word2 Block Size and Spatial Locality Larger block size has distinct hardware advantages less tag overhead exploit fast burst transfers from DRAM exploit fast burst transfers over wide busses What are the disadvantages of increasing block size? block address offset b 2 b = block size a.k.a line size (in bytes) Split CPU address b bits 32-b bits Tag Block is unit of transfer between the cache and memory 4 word block, b=2 Fewer blocks => more conflicts. Can waste bandwidth.

10 CSE 490/590, Spring 2011 10 CPU-Cache Interaction (5-stage pipeline) PC addr inst Primary Instruction Cache 0x4 Add IR D nop hit ? PCen Decode, Register Fetch wdata R addr wdata rdata Primary Data Cache we A B YY ALU MD1 MD2 Cache Refill Data from Lower Levels of Memory Hierarchy hit ? Stall entire CPU on data cache miss To Memory Control M E

11 CSE 490/590, Spring 2011 11 Improving Cache Performance Average memory access time = Hit time + Miss rate x Miss penalty To improve performance: reduce the hit time reduce the miss rate reduce the miss penalty What is the simplest design strategy? Biggest cache that doesn’t increase hit time past 1-2 cycles (approx 8-32KB in modern technology) [ design issues more complex with out-of-order superscalar processors ]

12 CSE 490/590, Spring 2011 Serial-versus-Parallel Cache and Memory access  is HIT RATIO: Fraction of references in cache 1 -  is MISS RATIO: Remaining references CACHE Processor Main Memory Addr Data Average access time for serial search: t cache + (1 - ) t mem CACHE Processor Main Memory Addr Data Average access time for parallel search:  t cache + (1 - ) t mem Savings are usually small, t mem >> t cache, hit ratio  high High bandwidth required for memory path Complexity of handling parallel paths can slow t cache

13 CSE 490/590, Spring 2011 13 CSE 490/590 Administrivia Feedback on lectures –If you have any feedback/concern, please send it along to me –Thanks to those who already did –Please ask questions if things are not clear –Or you can simply scream, “TOO FAST!” –Please utilize my office hours (I will change to sometime in the afternoon) Very important to attend –Recitations this week & next week Quiz 1 –Fri, 2/11 –Closed book, in-class –Includes lectures until last Monday (1/31) –Review: Wed (2/9)

14 CSE 490/590, Spring 2011 14 Acknowledgements These slides heavily contain material developed and copyright by –Krste Asanovic (MIT/UCB) –David Patterson (UCB) And also by: –Arvind (MIT) –Joel Emer (Intel/MIT) –James Hoe (CMU) –John Kubiatowicz (UCB) MIT material derived from course 6.823 UCB material derived from course CS252


Download ppt "CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache II Steve Ko Computer Sciences and Engineering University at Buffalo."

Similar presentations


Ads by Google