We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byNorah Hart
Modified over 4 years ago
CSCI 232© 2005 JW Ryder1 Cache Memory Systems Introduced by M.V. Wilkes (“Slave Store”) Appeared in IBM S360/85 first commercially
CSCI 232© 2005 JW Ryder2 Motivations Main memory access time 5 to 25 times slower than accessing register –on chip vs. off chip issues et al. Can’t have too many registers in the CPU Program locality should allow small fast buffer between the CPU and MM Should be managed by hardware to be effective
CSCI 232© 2005 JW Ryder3 Motivations Continued Most of time, MM data has to be found in cache to be worth it Can only happen if dynamic locality is tracked well Automatic management, transparent to Instruction Set Architecture (ISA)
CSCI 232© 2005 JW Ryder4 Access and Cost T cache < T MM T reg < T cache C reg > C cache > C MM (per bit - real estate)
CSCI 232© 2005 JW Ryder5 Cache vs. Registers Cache –Locality: Tracked dynamically –Management: Hardware –Expandability: Easy –ISA Visibility: Invisible (mostly) Registers –Locality: Static by compiler –Management: Software/Programmer –Expandability: Not possible –ISA Visibility: Visible
CSCI 232© 2005 JW Ryder6 4 2 1 5 3 Simple Cache Based System MM Registers CPU Cache
CSCI 232© 2005 JW Ryder7 Read Operation See if desired MM word is in the cache (1) If it is (‘cache hit’) get it from the cache (2) If it isn’t (cache miss) get it from MM - supply simultaneously to CPU and cache (3) –Make room in cache by selecting a victim - may have to be written back to MM (4) and then copy installed (5) CPU stalls until missing word is supplied
CSCI 232© 2005 JW Ryder8 Locality of Reference Temporal –If this word is needed now, then there is a good chance it will be needed again Spatial –When the fetch from MM is done, it actually gets a chunk of words –Probably some word near the word will also be needed Registers use TLOR Caches use TLOR, SLOR
CSCI 232© 2005 JW Ryder9 Selecting a Victim Must not be accessed in near future Maintain a history of usage Basic unit of transfer between cache and MM is a block (line) consisting of 2 b words –b is small (2 - 4) On miss, block containing missing word loaded into cache (by cache controller) Ensures neighboring words also cached (SLOR)
CSCI 232© 2005 JW Ryder10 Addressing Cache Same as memory Cache stores entries in form – Cache controller compares address issued by CPU with address field of cache entries to determine a hit or miss Transfer between Cache and CPU is only a word or 2 Between Cache and MM in block(s) Hit - Data back from cache in 1 clock cycle Miss - 15 - 20 cycles
CSCI 232© 2005 JW Ryder11 Functions of Cache Controller Given an address issued by CPU, CC should be able to determine if block containing word is in cache or not –requires assoc. logic / comparators CC needs to keep track of usage of blocks in cache Hardware logic for victim selection May need to write back line (victim) from cache to MM Must implement a placement policy that determines how blocks from MM are placed in cache Replacement policy needed only if there is a choice for victim
CSCI 232© 2005 JW Ryder12 Cache Loading Strategies Load block into cache from MM only on a miss Prefetch (anticipating a miss) block into cache –Prefetch on Miss: On block i miss, prefetch block i + 1 too –Always Prefetch: Prefetch block i + 1 on first reference to block i –Tagged Prefetch: Prefetch on miss and prefetch block i + 1 if a reference to a previously prefetched block is made for the first time –Keep prefetching if last prefetch was useful –Tags distinguish not yet accessed blocks from others
CSCI 232© 2005 JW Ryder13 More Strategies Previous prefetches are 1 block, can be > 1 block Selective Fetch –Don’t fetch shared writeable blocks –Used in many systems to avoid cache incoherence (multiprocessors)
CSCI 232© 2005 JW Ryder14 Load-Thru / Read-Thru Missing word forwarded to CPU and cache concurrently Remaining words of block are then fetched in wraparound fashion 0 1 2 … … 2 k w Order of loading for remaining words in block Wrapping around saves pointer resetting Write pointer already positioned Not needed if load can be in one shot
CSCI 232© 2005 JW Ryder15 Cache with Writeback Buffers Cache CPUMM Write-Thru caches Write-Back caches Special R W W W Writeback buffer = fast registers Special: Used with both types of caches; used when wrote word to writeback buffer then there is a cache miss Cache speed, buffer speed, memory speed
CSCI 232© 2005 JW Ryder16 Write-Thru Caches Write generated by CPU writes into cache and also deposits the write into writeback buffer –Eventually written back to MM Delay perceived by CPU –max (T cache, T WB ) T cache Cache access time T WB Time to write into writeback buffer T cache, T WB < T MM
CSCI 232© 2005 JW Ryder17 Writeback Cache Write to cache Write modified victims to MM via writeback buffer Delay perceived by CPU = T cache Special happens on a miss, read or write
CSCI 232© 2005 JW Ryder18 Cache Update Policies Keeps MM copy and cache copy of a word (ergo block) consistent Write-Thru (Store-Thru) –On hit if operation is a write, copies in MM and cache are both updated simultaneously –No need to writ e back blocks selected as victims –Useful for multiprocessing systems (MM always has latest copy) –If cache fails MM copy can serve as hot back up –Can slow up CPU on writes (since MM updates take place at slower rates)
CSCI 232© 2005 JW Ryder19 Write-Back (No Write-Thru) On write hit, only cache copy is updated Faster writes on a cache hit Need to write back dirty blocks selected as victims –Dirty Block: A block modified after being brought into the cache Requires a clean/dirty bit for every block
CSCI 232© 2005 JW Ryder20 Allocation Policies WTWA - Write Thru Write Allocate - allocate missing block in cache on both read and write miss WTNWA - Write Thru No Write Allocate - Don’t allocate on a write miss, allocate only for a read miss
SE-292 High Performance Computing
361 Computer Architecture Lecture 15: Cache Memory
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
Lecture 12 Reduce Miss Penalty and Hit Time
KeyStone Training More About Cache. XMC – External Memory Controller The XMC is responsible for the following: 1.Address extension/translation 2.Memory.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Performance of Cache Memory
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
On-Chip Cache Analysis A Parameterized Cache Implementation for a System-on-Chip RISC CPU.
Processor - Memory Interface
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Chapter 12 Pipelining Strategies Performance Hazards.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Multiprocessing Memory Management
© 2019 SlidePlayer.com Inc. All rights reserved.