CSCI 232© 2005 JW Ryder1 Cache Memory Systems Introduced by M.V. Wilkes (“Slave Store”) Appeared in IBM S360/85 first commercially.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

361 Computer Architecture Lecture 15: Cache Memory
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
Lecture 12 Reduce Miss Penalty and Hit Time
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Performance of Cache Memory
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
On-Chip Cache Analysis A Parameterized Cache Implementation for a System-on-Chip RISC CPU.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Chapter 12 Pipelining Strategies Performance Hazards.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Management 2010.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Lecture 31: Chapter 5 Today’s topic –Direct mapped cache Reminder –HW8 due 11/21/
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
CMPE 421 Parallel Computer Architecture
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Chapter Twelve Memory Organization
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui COMP 203 / NWEN 201 Computer Organisation / Computer Architectures Virtual.
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CSCI 232© 2005 JW Ryder1 Cache Memory Organization Direct Mapping Fully Associative Set Associative (very popular) Sector Mapping.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
Anshul Kumar, CSE IITD CSL718 : Memory Hierarchy Cache Memories - continued 13th Feb, 2006.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Chapter 5 Memory Hierarchy Design. 2 Many Levels in Memory Hierarchy Pipeline registers Register file 1st-level cache (on-chip) 2nd-level cache (on same.
Computer Architecture Lecture 25 Fasih ur Rehman.
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
CS161 – Design and Architecture of Computer
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
5.2 Eleven Advanced Optimizations of Cache Performance
Morgan Kaufmann Publishers
Adapted from slides by Sally McKee Cornell University
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Update : about 8~16% are writes
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

CSCI 232© 2005 JW Ryder1 Cache Memory Systems Introduced by M.V. Wilkes (“Slave Store”) Appeared in IBM S360/85 first commercially

CSCI 232© 2005 JW Ryder2 Motivations Main memory access time 5 to 25 times slower than accessing register –on chip vs. off chip issues et al. Can’t have too many registers in the CPU Program locality should allow small fast buffer between the CPU and MM Should be managed by hardware to be effective

CSCI 232© 2005 JW Ryder3 Motivations Continued Most of time, MM data has to be found in cache to be worth it Can only happen if dynamic locality is tracked well Automatic management, transparent to Instruction Set Architecture (ISA)

CSCI 232© 2005 JW Ryder4 Access and Cost T cache < T MM T reg < T cache C reg > C cache > C MM (per bit - real estate)

CSCI 232© 2005 JW Ryder5 Cache vs. Registers Cache –Locality: Tracked dynamically –Management: Hardware –Expandability: Easy –ISA Visibility: Invisible (mostly) Registers –Locality: Static by compiler –Management: Software/Programmer –Expandability: Not possible –ISA Visibility: Visible

CSCI 232© 2005 JW Ryder Simple Cache Based System MM Registers CPU Cache

CSCI 232© 2005 JW Ryder7 Read Operation See if desired MM word is in the cache (1) If it is (‘cache hit’) get it from the cache (2) If it isn’t (cache miss) get it from MM - supply simultaneously to CPU and cache (3) –Make room in cache by selecting a victim - may have to be written back to MM (4) and then copy installed (5) CPU stalls until missing word is supplied

CSCI 232© 2005 JW Ryder8 Locality of Reference Temporal –If this word is needed now, then there is a good chance it will be needed again Spatial –When the fetch from MM is done, it actually gets a chunk of words –Probably some word near the word will also be needed Registers use TLOR Caches use TLOR, SLOR

CSCI 232© 2005 JW Ryder9 Selecting a Victim Must not be accessed in near future Maintain a history of usage Basic unit of transfer between cache and MM is a block (line) consisting of 2 b words –b is small (2 - 4) On miss, block containing missing word loaded into cache (by cache controller) Ensures neighboring words also cached (SLOR)

CSCI 232© 2005 JW Ryder10 Addressing Cache Same as memory Cache stores entries in form – Cache controller compares address issued by CPU with address field of cache entries to determine a hit or miss Transfer between Cache and CPU is only a word or 2 Between Cache and MM in block(s) Hit - Data back from cache in 1 clock cycle Miss cycles

CSCI 232© 2005 JW Ryder11 Functions of Cache Controller Given an address issued by CPU, CC should be able to determine if block containing word is in cache or not –requires assoc. logic / comparators CC needs to keep track of usage of blocks in cache Hardware logic for victim selection May need to write back line (victim) from cache to MM Must implement a placement policy that determines how blocks from MM are placed in cache Replacement policy needed only if there is a choice for victim

CSCI 232© 2005 JW Ryder12 Cache Loading Strategies Load block into cache from MM only on a miss Prefetch (anticipating a miss) block into cache –Prefetch on Miss: On block i miss, prefetch block i + 1 too –Always Prefetch: Prefetch block i + 1 on first reference to block i –Tagged Prefetch: Prefetch on miss and prefetch block i + 1 if a reference to a previously prefetched block is made for the first time –Keep prefetching if last prefetch was useful –Tags distinguish not yet accessed blocks from others

CSCI 232© 2005 JW Ryder13 More Strategies Previous prefetches are 1 block, can be > 1 block Selective Fetch –Don’t fetch shared writeable blocks –Used in many systems to avoid cache incoherence (multiprocessors)

CSCI 232© 2005 JW Ryder14 Load-Thru / Read-Thru Missing word forwarded to CPU and cache concurrently Remaining words of block are then fetched in wraparound fashion … … 2 k w Order of loading for remaining words in block Wrapping around saves pointer resetting Write pointer already positioned Not needed if load can be in one shot

CSCI 232© 2005 JW Ryder15 Cache with Writeback Buffers Cache CPUMM Write-Thru caches Write-Back caches Special R W W W Writeback buffer = fast registers Special: Used with both types of caches; used when wrote word to writeback buffer then there is a cache miss Cache speed, buffer speed, memory speed

CSCI 232© 2005 JW Ryder16 Write-Thru Caches Write generated by CPU writes into cache and also deposits the write into writeback buffer –Eventually written back to MM Delay perceived by CPU –max (T cache, T WB ) T cache Cache access time T WB Time to write into writeback buffer T cache, T WB < T MM

CSCI 232© 2005 JW Ryder17 Writeback Cache Write to cache Write modified victims to MM via writeback buffer Delay perceived by CPU = T cache Special happens on a miss, read or write

CSCI 232© 2005 JW Ryder18 Cache Update Policies Keeps MM copy and cache copy of a word (ergo block) consistent Write-Thru (Store-Thru) –On hit if operation is a write, copies in MM and cache are both updated simultaneously –No need to writ e back blocks selected as victims –Useful for multiprocessing systems (MM always has latest copy) –If cache fails MM copy can serve as hot back up –Can slow up CPU on writes (since MM updates take place at slower rates)

CSCI 232© 2005 JW Ryder19 Write-Back (No Write-Thru) On write hit, only cache copy is updated Faster writes on a cache hit Need to write back dirty blocks selected as victims –Dirty Block: A block modified after being brought into the cache Requires a clean/dirty bit for every block

CSCI 232© 2005 JW Ryder20 Allocation Policies WTWA - Write Thru Write Allocate - allocate missing block in cache on both read and write miss WTNWA - Write Thru No Write Allocate - Don’t allocate on a write miss, allocate only for a read miss