Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Performance of Cache Memory
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
Cache Optimization Summary
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Shared-memory.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
1 Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Nov 14, 2005 Topic: Cache Coherence.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
Snoopy Coherence Protocols Small-scale multiprocessors.
Cache Organization of Pentium
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Multiprocessor Cache Coherency
Lecture 31: Chapter 5 Today’s topic –Direct mapped cache Reminder –HW8 due 11/21/
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 3 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
EECS 252 Graduate Computer Architecture Lec 13 – Snooping Cache and Directory Based Multiprocessors David Patterson Electrical Engineering and Computer.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories Also known as “Snoopy cache” Paper by: Mark S. Papamarcos and Janak H.
Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 5, 2005 Session 22.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
CAM Content Addressable Memory
“An Evaluation of Directory Schemes for Cache Coherence” Presented by Scott Weber.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Memory Hierarchy and Cache Design (3). Reducing Cache Miss Penalty 1. Giving priority to read misses over writes 2. Sub-block placement for reduced miss.
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
CSCI206 - Computer Organization & Programming
Memory Hierarchy Ideal memory is fast, large, and inexpensive
תרגול מס' 5: MESI Protocol
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
CAM Content Addressable Memory
Replacement Policy Replacement policy:
CS 704 Advanced Computer Architecture
Multiprocessor Cache Coherency
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
COSC121: Computer Systems. Managing Memory
CMSC 611: Advanced Computer Architecture
Lecture 21: Memory Hierarchy
Example Cache Coherence Problem
Lecture 21: Memory Hierarchy
Cache Coherence (controllers snoop on bus transactions)
Interconnect with Cache Coherency Manager
11 – Snooping Cache and Directory Based Multiprocessors
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Miss Rate versus Block Size
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 22: Cache Hierarchies, Memory
Lecture 11: Cache Hierarchies
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Lecture 21: Memory Hierarchy
Cache coherence CEG 4131 Computer Architecture III
Update : about 8~16% are writes
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
CSE 486/586 Distributed Systems Cache Coherence
Presentation transcript:

Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information

Single Cache Control  Direct mapped  Write-back  4 word blocks  16 KB size, so 1024 blocks  32 bit addresses  Valid & dirty bit for each block 2-bit block offset, 2-bit byte offset 10-bit cache address 18-bit tag

Signals: Processor Cache 1-bit read/write 1-bit request (for cache op) 32-bit address 32-bit data, processor to cache 32-bit data, cache to processor 1-bit ready, indicating cache operation complete If read, data is ready

Signals: Cache Memory 1-bit read/write 1-bit request (for memory op) 32-bit address 128-bit data from cache to memory 128-bit data from memory to cache 1-bit ready, indicates memory operation complete If read from memory, data is ready If write, data is buffered and cache may proceed

Communications pathways ProcessorCache Memory Read/Write Request Address Data Ready Data Read/Write Request Address Data Ready Data

Cache States Idle – waiting for read/write request from processor Compare Tag –If valid & match (hit), If also write, write data & set valid, set dirty If not read, send data Set cache ready –If not valid or not match (miss), set cache not ready Allocate –Read new block from memory Write-back –Write old block to memory

Cache controller FSM Idle Compare Tag Allocate Write- back CPU request Cache Ready Cache miss & Block dirty Memory Ready Memory not ready Cache miss & Block clean Memory Ready

Cache Coherence for multiple processors Memory Cache 1Cache 2 Processor 1 Processor 2 Bus

Cache Coherence for multiple processors  Reads –Several Processors can store the same information in their own caches for reading without problems  Writes –Other processors will no longer have the correct information in their caches –Write Back protocol to keep bus traffic lower

Cache Block States  Uses bits to indicate appropriate use of cache line/block  Depends on actions of the given processor and other processors  Keep track using Finite State Machine Model -- for each line/block

Cache State  How can cache for one processor change state based on actions of another processor? –“Snoopy” bus protocol –each cache uses bus to access memory, possibly to signal other processors –each cache “snoops” on the bus, watching actions of other processors which may affect its own state, eaves-dropping –a cache may change state based on action of other processors on bus, without special signals

Finite State Machine representation State 1: Invalid –information in this cache block is invalid, regardless of the tags State 2: Read only (clean, shared) –information in this cache block may be read locally, without any bus traffic State 3: Read/Write (dirty, exclusive) –information in this cache block may be read or written locally, without any bus traffic –information must be copied back to main memory when cache block is replaced or when block is used by another processor

Finite State Machine representation  State Changes based on local processor action  State Changes based on other processor action –Depends on seeing action on the bus, e.g. misses or an explicit signal –Sees address (cache-block address and tag) on the bus to know if action applies to this cache block  If Dirty and State changes, must Copy Back to Main Memory  Read Only = Clean = Shared  Read / Write = Dirty

InvalidRead Only Read / Write Read / Miss Read Hit or Miss Write / Miss Write Hit or Miss Write Hit or Miss (on Hit send invalidate) Read / Miss (Copy Back) Read / Hit

InvalidRead Only Read / Write Read / Miss Read Hit or Miss Write / Miss Write Hit or Miss Write Hit or Miss (on Hit send invalidate) Read / Miss (Copy Back) Read / Hit Other Proc Write / Miss or Invalidate Other Proc Write / Miss Write Back to Main Memory and Other Proc Write / Miss Other Proc Read / Miss. Write Back to Main Memory and Other Proc

Cache Control FSM Idle Compare Tag Check Valid Allocate Write- back CPU request Cache Ready Cache miss & Block valid & dirty Memory Ready Memory not ready Cache miss & Block clean or invalid Memory Ready