CAM Content Addressable Memory

Slides:



Advertisements
Similar presentations
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
How caches take advantage of Temporal locality
Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Maninder Kaur CACHE MEMORY 24-Nov
Lecture 19: Virtual Memory
Lecture Objectives: 1)Define set associative cache and fully associative cache. 2)Compare and contrast the performance of set associative caches, direct.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CSCI 232© 2005 JW Ryder1 Cache Memory Organization Direct Mapping Fully Associative Set Associative (very popular) Sector Mapping.
Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.
Chapter 91 Logical Address in Paging  Page size always chosen as a power of 2.  Example: if 16 bit addresses are used and page size = 1K, we need 10.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Lecture 20 Last lecture: Today’s lecture: Types of memory
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
CAM Content Addressable Memory
Chapter 9 Memory Organization. 9.1 Hierarchical Memory Systems Figure 9.1.
Performance improvements ( 1 ) How to improve performance ? Reduce the number of cycles per instruction and/or Simplify the organization so that the clock.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Main Memory Cache Architectures
Cache Memory.
CSE 351 Section 9 3/1/12.
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
The Hardware/Software Interface CSE351 Winter 2013
Memory Hierarchy Virtual Memory, Address Translation
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Lecture: Large Caches, Virtual Memory
William Stallings Computer Organization and Architecture 7th Edition
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
Replacement Policies Assume all accesses are: Cache Replacement Policy
CSCI206 - Computer Organization & Programming
Computer Architecture
Module IV Memory Organization.
Chapter 6 Memory System Design
Chap. 12 Memory Organization
Morgan Kaufmann Publishers
How can we find data in the cache?
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Main Memory Cache Architectures
Translation Buffers (TLB’s)
Translation Buffers (TLB’s)
Lecture 21: Memory Hierarchy
Chapter Five Large and Fast: Exploiting Memory Hierarchy
How does the CPU work? CPU’s program counter (PC) register has address i of the first instruction Control circuits “fetch” the contents of the location.
Cache Memory and Performance
Translation Buffers (TLBs)
Review What are the advantages/disadvantages of pages versus segments?
Virtual Memory.
10/18: Lecture Topics Using spatial locality
Presentation transcript:

CAM Content Addressable Memory For TAG look-up in a Fully-Associative Cache

Fully Associative Cache 1 Tagin == V Tag0 Data0 == V Tag1 Data1 == V Tag15 Data15

CAM 1 Tagin Data RAM == V Tag0 Data0 == V Tag1 Data1 == V Tag15 Data15

Tag search Using CAM CAM: Content Addressable Memory Input: Data Output: Address It searches for the content (input data) and provides the address of the location with that content Tag search using CAM Input: Tag portion of the block address requested Output: One hot coded or encoded binary address of where the tag is found => i.e. the address of the data in the data RAM

Tag search using CAM At most one match signal is high == Match 0 V Tag0 == == Match 1 Tag in Tag in V Tag1 At most one match signal is high Match signal determines the correct line in DATA RAM One hot encoded address for Data RAM RE == Match 15 WE V Tag15

Interfacing with Data RAM CAM 16 X 8 Use binary encoding to generate address from Tag RAM Use that address to fetch data from Data RAM (Hit == 1) => Address is valid (Hit == 0) => Address is invalid Tag In Addr 8 4 RE Hit 1 Addr Data RAM 64 X 32 Addr Data RAM 16 X 32 Data Data 4 4 32 32 2 Word Field Data RAM with one word/block Data RAM with four words/block

Updating CAM CAM should be updated on a cache miss Write Addr CAM 16 X 8 CAM should be updated on a cache miss Use Write Enable Signal Input: Location to write Input: Tag to be written 4 Addr WE 4 7 Tag In Hit 1 1 RE Note: Address to write can be one of the empty locations address if the cache is not full. If the cache is full, the “Victim” address is chosen using either RANDOM replacement algorithm or LRU (Least Recently Used) replacement algorithm. In a Write-back cache, if the victim block is dirty, it should first be copied to the MM. Then Data is written into the Data RAM while Tag (along with Valid-bit == 1) is written into the CAM.

Empty address CAM constantly generates Empty information and also an empty address for use by CCU if it is going to bring a new (missing) block into cache and store the corresponding Tag into the CAM. If multiple locations of the CAM are empty, the address of the lowest addressed empty location can be produced. This is to be deterministic instead of being random in choosing an empty address as logic to make it random is more expensive.

Showing Empty Address also CAM 16 X 8 Hit Addr 7 Hit 4 1 Tag In WE Write Addr RE Empty Addr Empty LRU Addr Empty Addr Empty When you bring a new block you not only deposit the TAG in the CAM but you validate it by setting the Valid bit to 1. If we are always writing a “1” into the valid bit can we avoid stating it explicitly through a pin and save a pin? Well we write a zero in the valid bit if we are flushing out a block for various reasons (1. replacing the dirty block in preparation to bring a new block into cache, 2. in a multicore system to be covered later, if the cache in another core requests the block as he wants to write to it, we need to give away the block and invalidate it in this core’s cache). So we need the input pin for Valid bit.

A CAM for Fully-Associative Mapping only? A CAM is certainly used in fully associative mapping (in TLBs, in routers, etc. but not in cache as cache are too big for fully-associative mapping). But a CAM can also be used whenever the degree of set associativity is quite high (say 16 or more) where so many shallow TAG RAMs do not make sense. See Q#4.3 from the ee457_MT_Spring2017.

Q#4.3 of ee457_MT_Spring2017 Processor: 80486 (32-bit address, 32-bit data, byte-addressable processor) Cache size: 1KB (256 32-bit words in 4 byte-wide banks each 256x8) Block size = 4 words => Block frames = 64 DoSA ( Degree of Set Associativity ) = 32 Number of sets = 2 This is almost fully associative. 32 segregations (with 32 2-location TAG RAMs and 32 8-location DATA RAMs) does not make sense. So we choose to handle with 2 CAMs and 2 DATA RAMs as shown on next page. Each CAM searches the block frames in one set.

Q#4.3 of ee457_MT_Spring2017 Incomplete diagram in the Question

Q#4.3 of ee457_MT_Spring2017 Solution diagram

Explanation of the “i” and the “j” in the figure If the processor wants to read or write to the cache, the CCU would first ascertain that the block is present in the specific set (I = 0 or 1 based on the set bit in the address from the CPU) and then allow the processor to read from or write to the data ram of that set. The “j” When there is a cache miss, the cache control unit (CCU) will bring the block from MM and will deposit in either set 0 or set 1 based on the set index bit of the address of the block. Within the set, it will choose an empty location if available, or a victim location chosen by LRU or Random replacement algorithm, if it is full.

Q#4.3 of ee457_MT_Spring2017 Solution diagram replicated for the two sets (set 0 and set 1)