Lecture Objectives: 1)Define set associative cache and fully associative cache. 2)Compare and contrast the performance of set associative caches, direct.

Slides:



Advertisements
Similar presentations
Cache Memory Exercises. Questions I Given: –memory is little-endian and byte addressable; memory size; –number of cache blocks, size of cache block –An.
Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Memory system.
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
CS 430 Computer Architecture 1 CS 430 – Computer Architecture Caches, Part II William J. Taffe using slides of David Patterson.
Computer ArchitectureFall 2007 © November 14th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Modified from notes by Saeid Nooshabadi
CS61C L22 Caches III (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #22: Caches Andy.
Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.
Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
CS61C L32 Caches III (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C.
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
5/27/99 Ashish Sabharwal1 Cache Misses - The 3 C’s n Compulsory misses –cold start –don’t have a choice –except that by increasing block size, can reduce.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Cache memory October 16, 2007 By: Tatsiana Gomova.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.
CS 1104 Help Session I Caches Colin Tan, S
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
CS2100 Computer Organisation Cache II (AY2015/6) Semester 1.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
Lecture Objectives: 1)Explain the relationship between miss rate and block size in a cache. 2)Construct a flowchart explaining how a cache miss is handled.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Lecture 20 Last lecture: Today’s lecture: Types of memory
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Cache Organization 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
CS 61C: Great Ideas in Computer Architecture Caches Part 2 Instructors: Nicholas Weaver & Vladimir Stojanovic
處理器設計與實作 CPU LAB for Computer Organization 1. LAB 7: MIPS Set-Associative D- cache.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CSCI206 - Computer Organization & Programming
Main Memory Cache Architectures
COSC3330 Computer Architecture
CAM Content Addressable Memory
Multilevel Memories (Improving performance using alittle “cash”)
Basic Performance Parameters in Computer Architecture:
Consider a Direct Mapped Cache with 4 word blocks
CS61C : Machine Structures Lecture 6. 2
ECE 445 – Computer Organization
Lecture 21: Memory Hierarchy
Lecture 08: Memory Hierarchy Cache Performance
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Performance metrics for caches
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Cache - Optimization.
Cache Memory and Performance
10/18: Lecture Topics Using spatial locality
Presentation transcript:

Lecture Objectives: 1)Define set associative cache and fully associative cache. 2)Compare and contrast the performance of set associative caches, direct mapped caches, and fully associative caches. 3)Explain the operation of the LRU replacement scheme. 4)Explain the three C model for cache.

Direct-mapped Cache (4KB, data bytes/block) Index (1 byte) Dirty (1 bit) Valid (1 bit) Tag (20 bits) Data (16 bytes) 00000x x x x x E e e …………… FF000x CS2710 Computer Organization2 lw $t0, ($t1) # t1=0x (4-byte address) The last four bits correspond to the address of the data within the block. The Cache Index is extracted from Memory Block Address 0x The Tag is formed from the remaining digits of the Memory Block address. All memory address with address digits 0xnnnnn04m will map to index 04 of the cache.

Problems with direct-mapping Index (1 byte) Dirty (1 bit) Valid (1 bit) Tag (20 bits) Data (16 bytes) 00000x x x x x x E e e aa bb cc dd 1a 2b 3c 4d 5e 6f 7a 8b …………… FF000x CS2710 Computer Organization3 A given memory address maps into a specific cache block, based on the mapping formula. (e.g 0x and 0x ) This can result in: frequent misses due to competition for the same block unused blocks of cache if no memory accesses map to those blocks

Fully-Associative Cache (4KB, data bytes/block) LRU (1 byte) Dirty (1 bit) Valid (1 bit) Tag (28 bits) Data (16 bytes) 01010x E e e x aa bb cc dd 1a 2b 3c 4d 5e 6f 7a 8b 00000x x x …………… 00000x CS2710 Computer Organization4 A cache structure in which a block can be placed in any location in the cache. A given address does not map to any specific location in the cache. Motivation: Decreases cache misses. For a given address, the least recently used (LRU) block is allocated. LRU bits are used to maintain a “record” of the last-used block. lw $t0, ($t1) # t1=0x lw $t0, ($t2) # t2=0x

Hit-testing a Fully-Associative Cache LRU (1 byte) Dirty (1 bit) Valid (1 bit) Tag (28 bits) Data (16 bytes) 01010x E e e x aa bb cc dd 1a 2b 3c 4d 5e 6f 7a 8b 00000x x x …………… 00000x CS2710 Computer Organization5 Whenever a new memory access instruction occurs, the cache manager has to check every block that has a valid tag to see if the tags match (indicating a hit). After a very short period of time, every block is valid. If checking is done sequentially, this would take a significant amount of time. Parallel comparison circuitry can help, but such circuitry is expensive (256 comparators needed). lw $t0, ($t1) # t1=0x lw $t0, ($t2) # t2=0x sw $zero, ($t2) # t2=0x

Miss-handling with LRU in a Fully-Associative Cache LRU (1 byte) Dirty (1 bit) Valid (1 bit) Tag (28 bits) Data (16 bytes) 01110x E e e x aa bb cc dd 1a 2b 3c 4d 5e 6f 7a 8b FF010xbbbbbbb N N N N N N N N C0010xbbbbbbb N N N N N N N N 32110xbbbbbbb N N N N N N N N ………… … 18010xbbbbbbb N N N N N N N N CS2710 Computer Organization6 Once the fully-associative cache is full, misses will result in the need to replace existing blocks. This is called a Capacity Miss. lw $t0, ($t1) # t1=0x , oldest access lw $t0, ($t2) # t2=0x … lw $s0, ($t4) # t2=0x c, newest access The cache manager has to look for the Least Recently Used block (01), and replace that block’s content (writing back first if needed). Searching for the oldest block takes additional time.

Problems with Fully Associative Caches LRU (1 byte) Dirty (1 bit) Valid (1 bit) Tag (28 bits) Data (16 bytes) 01110x E e e x aa bb cc dd 1a 2b 3c 4d 5e 6f 7a 8b FF010xbbbbbbb N N N N N N N N C0010xbbbbbbb N N N N N N N N 32110xbbbbbbb N N N N N N N N ………… … 18010xbbbbbbb N N N N N N N N CS2710 Computer Organization7 Hit-testing requires comparison of every tag in the cache – too slow to do sequentially; expensive if parallel comparator circuitry is implemented. Thus, hit-testing is slower/more expensive than in a direct-mapped cache. Miss-handling takes additional time due to LRU determination.

Set-Associative Cache (8KB, 256 sets/ data bytes/block) Set Index (1 byte) Set LRU (1 bit) Dirty (1 bit) Valid (1 bit) Tag (20 bits) Data (16 bytes) x E e e x aa bb cc dd 1a 2b 3c 4d 5e 6f 7a 8b x x …………… … FF0000x x CS2710 Computer Organization8 A cache structure that has a fixed number of locations (e.g. two) where a given block can be placed lw $t0, ($t1) # t1=0x (4-byte address) lw $t0, ($t2) # t2=0x The last four bits correspond to the address of the data within the block. The Set Index is extracted from Memory Block Address 0x The Tag is formed from the remaining digits of the Memory Block address. All memory address with address digits 0xnnnnn00m will map to set 00 of the cache. The LRU bits are set to indicate that the 2 nd block was most recently used within the set.

Hit-testing in a Set-Associative Cache Set Index (1 byte) Set LRU (1 bit) Dirty (1 bit) Valid (1 bit) Tag (20 bits) Data (16 bytes) x E e e x aa bb cc dd 1a 2b 3c 4d 5e 6f 7a 8b x x …………… … FF0000x x CS2710 Computer Organization9 lw $t0, ($t1) # t1=0x (4-byte address) lw $t0, ($t2) # t2=0x sw $zero,($t3) # t3=0x c The Cache Set Index is computed from Memory Block Address 0x Both blocks in the set are occupied (Valid bit=1), so each of the tags is checked. A hit is detected with the first block, and the data is written at address c within the block. The LRU bits are also flipped to indicate that the first block was more recently used. (Dirty bit is also set in this case)

Miss-handling in a Set-Associative Cache Set Index (1 byte) Set LRU (1 bit) Dirty (1 bit) Valid (1 bit) Tag (20 bits) Data (16 bytes) x E e e x aa bb cc dd x x …………… … FF0000x x CS2710 Computer Organization10 lw $t0, ($t1) # t1=0x (4-byte address) lw $t0, ($t2) # t2=0x sw $zero, ($t1) # t2=0x c lw $t0, ($t2) # t2=0x The Cache Set Index is computed from Memory Block Address 0x Both blocks in the set are occupied (Valid bit=1), so each of the tags is checked. A miss is detected. The LRU indicates that the 2 nd block is older, so that block is replaced The LRU bits are again flipped to indicate that the 2 nd block was more recently used

The degree of associativity specifies how many blocks are in each set of a Set-Associative Cache CS2710 Computer Organization11 2-way set associativity = 2 blocks/set – The degree of associativity usually decreases the miss rate – A direct-mapped cache is really a 1-way set associate cache! – A significant gain is realized by going to 2-way associativity, but further increases in set size have little effect:

The Three Cs model A Cache model in which all cache misses are classified into one of three categories – Compulsory Misses : arising from cache blocks that are initially empty (aka cold-start miss) – Capacity Misses: in a fully-associative cache, due to the fact that the cache is full – Conflict Miss: in a set-associate or direct-mapped cache, due to the fact that a block is already occupied CS2710 Computer Organization12

Source of misses Compulsory misses not visible (0.006% ) – Only happens on cold-start, so relatively few CS2710 Computer Organization13

Basic Design challenges Design ChangeEffect on miss ratePossible negative performance impact Increase the cache size Decreases capacity misses May increase access time Increase Associativity Decreases miss rate due to conflict misses May increase access time Increase Block SizeDecreases miss rate for a wide range of block sizes due to spatial locality Increases miss penalty. Very large blocks could increase miss rate. CS2710 Computer Organization14