Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.

Slides:



Advertisements
Similar presentations
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Advertisements

Memory Hierarchy CS465 Lecture 11. D. Barbara Memory CS465 2 Control Datapath Memory Processor Input Output Big Picture: Where are We Now?  The five.
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
CS 430 – Computer Architecture
CS 430 Computer Architecture 1 CS 430 – Computer Architecture Caches, Part II William J. Taffe using slides of David Patterson.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CS61C L32 Caches II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #20: Caches Andy.
Computer ArchitectureFall 2007 © November 14th, 2007 Majd F. Sakr CS-447– Computer Architecture.
CS61C L31 Caches II (1) Garcia, Fall 2006 © UCB GPUs >> CPUs?  Many are using graphics processing units on graphics cards for high-performance computing.
CS61C L22 Caches II (1) Garcia, Fall 2005 © UCB Lecturer PSOE, new dad Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
Memory Subsystem and Cache Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Modified from notes by Saeid Nooshabadi
CS61C L22 Caches III (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #22: Caches Andy.
CS61C L23 Caches I (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #23 Cache I.
CS61C L21 Caches I (1) Garcia, Fall 2005 © UCB Lecturer PSOE, new dad Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 31 – Caches II In this week’s Science, IBM researchers describe a new class.
CS61C L33 Caches III (1) Garcia, Spring 2007 © UCB Future of movies is 3D?  Dreamworks says they may exclusively release movies in this format. It’s based.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
COMP3221: Microprocessors and Embedded Systems Lecture 26: Cache - II Lecturer: Hui Wu Session 2, 2005 Modified from.
CS61C L32 Caches II (1) Garcia, 2005 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
CS 61C L35 Caches IV / VM I (1) Garcia, Fall 2004 © UCB Andy Carle inst.eecs.berkeley.edu/~cs61c-ta inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
361 Computer Architecture Lecture 14: Cache Memory
CS61C L32 Caches II (1) Garcia, Spring 2007 © UCB Experts weigh in on Quantum CPU  Most “profoundly skeptical” of the demo. D-Wave has provided almost.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
COMP3221 lec34-Cache-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 34: Cache Memory - II
CS61C L24 Cache II (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
CS 61C L21 Caches II (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
CS61C L32 Caches III (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
CMPE 421 Parallel Computer Architecture
CS1104: Computer Organisation School of Computing National University of Singapore.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
1 Computer Architecture Cache Memory. 2 Today is brought to you by cache What do we want? –Fast access to data from memory –Large size of memory –Acceptable.
CS61C L17 Cache1 © UC Regents 1 CS61C - Machine Structures Lecture 17 - Caches, Part I October 25, 2000 David Patterson
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
EEL-4713 Ann Gordon-Ross 1 EEL-4713 Computer Architecture Memory hierarchies.
Csci 211 Computer System Architecture – Review on Cache Memory Xiuzhen Cheng
CS2100 Computer Organisation Cache II (AY2015/6) Semester 1.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 – Caches III Google Glass may be one vision of the future of post-PC.
Cps 220 Cache. 1 ©GK Fall 1998 CPS220 Computer System Organization Lecture 17: The Cache Alvin R. Lebeck Fall 1999.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
CMSC 611: Advanced Computer Architecture
Lecture 12 Virtual Memory.
CS61C : Machine Structures Lecture 6. 2
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
CS61C : Machine Structures Lecture 6. 2
Cache Memory.
ECE232: Hardware Organization and Design
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
CS-447– Computer Architecture Lecture 20 Cache Memories
Some of the slides are adopted from David Patterson (UCB)
Cache - Optimization.
Cache Memory Rabi Mahapatra
10/18: Lecture Topics Using spatial locality
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Presentation transcript:

cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html

cs 61C L17 Cache.2 Patterson Spring 99 ©UCB Review 1/3: Memory Hierarchy Pyramid Levels in memory hierarchy Central Processor Unit (CPU) Size of memory at each level Level 1 Level 2 Level n Increasing Distance from CPU, Decreasing cost / MB “Upper” “Lower” Level 3...

cs 61C L17 Cache.3 Patterson Spring 99 ©UCB Review 2/3: Hierarchy Analogy: Library °Term Paper: Every time need a book Leave some books on desk after fetching them Only go to shelves when need a new book When go to shelves, bring back related books in case you need them; sometimes you’ll need to return books not used recently to make space for new books on desk Return to desk to work When done, replace books on shelves, carrying as many as you can per trip °Illusion: whole library on your desktop

cs 61C L17 Cache.4 Patterson Spring 99 ©UCB Review 3/3 °Principle of Locality (locality in time, locality in space) + Hierarchy of Memories of different speed, cost; exploit locality to improve cost-performance °Hierarchy Terms: Hit, Miss, Hit Time, Miss Penalty, Hit Rate, Miss Rate, Block, Upper level memory, Lower level memory °Review of Big Ideas (so far): Abstraction, Stored Program, Pliable Data, compilation vs. interpretation, Performance via Parallelism, Performance Pitfalls Applies to Processor, Memory, and I/O

cs 61C L17 Cache.5 Patterson Spring 99 ©UCB Outline °Review °Direct Mapped Cache °Example °Administrivia, Midterm results, Some survey results °Block Size to Reduce Misses °Associative Cache to Reduce Misses °Conclusion

cs 61C L17 Cache.6 Patterson Spring 99 ©UCB Cache: 1st Level of Memory Hierarchy °How do you know if something is in the cache? °How find it if it is in the cache? °In a direct mapped cache, each memory address is associated with one possible block (also called “line”) within the cache Therefore, we only need to look in a single location in the cache for the data if it exists in the cache

cs 61C L17 Cache.7 Patterson Spring 99 ©UCB Simplest Cache: Direct Mapped Memory 4 Byte Direct Mapped Cache Memory Address A B C D E F Cache Index °Cache Location 0 can be occupied by data from: Memory location 0, 4, 8,... In general: any memory location whose 2 rightmost bits of the address are 0s Address & 0x3= Cache index

cs 61C L17 Cache.8 Patterson Spring 99 ©UCB Direct Mapped Questions °Which memory block is in the cache? °Also, What if block size is > 1 byte? °Divide Memory Address into 3 portions: tag, index, and byte offset within block °The index tells where in the cache to look, the offset tells which byte in block is start of the desired data, and the tag tells if the data in the cache corresponds to the memory address being looking for ttttttttttttttttt iiiiiiiiii oooo

cs 61C L17 Cache.9 Patterson Spring 99 ©UCB Size of Tag, Index, Offset fields °If 32-bit Memory Address Cache size = 2 N bytes Block (line) size = 2 M bytes °Then The leftmost (32 - N) bits are the Cache Tag The rightmost M bits are the Byte Offset Remaining bits are the Cache Index ttttttttttttttttt iiiiiiiiii oooo 31NN-1MM-10

cs 61C L17 Cache.10 Patterson Spring 99 ©UCB Accessing data in a direct mapped cache °So lets go through accessing some data in a direct mapped, 16KB cache 16 byte blocks x 1024 cache blocks °4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields ° ° ° ° TagIndexOffset

cs 61C L17 Cache.11 Patterson Spring 99 ©UCB 16 KB Direct Mapped Cache, 16B blocks... Valid Tag 0x0-3 0x4-70x8-b0xc-f °Valid bit  no address match when power on cache (Not valid  no match even if tag = addr) Index

cs 61C L17 Cache.12 Patterson Spring 99 ©UCB Read Valid Tag 0x0-3 0x4-70x8-b0xc-f ° Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.13 Patterson Spring 99 ©UCB So we read block 1 ( )... Valid Tag 0x0-3 0x4-70x8-b0xc-f ° Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.14 Patterson Spring 99 ©UCB No valid data... Valid Tag 0x0-3 0x4-70x8-b0xc-f ° Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.15 Patterson Spring 99 ©UCB So load that data into cache, setting tag, valid... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.16 Patterson Spring 99 ©UCB Read from cache at offset, return word b ° Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.17 Patterson Spring 99 ©UCB Read Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.18 Patterson Spring 99 ©UCB Data valid, tag OK, so read offset return word d... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index

cs 61C L17 Cache.19 Patterson Spring 99 ©UCB Read Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.20 Patterson Spring 99 ©UCB So read block 3... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.21 Patterson Spring 99 ©UCB No valid data... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.22 Patterson Spring 99 ©UCB Load that cache block, return word f... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.23 Patterson Spring 99 ©UCB Read Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.24 Patterson Spring 99 ©UCB So read Cache Block 1, Data is Valid... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.25 Patterson Spring 99 ©UCB Cache Block 1 Tag does not match (0 != 2)... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.26 Patterson Spring 99 ©UCB Miss, so replace block 1 with new data & tag... Valid Tag 0x0-3 0x4-70x8-b0xc-f ijkl ° efgh Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.27 Patterson Spring 99 ©UCB And return word j... Valid Tag 0x0-3 0x4-70x8-b0xc-f ijkl ° efgh Index Tag fieldIndex fieldOffset

cs 61C L17 Cache.28 Patterson Spring 99 ©UCB Administrivia °Project 5: Due 4/14: design and implement a cache (in software) and plug into instruction simulator °6th homework: Due 4/7 7PM Exercises 7.7, 7.8, 7.20, 7.22, 7.32 °Readings 7.2, 7.3, 7.4 °Midterm results: did well median ~ 40/50; more details Friday

cs 61C L17 Cache.29 Patterson Spring 99 ©UCB Lecture Topic Understanding Survey

cs 61C L17 Cache.30 Patterson Spring 99 ©UCB Survey

cs 61C L17 Cache.31 Patterson Spring 99 ©UCB Block Size Tradeoff °In general, larger block size take advantage of spatial locality BUT: Larger block size means larger miss penalty: -Takes longer time to fill up the block If block size is too big relative to cache size, miss rate will go up -Too few cache blocks °In general, minimize Average Access Time: = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate

cs 61C L17 Cache.32 Patterson Spring 99 ©UCB Extreme Example: single big block(!) °Cache Size = 4 bytesBlock Size = 4 bytes Only ONE entry in the cache! °If item accessed, likely accessed again soon But unlikely will be accessed again immediately! °The next access will likely to be a miss again Continually loading data into the cache but discard data (force out) before use it again Nightmare for cache designer: Ping Pong Effect Cache Data Valid Bit B 0B 1B 3 Cache Tag B 2

cs 61C L17 Cache.33 Patterson Spring 99 ©UCB Block Size Tradeoff Miss Penalty Block Size Average Access Time Increased Miss Penalty & Miss Rate Block Size Miss Rate Exploits Spatial Locality Fewer blocks: compromises temporal locality Block Size

cs 61C L17 Cache.34 Patterson Spring 99 ©UCB One Reason for Misses, and Solutions °“Conflict Misses” are misses caused by different memory locations mapped to the same cache index accessed almost simultaneously °Solution 1: Make the cache size bigger °Solution 2: Multiple entries for the same Cache Index?

cs 61C L17 Cache.35 Patterson Spring 99 ©UCB Extreme Example: Fully Associative °Fully Associative Cache (e.g., 32 B block) Forget about the Cache Index Compare all Cache Tags in parallel °By definition: Conflict Misses = 0 for a fully associative cache Byte Offset : Cache Data B : Cache Tag (27 bits long) Valid : B 1B 31 : Cache Tag = = = = = :

cs 61C L17 Cache.36 Patterson Spring 99 ©UCB Compromise: N-way Set Associative Cache °N-way set associative: N entries for each Cache Index N direct mapped caches operate in parallel Select the one that gets a hit °Example: 2-way set associative cache Cache Index selects a “set” from the cache The 2 tags in set are compared in parallel Data is selected based on the tag result (which matched the address)

cs 61C L17 Cache.37 Patterson Spring 99 ©UCB Compromise: 2-way Set Associative Cache Cache Data Block 0 Cache Tag Valid ::: Cache Data Block 0 Cache Tag Valid ::: Cache Index Select Cache Block Compare Addr Tag OR Hit Compare Addr Tag (Also called “Multiplexor” or “Mux”)

cs 61C L17 Cache.38 Patterson Spring 99 ©UCB Block Replacement Policy °N-way Set Associative or Fully Associative have choice where to place a block, (which block to replace) Of course, if there is an invalid block, use it °Whenever get a cache hit, record the cache block that was touched °When need to evict a cache block, choose one which hasn't been touched recently: “Least Recently Used” (LRU) Past is prologue: history suggests it is least likely of the choices to be used soon Flip side of temporal locality

cs 61C L17 Cache.39 Patterson Spring 99 ©UCB Block Replacement Policy: Random °Sometimes hard to keep track of LRU if lots choices °Second Choice Policy: pick one at random and replace that block °Advantages Very simple to implement Predictable behavior No worst case behavior

cs 61C L17 Cache.40 Patterson Spring 99 ©UCB “And in Conclusion...” °Tag, index, offset to find matching data, support larger blocks, reduce misses °Where in cache? Direct Mapped Cache Conflict Misses if memory addresses compete °Fully Associative to let memory data be in any block: no Conflict Misses °Set Associative: Compromise, simpler hardware than Fully Associative, fewer misses than Direct Mapped °LRU: Use history to predict replacement °Next: Cache Improvement, Virtual Memory