COMP3221 lec34-Cache-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 34: Cache Memory - II

Slides:



Advertisements
Similar presentations
Memory Hierarchy CS465 Lecture 11. D. Barbara Memory CS465 2 Control Datapath Memory Processor Input Output Big Picture: Where are We Now?  The five.
Advertisements

CS 430 – Computer Architecture
Modified from notes by Saeid Nooshabadi COMP3221: Microprocessors and Embedded Systems Lecture 25: Cache - I Lecturer:
CS 430 Computer Architecture 1 CS 430 – Computer Architecture Caches, Part II William J. Taffe using slides of David Patterson.
CS61C L32 Caches II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #20: Caches Andy.
Computer ArchitectureFall 2007 © November 14th, 2007 Majd F. Sakr CS-447– Computer Architecture.
CS61C L31 Caches II (1) Garcia, Fall 2006 © UCB GPUs >> CPUs?  Many are using graphics processing units on graphics cards for high-performance computing.
CS61C L22 Caches II (1) Garcia, Fall 2005 © UCB Lecturer PSOE, new dad Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
Memory Subsystem and Cache Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CS61C L23 Cache II (1) Chae, Summer 2008 © UCB Albert Chae, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #23 – Cache II.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 32 – Caches III Prem Kumar of Northwestern has created a quantum inverter.
Modified from notes by Saeid Nooshabadi
CS61C L22 Caches III (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #22: Caches Andy.
COMP3221 lec33-Cache-I.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 12: Cache Memory - I
CS61C L23 Caches I (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #23 Cache I.
Computer ArchitectureFall 2008 © October 27th, 2008 Majd F. Sakr CS-447– Computer Architecture.
COMP3221 lec36-vm-I.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 13: Virtual Memory - I
CS61C L21 Caches I (1) Garcia, Fall 2005 © UCB Lecturer PSOE, new dad Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 31 – Caches II In this week’s Science, IBM researchers describe a new class.
CS61C L33 Caches III (1) Garcia, Spring 2007 © UCB Future of movies is 3D?  Dreamworks says they may exclusively release movies in this format. It’s based.
COMP3221: Microprocessors and Embedded Systems Lecture 26: Cache - II Lecturer: Hui Wu Session 2, 2005 Modified from.
CS61C L32 Caches II (1) Garcia, 2005 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
CS61C L31 Caches I (1) Garcia 2005 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
CS61C L20 Caches I (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #20: Caches Andy Carle.
361 Computer Architecture Lecture 14: Cache Memory
CS61C L32 Caches II (1) Garcia, Spring 2007 © UCB Experts weigh in on Quantum CPU  Most “profoundly skeptical” of the demo. D-Wave has provided almost.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
11/3/2005Comp 120 Fall November 10 classes to go! Cache.
CS61C L24 Cache II (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
CS61C L18 Cache2 © UC Regents 1 CS61C - Machine Structures Lecture 18 - Caches, Part II November 1, 2000 David Patterson
CS 61C L21 Caches II (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS61C L32 Caches III (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan
COMP3221 lec37-vm-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 13: Virtual Memory - II
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Cache Memory CSE Slides from Dan Garcia, UCB.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
1 Computer Architecture Cache Memory. 2 Today is brought to you by cache What do we want? –Fast access to data from memory –Large size of memory –Acceptable.
CS61C L17 Cache1 © UC Regents 1 CS61C - Machine Structures Lecture 17 - Caches, Part I October 25, 2000 David Patterson
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
EEL-4713 Ann Gordon-Ross 1 EEL-4713 Computer Architecture Memory hierarchies.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Csci 211 Computer System Architecture – Review on Cache Memory Xiuzhen Cheng
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 – Caches III Google Glass may be one vision of the future of post-PC.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
CMSC 611: Advanced Computer Architecture
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
The Goal: illusion of large, fast, cheap memory
Memristor memory on its way (hopefully)
Cache Memory.
CMSC 611: Advanced Computer Architecture
ECE232: Hardware Organization and Design
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
CS-447– Computer Architecture Lecture 20 Cache Memories
Lecturer PSOE Dan Garcia
Some of the slides are adopted from David Patterson (UCB)
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Some of the slides are adopted from David Patterson (UCB)
Csci 211 Computer System Architecture – Review on Cache Memory
Presentation transcript:

COMP3221 lec34-Cache-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 34: Cache Memory - II October, 2003 Saeid Nooshabadi Some of the slides are adopted from David Patterson (UCB)

COMP3221 lec34-Cache-II.2 Saeid Nooshabadi Outline °Direct-Mapped Cache °Types of Cache Misses °A (long) detailed example °Peer - to - peer education example °Block Size Tradeoff °Types of Cache Misses

COMP3221 lec34-Cache-II.3 Saeid Nooshabadi Review: Memory Hierarchy Control Datapath Memory Processor Memory Fastest Slowest Smallest Biggest Highest Lowest Speed: Size: Cost: Registers Cache L1 SRAM Cache L2 SRAM RAM/ROM DRAM EEPROM Hard Disk

COMP3221 lec34-Cache-II.4 Saeid Nooshabadi Review: Direct-Mapped Cache (#1/2) °In a direct-mapped cache, each memory address is associated with one possible block within the cache Therefore, we only need to look in a single location in the cache for the data if it exists in the cache Block is the unit of transfer between cache and memory

COMP3221 lec34-Cache-II.5 Saeid Nooshabadi Review: Direct-Mapped Cache 1 Word Block °Block size = 4 bytes °Cache Location 0 can be occupied by data from: Memory location 0 - 3, 8 - B,... In general: any 4 memory locations that is 8*n (n=0,1,2,..) °Cache Location 1 can be occupied by data from: Memory location 4 - 7, C - F,... In general: any 4 memory locations that is 8*n + 4 (n=0,1,2,..) Memory Memory Address A B C D E F 8 Byte Direct Mapped Cache Cache Index 0 1

COMP3221 lec34-Cache-II.6 Saeid Nooshabadi Direct-Mapped with 1 woed Blocks Example Byte offset block index Address tag

COMP3221 lec34-Cache-II.7 Saeid Nooshabadi Review: Direct-Mapped Cache Terminology °All fields are read as unsigned integers. °Index: specifies the cache index (which “row” or “line” of the cache we should look in) °Offset: once we’ve found correct block, specifies which byte within the block we want °Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location

COMP3221 lec34-Cache-II.8 Saeid Nooshabadi Reading Material °Steve Furber: ARM System On-Chip; 2nd Ed, Addison-Wesley, 2000, ISBN: Chapter 10.

COMP3221 lec34-Cache-II.9 Saeid Nooshabadi Accessing Data in a Direct Mapped Cache (#1/3) °Ex.: 16KB of data, direct-mapped, 4 word blocks °Read 4 addresses 0x , 0x C, 0x , 0x °Only cache/memory level of hierarchy Address (hex) Value of Word Memory C a b c d C e f g h C i j k l...

COMP3221 lec34-Cache-II.10 Saeid Nooshabadi Accessing Data in a Direct Mapped Cache (#2/3) °4 Addresses: 0x , 0x C, 0x , 0x °4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields Tag Index Offset ttttttttttttttttt iiiiiiiiii oooo tag to check if have index to byte offset correct block select block within block

COMP3221 lec34-Cache-II.11 Saeid Nooshabadi Accessing Data in a Direct Mapped Cache (#3/3) °So lets go through accessing some data in this cache 16KB data, direct-mapped, 4 word blocks °Will see 3 types of events: °cache miss: nothing in cache in appropriate block, so fetch from memory °cache hit: cache block is valid and contains proper address, so read desired word °cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory

COMP3221 lec34-Cache-II.12 Saeid Nooshabadi Example Block 16 KB Direct Mapped Cache, 16B blocks °Valid bit: determines whether anything is stored in that row (when computer initially turned on, all entries are invalid)... Valid Tag 0x0-3 0x4-70x8-b0xc-f Index

COMP3221 lec34-Cache-II.13 Saeid Nooshabadi Read 0x = 0… ° Valid Tag 0x0-3 0x4-70x8-b0xc-f Index Tag field Index field Offset

COMP3221 lec34-Cache-II.14 Saeid Nooshabadi So we read block 1 ( )... Valid Tag 0x0-3 0x4-70x8-b0xc-f ° Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.15 Saeid Nooshabadi No valid data... Valid Tag 0x0-3 0x4-70x8-b0xc-f ° Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.16 Saeid Nooshabadi So load that data into cache, setting tag, valid... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.17 Saeid Nooshabadi Read from cache at offset, return word b ° Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.18 Saeid Nooshabadi Read 0x C = 0… Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.19 Saeid Nooshabadi Data valid, tag OK, so read offset return word d... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index

COMP3221 lec34-Cache-II.20 Saeid Nooshabadi Read 0x = 0… Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.21 Saeid Nooshabadi So read block 3... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.22 Saeid Nooshabadi No valid data... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.23 Saeid Nooshabadi Load that cache block, return word f... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.24 Saeid Nooshabadi Read 0x = 0… Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.25 Saeid Nooshabadi So read Cache Block 1, Data is Valid... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.26 Saeid Nooshabadi Cache Block 1 Tag does not match (0 != 2)... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.27 Saeid Nooshabadi Miss, so replace block 1 with new data & tag... Valid Tag 0x0-3 0x4-70x8-b0xc-f ijkl ° efgh Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.28 Saeid Nooshabadi And return word j... Valid Tag 0x0-3 0x4-70x8-b0xc-f ijkl ° efgh Index Tag fieldIndex fieldOffset

COMP3221 lec34-Cache-II.29 Saeid Nooshabadi Do an example yourself. What happens? °Chose from: Cache: Hit, Miss, Miss w. replace Values returned:a,b, c, d, e,..., k, l °Read address 0x ? °Read address 0x c ? Valid Tag 0x0-3 0x4-70x8-b0xc-f ijkl 1 0efgh Index Cache

COMP3221 lec34-Cache-II.30 Saeid Nooshabadi Answers °0x a hit Index = 3, Tag matches, Offset = 0, value = e °0x c a miss with replacment Index = 1, Tag mismatch, so replace from memory, Offset = 0xc, value = d °The Values read from Cache must equal memory values whether or not cached: 0x = e 0x c = d Address Value of Word Memory c a b c d c e f g h c i j k l...

COMP3221 lec34-Cache-II.31 Saeid Nooshabadi Block Size Tradeoff (#1/3) °Benefits of Larger Block Size Spatial Locality: if we access a given word, we’re likely to access other nearby words soon (Another Big Idea) Very applicable with Stored-Program Concept: if we execute a given instruction, it’s likely that we’ll execute the next few as well Works nicely in sequential array accesses too

COMP3221 lec34-Cache-II.32 Saeid Nooshabadi Block Size Tradeoff (#2/3) °Drawbacks of Larger Block Size Larger block size means larger miss penalty -on a miss, takes longer time to load a new block from next level If block size is too big relative to cache size, then there are too few blocks -Result: miss rate goes up °In general, minimize Average Access Time = Hit Time x Hit Rate + Miss Penalty x Miss Rate

COMP3221 lec34-Cache-II.33 Saeid Nooshabadi Block Size Tradeoff (#3/3) °Hit Time = time to find and retrieve data from current level cache °Miss Penalty = average time to retrieve data on a current level miss (includes the possibility of misses on successive levels of memory hierarchy) °Hit Rate = % of requests that are found in current level cache °Miss Rate = 1 - Hit Rate

COMP3221 lec34-Cache-II.34 Saeid Nooshabadi Extreme Example: One Big Block °Cache Size = 4 bytesBlock Size = 4 bytes Only ONE entry in the cache! °If item accessed, likely accessed again soon But unlikely will be accessed again immediately! °The next access will likely to be a miss again Continually loading data into the cache but discard data (force out) before use it again Nightmare for cache designer: Ping Pong Effect Cache Data Valid Bit B 0B 1B 3 Tag B 2

COMP3221 lec34-Cache-II.35 Saeid Nooshabadi Block Size Tradeoff Conclusions Miss Penalty Block Size Increased Miss Penalty & Miss Rate Average Access Time Block Size Exploits Spatial Locality Fewer blocks: compromises temporal locality Miss Rate Block Size

COMP3221 lec34-Cache-II.36 Saeid Nooshabadi Things to Remember °Cache Access involves 3 types of events: °cache miss: nothing in cache in appropriate block, so fetch from memory °cache hit: cache block is valid and contains proper address, so read desired word °cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory