Cache Memory and Performance

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Cache Performance 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
Recitation 7 Caching By yzhuang. Announcements Pick up your exam from ECE course hub ◦ Average is 43/60 ◦ Final Grade computation? See syllabus
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
How caches take advantage of Temporal locality
Cache Organization Topics Background Simple examples.
Memory Hierarchy 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
CacheLab 10/10/2011 By Gennady Pekhimenko. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings.
Lecture Objectives: 1)Define set associative cache and fully associative cache. 2)Compare and contrast the performance of set associative caches, direct.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
1 Cache Memories Andrew Case Slides adapted from Jinyang Li, Randy Bryant and Dave O’Hallaron.
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
1 Seoul National University Cache Memories. 2 Seoul National University Cache Memories Cache memory organization and operation Performance impact of caches.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Lecture 20 Last lecture: Today’s lecture: Types of memory
Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module.
Cache Organization 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
Vassar College 1 Jason Waterman, CMPU 224: Computer Organization, Fall 2015 Cache Memories CMPU 224: Computer Organization Nov 19 th Fall 2015.
Lecture 5 Cache Operation
Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.
CSCI206 - Computer Organization & Programming
CMSC 611: Advanced Computer Architecture
Cache Memories.
COSC3330 Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
COSC3330 Computer Architecture
CSE 351 Section 9 3/1/12.
The Memory System (Chapter 5)
Cache Memory and Performance
The Goal: illusion of large, fast, cheap memory
Cache Memories CSE 238/2038/2138: Systems Programming
Replacement Policy Replacement policy:
The Hardware/Software Interface CSE351 Winter 2013
Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Caches II CSE 351 Spring 2017 Instructor: Ruth Anderson
CS 105 Tour of the Black Holes of Computing
CACHE MEMORY.
ReCap Random-Access Memory (RAM) Nonvolatile Memory
Lecture 21: Memory Hierarchy
Cache Memories September 30, 2008
Cache Memories Topics Cache memory organization Direct mapped caches
Module IV Memory Organization.
Lecture 22: Cache Hierarchies, Memory
Module IV Memory Organization.
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
CMSC 611: Advanced Computer Architecture
Caches II CSE 351 Winter 2018 Instructor: Mark Wyse
CSE 351: The Hardware/Software Interface
Miss Rate versus Block Size
/ Computer Architecture and Design
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
Lecture 22: Cache Hierarchies, Memory
CS 3410, Spring 2014 Computer Science Cornell University
Cache Memories Professor Hugh C. Lauer CS-2011, Machine Organization and Assembly Language (Slides include copyright materials from Computer Systems:
Lecture 21: Memory Hierarchy
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Cache Memories.
Cache Memory and Performance
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson

Presentation transcript:

Cache Memory and Performance Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP) Randal E. Bryant and David R. O'Hallaron http://csapp.cs.cmu.edu/public/lectures.html The book is used explicitly in CS 2505 and CS 3214 and as a reference in CS 2506.

Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and L3), then in main memory. Typical system structure: CPU chip Register file ALU Cache memories System bus Memory bus Main memory Bus interface I/O bridge

General Cache Organization (S, E, B) E = 2e lines (blocks) per set 1 2e-1 set 1 line (block) 2 S = 2s sets 3 2s-1 1 2 B-1 tag v B = 2b bytes per cache block (the data) valid bit Cache size: C = S x E x B data bytes

Cache Defines View of DRAM The "geometry" of the cache is defined by: S = 2s the number of sets in the cache E = 2e the number of lines (blocks) in a set B = 2b the number of bytes in a line (block) These values define a related way to think about the organization of DRAM: DRAM consists of a sequence of blocks of B bytes. The bytes in a block (line) can be indexed by using b bits. DRAM consists of a sequence of groups of S blocks (lines). The blocks (lines) in a group can be indexed by using s bits. Each group contains SxB bytes, which can be indexed by using s + b bits.

Cache (8, 2, 4) and 256-Byte DRAM Cache size: C = S x E x B = 64 data bytes E = 21 blocks (lines) per set DRAM 1 5 6 7 4 3 2 1 2 3 4 5 6 7 . . . . . 252 253 254 255 S = 23 sets 1 2 tag v B = 22 bytes per cache block (the data) valid bit 3

Example of Cache View of DRAM Assume a cache has the following geometry: S = 23 = 8 the number of sets in the cache E = 21 = 2 the number of lines (blocks) in a set B = 22 = 4 the number of bytes in a line (block) Suppose that DRAM consists of 256 bytes, so we have 8-bit addresses. Then DRAM consists of: - 64 blocks, each holding 4 bytes - 8 groups, each holding 8 blocks address DRAM 00000000 00000001 00000010 00000011 00000100 00000101 00000110 00000111 . . . . . . . . . . . 00011100 00011101 00011110 00011111 group block 0 block 1 block 7

Example of Cache View of DRAM address DRAM Pick an address: 01110101 . . . . . . . . 01100000 byte 00 01110000 01110001 byte 01 01110010 byte 10 01110011 byte 11 01110100 01110101 01110110 01110111 01111000 01111001 01111010 01111011 block 000 01 101 011 group block byte block 100 a1a0 give the byte number and the offset of the byte in the block. block 101 group block 110

Example of Cache View of DRAM address DRAM Pick an address: 01110101 . . . . . . . . 01100000 byte 00 01110000 01110001 byte 01 01110010 byte 10 01110011 byte 11 01110100 01110101 01110110 01110111 01111000 01111001 01111010 01111011 block 000 a4a3a200* equals the offset of the block in the group. 01 101 011 group block byte block 100 a4a3a2 give the block number block 101 group block 110 * a4a3a200 = a4a3a2 x 22 = a4a3a2 x (size of a block)

Example of Cache View of DRAM address DRAM Pick an address: 01110101 . . . . . . . . 01100000 byte 00 01110000 01110001 byte 01 01110010 byte 10 01110011 byte 11 01110100 01110101 01110110 01110111 01111000 01111001 01111010 01111011 block 000 01 101 011 group block byte block 100 a7a6a5 give the group number block 101 group a7a6a500000* equals the offset of the group in the DRAM. 00000000 01100000 block 110 * Why?

The BIG Picture 011 101 01 01110100 byte 00 01110101 byte 01 01110110 DRAM Group 011 in DRAM 00000000 00100000 01000000 01100000 10000000 10100000 11000000 11100000 01100000 01100100 01101000 01101100 01110000 01110100 01111000 01111100 Block 101 in Group 011 01110100 byte 00 01110101 byte 01 01110110 byte 10 01110111 byte 11

Example of Cache View of DRAM Pick an address: 01110101 address DRAM . . . . . . . . 01110000 byte 00 01110001 byte 01 01110010 byte 10 01110011 byte 11 01110100 01110101 01110110 01110111 01111000 01111001 01111010 01111011 How does this address map into the cache? block 100 The DRAM block number determines the cache set used to store the block. Note this means that two DRAM blocks from the same DRAM group cannot map into the same cache set. block 101 So the address: 01110101 maps to set 101 in the cache. block 110 Each set in our cache can hold 2 blocks. This block could be stored at either location within the corresponding set.

Example of Cache View of DRAM DRAM block containing address: 01110101 address DRAM Maps to cache set: 01110101 . . . . . . . . 01110000 byte 00 01110001 byte 01 01110010 byte 10 01110011 byte 11 01110100 01110101 01110110 01110111 01111000 01111001 01111010 01111011 Cache block 100 1 5 6 7 4 3 2 block 101 or block 110 1 2 011 3 valid tag

Cache View of DRAM So, to generalize, suppose a cache has: S = 2s sets E = 2e blocks (lines) per set B = 2b bytes per block (line) And, suppose that DRAM uses N-bit addresses. then for any address: aN-1 … as+b as+b-1 … ab ab-1 … a0 Bits ab-1:a0 give the byte index within the block Bits ab+s-1:ab give the set index Bits aN-1:as+b become the tag for the data Note that these bits are only the same for blocks that are within the same DRAM group.

data begins at this offset Cache Read Locate set Check if any line in set has matching tag Yes + line valid: hit Locate data starting at offset Line Set 1 K 2s-1 2e-1 set index 1 Address of word: t bits s bits = K b bits = J tag 2 block offset data begins at this offset 4 K 3 v tag 1 J 2b-1

Example: Direct Mapped Cache (E = 1) Direct mapped: One line per set Assume: cache block size 8 bytes Address of int: v tag 1 2 3 4 5 6 7 t bits 0…01 100 v tag 1 2 3 4 5 6 7 find set S = 2s sets v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7

Example: Direct Mapped Cache (E = 1) Direct mapped: One line per set Assume: cache block size 8 bytes Address of int: valid? + matching tags  hit t bits 0…01 100 v tag tag 1 2 3 4 5 6 7 block offset

Example: Direct Mapped Cache (E = 1) Direct mapped: One line per set Assume: cache block size 8 bytes Address of int: valid? + matching tags  hit t bits 0…01 100 v tag 1 2 3 4 5 6 7 block offset int (4 Bytes) is here No match: old line (block) is evicted and replaced by requested block from DRAM

Direct-Mapped Cache Simulation b=1 M=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 Blocks/set Address trace (reads, one byte per read): 0 [00002], 1 [00012], 7 [01112], 8 [10002], 0 [00002] x xx x miss hit miss miss miss v Tag Block Set 0 1 M[0-1] 1 M[8-9] 1 M[0-1] ? Set 1 Set 2 Set 3 1 M[6-7]

E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume: cache block size 8 bytes Address of short int: t bits 0…01 100 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 find set v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7

E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume: cache block size 8 bytes Address of short int: t bits 0…01 100 compare both valid? + matching tags  hit v tag tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 block offset

E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume: cache block size 8 bytes Address of short int: t bits 0…01 100 compare both valid? + matching tags  hit v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 block offset short int (2 Bytes) is here No match: One line in set is selected for eviction and replacement Replacement policies: random, least recently used (LRU), …

2-Way Set Associative Cache Simulation b=1 M=16 byte addresses, B=2 bytes/block, S=2 sets, E=2 blocks/set Address trace (reads, one byte per read): 0 [00002], 1 [00012], 7 [01112], 8 [10002], 0 [00002] xx x x miss hit miss miss hit v Tag Block ? 1 00 M[0-1] Set 0 1 10 M[8-9] 1 01 M[6-7] Set 1

Cache Organization Types The "geometry" of the cache is defined by: S = 2s the number of sets in the cache E = 2e the number of lines (blocks) in a set B = 2b the number of bytes in a line (block) E = 1 (e = 0) direct-mapped cache only one possible location in cache for each DRAM block S > 1 E = K > 1 K-way associative cache K possible locations (in same cache set) for each DRAM block S = 1 (only one set) fully-associative cache E = # of cache blocks each DRAM block can be at any location in the cache

Searching a Set If we have an associative cache (K-way or fully), how do we determine if a given DRAM block occurs within a set? Compare the tag we’re trying to match to all of the tags for blocks in the relevant set at the same time! Then factor in the valid bits, also in parallel. And employ a MUX