Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.

Similar presentations


Presentation on theme: "1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest."— Presentation transcript:

1 1 Recap: Memory Hierarchy

2 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest Smallest Biggest Highest Lowest Speed: Size: Cost: Control Datapath Secondary Storage (Disk) Processor Registers L2 Off-Chip Cache Main Memory (DRAM) L1 On-Chip Cache

3 3 Why Hierarchy Works The principle of locality –Programs access a relatively small portion of the address space at any instant of time. –Temporal locality: recently accessed instruction/data is likely to be used again –Spatial locality: instruction/data near recently accessed /instruction data is likely to be used soon Result: the illusion of large, fast memory Address Space 02 n - 1 Probability of reference

4 4 Example of Locality int A[100], B[100], C[100], D; for (i=0; i<100; i++) { C[i] = A[i] * B[i] + D; } A[0]A[1]A[2]A[3]A[5]A[6]A[7]A[4] A[96]A[97]A[98]A[99]B[1]B[2]B[3]B[0] B[5]B[6]B[7]B[4]B[9]B[10]B[11]B[8] C[0]C[1]C[2]C[3]C[5]C[6]C[7]C[4] C[96]C[97]C[98]C[99]D

5 5 Four Key Cache Questions: 1.Where can block be placed in cache? (block placement) 2.How can block be found in cache? …using a tag (block identification) 3.Which block should be replaced on a miss? (block replacement) 4.What happens on a write? (write strategy)

6 6 Q1: Block Placement Where can block be placed in cache? –In one predetermined place - direct-mapped Use fragment of address to calculate block location in cache Compare cache block with tag to test if block present –Anywhere in cache - fully associative Compare tag to every block in cache –In a limited set of places - set-associative Use address fragment to calculate set Place in any block in the set Compare tag to every block in set Hybrid of direct mapped and fully associative

7 7 Direct Mapped Block Placement *4*0*8*C Cache C C C C C Memory address maps to block: location = (block address MOD # blocks in cache)

8 8 0xF xAA 0x0F x55 Direct Mapping x0F x xAA 0xF Tag Index Data Direct mapping: A memory value can only be placed at a single corresponding location in the cache

9 9 Fully Associative Block Placement C C C C C Cache Memory arbitrary block mapping location = any

10 10 0xF xAA 0x0F x55 Fully Associative Mapping 0x0F 0x55 0xAA 0xF0 Tag Data xF xAA 0x0F x55 0x0F 0x55 0xAA 0xF Fully-associative mapping: A memory value can be anywhere in the cache

11 11 Set-Associative Block Placement C C C C *4*0*8*C C Cache Memory *0*4*8*C Set 0 Set 1 Set 2 Set 3 address maps to set: location = (block address MOD # sets in cache) (arbitrary location in set)

12 12 0xF xAA 0x0F x55 Set Associative Mapping (2- Way) 0 1 0x0F 0x55 0xAA 0xF0 Tag Index Data Way Way 1 Way 0 Set-associative mapping: A memory value can be placed in any of a set of corresponding locations in the cache

13 13 Q2: Block Identification Every cache block has an address tag and index that identifies its location in memory Hit when tag and index of desired word match (comparison by hardware) Q: What happens when a cache block is empty? A: Mark this condition with a valid bit 0x 00001C0 0xff083c2d 1 Tag/indexValidData

14 14 Direct-Mapped Cache Design CACHE SRAM ADDR DATA[31:0] 0x 00001C0 0xff083c2d 0 1 0x x x x x23F02100x TagVData = 030x DATA[58:32]DATA[59] DATAHIT ADDRESS =1 Tag Cache Index Byte Offset

15 15 Set Associative Cache Design Key idea: –Divide cache into sets –Allow block anywhere in a set Advantages: –Better hit rate Disadvantage: –More tag bits –More hardware –Higher access time A Four-Way Set-Associative Cache (Fig. 7.17)

16 16 tag data = Fully Associative Cache Design Key idea: set size of one block –1 comparator required for each block –No address decoding –Practical only for small caches due to hardware demands tag data = = = = = tag tag tag tag data data data data tag in data out

17 17 Cache Replacement Policy Random –Replace a randomly chosen line LRU (Least Recently Used) –Replace the least recently used line

18 18 LRU Policy ABCD MRU LRULRU+1MRU-1 Access C CABD Access D DCAB Access E EDCA Access C CEDA Access G GCED MISS, replacement needed MISS, replacement needed

19 19 Cache Write Strategies Need to keep cache consistent with the main memory –Reads are easy - require no modification –Writes- when does the update occur 1 Write Though: Data is written to both the cache block and to a block of main memory.  The lower level always has the most updated data; an important feature for I/O and multiprocessing.  Easier to implement than write back. 2 Write back: Data is written or updated only to the cache block. The modified or dirty cache block is written to main memory when it’s being replaced from cache.  Writes occur at the speed of cache  Uses less memory bandwidth than write through.

20 20 0x1234 Write-through Policy 0x1234 Processor Cache Memory 0x1234 0x5678

21 21 0x1234 Write-back Policy 0x1234 Processor Cache Memory 0x1234 0x5678 0x9ABC

22 22 Write Buffer for Write Through A Write Buffer is needed between the Cache and Memory –Processor: writes data into the cache and the write buffer –Memory controller: write contents of the buffer to memory Write buffer is just a FIFO: –Typical number of entries: 4 –Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle Processor Cache Write Buffer DRAM

23 23 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for both instructions and data. Separate instruction/data Level 1 caches (Harvard Memory Architecture): The level 1 (L 1 ) cache is split into two caches, one for instructions (instruction cache, L 1 I-cache) and the other for data (data cache, L 1 D-cache). Control Datapath Processor Registers Unified Level One Cache L 1 Control Datapath Processor Registers L 1 I-cache L 1 D-cache Unified Level 1 Cache (Princeton Memory Architecture) Separate Level 1 Caches (Harvard Memory Architecture)


Download ppt "1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest."

Similar presentations


Ads by Google