Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Hadi AL Saadi Faculty Of Information Technology

Similar presentations


Presentation on theme: "Dr. Hadi AL Saadi Faculty Of Information Technology"— Presentation transcript:

1 Computer Architecture Introduction to Caches Introduction to Caches Memory
Dr. Hadi AL Saadi Faculty Of Information Technology University of Petra 4/14/2019 Dr. Hadi Hassan Computer architecture

2 Dr. Hadi Hassan Computer architecture
Random Access Memory Large arrays of storage cells Volatile memory Hold the stored data as long as it is powered on Random Access Access time is practically the same to any data on a RAM chip Output Enable (OE) control signal Specifies read operation Write Enable (WE) control signal Specifies write operation 2n × m RAM chip: n-bit address and m-bit data RAM Address Data OE WE n m 4/14/2019 Dr. Hadi Hassan Computer architecture

3 Dr. Hadi Hassan Computer architecture
Data Transfer: The Big Picture Memory Computer Processor Control Devices Input Output Datapath + Registers Store to memory Load from memory Registers are in the datapath of the processor. If operands are in memory we have to load them to processor (registers), operate on them, and store them back to memory 4/14/2019 Dr. Hadi Hassan Computer architecture

4 Dr. Hadi Hassan Computer architecture
Memory Technology Today: DRAM DDR SDRAM Double Data Rate – Synchronous Dynamic RAM The dominant memory technology in PC market Delivers memory on the positive and negative edge of a clock (double rate) Generations: DDR (MemClkFreq x 2(double rate) x 8 words) DDR2 (MemClkFreq x 2(multiplier) x 2 x 8 words) DDR3 (MemClkFreq x 4(multiplier) x 2 x 8 words) DDR4 (due 2014) 4/14/2019 Dr. Hadi Hassan Computer architecture

5 Dr. Hadi Hassan Computer architecture
DRAM Capacity Growth Unprecedented growth in density, but we still have a problem 4/14/2019 Dr. Hadi Hassan Computer architecture

6 Static RAM Storage Cell
Static RAM (SRAM): fast but expensive RAM 6-Transistor cell Typically used for caches Provides Fast access latency of 0.5 – 5 ns Cell Implementation: Cross-coupled inverters store bit Two pass transistors Row decoder selects the word line Pass transistors enable the cell to be read and written Typical SRAM Cell 4/14/2019 Dr. Hadi Hassan Computer architecture

7 Dynamic RAM Storage Cell
Dynamic RAM (DRAM): slow ( access latency of 50-70ns) , cheap, and dense memory Typical choice for main memory Cell Implementation: 1-Transistor cell (pass transistor) Trench capacitor (stores bit) Bit is stored as a charge on capacitor Must be refreshed periodically Because of leakage of charge from tiny capacitor Refreshing for all memory rows Reading each row and writing it back to restore the charge 4/14/2019 Dr. Hadi Hassan Computer architecture

8 Dr. Hadi Hassan Computer architecture
Slow Memory Technologies: Magnetic Disk Typical high-end hard disk: Average Latency: ms Capacity: GB 4/14/2019 Dr. Hadi Hassan Computer architecture

9 Dr. Hadi Hassan Computer architecture
Quality vs Quantity Memory (DRAM) Devices Input Output Processor Control Datapath Registers (Harddisk) Capacity Latency Cost/GB Register 100s Bytes 20 ps $$$$ SRAM 100s KB 0.5-5 ns $$$ DRAM 100s MB 50-70 ns $ Hard Disk 100s GB 5-20 ms Cents Ideal 1 GB 1 ns Cheap 4/14/2019 Dr. Hadi Hassan Computer architecture

10 Dr. Hadi Hassan Computer architecture
Best of Both Worlds What we want: A BIG and FAST memory Memory system should perform like 1GB of SRAM (1ns access time) but cost like 1GB of slow memory Key concept: Use a hierarchy of memory technologies: Small but fast memory near CPU Large but slow memory farther away from CPU 4/14/2019 Dr. Hadi Hassan Computer architecture

11 The Need for Cache Memory
Widening speed gap between CPU and main memory Processor operation takes less than 1 ns Main memory requires about 100 ns to access Each instruction involves at least one memory access One memory access to fetch the instruction A second memory access for load and store instructions Memory bandwidth limits the instruction execution rate Cache memory can help bridge the CPU-memory gap Cache memory is small in size but fast 4/14/2019 Dr. Hadi Hassan Computer architecture

12 Typical Memory Hierarchy
Registers are at the top of the hierarchy Typical size < 1 KB Access time < 0.5 ns Level 1 Cache (8 – 64 KB) Access time: 1 ns L2 Cache (512KB – 8MB) Access time: 3 – 10 ns Main Memory (4 – 16 GB) Access time: 50 – 100 ns Disk Storage (> 200 GB) Access time: 5 – 10 ms Microprocessor Registers L1 Cache L2 Cache Main Memory Magnetic or Flash Disk Memory Bus I/O Bus Faster Bigger 4/14/2019 Dr. Hadi Hassan Computer architecture

13 Dr. Hadi Hassan Computer architecture
Cache: The Basic Idea Keep the frequently and recently used data in smaller but faster memory Refer to bigger and slower memory: Only when you cannot find data/instruction in the faster memory Why does it work? Principle of Locality Program accesses only a small portion of the memory address space within a small time interval 4/14/2019 Dr. Hadi Hassan Computer architecture

14 Principle of Locality of Reference
Programs access small portion of their address space At any time, only a small set of instructions & data is needed Temporal Locality (in time) If an item is accessed, probably it will be accessed again soon Same loop instructions are fetched each iteration Same procedure may be called and executed many times Spatial Locality (in space) Tendency to access contiguous instructions/data in memory Sequential execution of Instructions Traversing arrays element by element Loop:lw $t0, 0($s1) add $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $0, Loop sub $sp, $sp, 16 sw $ra, 0($sp) sw $s0, 4($sp) sw $a0, 8($sp) sw $a1, 12($sp) 4/14/2019 Dr. Hadi Hassan Computer architecture

15 Four Basic Questions on Caches
Q1: Where can a block be placed in a cache? Block placement Direct Mapped, Set Associative, Fully Associative Q2: How is a block found in a cache? Block identification Block address, tag, index Q3: Which block should be replaced on a miss? Block replacement FIFO, Random, LRU Q4: What happens on a write? Write strategy Write Back or Write Through (with Write Buffer) 4/14/2019 Dr. Hadi Hassan Computer architecture

16 BASICS OF MEMORY HIERARCHIES
8-byte blocks Address Cache memory is subdivided into cache lines (Blocks) Cache Lines / Blocks: The smallest unit of memory that can be transferred between the main memory and the cache. Block size is typically one or more words e.g.: 16-byte block  4-word block 32-byte block  8-word block if the Block size= 2N bytes byte offset within a block=log22N = N Word0 Block0 Word1 Word2 Block Block1 Word Byte Word3 Cache Memory 4/14/2019 Dr. Hadi Hassan Computer architecture

17 BASICS OF MEMORY HIERARCHIES
Where to place blocks in the cache? Direct-mapped cache One block per set The block is placed at the same one location Number of Blocks in Cache = cache size/Block size Cache block #(Index) =(block address) modulo (# of blocks in the cache) Fully-associative cache Has one set A block can go any where in the cache Set-associative cache A set is a group of blocks A block is placed within a set n blocks in a set  n-way set associative Cache Set # = Block Address MOD #Sets 4/14/2019 Dr. Hadi Hassan Computer architecture

18 Dr. Hadi Hassan Computer architecture
Direct Mapped Cache: Cache Index For example, on the right is a 16-byte main memory and a 4-byte cache (four 1-byte blocks). Memory locations 0, 4, 8 and 12 all map to cache block 0. Addresses 1, 5, 9 and 13 map to cache block 1, etc. How can we compute this mapping? For instance, with the four-block cache here, address 14 would map to cache block 2. Cache block Index=(block address) Mod (# blocks in the cache) 14 mod 4 = 2 With our four-byte cache we would inspect the two least significant bits of our memory addresses Again, you can see that address 14 (1110 in binary) maps to cache block 2 (10 in binary). Taking the least k bits of a binary value is the same as computing that value mod 2k. Memory Address 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Index 1 2 3 Cache Memory Main Memory 4/14/2019 Dr. Hadi Hassan Computer architecture

19 Dr. Hadi Hassan Computer architecture
Example: Given the following information's Cache size = 32 KB Block size = 32 Bytes Byte address = 8160 Find the cache block numbers ( index ) of the above byte Cache block #(Index) =(block address) modulo (# of blocks in the cache) # Blocks in Cache = cache size/Block size=32*1024/32=1024 Blocks Block address = Byte address/#bytes = 8160/32 = 255 Cache block # =255 mod 1024=255 4/14/2019 Dr. Hadi Hassan Computer architecture

20 BASICS OF MEMORY HIERARCHIES
How to find a block in the cache? We need to add tags to the cache, which supply the rest of the address bits to let us distinguish between different memory locations that map to the same cache block. Tag : part of the memory address. Checked to know if the block contains the required information Tag number: Tag = Floor[Block number / Number of Cache Blocks] 00 01 10 11 Index Tag Data = 0000 = 1101 = 0110 = 0111 Main memory address in cache block 4/14/2019 Dr. Hadi Hassan Computer architecture

21 Direct Mapped Cache: Mapping
Memory Address 31 N-1 N Block Number Offset Cache Block size = 2N bytes N+M-1 Memory Address 31 N-1 N Tag Offset Index Cache Block size = 2N bytes Number of Blocks in cache = 2M Offset = N bits Index = M bits Tag = 32 – (N + M) bits 4/14/2019 Dr. Hadi Hassan Computer architecture

22 Dr. Hadi Hassan Computer architecture
Example: Show cache addressing for a byte-addressable memory with 32-bit addresses. Cache line(Block) W = 16 Byte. Cache size L = 4096 lines(Blocks) (64 KB). Solution Byte offset in line is log216 = 4 bit. Cache line index is log24096 = 12 bit. This leaves 32 – 12 – 4 = 16 bit for the tag. 4/14/2019 Dr. Hadi Hassan Computer architecture

23 Mapping an Address to a Cache Block
Example Consider a direct-mapped cache with 256 blocks Block size = 16 bytes Compute tag, index, and byte offset of address: 0x01FFF8AC Solution 32-bit address is divided into: 4-bit byte offset field, because block size = 24 = 16 bytes 8-bit cache index, because there are 28 = 256 blocks in cache 20-bit tag field Byte offset = 0xC = 12 (least significant 4 bits of address) Cache index = 0x8A = 138 (next lower 8 bits of address) Tag = 0x01FFF (upper 20 bits of address) Tag Index offset 4 8 20 Block Address 4/14/2019 Dr. Hadi Hassan Computer architecture

24 Dr. Hadi Hassan Computer architecture
The Valid Bit When started, the cache is empty and does not contain valid data. We should account for this by adding a valid bit for each cache block. When the system is initialized, all the valid bits are set to 0. When data is loaded into a particular cache block, the corresponding valid bit is set to 1 So the cache contains more than just copies of the data in memory; it also has bits to help us find data within the cache and verify its validity. Valid Tag Data Index 00 01 10 11 Index Tag Data = 0000 Invalid = 0111 Main memory address in cache block 1 Valid Bit 00 01 10 11 Cache Structure 4/14/2019 Dr. Hadi Hassan Computer architecture

25 What happens on a cache hit
When the CPU tries to read from memory, the address will be sent to a cache controller. The lowest k bits of the address will index a block in the cache. If the block is valid and the tag matches the upper (m - k) bits of the m-bit address, then that data will be sent to the CPU. Here is a diagram of a 32-bit memory address and a 210-byte cache. Cache hit : ( Valid[index] = TRUE ) AND ( Tag[ index ] = Tag[ memory address ] ) 4/14/2019 Dr. Hadi Hassan Computer architecture

26 Example on Cache Placement and Misses
Consider a small direct-mapped cache with 32 blocks Cache is initially empty all Valid bit=0 , Block size = 16 bytes The following memory addresses (in decimal) are referenced: 1000, 1004, 1008, 2548, 2552, 2556. Map addresses to cache blocks and indicate whether hit or miss Solution: 1000 = 0x3E8= Tag=0x1 cache index = 0x1E valid =0 Miss (first access) 1004 = 0x3EC= Tag=0x1 cache index = 0x1E valid=1 Hit 1008 = 0x3F0= Tag=0x1 cache index = 0x1F valid=0 Miss (first access) 2548 = 0x9F4= Tag=0x4 cache index = 0x1F valid=1Miss (different tag) 2552 = 0x9F8= Tag=0x4 cache index = 0x1F valid=1 Hit 2556 = 0x9FC= Tag=0x4 cache index = 0x1F valid =1 Hit Tag Index offset 4 5 23 4/14/2019 Dr. Hadi Hassan Computer architecture

27 Dr. Hadi Hassan Computer architecture
Example : Cache Miss; Empty Block 4 31 9 Cache Index : Cache Tag 0x0002fe 0x00 0x000050 Valid Bit 1 2 3 Cache Data Byte 0 Byte 1 Byte 31 Byte 32 Byte 33 Byte 63 Byte 992 Byte 1023 Byte Select = Cache Miss 0xxxxxxx 0x004440 4/14/2019 Dr. Hadi Hassan Computer architecture

28 Dr. Hadi Hassan Computer architecture
Example 1b: … Read in Data 4 31 9 Cache Index : Cache Tag 0x0002fe 0x00 0x000050 Valid Bit 1 2 3 Cache Data Byte 32 Byte 33 Byte 63 Byte 992 Byte 1023 Byte Select = 0x004440 New Block of data 4/14/2019 Dr. Hadi Hassan Computer architecture

29 Dr. Hadi Hassan Computer architecture
Example 1c: Cache Hit 4 31 9 Cache Index : Cache Tag 0x000050 0x01 Valid Bit 1 2 3 Cache Data Byte 0 Byte 1 Byte 31 Byte 32 Byte 33 Byte 63 Byte 992 Byte 1023 Byte Select 0x08 = Cache Hit 0x0002fe 0x004440 4/14/2019 Dr. Hadi Hassan Computer architecture

30 Dr. Hadi Hassan Computer architecture
Example 1d: Cache Miss; Incorrect Block 4 31 9 Cache Index : Cache Tag 0x002450 0x02 0x000050 Valid Bit 1 2 3 Cache Data Byte 0 Byte 1 Byte 31 Byte 32 Byte 33 Byte 63 Byte 992 Byte 1023 Byte Select 0x04 = Cache Miss 0x0002fe 0x004440 4/14/2019 Dr. Hadi Hassan Computer architecture

31 Dr. Hadi Hassan Computer architecture
Example 1e: … Replace Block 4 31 9 Cache Index : Cache Tag 0x002450 0x02 0x000050 Valid Bit 1 2 3 Cache Data Byte 0 Byte 1 Byte 31 Byte 32 Byte 33 Byte 63 Byte 992 Byte 1023 Byte Select 0x04 = 0x0002fe New Block of data Cache Miss 4/14/2019 Dr. Hadi Hassan Computer architecture

32 Dr. Hadi Hassan Computer architecture
The size of direct-mapped memory cache • Assuming a 32-bit address, a direct-mapped cache of size 2n words (number of entries) with one-word (4- byte) blocks (data block size) will require a tag field whose size is (n+2) bits, because 2 bits are used for the byte offset and n bits are used for the index. • The total number of bits in a direct-mapped cache is 2n x (block size + tag size + valid field size). • Since the block size is one word (32 bits) and the address size is 32 bits, the number of bits in such a cache is 2n x (32 + (32 - n - 2) + 1) = 2n x (63 – n). Valid Tag Data Index Block0 00 Block1 01 Block2 10 Block3 11 Cache Structure 4/14/2019 Dr. Hadi Hassan Computer architecture

33 Dr. Hadi Hassan Computer architecture
Example: How many total bits are required for a direct-mapped cache with 64KByte of data and one-word blocks, assuming a 32-bit address? Answer: – We know that 64KB is 64K/4 = 16K words, which is 210 x 24 = 214 words, and with a block size of one word, 214 blocks. – Each block has 32 bits of data plus a tag, which is 32 – 14 – 2 bits, plus a valid bit. – Thus the total cache size is : 214 x (32 + (32 – 14 – 2) + 1) = 214 x 49 = 784 x 210 = 784 Kbits or 784K/8 = 98KByte for a 64-KByte cache. – For this cache, the total number of bits in the cache is over 1.5 times as many needed just for the storage of the data. 4/14/2019 Dr. Hadi Hassan Computer architecture

34 Fully Associative Cache
A block can be placed anywhere in cache  no indexing If m blocks exist then m comparators are needed to match tag Cache data size = m  2b bytes Address Tag offset Data Hit = V Block Data mux 4/14/2019 Dr. Hadi Hassan Computer architecture

35 Fully Associative Mapped Cache
Main memory size = m words Cache memory size = c words Block size = b word No. of memory block=m/b No. of Cache Block= c/ b Tag=log2( No. of Main Mem Block ) Block Offset =log2 b Example: Main Memory size= 4 Gbyte Cache Memory size = 64 Mbyte Block size = 16 bytes No.of Main Mem Block= 4 Gbyte/16 b = 22*230/24 =228 Block Offset bit= log2 16=4 bits Tag=log2(228)= 28 bits Tag Block Offset 4/14/2019 Dr. Hadi Hassan Computer architecture

36 Set-Associative Cache
A set is a group of blocks that can be indexed A block is first mapped onto a set Set index = Block address mod Number of sets in cache If there are m blocks in a set (m-way set associative) then m tags are checked in parallel using m comparators If 2n sets exist then set index consists of n bits Cache data size = m  2n+b bytes (with 2b bytes per block) Without counting tags and valid bits A direct-mapped cache has one block per set (m = 1) A fully-associative cache has one set (2n = 1 or n = 0) 4/14/2019 Dr. Hadi Hassan Computer architecture

37 Set-Associative Cache
Let us assume that we have 8-block cache. This can be arrange as one-way, two-way, four-way, and fully-associative; 4/14/2019 Dr. Hadi Hassan Computer architecture

38 Set-Associative Cache Diagram
Tag Block Data Address Index offset Data = mux Hit 4/14/2019 Dr. Hadi Hassan Computer architecture

39 Set-Associative Cache
Main memory locations - 3 16 19 32 35 48 51 64 67 80 83 96 99 112 115 Valid bits Tags 1 2 bit set index in cache bit word offset in line Tag Word address Set 0 set 1 Read tag and specified word from each Set Com pare 1,Tag Data out Cache miss 1 if equal Two-way set-associative cache holding 32 words of data within 4-word lines and 2-line sets. 4/14/2019 Dr. Hadi Hassan Computer architecture

40 Dr. Hadi Hassan Computer architecture
Example Show cache addressing scheme for a byte-addressable memory with 32-bit addresses. Cache line width 2W = 16 B. Set size 2S = 2 lines. Cache size 2L = 4096 lines (64 KB). Solution Byte offset in line is log216 = 4 b. Cache set index is (log24096/2) = 11 b. This leaves 32 – 11 – 4 = 17 b for the tag. 11 - bit set index in cache 4 bit byte offset in line (Block ) Address in cache used to read out two candidate items and their control info 17 bit line tag 32 address Components of the 32-bit address in an example two-way set-associative cache. 4/14/2019 Dr. Hadi Hassan Computer architecture

41 Dr. Hadi Hassan Computer architecture
Example A 64 KB four-way set-associative cache is byte-addressable and contains 32 B lines. Memory addresses are 32 b wide. a. How wide are the tags in this cache? b. Which main memory addresses are mapped to set number 5? Solution a. Address (32 b) = 5 b byte offset + 9 b set index + 18 b tag b. Addresses that have their 9-bit set index equal to 5. So the memory addresses are , , . . . 32-bit address Tag Set index Offset 18 bits 9 bits 5 bits Cache size=64 KB=216 Set size = 4  32 B = 128 B= 27 B Number of sets = 216/27 = 29 Line width = 32 B = 25 B Tag width = 32 – 5 – 9 = 18 4/14/2019 Dr. Hadi Hassan Computer architecture

42 Example: Accessing A Set-Associative Cache
2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 1: Mapping: 0 mod 2 = 0 Mem Block DM Hit/Miss Set 0 Set 1 SA Block Replacement Rule: replace least recently used block in set 4/14/2019 Dr. Hadi Hassan Computer architecture

43 Example: Accessing A Set-Associative Cache
2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 2: Mapping: 8 mod 2 = 0 Mem Block DM Hit/Miss miss 8 Set 0 Mem[0] Set 1 4/14/2019 Dr. Hadi Hassan Computer architecture

44 Example: Accessing A Set-Associative Cache
2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 3: Mapping: 0 mod 2 = 0 Mem Block DM Hit/Miss miss 8 Set 0 Mem[0] Mem[8] Set 1 4/14/2019 Dr. Hadi Hassan Computer architecture

45 Example: Accessing A Set-Associative Cache
2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 3: Mapping: 0 mod 2 = 0 Mem Block DM Hit/Miss miss 8 hit Set 0 Mem[0] Mem[8] Set 1 Set 0, Block 0 contains Mem[0] 4/14/2019 Dr. Hadi Hassan Computer architecture

46 Example: Accessing A Set-Associative Cache
2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 4: Mapping: 6 mod 2 = 0 Mem Block DM Hit/Miss miss 8 hit 6 Set 0 Mem[0] Mem[8] Set 1 4/14/2019 Dr. Hadi Hassan Computer architecture

47 Example: Accessing A Set-Associative Cache
2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 4: Mapping: 6 mod 2 = 0 Mem Block DM Hit/Miss miss 8 hit 6 Set 0 Mem[0] Mem[6] Set 1 Set 0, Block 1 is LRU: overwrite with Mem[6] 4/14/2019 Dr. Hadi Hassan Computer architecture

48 Example: Accessing A Set-Associative Cache
2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 5: Mapping: 8 mod 2 = 0 Mem Block DM Hit/Miss miss 8 hit 6 Set 0 Mem[0] Mem[6] Set 1 4/14/2019 Dr. Hadi Hassan Computer architecture

49 Example: Accessing A Set-Associative Cache
2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 5: Mapping: 8 mod 2 = 0 Mem Block DM Hit/Miss miss 8 hit 6 Set 0 Mem[8] Mem[6] Set 1 Set 0, Block 0 is LRU: overwrite with Mem[8] 4/14/2019 Dr. Hadi Hassan Computer architecture

50 Dr. Hadi Hassan Computer architecture
Which block should be replaced on a miss? The cache cant have all data found in lower level! Direct mapped: no choice! Set and fully associative? Least-recently used (LRU) Choose the one unused for the longest time Simple for 2-way, manageable for 4-way, too hard beyond that Random Gives approximately the same performance as LRU for high associativity FIFO Approximates LRU 4/14/2019 Dr. Hadi Hassan Computer architecture

51 Dr. Hadi Hassan Computer architecture
Memory Access Time: Terminology Memory Processor Cache X Y Hit: Data is in cache (e.g., X) Hit rate: Fraction of memory accesses that hit Hit time: Time to access cache Miss: Data is not in cache (e.g., Y) Miss rate = 1 – Hit rate Miss penalty: Time to replace cache block + deliver data Hit time < Miss penalty 4/14/2019 Dr. Hadi Hassan Computer architecture

52 Dr. Hadi Hassan Computer architecture
Hit Rate and Miss Rate Hit Rate = Hits / (Hits + Misses) Miss Rate = Misses / (Hits + Misses) I-Cache Miss Rate = Miss rate in the Instruction Cache D-Cache Miss Rate = Miss rate in the Data Cache Example: Out of 1000 instructions fetched, 150 missed in the I-Cache 25% are load-store instructions, 50 missed in the D-Cache What are the I-cache and D-cache miss rates? I-Cache Miss Rate = 150 / 1000 = 15% D-Cache Miss Rate = 50 / (25% × 1000) = 50 / 250 = 20% 4/14/2019 Dr. Hadi Hassan Computer architecture

53 Dr. Hadi Hassan Computer architecture
Memory Access Time: Formula Average Access Time = Hit time + Miss rate  Miss penalty Average Memory Access Time = L1 Hit time + L1 Miss rate * L1 Miss penalty L1 Miss penalty = L2 Hit time + L2 Miss rate * L2 Miss penalty L2 Miss penalty = L3 Hit time + L3 Miss rate * L3 Miss penalty Example: In 1000 memory references there are 40 misses in L1 and 20 misses in L2. Assume all are reads with no bypass. L1: Hit time = 1 cycle L2: Hit time = 10 cycles Miss penalty = 100 cycles What is the average memory access time? Answer L1 Local miss rate = L1 Local misses / Total references = 40/1000 = 4% L2 Local miss rate = L2 Local misses / L1 Local misses = 20/40 = 50% Average Memory Access Time = L1 Hit Time + L1 Miss Rate * (L2 Hit Time + L2 Miss Rate * L2 Miss Penalty) = 1 + 4% * ( % * 100) = 3.4 clock cycles 4/14/2019 Dr. Hadi Hassan Computer architecture


Download ppt "Dr. Hadi AL Saadi Faculty Of Information Technology"

Similar presentations


Ads by Google