Presentation is loading. Please wait.

Presentation is loading. Please wait.

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

Similar presentations


Presentation on theme: "Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16."— Presentation transcript:

1

2 Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries = Data Hit

3 Write – Through Performance Improvement
Every Write : Write Cache and Write Main Memory Can be 10% to 15% of instructions

4 Write – Through Performance Improvement
Consider a Write Buffer Processor Write Buffer Cache Main Memory

5 Write – Through Performance Improvement
Consider a Write Buffer Processor Write Buffer Cache Address Data Valid Main Memory

6 Write – Through Performance Improvement
Consider a Write Buffer Processor Write Buffer Cache Memory Controller Writes Data from Buffer to Main and Releases Buffer Main Memory

7 Write – Through Performance Improvement
Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache Memory Controller Writes Data from Buffer to Main and Releases Buffer Main Memory

8 Write – Through Performance Improvement
Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache Write Buffer Full (Write Miss – HOLD) Main Memory

9 Write – Through Performance Improvement
Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache Write Buffer Full (Write Miss – HOLD) 2. Read Miss Main Memory

10 Write – Through Performance Improvement
Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache Write Buffer Full (Write Miss – HOLD) Read Miss Wait until Write Buffer is empty. Main Memory

11 Consider a Cache with a block of several adjacent words.
Read Miss: Fetch a block of multiple adjacent words which replaces a block

12 Consider a Cache with a block of several adjacent words.
Read Miss: Fetch a block of multiple adjacent words which replaces a block in cache Predicts that if a location is accessed, then the locations in the block will be used soon. ( Increased use of Spatial Locality)

13 Consider a Cache with a block of several adjacent words.
Read Miss: Fetch a block of multiple adjacent words which replaces a block in cache Predicts that if a location is accessed, then the locations in the block will be used soon. ( Increased use of Spatial Locality) Cache Entry - 4 word block Index Valid Tag Word 3 Word Word Word 0

14 Consider a Cache with a block of several adjacent words.
Read Miss: Fetch a block of multiple adjacent words which replaces a block in cache Predicts that if a location is accessed, then the locations in the block will be used soon. ( Increased use of Spatial Locality) Cache Entry - 4 word block Index Valid Tag Word 3 Word Word Word 0 Shared Valid and Tag more efficient use of memory

15 Address Tag Index 16 12 Byte Offset Block Offset
Address Tag Index Byte Offset Block Offset

16 v Tag Word3 Word2 Word1 Word0
Address Tag Index Byte Offset Block Offset v Tag Word3 Word2 Word1 Word0 4K Entries (Blocks)

17 Address Tag Index 16 12 Byte Offset Block Offset 4K Entries 16 = Hit
Address Tag Index Byte Offset Block Offset v Tag Word3 Word2 Word1 Word0 4K Entries 16 = Hit

18 v Tag Word3 Word2 Word1 Word0
Address Tag Index Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 4K Entries 16 = Mux Hit Data 32

19 Consider this 4K ( 4096 ) Entry Cache with a block
of 4 words or 16 bytes. For address of , what is the block number? Block Number = Address of Cache = Index

20 Consider this 4K ( 4096 ) Entry Cache with a block
of 4 words or 16 bytes. For address of , what is the block number? Block Number = Address of Cache = Index Address = ( byte ) Block address =

21 Consider this 4K ( 4096 ) Entry Cache with a block
of 4 words or 16 bytes. For address of , what is the block number? Block Number = Address of Cache Address = ( byte ) Block address = / 16 bytes/block = ( left 28 bits of address)

22 Consider this 4K ( 4096 ) entry Cache with a block
of 4 words or 16 bytes. For address of , what is the block number? Block Number = Address of Cache Address = ( byte ) Block address = / 16 bytes/block = ( left 28 bits of address) Block Number = (Block Addr) modulo(No. of cache blocks)

23 Consider this 4K ( 4096 ) entry Cache with a block
of 4 words or 16 bytes. For address of , what is the block number? Block Number = Address of Cache Address = ( byte ) Block address = / 16 bytes/block = ( left 28 bits of address) Block Number = (Block Addr) modulo(No. of cache blocks) 8213 -4096 4117 21

24 131408 ( byte ) 8213 ( block) Tag Index Address 16 12 Byte Offset
( byte ) 8213 ( block) Tag Index Address Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 4K Entries 21 16 = Mux Hit Data 32

25 v Tag Word3 Word2 Word1 Word0
READ Tag Index Address Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 4K Entries 16 = Mux Hit Data 32

26 v Tag Word3 Word2 Word1 Word0
READ MISS Load Cache with 4 Words, Tag and Valid Tag Index Address Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 4K Entries 16 = Mux Hit Data 32

27 v Tag Word3 Word2 Word1 Word0
WRITE WORD Tag Index Address Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 4K Entries 16 = Mux Hit Data 32

28 Write Word for Multiword Cache Block ( Write-Through)
Procedure: Write the Data Word to cache and compare Tags If Hit, done. Go to 4

29 Write Word for Multiword Cache Block ( Write-Through)
Procedure: Write the Data Word to cache and compare Tags If Hit, done. Go to 4 If not Hit, ( Write Miss) Load block from Main Memory to Cache Write Data Word to cache

30 Write Word for Multiword Cache Block ( Write-Through)
Procedure: Write the Data Word to cache and compare Tags If Hit, done. Go to 4. If not Hit, ( Write Miss) Load block from Main Memory to Cache Write Data Word to cache 4. Write the Data Word to Main Memory

31 Average Memory Access Time =
Hit Time + Miss Rate * Miss Penalty Miss Penalty Miss Rate Block Size Block Size Constant Size Cache

32 Average Memory Access Time =
Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Access Time Block Size Block Size Constant Size Cache

33 Average Memory Access Time =
Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Access Time Block Size Block Size Constant Size Cache

34 Average Memory Access Time =
Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Fewer Blocks Access Time Block Size Block Size Constant Size Cache

35 Average Memory Access Time =
Hit Time + Miss Rate * Miss Penalty Average Access Time Block Size

36 DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate % % % % % % % % % % % % gcc spice Write Misses included in 4 word block, but not in 1 word.

37 DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate % % % % % % % % % % % % gcc spice Write Misses included in 4 word block, but not in 1 word. Remember Miss Penalty goes UP !

38 Average Memory Access Time =
Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Fewer Blocks Access Time Block Size Block Size Constant Size Cache

39 Reducing the Miss Penalty
Reduce the time to read the multiple words from Main Memory to the cache block.

40 Reducing the Miss Penalty
Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart”

41 Reducing the Miss Penalty
Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available.

42 Reducing the Miss Penalty
Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available. Variation: “Requested Word First”

43 Reducing the Miss Penalty
Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available. Variation: “Requested Word First” Disadvantage: Complex Control Likely access cache block before transfer is complete

44 Reducing the Miss Penalty
Reduce the time to read the multiple words from Main Memory to the cache block. Assume Memory Access times: 1 clock cycle to send address 10 Clock cycles to access DRAM 1 clock cycle to send a word of data

45 Reducing the Miss Penalty
Reduce the time to read the multiple words from Main Memory to the cache block. Assume Memory Access times: 1 clock cycle to send address 10 Clock cycles to access DRAM 1 clock cycle to send a word of data For sequential transfer of 4 data words: Miss Penalty = *( 10 +1) = 45 clock cycles

46 What if we could read a block of words simultaneously
from the Main Memory? Cache Entry Tag Word3 Word Word Word0 Valid Main Memory

47 What if we could read a block of words simultaneously
from the Main Memory? Cache Entry Tag Word3 Word Word Word0 Valid Main Memory Miss Penalty = = 12 clock cycles Miss Penalty for Sequential = 45 clock cycles

48 What about 4 banks of Memory? “Interleaved Memory”
Cache Banks are accessed in parallel Words are transferred serially Address Bank Bank Bank Bank 0

49 What about 4 banks of Memory? “Interleaved Memory”
Cache Banks are accessed in parallel Words are transferred serially Address Bank Bank Bank Bank 0 Miss Penalty = * 1 = 15 clock cycles Miss Penalty for Parallel = 12 clock cycles Miss Penalty for Sequential = 45 clock cycles

50 Average Memory Access Time =
Hit Time + Miss Rate * Miss Penalty Increase Cache size Increase Block size Main Memory Organization Average Access Time Block Size

51 CPU Performance with Cache Memory
For a program: CPU time = CPU execution time + CPU Hold time Assuming no penalty for Hit

52 CPU Performance with Cache Memory
For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time Assuming no penalty for Hit

53 CPU Performance with Cache Memory
For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles Assuming no penalty for Hit

54 CPU Performance with Cache Memory
For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles Read Stall Cycles = Reads * Read Miss Rate * Read Miss Penalty Program Assuming no penalty for Hit

55 CPU Performance with Cache Memory
Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program + Write Buffer Stalls

56 CPU Performance with Cache Memory
Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program + Write Buffer Stalls Write Buffer Stalls should be << Write Miss Stalls

57 CPU Performance with Cache Memory
Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program + Write Buffer Stalls Write Buffer Stalls should be << Write Miss Stalls So, approximately,

58 CPU Performance with Cache Memory
Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles = Reads * Read Miss Rate * Read Miss Penalty Program + Writes * Write Miss Rate * Write Miss Penalty

59 CPU Performance with Cache Memory
Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles = Reads * Read Miss Rate * Read Miss Penalty Program + Writes * Write Miss Rate * Write Miss Penalty The Miss Penalties are approximately the same ( Fetch the Block) So, combining the Reads and Writes together into a weighted Miss Rate Memory Stall Cycles = Memory Accesses * Miss Rate * Miss Penalty Program

60 CPU Performance with Cache Memory For a program:
CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time CPU time = CPU execution time + Memory Accesses * Miss Rate * Miss Penalty* Clock Cycle time Program Assuming no penalty for Hit

61 CPU Performance with Cache Memory For a program:
CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time CPU time = CPU execution time + Memory Accesses * Miss Rate * Miss Penalty* Clock Cycle time Program Dividing both sides by Instructions / Program and Clock Cycle time Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Assuming no penalty for Hit

62 CPU Performance with Cache Memory
Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction

63 CPU Performance with Cache Memory
Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Eff CPI = ( 1 * * .006) Miss Penalty = * Miss Penalty

64 CPU Performance with Cache Memory
Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Eff CPI = ( 1 * * .006) Miss Penalty = * Miss Penalty 1.) Eff CPI = * 65 = = 1.43

65 CPU Performance with Cache Memory
Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Eff CPI = ( 1 * * .006) Miss Penalty = * Miss Penalty 1.) Eff CPI = * 65 = = 1.43 2.) Eff CPI = * 20 = = 1.271

66 CPU Performance with Cache Memory
Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 4 Bank Interleaved: Miss penalty = 20 clock cycles Eff CPI = clock cycles What if we get a new processor and cache that runs at twice the clock frequency, but keep the same main memory speed?

67 CPU Performance with Cache Memory
Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 4 Bank Interleaved: Miss penalty = 20 clock cycles Eff CPI = clock cycles What if we get a new processor and cache that runs at twice the clock frequency, but keep the same main memory speed? Miss penalty = 40 clock cycles Eff CPI = * 40 = = 1.342

68 CPU Performance with Cache Memory
Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 4 Bank Interleaved: Miss penalty = 20 clock cycles Eff CPI = clock cycles What if we get a new processor and cache that runs at twice the clock frequency, but keep the same main memory speed? Miss penalty = 40 clock cycles Eff CPI = * 40 = = 1.342 Performance Fast clock = * 2 *clock cycle time = 1.89 Slow clock * clock cycle time


Download ppt "Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16."

Similar presentations


Ads by Google