DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3% 0.6% 0.4% gcc spice Write Misses included in 4 word block, but not in 1 word.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3% 0.6% 0.4% gcc spice Write Misses included in 4 word block, but not in 1 word. Remember Miss Penalty goes UP !

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Miss Penalty Block Size Miss Rate Block Size Access Time Transfer Time Constant Size Cache Fewer Blocks

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block.

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart”

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available.

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available. Variation: “Requested Word First”

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available. Variation: “Requested Word First” Disadvantage: Complex Control Likely access cache block before transfer is complete

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Assume Memory Access times: 1 clock cycle to send address 10 Clock cycles to access DRAM 1 clock cycle to send a word of data

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Assume Memory Access times: 1 clock cycle to send address 10 Clock cycles to access DRAM 1 clock cycle to send a word of data For sequential transfer of 4 data words: Miss Penalty = 1 + 4 *( 10 +1) = 45 clock cycles

What if we could read a block of words simultaneously from the Main Memory? Cache Entry Valid Tag Word3 Word2 Word1 Word0 32 32 Main Memory

What if we could read a block of words simultaneously from the Main Memory? Cache Entry Valid Tag Word3 Word2 Word1 Word0 32 32 Main Memory Miss Penalty = 1 + 10 + 1 = 12 clock cycles Miss Penalty for Sequential = 45 clock cycles

What about 4 banks of Memory? “Interleaved Memory” Cache Bank 3 Bank 2 Bank 1 Bank 0 Address Banks are accessed in parallel Words are transferred serially

What about 4 banks of Memory? “Interleaved Memory” Cache Bank 3 Bank 2 Bank 1 Bank 0 Address Banks are accessed in parallel Words are transferred serially Miss Penalty = 1 + 10 + 4 * 1 = 16 clock cycles Miss Penalty for Parallel = 12 clock cycles Miss Penalty for Sequential = 45 clock cycles

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Average Access Time Block Size Increase Cache size Increase Block size Main Memory Organization

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time Assuming no penalty for Hit

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time Assuming no penalty for Hit

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles Assuming no penalty for Hit

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles Read Stall Cycles = Reads * Read Miss Rate * Read Miss Penalty Program Assuming no penalty for Hit

CPU Performance with Cache Memory Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program + Write Buffer Stalls

CPU Performance with Cache Memory Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program + Write Buffer Stalls Write Buffer Stalls should be << Write Miss Stalls

CPU Performance with Cache Memory Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program + Write Buffer Stalls Write Buffer Stalls should be << Write Miss Stalls So, approximately, Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program

CPU Performance with Cache Memory Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles = Reads * Read Miss Rate * Read Miss Penalty Program + Writes * Write Miss Rate * Write Miss Penalty Program

CPU Performance with Cache Memory Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles = Reads * Read Miss Rate * Read Miss Penalty Program + Writes * Write Miss Rate * Write Miss Penalty Program The Miss Penalties are approximately the same ( Fetch the Block) So, combining the Reads and Writes together into a weighted Miss Rate Memory Stall Cycles = Memory Accesses * Miss Rate * Miss Penalty Program

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time CPU time = CPU execution time + Memory Accesses * Miss Rate * Miss Penalty* Clock Cycle time Program Assuming no penalty for Hit

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time CPU time = CPU execution time + Memory Accesses * Miss Rate * Miss Penalty* Clock Cycle time Program Dividing both sides by Instructions / Program and Clock Cycle time Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Assuming no penalty for Hit

CPU Performance with Cache Memory Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles

CPU Performance with Cache Memory Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Eff CPI = 1.2 + ( 1 *.003 +.09 *.006) Miss Penalty = 1.2 +.00354 * Miss Penalty Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles

CPU Performance with Cache Memory Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Eff CPI = 1.2 + ( 1 *.003 +.09 *.006) Miss Penalty = 1.2 +.00354 * Miss Penalty 1.) Eff CPI = 1.2 +.00354* 65 = 1.2 +.2301 = 1.43 Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles

CPU Performance with Cache Memory Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Eff CPI = 1.2 + ( 1 *.003 +.09 *.006) Miss Penalty = 1.2 +.00354 * Miss Penalty 1.) Eff CPI = 1.2 +.00354* 65 = 1.2 + 0.2301 = 1.43 2.) Eff CPI = 1.2 +.00354 * 20 = 1.2 + 0.071 = 1.271 Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles

CPU Performance with Cache Memory Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 4 Bank Interleaved: Miss penalty = 20 clock cycles Eff CPI = 1.271 clock cycles What if we get a new processor and cache that runs at twice the clock frequency, but keep the same main memory speed?

CPU Performance with Cache Memory Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 4 Bank Interleaved: Miss penalty = 20 clock cycles Eff CPI = 1.271 clock cycles What if we get a new processor and cache that runs at twice the clock frequency, but keep the same main memory speed? Miss penalty = 40 clock cycles Eff CPI = 1.2 +.00354 * 40 = 1.2 + 0.1416 = 1.342

CPU Performance with Cache Memory Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 4 Bank Interleaved: Miss penalty = 20 clock cycles Eff CPI = 1.271 clock cycles What if we get a new processor and cache that runs at twice the clock frequency, but keep the same main memory speed? Miss penalty = 40 clock cycles Eff CPI = 1.2 +.00354 * 40 = 1.2 + 0.1416 = 1.342 Performance Fast clock = 1.271 * 2 *clock cycle time = 1.89 Slow clock 1.342 * clock cycle time

31... 16 15... 4 3 2 1 0 Address Byte Offset Block Offset IndexTag 16 12 v Tag Word3 Word2 Word1 Word0 4K Entries = 16 Hit Mux 32 32 2 32 Data

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 7 8 9 80 6 7 8 9 81

03210 17654 2111098 315141312 731302928 835343332 1563626160 X4X+3 4X+2 4X+1 4X Block Address Word Address Word Addr 4

03210 17654 2111098 315141312 731302928 835343332 1563626160 X4X+3 4X+2 4X+1 4X Block Address Word Address Word Addr 4 Cache Address 0 1 2 3 7

03210 17654 2111098 315141312 731302928 835343332 1563626160 X4X+3 4X+2 4X+1 4X Block Address Word Address Word Addr 4 Cache Address 0 1 2 3 7 0 7

03210 17654 2111098 315141312 731302928 835343332 1563626160 X4X+3 4X+2 4X+1 4X Block Address Word Address Word Addr 4 Cache Address 0 1 2 3 7 0 7 X Modulo 8

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 7 8 9 80 6 7 8 9 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 611Miss 7 8 9 80 6 7 8 9 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 611Miss 711Hit 8 9 80 6 7 8 9 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 611Miss 711Hit 822Miss 9 80 6 7 8 9 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 611Miss 711Hit 822Miss 922Hit 80 6 7 8 9 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 611Miss 711Hit 822Miss 922Hit 80204Miss 6 7 8 9 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 611Miss 711Hit 822Miss 922Hit 80204Miss 611Hit 711Hit 822Hit 922Hit 81204Hit Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 611Miss 711Hit 822Miss 922Hit 68 61 71 82 92 69 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 611Miss 711Hit 822Miss 922Hit 68171Miss 61 71 82 92 69 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 611Miss 711Hit 822Miss 922Hit 68171Miss 611Miss 711Hit 822Hit 922Hit 69 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 611Miss 711Hit 822Miss 922Hit 68171Miss 611Miss 711Hit 822Hit 922Hit 69171Miss Cache Address =( Word Addr ) modulo 8 4

How about putting a block in any unused block of the eight blocks? Tag Word3 Word2 Word1 Word0

How about putting a block in any unused block of the eight blocks? Tag Word3 Word2 Word1 Word0 How can you find it?

How about putting a block in any unused block of the eight blocks? Tag Word3 Word2 Word1 Word0 How can you find it? Expand the Tag to the block address and compare

How about putting a block in any unused block of the eight blocks? Tag Word3 Word2 Word1 Word0 Fully Associative Memory – Addressed by it’s contents Block Address – 28 bits Address

Fully Associative Memory – Addressed by it’s contents Block Address – 28 bits Address For practical Hit time, must have parallel comparisons of the Tag and the Block Address Only feasible for small number of blocks Byte Offset Block Offset

Fully Associative Memory – Addressed by it’s contents Block Address – 28 bits Address Tag Data Tag Data Blk Addr = == = + Hit Mux Data Valid bit not shown Block Offset selects Word Byte Offset Block Offset

Fully Associative Memory – Addressed by it’s contents Block Address – 28 bits Address Tag Data Tag Data Blk Addr = == = + Hit Mux Data Valid bit not shown Hardware Not Feasible for large Cache Byte Offset Block Offset

Make sets of Blocks Associative Two-way set associative Tag0 Data0 Tag1 Data1 01...01... Index Valid bit not shown Addr by Index Compare Two Tags in parallel for Hit 2 k -1

Make sets of Blocks Associative Two-way set associative Tag0 Data0 Tag1 Data1 01...01... Index Valid bit not shown Tag Index Block Offset Byte Offset Addr by Index Compare Two Tags in parallel for Hit Address 2 k -1

Block replacement strategies For each Index there are 2, 4,... n options for replacement. Strategies 1.LRU – Least Recently Used Replace the block that has been unused for the longest time Implementation

Block replacement strategies For each Index there are 2, 4,... n options for replacement Strategies 1.LRU – Least Recently Used Replace the block that has been unused for the longest time 2.Random Select the block to be replaced randomly Implementation

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss Entry 0 Entry 1 6 7 8 9 68 6 7 8 9 69 Cache Address =( Word Addr ) modulo 4 4

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss Entry 0 Entry 1 611Miss 711Hit 822Miss 922Hit 68 6 7 8 9 69 Cache Address =( Word Addr ) modulo 4 4

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss Entry 0 Entry 1 611Miss 711Hit 822Miss 922Hit 68171Miss 6 7 8 9 69 Cache Address =( Word Addr ) modulo 4 4

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss Entry 0 Entry 1 611Miss 711Hit 822Miss 922Hit 68171Miss 611Hit 711Hit 822Hit 922Hit 69171Hit Cache Address =( Word Addr ) modulo 4 4

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Similar presentations

Presentation on theme: "DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Similar presentations

Presentation on theme: "DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%"— Presentation transcript:

Similar presentations

About project

Feedback