Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Organization CS224 Fall 2012 Lessons 39 & 40.

Similar presentations


Presentation on theme: "Computer Organization CS224 Fall 2012 Lessons 39 & 40."— Presentation transcript:

1 Computer Organization CS224 Fall 2012 Lessons 39 & 40

2 Write-Through  On data-write hit, could just update the block in cache l But then cache and memory would be inconsistent  Write through: also update memory  But makes writes take longer l e.g., if base CPI = 1, 10% of instructions are stores, write to memory takes 100 cycles - Effective CPI = 1 + 0.1×100 = 11  Solution: write buffer l Holds data waiting to be written to memory l CPU continues immediately -Only stalls on write if write buffer is already full

3 Write-Back  Alternative: On data-write hit, just update the block in cache l Keep track of whether each block is dirty  When a dirty block is replaced l Write it back to memory l Can use a write buffer to allow replacing block to be read first

4 Write Allocation  What should happen on a write miss?  Alternatives for write-through l Allocate on miss: fetch the block l Write around: don’t fetch the block -Since programs often write a whole block before reading it (e.g., initialization)  For write-back l Usually fetch the block

5 Example: Intrinsity FastMATH  Embedded MIPS processor l 12-stage pipeline l Instruction and data access on each cycle  Split cache: separate I-cache and D-cache l Each 16KB: 256 blocks × 16 words/block l D-cache: write-through or write-back  SPEC2000 miss rates l I-cache: 0.4% l D-cache: 11.4% l Weighted average: 3.2%

6 Example: Intrinsity FastMATH

7 Main Memory Supporting Caches  Use DRAMs for main memory l Fixed width (e.g., 1 word) l Connected by fixed-width clocked bus -Bus clock is typically slower than CPU clock  Example cache block read l 1 bus cycle for address transfer l 15 bus cycles per DRAM access l 1 bus cycle per data transfer  For 4-word block, 1-word-wide DRAM l Miss penalty = 1 + 4×15 + 4×1 = 65 bus cycles l Bandwidth = 16 bytes / 65 cycles = 0.25 B/cycle

8 Increasing Memory Bandwidth  4-word wide memory l Miss penalty = 1 + 15 + 1 = 17 bus cycles l Bandwidth = 16 bytes / 17 cycles = 0.94 B/cycle  4-bank interleaved memory l Miss penalty = 1 + 15 + 4×1 = 20 bus cycles l Bandwidth = 16 bytes / 20 cycles = 0.8 B/cycle

9 Advanced DRAM Organization  Bits in a DRAM are organized as a rectangular array l DRAM accesses an entire row l Burst mode: supply successive words from a row with reduced latency  Double data rate (DDR) DRAM l Transfer on rising and falling clock edges  Quad data rate (QDR) DRAM l Separate DDR inputs and outputs

10 DRAM Generations YearCapacity$/GB 198064Kbit$1500000 1983256Kbit$500000 19851Mbit$200000 19894Mbit$50000 199216Mbit$15000 199664Mbit$10000 1998128Mbit$4000 2000256Mbit$1000 2004512Mbit$250 20071Gbit$50

11 Associative Caches  Fully associative l Allow a given block to go in any cache entry l Requires all entries to be searched at once l Comparator per entry (expensive)  N-way set associative l Each set contains n entries l Block number determines which set -(Block number) modulo (#Sets in cache) l Search all entries in a given set at once l n comparators (less expensive) §5.3 Measuring and Improving Cache Performance

12 Associative Cache Example

13 Spectrum of Associativity  For a cache with 8 entries

14 Associativity Example  Compare 4-block caches l Direct mapped, 2-way set associative, fully associative l Block access sequence: 0, 8, 0, 6, 8  Direct mapped Block addressCache indexHit/missCache content after access 0123 00missMem[0] 80missMem[8] 00missMem[0] 62missMem[0]Mem[6] 80missMem[8]Mem[6]

15 Associativity Example  2-way set associative Block addressCache indexHit/missCache content after access Set 0Set 1 00missMem[0] 80missMem[0]Mem[8] 00hitMem[0]Mem[8] 60missMem[0]Mem[6] 80missMem[8]Mem[6]  Fully associative Block addressHit/missCache content after access 0missMem[0] 8missMem[0]Mem[8] 0hitMem[0]Mem[8] 6missMem[0]Mem[8]Mem[6] 8hitMem[0]Mem[8]Mem[6]

16 How Much Associativity?  Increased associativity decreases miss rate l But with diminishing returns  Simulation of a system with 64KB D-cache, 16-word blocks, SPEC2000 l 1-way: 10.3% l 2-way: 8.6% l 4-way: 8.3% l 8-way: 8.1%

17 Set Associative Cache Organization


Download ppt "Computer Organization CS224 Fall 2012 Lessons 39 & 40."

Similar presentations


Ads by Google