Cache Organization and Performance Evaluation Vittorio Zaccaria.

Cache Organization and Performance Evaluation Vittorio Zaccaria

Exercise 1 How many total bits are required for a direct mapped instruction cache with 64 KB of data and one-word blocks, assuming 32-bit address? 1 Word=4 bytes Block no. = 64KB/4Bytes=2^14 blocks Tag bits=32-14[index]-2[offset]=16 Size=[16(tag)+1(validbit)+4(blocksize)*8]*2^14=802816

Exercise 2: DM cache 64blocks x 32 bytes Assuming byte addressing and 32-bit addresses, how many bits are there in each of the tag, Index, and Offset fields of the address? How many total bytes of data can be stored in the cache? How many bytes of memory does the cache use (including tags, valid bits, and data)? How many possible blocks reference to the same cache block? If the cache is loaded with random blocks, what is the probability of, given an address, having a match in the tag field? Index=6 bits, offset=5 bits; tag=21bits 2KB (21+1[valid])*64/8+32*64=2224Bytes 2^21 1/(2^21)

Exercise 3 Assume a cache with: Cache size = 128 bytes total. 2-word blocks. 2-way set associative. How may blocks has the cache? How many bits is the index? How many bits is the tag? [128/(8*2)]=8 [2=log(8blocks/2 sets)] [32-2-3(offset)=27]

Cache Performance CPUtime = Instruction Count x (CPI execution + Mem accesses per instruction x Miss rate x Miss penalty in cycles) x Clock cycle time Misses per instruction = #Memory accesses per instruction x Miss rate CPI = CPI execution + Misses per instruction x Miss penalty cycles AMAT= HitTime+MissRate*MissPenalty (can be expressed in cycles or in secs).

Why misses? 1) Compulsory — The first access to a block is not in the cache, so the block must be brought into the cache. 2) Capacity — If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. 3) Conflict — If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur.

3Cs Absolute Miss Rate (SPEC92)

Exercise 4 Consider a VAX-11/780 MP=6 cycles CPI exec = 8.5 MR=0.11 #mem_acc/instruction=3 Compute arch. CPI with cache CPIrealCache= CPIexec+#memacc/instr*MR*MP= 8.5 + 3* 0.11 *6 = 10.48

Exercise 5 Compare the previous architecture in the 100% miss rate case with the same in the 100% hit rate case. Compare the speedup of the real cache with the ideal one. 100%miss 100%Hit: CPIidealCache=CPIexec=8.5 CPInoCache=8.5 + 3*6 = 26.5 Speedup(idealCache, realCache)=10.48/8.5=1.23

Excercise 6 Compute the CPI of an architecture cache with: CPIideal=1.5 MP=10 MR=0.11 #mem_acc/instr=1.4 CPIrealCache= 1.5+1.4*0.11*10=3.04

Exercise (6 cont.) Compare the case of 100% hit rate with the case of 100% miss rate. Speedup real-ideal cache: CPInoCache= 1.5+1.4*10=15.5 CPIidealCache= 1.5 Speedup=3.04 / 1.5 = 2

Exercise 7 Consider two architectures: A and B Tclk(A)=20ns, 8.5% faster than Tclk(B) Both A and B have #mem_acc/instr=1.3 MP(A)=MP(B)=200 ns MR(A)=3.9%, MR(B)=3.0% Compute AMAT(A) and AMAT(B) Compute CPI(A) and CPI(B)

Solution 7 CPI(A)= CPI(B) AMAT(A)= AMAT(B)= 1.5+1.3*10*3.9%=2.07 1.5+1.3*[3%*round(200ns*/(20ns+8.5%*20ns))] =1.85 20ns+200ns*3.9%=27.8ns 20ns(1+8.5%)+200ns*3.0%=27.7ns

Exercise 8 Architecture A[I$,D$]: 1 instr. on 85% of the cycles; other cycles NOP. Architecture B[I$,D$]: 2 instr. on 65% of cycles; 1 instr. on 30% of the time; other cycles NOP. Assume hit time= 1 cycle, miss time = 50 cycles. I$ hit rate = 100% D$ hit rate= 98% L/S instr = 33% of all instr.

Exercise 8 (cont.) CPI(A) and CPI(B) with a perfect memory system? AMAT in cycles relative to D$? CPI(A)=100cycles/85instr=1.17 CPI(B)=100/(65*2+30)=0.62 1+0.02*49=1.98 cycles

Exercise 8 (cont.) CPI(A) and CPI(B) with actual cache? Speedup(B,A)=1.58; CPI(A)=1.17+0.33*0.02*49=1.49 CPI(B)=0.62+0.33*0.02*49=0.94

Exercise 9 300 MHz CPU, 50 MHz bus speed DCache has 2 64-bit words per block Buses: 2 bytes wide burst transfer mode: each block read is: 4-1-1-1-1-1-1-1 (bus clocks) Hit time= 1 cycle 6% miss rate. Ideal ICache

Exercise 9 (cont.) Consider only read data accesses. What is the effective AMAT in ns? How would you speedup? Doubling bus width? Doubling bus speed? Compute first AMAT and then speedup (1+0.06*((4+7)*300/50))CPU clocks =4.96CPU clocks, 16.5 ns

Exercise 9 Doubling bus width? First datum in 4 bus clocks, then 1-1-1 AMAT= (1+0.06*(4+3)*6)CPUclocks =3.52CPUclocks =11.7 ns

Exercise 9 (cont.) Doubling bus speed? 1 bus clock= 3 cpu cycles AMAT= (1+0.06*(4+7)*3)CPUclocks =2.98CPUclocks =9 ns Speedup(2Xfreq,2Xwidth)=1.18

Cache Organization and Performance Evaluation Vittorio Zaccaria.

Similar presentations

Presentation on theme: "Cache Organization and Performance Evaluation Vittorio Zaccaria."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cache Organization and Performance Evaluation Vittorio Zaccaria.

Similar presentations

Presentation on theme: "Cache Organization and Performance Evaluation Vittorio Zaccaria."— Presentation transcript:

Similar presentations

About project

Feedback