Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright 1998 UC, Irvine1 Miss Stride Buffer Department of Information and Computer Science University of California, Irvine.

Similar presentations


Presentation on theme: "Copyright 1998 UC, Irvine1 Miss Stride Buffer Department of Information and Computer Science University of California, Irvine."— Presentation transcript:

1 Copyright 1998 UC, Irvine1 Miss Stride Buffer Department of Information and Computer Science University of California, Irvine

2 Copyright 1998 UC, Irvine2 Introduction In this presentation, we present a new technique to eliminate conflict miss in cache. We use a Miss History Buffer to record the miss address. From the buffer we can calculate the miss stride to predict what address will miss again. Then we prefetch this memory address into cache. Experiment shows this technique is very effective in eliminate conflict miss in some applications, and it incurs little increase in bandwidth.

3 Copyright 1998 UC, Irvine3 Overview Importance of cache performanceImportance of cache performance Techniques to reduce cache missesTechniques to reduce cache misses Our Approach--Miss Stride BufferOur Approach--Miss Stride Buffer ExperimentsExperiments DiscussionDiscussion

4 Copyright 1998 UC, Irvine4 Importance of Cache performance Disparity of processor and memory speedDisparity of processor and memory speed Cache missCache miss –Compulsory –Capacity –Conflict Increasing cache miss penalty for faster machineIncreasing cache miss penalty for faster machine

5 Copyright 1998 UC, Irvine5 Techniques to reduce Cache miss All use some kind of predictions on the pattern of missAll use some kind of predictions on the pattern of miss Victim CacheVictim Cache Stream BufferStream Buffer Stride prefetchStride prefetch

6 Copyright 1998 UC, Irvine6 Victim Cache Mainly used to eliminate conflict missMainly used to eliminate conflict miss Prediction: the memory address of a cache line that is replaced is likely to be accessed again in near futurePrediction: the memory address of a cache line that is replaced is likely to be accessed again in near future Scenario for prediction to be effective: false sharing, ugly address mappingScenario for prediction to be effective: false sharing, ugly address mapping Architecture implementation: use a on-chip buffer to store the contents of recently replaced cache lineArchitecture implementation: use a on-chip buffer to store the contents of recently replaced cache line

7 Copyright 1998 UC, Irvine7

8 8 Drawback of Victim Cache Ugly mapping can be rectified by cache aware compilerUgly mapping can be rectified by cache aware compiler Small size of victim cache, probability of memory address reuse within short period is very low.Small size of victim cache, probability of memory address reuse within short period is very low. Experiment shows victim cache is not effectiveExperiment shows victim cache is not effective

9 Copyright 1998 UC, Irvine9 Stream Buffer Mainly used to eliminate compulsory/capacity missesMainly used to eliminate compulsory/capacity misses Prediction: if a memory address is missed, the consecutive address is likely to be missed in near futurePrediction: if a memory address is missed, the consecutive address is likely to be missed in near future Scenario for prediction to be useful: stream accessScenario for prediction to be useful: stream access Architecture implementation: when an address miss, prefetch consecutive address into on-chip buffer. When there is a hit in stream buffer, prefetch the consecutive address of the hit address.Architecture implementation: when an address miss, prefetch consecutive address into on-chip buffer. When there is a hit in stream buffer, prefetch the consecutive address of the hit address.

10 Copyright 1998 UC, Irvine10

11 Copyright 1998 UC, Irvine11 Stream Cache Modification of stream bufferModification of stream buffer Use a separate cache to store stream data to prevent cache pollutionUse a separate cache to store stream data to prevent cache pollution When there is a hit in stream buffer, the hit address is sent to stream cache instead of L1 cacheWhen there is a hit in stream buffer, the hit address is sent to stream cache instead of L1 cache

12 Copyright 1998 UC, Irvine12

13 Copyright 1998 UC, Irvine13 Stride Prefetch Mainly used to eliminate compulsory/capacity missMainly used to eliminate compulsory/capacity miss Prediction: if a memory address is missed, an address that is offset by a distance from the missed address is likely to be missed in near futurePrediction: if a memory address is missed, an address that is offset by a distance from the missed address is likely to be missed in near future Scenario for prediction to be useful: stride accessScenario for prediction to be useful: stride access Architecture implementation: when an address miss, prefetch address that is offset by a distance from the missed address. When there is a hit in buffer, also prefetch the address that is offset by a distance from the hit address.Architecture implementation: when an address miss, prefetch address that is offset by a distance from the missed address. When there is a hit in buffer, also prefetch the address that is offset by a distance from the hit address.

14 Copyright 1998 UC, Irvine14 Miss Stride Buffer Mainly used to eliminate conflict missMainly used to eliminate conflict miss Prediction: if a memory address miss again after N other misses, the memory address is likely to miss again after N other missesPrediction: if a memory address miss again after N other misses, the memory address is likely to miss again after N other misses Scenario for the prediction to be usefulScenario for the prediction to be useful –multiple loop nests –some variables or array elements are reused across iterations

15 Copyright 1998 UC, Irvine15 Advantage over Victim Cache Eliminate conflict miss that even cache aware compiler can not eliminateEliminate conflict miss that even cache aware compiler can not eliminate –Ugly mappings are fewer and can be rectified –Much more conflicts are random. From probability perspective, a certain memory address will conflict with other addresses after some time, but we can not know at compile time which address it will conflict. There can be a much longer period before the conflict address is reusedThere can be a much longer period before the conflict address is reused –Victim cache’s small size

16 Copyright 1998 UC, Irvine16 Architecture Implementation Memory history bufferMemory history buffer –FIFO buffer to record recently missed memory address –Predict only when there is a hit in the buffer –Miss stride can be calculated by the relative position of consecutive miss for the same address –The size of the buffer determines the number of predictions Prefetch buffer (On-chip)Prefetch buffer (On-chip) –Store the contents of prefetched memory address –The size of the buffer determines how much we can tolerate the variation of miss stride

17 Copyright 1998 UC, Irvine17 Architecture Implementation Prefetch schedulerPrefetch scheduler –Select a right time to prefetch –Avoid collision PrefetcherPrefetcher –prefetch the contents of miss address into on-chip prefetch buffer

18 Copyright 1998 UC, Irvine18

19 Copyright 1998 UC, Irvine19

20

21 Copyright 1998 UC, Irvine21 Experiment Application: Matrix MultiplyApplication: Matrix Multiply #define N 257 main() { int i, j, k, sum, a[N][N], b[N][N], c[N][N]; for ( i=0; i<N; i++ ) for ( j=0; j<N; j++ ) { b[i][j] = 1; c[i][j] = 1; } for ( i=0; i<N; i++ ) for ( j=0; j<N; j++ ) { sum = 0; for ( k=0; k<N; k++ ) { sum += b[i][k]+c[k][j]; } a[i][j] = sum; }

22 Copyright 1998 UC, Irvine22

23 Copyright 1998 UC, Irvine23

24 Copyright 1998 UC, Irvine24

25 Copyright 1998 UC, Irvine25

26 Copyright 1998 UC, Irvine26

27 Copyright 1998 UC, Irvine27 Discussion The effectiveness depends on the hit ratio in MHBThe effectiveness depends on the hit ratio in MHB Combined with blocking to increase the hit ratio in MHBCombined with blocking to increase the hit ratio in MHB Used with victim cacheUsed with victim cache –long time vs.. short time memory address reuse Used with other miss elimination techniquesUsed with other miss elimination techniques –decrease the number of miss seen by MHB, equivalent to increase the size of MHB –More accurate prediction

28 Copyright 1998 UC, Irvine28 Discussion ReconfigurationReconfiguration –Miss stride prefetch buffer, victim cache, and stream buffer share the same big buffer, dynamically partition buffers –Use Conflict counter to recognize recent cache miss pattern--conflict dominant or not


Download ppt "Copyright 1998 UC, Irvine1 Miss Stride Buffer Department of Information and Computer Science University of California, Irvine."

Similar presentations


Ads by Google