Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tile Size Selection Using Cache Organization and Data Layout Stephanie Coleman Intermetrics, Inc. Kathryn S. M c Kinley Computer Science, LGRC, University.

Similar presentations


Presentation on theme: "Tile Size Selection Using Cache Organization and Data Layout Stephanie Coleman Intermetrics, Inc. Kathryn S. M c Kinley Computer Science, LGRC, University."— Presentation transcript:

1 Tile Size Selection Using Cache Organization and Data Layout Stephanie Coleman Intermetrics, Inc. Kathryn S. M c Kinley Computer Science, LGRC, University of Massachusetts Amherst 10/27/01

2 Where to Use Tiling/Blocking? Register TLB L1 cache L2 cache any other memory hierarchy

3 Cache Misses Compulsory misses Capacity misses Interference misses Self-interference Cross-interference

4 Data Reuse and locality Data reuse –Temporal reuse –Spatial reuse Locality: reused data remain in cache Reuse does not necessarily result in locality

5 Without Tiling Matrix Multiply for I=1 to N do for K=1 to N do R=X(K,I) for J=1 to N do Z(J,I)=Z(J,I)+R*Y(J,K)

6 Reuse Pattern without tiling

7 Reuse Pattern after tiling

8 After tiling (tile size=TK* TJ) for KK=1 to N by TK do for JJ=1 to N by TJ do for I=1 to N do for K=KK to MIN(KK+TK-1,N) do R=X(K,I) for J=JJ to MIN(JJ+TJ-1,N) do Z(J,I)=Z(J,I)+R*Y(J,K)

9 General Formula for tiling Before tiling: for I= lo to hi do Tiled into: for It=floor((lo-off)/ts)*ts+off to floor((hi-off)/ts)*ts+off by ts do for I=max(lo, It) to min(hi, It+ts-1) (off: offset ts: tile size)

10 Loop Interchange Interchange an innter tile loop with an outer element loop: for I=max(l1,l2,..) to min(u1,u2,…) do for Jt=floor((k1*I+m1)/ts)*ts+off to floor((ku*I+mu)/ts)*ts+off by ts do The limit for the I loop: do not change; The new lower/upper limit for Jt loop will be the max of a set of expressions,where each expression is its old limit with I replaced by one of l1,l2,…(if k1>0), or u1,u2,…(if k1<0).

11 Tile Size Selection

12 Tile Size selection Cache layout with a tile size of 24

13 Potential column dimensions Euclidean algorithm –G.C.D(a,b)=G.C.D(a-b,b) CS= q1*N+r1 N = q2*r1+r2 r1 = q3*r2+r3 … 1024 = 5* 200 + 24 200 = 8*24 + 8 Potential column dimensions: 24, 8.

14 Computing row size for a column size

15 Improve Spatial Locality with Cache Line Size colSize= colSize if colSize mod CLS =0, or if colSize=column length floor(colSize/CLS)*CLS otherwise

16 Minimize Cross Interference Working set size constraint: TJ*TK+TJ+1*CLS<CS

17 Tile Size Selection Algorithm(TSS)

18 Other Algorithm for Computing Tile Size LRW –improves the average cache performance –sensitive to the array size –ineffective cache utilization ESS –effective only for one-dimensional tiling –no consideration on cross-interference

19 Conclusion TSS incorporate the effect of cache line size and cross-interference between arrays Performs better on direct-mapped caches and higher associative caches than ESS and LRW sensitive to array dimension not fully exploit temporal reuse for some matrix sizes


Download ppt "Tile Size Selection Using Cache Organization and Data Layout Stephanie Coleman Intermetrics, Inc. Kathryn S. M c Kinley Computer Science, LGRC, University."

Similar presentations


Ads by Google