Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Data Locality Optimizing Algorithm based on A Data Locality Optimizing Algorithm by Michael E. Wolf and Monica S. Lam.

Similar presentations


Presentation on theme: "A Data Locality Optimizing Algorithm based on A Data Locality Optimizing Algorithm by Michael E. Wolf and Monica S. Lam."— Presentation transcript:

1 A Data Locality Optimizing Algorithm based on A Data Locality Optimizing Algorithm by Michael E. Wolf and Monica S. Lam

2 Outline Introduction The Problem Loop Transformation Theory –Iteration Space –Matrix Form of Loop Transformations The Localized Vector Space –Tiling –Reuse and Locality The SRP Algorithm

3 Introduction As processor speed continues to increase faster than memory speed, optimizations to use the memory hierarchy efficiently become ever more important. Tiling is a well known technique that improves the data locality of numerical algorithms.

4 Let’s consider the example of matrix multiplication: for I 1 := 1 to n for I 2 := 1 to n for I 3 := 1 to n C[I 1,I 3 ] += A[I 1,I 2 ] * B[I 2,I 3 ] for II 2 := 1 to n by s for II 3 := 1 to n by s for I 1 := 1 to n for I 2 := II 2 to min(II 2 + s - 1,n) for I 3 := II 3 to min(II 3 + s - 1,n) C[I 1,I 3 ] += A[I 1,I 2 ] * B[I 2,I 3 ] Introduction (cont.) can be reused

5 The Problem Matrix multiplication is a particularly simple example because it is both legal and advantageous to tile the entire nest. In general, it is not always possible to tile the entire loop nest. Let’s consider the example of an abstraction of hyperbolic PDE code: for I 1 := 0 to 5 for I 2 := 0 to 6 A[I 2 + 1] := 1/3 * (A[I 2 ] + A[I 2 + 1] + A[I 2 + 2])

6 Loop Transformation Theory In some cases when direct tiling is not applicable, we must use the loop transformations such as interchange, skewing and reversal. And for this we must construct some theory.

7 Iteration Space In our model, a loop nest of depth n corresponds to a finite convex polyhedron of iteration space Z n, bounded by the loop bounds. Each iteration in the loop corresponds to a node in the polyhedron, and is identified by its index vector p i is the loop index of the i loop in the nest, counting from outermost to innermost.

8 A dependence vector in an n-nested loop is denoted by a vector Each component d i is possibly infinite range of integers, represented by, where and. A single dependence vector therefore represents a set of distance vectors, called its distance vector set: and. Iteration Space (cont.)

9 for I 1 := 0 to 5 for I 2 := 0 to 6 A[I 2 + 1] := 1/3 * (A[I 2 ] + A[I 2 + 1] + A[I 2 + 2]) Iteration Space (cont.) I1I1 I2I2 So, the dependencies for our last example are D = {(0, 1), (1, 0), (1, –1)}

10 Matrix Form of Loop Transformations With dependencies represented as vectors in the iteration space, loop transformations such as interchange, skewing and reversal, can be represented as matrix transformations. For example, matrix form of loop interchange transformation that maps iteration (p 1, p 2 ) to iteration (p 2, p 1 ) is

11 Matrix Form of Loop Transformations (cont.) Since a matrix transformation T is a linear transformation on the iteration space,. Therefore, if is a distance vector in the original iteration space, then is a distance vector in the transformed iteration space.

12 Interchange (Permutation) A permutation  on a loop nest transforms iteration (p 1, …, p n ) to. This transformation can be expressed in matrix form as I , the n  n identity matrix I with rows permuted by . The loop interchange above is an n = 2 example of the general permutation transformation.

13 Reversal Reversal of ith loop is represented by the identity matrix, but with the ith diagonal element equal to –1 rather than 1. For example, loop reversal of the outermost loop of a two-deep loop nest is represented as

14 Skewing Skewing loop I j by an integer factor f with respect to loop I i maps iteration to And here the example of skewing of the innermost loop of a two-deep loop nest is

15 The Localized Vector Space It is important to distinguish between reuse and locality. We say that a data item is reused if the same data is used in multiple iterations in a loop nest. Thus reuse is a measure that is inherent in the computation and not dependent on the particular way the loops are written. This reuse may not lead to saving a memory access if intervening iterations flush the data out of the cache between uses of data.

16 The Localized Vector Space (cont.) Let’s consider the following example: for I 1 := 0 to n for I 2 := 0 to n f(A[I 1 ],A[I 2 ]) Here, reference A[I 2 ] touches different data within the innermost loop, but reuses the same elements across the outer loop.

17 Tiling In general, tiling transforms an n-deep loop nest into a 2n-deep loop nest where the inner n loops execute a compiler- determined number of iterations. For example, the result of applying tiling to our example of an abstraction of hyperbolic PDE code will look as follows: for II 1 := 0 to 5 by 2 for II 2 := 0 to 11 by 2 for I 1 := II 1 to min(II 1 + 1, 5) for I 2 := max(I 1, II 2 ) to min(6 + I 1, II 2 + 1) A[I 2 - I 1 + 1] := 1/3 * (A[I 2 - I 1 ] + A[I 2 - I 1 + 1] + A[I 2 - I 1 + 2])

18 Tiling (cont.) As the outer loop nests of tiled code controls the execution of the tiles, we will refer to them as the controlling loops. When we say tiling, we refer to the partitioning of the iteration space into rectangular blocks. Non-rectangular blocks are obtained by first applying unimodular transformations to the iteration space and then applying tiling. for II 1 := 0 to 5 by 2 for II 2 := 0 to 11 by 2 for I 1 := II 1 to min(II 1 + 1, 5) for I 2 := max(I 1, II 2 ) to min(6 + I 1, II 2 + 1) A[I 2 - I 1 + 1] := 1/3 * (A[I 2 - I 1 ] + A[I 2 - I 1 + 1] + A[I 2 - I 1 + 2])

19 Tiling (cont.) II 2 II 1 for II 1 := 0 to 5 by 2 for II 2 := 0 to 11 by 2 for I 1 := II 1 to min(II 1 + 1, 5) for I 2 := max(I 1, II 2 ) to min(6 + I 1, II 2 + 1) A[I 2 - I 1 + 1] := 1/3 * (A[I 2 - I 1 ] + A[I 2 - I 1 + 1] + A[I 2 - I 1 + 2])

20 Reuse and Locality Since unimodular transformations and tiling can modify the localized vector space, knowing where there is reuse in the iteration space can help guide the search for the transformation that delivers the best locality. Also, to choose between alternate transformations that exploit different reuses in a loop nest, we need a metric to quantify locality for a specific localized iteration space.

21 Types of Reuse Reuse occurs when a reference within a loop accesses the same data location in different iterations. We call this self-temporal reuse. Likewise, if a reference accesses data on the same cache line in different iterations, it is said to possess self-spatial reuse. Furthermore, different references may access the same locations. We say that there is group-temporal reuse if the references refer to the same location, and group-spatial reuse if they refer to the same cache line.

22 The SRP Algorithm Combining all together, we get the algorithm, which is known as SRP because the unimodular transformations it performs can be expressed as the product of a skew transformation (S), a reversal transformation (R) and a permutation transformation (P).

23 The SRP Algorithm Let us illustrate SRP algorithm using our example of an abstraction of hyperbolic PDE code: for I 1 := 0 to 5 for I 2 := 0 to 6 A[I 2 + 1] := 1/3 * (A[I 2 ] + A[I 2 + 1] + A[I 2 + 2]) First an outer loop must be chosen. I 1 can be the outer loop, because its dependence components are all non-negative. Now loop I 2 has a dependence component that is negative, but it can be made non-negative by skewing with respect to I 1.

24 The SRP Algorithm (cont.) for I 1 := 0 to 5 for I 2 := I 1 to 6 + I 1 A[I 2 – I 1 + 1] := 1/3 * (A[I 2 – I 1 ] + A[I 2 – I 1 + 1] + A[I 2 + 2]) I1I1 I2I2 Loop I 2 is now placed in the same fully permutable nest as I 1 ; the loop nest is tilable.

25 The SRP Algorithm (cont.) for II 2 := 0 to 11 by 2 for I 1 := 0 to 5 for I 2 := max(I 1, II 2 ) to min(6 + I 1,II 2 + 1) A[I 2 – I 1 + 1] := 1/3 * (A[I 2 – I 1 ] + A[I 2 – I 1 + 1] + A[I 2 + 2]) I1I1 I2I2 Loop I 2 is now placed in the same fully permutable nest as I 1 ; the loop nest is tilable.

26 The End.


Download ppt "A Data Locality Optimizing Algorithm based on A Data Locality Optimizing Algorithm by Michael E. Wolf and Monica S. Lam."

Similar presentations


Ads by Google