Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA LOCALITY & ITS OPTIMIZATION TECHNIQUES Presented by Preethi Rajaram CSS 548 Introduction to Compilers Professor Carol Zander Fall 2012.

Similar presentations


Presentation on theme: "DATA LOCALITY & ITS OPTIMIZATION TECHNIQUES Presented by Preethi Rajaram CSS 548 Introduction to Compilers Professor Carol Zander Fall 2012."— Presentation transcript:

1 DATA LOCALITY & ITS OPTIMIZATION TECHNIQUES Presented by Preethi Rajaram CSS 548 Introduction to Compilers Professor Carol Zander Fall 2012

2 Why? Processor Speed - increasing at a faster rate than the memory speed Computer Architectures -more levels of cache memory Cache - takes advantage of data locality Good Data Locality - good application performance Poor Data Locality - reduces the effectiveness of the cache

3 Data Locality It is the property that, references to the same memory location or adjacent locations are reused within a short period of time Temporal locality Spatial locality Fig: Program to find the squares of the differences (a) without loop fusion (b) with loop fusion [Image from: The Dragon book 2 nd edition]

4 Matrix Multiplication - Example Fig: Basic Matrix Multiplication Algorithm [Image from: The Dragon book 2 nd edition] Poor data locality N 2 multiply add operations separates the reuse of same data element in matrix Y N operations separate the reuse of same cache line in Y Solutions Changing the layout of the data structures Blocking

5 Matrix Multiplication – Example Contd… Changing the data structure layout Store Y in column-major order Improves reuse of cache lines of matrix Y Limited Applicability Blocking Changes the execution order of instructions Divide the matrix into submatrices or blocks Order the operations such that entire block is used over a short period of time Choose B such that, one block from each of the matrices fits into cache Image from: The Dragon book 2 nd edition

6 Data Reuse Locality Optimization Identify set of iterations that access the same data or same cache line Static Access- an instruction in a program e.g x = z[i,j] Dynamic Access- execution of instruction many times as in a loop nest Types of Reuse Self Iterations using same data come from same static access Group Iterations using same data come from different static access Temporal If the same exact location is referenced Spatial If the same cache line is referenced

7 Self Temporal Reuse Save substantial memory by exploiting self reuse n (d-k) times reused for data with ‘k’ dimensions in a loop nest of depth ‘d’ e.g. 3-deep nested loop accesses one column of an array, then there is a potential saving accesses of n 2 accesses Dimensionality of access- Rank of the matrix in access Iterations referring to the same location – Null Space of a matrix Rank of a Matrix No. of rows or columns that are linearly independent Null Space of a matrix A reference in ‘d’ deep loop nest with ‘r’ rank, accesses O(n r ) data elements in O(n d ) iterations, so on an average, O(n d-r ) iterations must refer to the same array element Rank = Dimensionality = 2 2 nd row = 1 st + 3 rd 4 th row = 3 rd – 2* 1 st Nullity = 3-2 = 1 Loop depth = 3 Rank = 2

8 Self Spatial Reuse Depends on data layout of the matrix – e.g. Row major order In an array of ‘d’ dimension, array elements share a cache line if they differ only in the last dimension e.g. Two array elements share the same cache line if and only if they share the same row in a 2-D array Truncated matrix is obtained by dropping of the last row from the matrix If the resulting matrix has a rank ‘r’ that is less than depth ‘d’, we can assure for spatial reuse Truncated Matrix, r = 1, d = 2 r<d, assures spatial reuse

9 Group Reuse Group reuse only among accesses in a loop sharing the same coefficient matrix Fig: 2-deep loop nest [Image from: The Dragon book 2 nd edition] z[i,j] and z[i-1,j] access almost the same set of array elements Data read by access z[i-1,j] is same as the data written by z[i,j], except for i = 1 Rank = 2, no self temporal reuse Truncated Matrix, Rank = 1, self spatial reuse

10 Locality Optimization Temporal Locality of data Use the results as soon as they are generated Fig: Code excerpt for a multigrid algorithm (a) before partition (b) after patition [Image from: The Dragon book 2 nd edition]

11 Locality Optimization Contd… Array Contraction Reduce the dimension of the array and reduce the number of memory locations accessed Fig: Code excerpt for a multigrid algorithm after partition and after array contraction Image from: The Dragon book 2 nd edition

12 Locality Optimization Contd… Instead of executing each partition one after the other; we interleave a number of the partitions so that reuse among partitions occur close together Interleaving Inner Loops in a Parallel Loop Interleaving Statements in a Parallel Loop Fig: The statement interleaving transformation [Image from: The Dragon book 2 nd edition] Fig: Interleaving four instances of the inner loop [Image from: The Dragon book 2 nd edition]

13 References Wolf, Michael E., and Monica S. Lam. "A data locality optimizing algorithm." ACM Sigplan Notices 26.6 (1991): 30-44. McKinley, Kathryn S., Steve Carr, and Chau-Wen Tseng. "Improving data locality with loop transformations." ACM Transactions on Programming Languages and Systems (TOPLAS) 18.4 (1996): 424-453. Bodin, François, et al. "A quantitative algorithm for data locality optimization." Code Generation: Concepts, Tools, Techniques (1992): 119-145. Kennedy, Ken, and Kathryn S. McKinley. "Optimizing for parallelism and data locality." Proceedings of the 6th international conference on Supercomputing. ACM, 1992. Compilers ‐ Principles, Techniques, and Tools by A. Aho, M. Lam (2nd edition), R. Sethi, and J.Ullman, Addison ‐ Wesley.

14 Thank You! Questions??


Download ppt "DATA LOCALITY & ITS OPTIMIZATION TECHNIQUES Presented by Preethi Rajaram CSS 548 Introduction to Compilers Professor Carol Zander Fall 2012."

Similar presentations


Ads by Google