A Unified Framework for Schedule and Storage Optimization

Slides:



Advertisements
Similar presentations
The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case.
Advertisements

Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Sub Exponential Randomize Algorithm for Linear Programming Paper by: Bernd Gärtner and Emo Welzl Presentation by : Oz Lavee.
Fast Algorithms For Hierarchical Range Histogram Constructions
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Stanford University CS243 Winter 2006 Wei Li 1 Loop Transformations and Locality.
Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization.
X := 11; if (x == 11) { DoSomething(); } else { DoSomethingElse(); x := x + 1; } y := x; // value of y? Phase ordering problem Optimizations can interact.
A Data Locality Optimizing Algorithm based on A Data Locality Optimizing Algorithm by Michael E. Wolf and Monica S. Lam.
Arrays. Objectives Learn about arrays Explore how to declare and manipulate data into arrays Learn about “array index out of bounds” Become familiar with.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Design and Analysis of Algorithms - Chapter 11 Algorithm An algorithm is a.
TK3043 Analysis and Design of Algorithms Introduction to Algorithms.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Decision Procedures An Algorithmic Point of View
Compressing Multiresolution Triangle Meshes Emanuele Danovaro, Leila De Floriani, Paola Magillo, Enrico Puppo Department of Computer and Information Sciences.
Array Dependence Analysis COMP 621 Special Topics By Nurudeen Lameed
Memory Allocations for Tiled Uniform Dependence Programs Tomofumi Yuki and Sanjay Rajopadhye.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Carnegie Mellon Lecture 14 Loop Optimization and Array Analysis I. Motivation II. Data dependence analysis Chapter , 11.6 Dror E. MaydanCS243:
Carnegie Mellon Lecture 15 Loop Transformations Chapter Dror E. MaydanCS243: Loop Optimization and Array Analysis1.
Algorithms 1.Notion of an algorithm 2.Properties of an algorithm 3.The GCD algorithm 4.Correctness of the GCD algorithm 5.Termination of the GCD algorithm.
CR18: Advanced Compilers L04: Scheduling Tomofumi Yuki 1.
Arrays Declaring arrays Passing arrays to functions Searching arrays with linear search Sorting arrays with insertion sort Multidimensional arrays Programming.
Linear Analysis and Optimization of Stream Programs Masterworks Presentation Andrew A. Lamb 4/30/2003 Professor Saman Amarasinghe MIT Laboratory for Computer.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
MIT and James Orlin © The Geometry of Linear Programs –the geometry of LPs illustrated on GTC.
1 2 Linear Programming Chapter 3 3 Chapter Objectives –Requirements for a linear programming model. –Graphical representation of linear models. –Linear.
Problem Solving: Brute Force Approaches
Advanced Algorithms Analysis and Design
Linear Programming for Solving the DSS Problems
Linear Programming Many problems take the form of maximizing or minimizing an objective, given limited resources and competing constraints. specify the.
EMGT 6412/MATH 6665 Mathematical Programming Spring 2016
Computation of the solutions of nonlinear polynomial systems
TK3043 Analysis and Design of Algorithms
Computer Programming BCT 1113
Conception of parallel algorithms
Solver & Optimization Problems
Boundary Element Analysis of Systems Using Interval Methods
Solving Linear Systems Ax=b
Chapter 1 Introduction Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
What is an algorithm? An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate.
What is an algorithm? An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate.
Data Dependence, Parallelization, and Locality Enhancement (courtesy of Tarek Abdelrahman, University of Toronto)
What is an algorithm? An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate.
A systolic array for a 2D-FIR filter for image processing
Presented by: Huston Bokinsky Ying Zhang 25 April, 2013
Linear Programming.
What is an algorithm? An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate.
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Chap 3. The simplex method
Objective of This Course
CISC5835, Algorithms for Big Data
Linear Programming.
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Chapter 12 Pipelining and RISC
Query Optimization.
Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors Qingda Lu1, Christophe Alias2, Uday Bondhugula1, Thomas Henretty1,
Introduction to Optimization
Chapter 6. Large Scale Optimization
Chapter 1 Introduction Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Multidisciplinary Optimization
What is an algorithm? An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate.
Presentation transcript:

A Unified Framework for Schedule and Storage Optimization William Thies, Frédéric Vivien*, Jeffrey Sheldon, and Saman Amarasinghe MIT Laboratory for Computer Science * ICPS/LSIIT, Université Louis Pasteur http://compiler.lcs.mit.edu/aov

Motivating Example for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 1 (i, j) = (1, 1) j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 2 (i, j) = (1, 2) j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 3 (i, j) = (1, 3) j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 4 (i, j) = (1, 4) j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 5 (i, j) = (1, 5) j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 6 (i, j) = (2, 1) j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 7 (i, j) = (2, 2) j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 8 (i, j) = (2, 3) j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 9 (i, j) = (2, 4) j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 10 (i, j) = (2, 5) j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 25 (i, j) = (5, 5) j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j

Motivating Example t = 25 (i, j) = (5, 5) j Array Expansion i j for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j Array Expansion init A[0][j] for i = 1 to n for j = 1 to n A[i][j] = f(A[i-1][j], A[i][j-2]) i j

Motivating Example t = 25 (i, j) = (5, 5) j Array Expansion t = 1 for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j Array Expansion t = 1 (i, j) = (1, 1) init A[0][j] for i = 1 to n for j = 1 to n A[i][j] = f(A[i-1][j], A[i][j-2]) i j

Motivating Example t = 25 (i, j) = (5, 5) j Array Expansion t = 2 for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j Array Expansion t = 2 (i, j) = {(1, 2), (2, 1)} init A[0][j] for i = 1 to n for j = 1 to n A[i][j] = f(A[i-1][j], A[i][j-2]) i j

Motivating Example t = 25 (i, j) = (5, 5) j Array Expansion t = 3 for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j Array Expansion t = 3 (i, j) = {(1, 3), (2, 2), (3, 1)} init A[0][j] for i = 1 to n for j = 1 to n A[i][j] = f(A[i-1][j], A[i][j-2]) i j

Motivating Example t = 25 (i, j) = (5, 5) j Array Expansion t = 4 for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j Array Expansion t = 4 (i, j) = {(1, 4), (2, 3), (3, 2), (4, 1)} init A[0][j] for i = 1 to n for j = 1 to n A[i][j] = f(A[i-1][j], A[i][j-2]) i j

Motivating Example t = 25 (i, j) = (5, 5) j Array Expansion t = 5 for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j Array Expansion t = 5 (i, j) = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} init A[0][j] for i = 1 to n for j = 1 to n A[i][j] = f(A[i-1][j], A[i][j-2]) i j

Motivating Example t = 25 (i, j) = (5, 5) j Array Expansion t = 6 for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j Array Expansion t = 6 (i, j) = {(2, 5), (3, 4), (4, 3), (5, 2)} init A[0][j] for i = 1 to n for j = 1 to n A[i][j] = f(A[i-1][j], A[i][j-2]) i j

Motivating Example t = 25 (i, j) = (5, 5) j Array Expansion t = 7 for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j Array Expansion t = 7 (i, j) = {(3, 5), (4, 4), (5, 3)} init A[0][j] for i = 1 to n for j = 1 to n A[i][j] = f(A[i-1][j], A[i][j-2]) i j

Motivating Example t = 25 (i, j) = (5, 5) j Array Expansion t = 8 for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j Array Expansion t = 8 (i, j) = {(4, 5), (5, 4)} init A[0][j] for i = 1 to n for j = 1 to n A[i][j] = f(A[i-1][j], A[i][j-2]) i j

Motivating Example t = 25 (i, j) = (5, 5) j Array Expansion t = 9 for i = 1 to n for j = 1 to n A[j] = f(A[j], A[j-2]) j Array Expansion t = 9 (i, j) = (5, 5) init A[0][j] for i = 1 to n for j = 1 to n A[i][j] = f(A[i-1][j], A[i][j-2]) i j

Parallelism/Storage Tradeoff Increasing storage can enable parallelism But storage can be expensive Phase ordering problem Optimizing for storage restricts parallelism Maximizing parallelism restricts storage options Too complex to consider all combinations Need efficient framework to integrate schedule and storage optimization Cache RAM Disk “efficient framework” means linear programming problem

Outline Abstract problem Simplifications Concrete problem Solution Method Conclusions

Abstract Problem Given DAG of dependent operations Must execute producers before consumers Must store a value until all consumers execute Two parameters control execution: 1. A scheduling function  Maps each operation to execution time Parallelism is implicit 2. A fully associative store of size m

Abstract Problem We can ask three questions: Two parameters control execution: 1. A scheduling function  Maps each operation to execution time Parallelism is implicit 2. A fully associative store of size m

Abstract Problem We can ask three questions: 1. Given , what is the smallest m? Two parameters control execution: 1. A scheduling function  Maps each operation to execution time Parallelism is implicit 2. A fully associative store of size m

Abstract Problem We can ask three questions: 1. Given , what is the smallest m? 2. Given m, what is the “best” ? Two parameters control execution: 1. A scheduling function  Maps each operation to execution time Parallelism is implicit 2. A fully associative store of size m

Abstract Problem We can ask three questions: 1. Given , what is the smallest m? 2. Given m, what is the “best” ? 3. What is the smallest m that is valid for all legal ? Two parameters control execution: 1. A scheduling function  Maps each operation to execution time Parallelism is implicit 2. A fully associative store of size m

Outline Abstract problem Simplifications Concrete problem Solution Method Conclusions

Simplifying the Schedule Real programs aren’t DAG’s Dependence graph is parameterized by loops Too many nodes to schedule Size could even be unknown (symbolic constants) Use classical solution: affine schedules Each statement has a scheduling function  Each  is an affine function of the enclosing loop counters and symbolic constants To simplify talk, ignore symbolic constants: (i) = B • i

Simplifying the Storage Mapping Programs use arrays, not associative maps If size decreases, need to specify which elements are mapped to the same location

Simplifying the Storage Mapping Programs use arrays, not associative maps If size decreases, need to specify which elements are mapped to the same location

Simplifying the Storage Mapping Programs use arrays, not associative maps If size decreases, need to specify which elements are mapped to the same location ?

Simplifying the Storage Mapping Occupancy Vectors (Strout et al.) Simplifying the Storage Mapping Specifies unit of overwriting within an array Locations collapsed if separated by a multiple of v v = (1, 1) j i

Simplifying the Storage Mapping Occupancy Vectors (Strout et al.) Simplifying the Storage Mapping Specifies unit of overwriting within an array Locations collapsed if separated by a multiple of v v = (1, 1) j i

Simplifying the Storage Mapping Occupancy Vectors (Strout et al.) Simplifying the Storage Mapping Specifies unit of overwriting within an array Locations collapsed if separated by a multiple of v v = (1, 1) j i

Occupancy Vectors (Strout et al.) Simplifying the Store Specifies unit of overwriting within an array Locations collapsed if separated by a multiple of v v = (1, 1) j i

Occupancy Vectors (Strout et al.) Simplifying the Store Specifies unit of overwriting within an array Locations collapsed if separated by a multiple of v v = (1, 1) j i

Occupancy Vectors (Strout et al.) Simplifying the Store Specifies unit of overwriting within an array Locations collapsed if separated by a multiple of v v = (1, 1) j i

Occupancy Vectors (Strout et al.) Simplifying the Store Specifies unit of overwriting within an array Locations collapsed if separated by a multiple of v v = (1, 1) j i

Occupancy Vectors (Strout et al.) Simplifying the Store Specifies unit of overwriting within an array Locations collapsed if separated by a multiple of v v = (1, 1) j i

Occupancy Vectors (Strout et al.) Simplifying the Store Specifies unit of overwriting within an array Locations collapsed if separated by a multiple of v v = (1, 1) j i

Occupancy Vectors (Strout et al.) Simplifying the Store Specifies unit of overwriting within an array Locations collapsed if separated by a multiple of v v = (1, 1) j i

Occupancy Vectors (Strout et al.) Simplifying the Store For a given schedule, v is valid if semantics are unchanged using transformed array Shorter vectors require less storage v = (1, 1) j i

Outline Abstract problem Simplifications Concrete problem Solution Method Conclusions

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Solution: v = (1, 1) i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Why not v = (0, 1)? i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Why not v = (0, 1)? i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Why not v = (0, 1)? i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Why not v = (0, 1)? i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Why not v = (0, 1)? i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Why not v = (0, 1)? i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Why not v = (0, 1)? i j

Answering Question #1 Given (i, j) = i + j, what is the shortest valid occupancy vector v? Why not v = (0, 1)? ??? i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? (i, j) is between: (i, j) = 2  i + j (inclusive) (i, j) = i (exclusive) i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? (i, j) is between: (i, j) = 2  i + j (inclusive) (i, j) = i (exclusive) i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? (i, j) is between: (i, j) = 2  i + j (inclusive) (i, j) = i (exclusive) i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? (i, j) is between: (i, j) = 2  i + j (inclusive) (i, j) = i (exclusive) i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? (i, j) is between: (i, j) = 2  i + j (inclusive) (i, j) = i (exclusive) i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? (i, j) is between: (i, j) = 2  i + j (inclusive) (i, j) = i (exclusive) i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? (i, j) is between: (i, j) = 2  i + j (inclusive) (i, j) = i (exclusive) i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #2 Given v = (0, 1), what is the range of valid schedules ? Lets try (i, j) = 2  i + j i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  v = (2, 1) i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  v = (2, 1) i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  v = (2, 1) i j

Answering Question #3 What is the shortest v that is valid for all legal affine schedules? Range of legal  v = (2, 1) Def: v is an affine occupancy vector (AOV) i j

Outline Abstract problem Simplifications Concrete problem Solution Method Conclusions

Schedule Constraint (i)  (h(i)) + 1 Schedule Constraints Dependence analysis yields: iteration i depends on iteration h(i) h is an affine function Consumer must execute after producer i2 be sure to make it clear that this is classical result that audience should be familiar with Schedule Constraint (i)  (h(i)) + 1 i h(i) i1

Storage Constraints i2 i h(i) i1

Storage Constraints dynamic single assignment v dynamic single assignment for i = 1 to n for j = 1 to n A[i][j] = … B[i][j] = … v i2 i h(i) i1

Storage Constraint (i)  (h(i) + v) Storage Constraints i2 h(i).+.v Consumer: i Producer: h(i) Overwriting producer: h(i) + v Consumer: i Producer: h(i) Consumer must execute before producer is overwritten v i h(i) i1 Storage Constraint (i)  (h(i) + v)

The Constraints A given (, v) combination is valid if For all dependences h, For all iterations i in the program: (i)  (h(i)) + 1 schedule constraint (i)  (h(i) + v) storage constraint Given , want to find v satisfying constraints Might look simple, but it is not Too many i’s and n’s to enumerate! Need to reduce the number of constraints

The Constraints A given (, v) combination is valid if For all dependences h, For all iterations i in the program: (i)  (h(i)) + 1 schedule constraint (i)  (h(i) + v) storage constraint Given , want to find v satisfying constraints Might look simple, but it is not Too many i’s to enumerate! Need to reduce the number of constraints “so we’ve expressed what makes a given combination valid, but we’d like to find v satisfying constraints…”

The Vertex Method (1-D) An affine function is non-negative within an interval [x1, x2] iff it is non-negative at x1 and x2 x1 x2

The Vertex Method (1-D) An affine function is non-negative over an unbounded interval [x1, ) iff it is non-negative at x1 and is non-decreasing along the interval x1

The Vertex Method The same result holds in higher dimensions An affine function is nonnegative over a bounded polyhedron D iff it is nonnegative at vertices of D mention that a polyhedron is just a region bounded by planes; everything we encounter here is a polyhedron

Applying the Method (Quinton87) Recall the storage constraints For all iterations i in the program: (i)  (h(i) + v) Re-arrange: 0  (h(i) + v) - (i) The right hand side is: 1. An affine function of i 2. Nonnegative over the domain D of iterations We can apply the vertex method

Applying the Method Replace i with the vertices w of its domain: (h(i) + v) - (i) iD, (h(i) + v) - (i)  0 (h(w1) + v) - (w1)  0 (h(w2) + v) - (w2)  0 (h(w3) + v) - (w3)  0 (h(w4) + v) - (w4)  0 be sure to say where w1 through w4 come from w3 w4 i2 i1 w1 w2 iteration space

The Reduced Constraints Apply same method to schedule constraints Now a given (, v) combination is valid if For all dependences h, For all vertices w of the iteration domain: (w)  (h(w)) + 1 schedule constraint (w)  (h(w) + v) storage constraint These are linear constraints  and v are variables; h and w are constants Given , constraints are linear in v (& vice-versa)

Answering the Questions (w)  (h(w)) + 1 schedule constraint (w)  (h(w) + v) storage constraint (w(z))  (h(w(z))) + 1 schedule constraint (h(w(z)) + v)  (w(z)) storage constraint 1. Given , we can “minimize” |v| - Linear programming problem 2. Given v, we can find a “good”  - Feautrier, 1992 3. To find an AOV... still too many constraints! - For all  satisfying the schedule constraints: v must satisfy the storage constraints

Finding an AOV Apply the vertex method again! (w)  (h(w)) + 1 schedule constraint (w)  (h(w) + v) storage constraint (w(z))  (h(w(z))) + 1 schedule constraint (h(w(z)) + v)  (w(z)) storage constraint Apply the vertex method again! Schedule constraints define domain of valid  Storage constraints can be written as a non-negative affine function of components of : Expand (i) = B • i B • w  B • (h(w) + v) Simplify (h(w) + v – w) • B  0

Finding an AOV Our constraints are now as follows: For all dependences h, For all vertices w of the iteration domain, For all vertices t of the space of valid schedules: t • w  t • (h(w) + v) AOV constraint Can find “shortest” AOV with linear program Finite number of constraints h, w, and t are known constants

The Big Picture Input program dependence analysis Affine Dependences Schedule & Storage Constraints vertex method linear program Constraints without i Given , find v Given v, find  vertex method linear program Constraints without  Find an AOV, valid for all 

Details in Paper Symbolic constants Inter-statement dependences across loops Farkas’ Lemma for improved efficiency

Related Work Universal Occupancy Vector (Strout et al.) Valid for all schedules, not just affine ones Stencil of dependences in single loop nest Storage for ALPHA programs (Quilleré, Rajopadhye, Wilde) Polyhedral model, with occupancy vector analog Assume schedule is given PAF compiler (Cohen, Lefebvre, Feautrier) Minimal expansion  scheduling  contraction Storage mapping A[i mod x][j mod y] strout only answered 3rd question, with branch and bound algorithm the cohen/lefebvre/feautrier team is at versailles university in france

Future Work Allow affine left hand side references A[2j][n-i] = … Consider multi-dimensional time schedules Collapse multiple dimensions of storage

Conclusions Unified framework for determining: 1. A good storage mapping for a given schedule 2. A good schedule for a given storage mapping 3. A good storage mapping for all valid schedules Take away: representations and techniques Occupancy vectors Affine schedules Vertex method