Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Slides:



Advertisements
Similar presentations
Precise Interprocedural Analysis using Random Interpretation Sumit Gulwani George Necula UC-Berkeley.
Advertisements

Introduction to Algorithms Quicksort
MATH 224 – Discrete Mathematics
Interprocedural Shape Analysis for Recursive Programs Noam Rinetzky Mooly Sagiv.
A simple example finding the maximum of a set S of n numbers.
DIVIDE AND CONQUER. 2 Algorithmic Paradigms Greedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer.
Strassen's Matrix Multiplication Sibel KIRMIZIGÜL.
1 Divide & Conquer Algorithms. 2 Recursion Review A function that calls itself either directly or indirectly through another function Recursive solutions.
Stephen P. Carl - CS 2421 Recursive Sorting Algorithms Reading: Chapter 5.
Lecture 5COMPSCI.220.FS.T Worst-Case Performance Upper bounds : simple to obtain Lower bounds : a difficult matter... Worst case data may be unlikely.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz April 7 th, 2010 Youngjoon Jo.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization.
Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz
Analysis of Recursive Algorithms
Compile-Time Deallocation of Individual Objects Sigmund Cherem and Radu Rugina International Symposium on Memory Management June, 2006.
CSC 2300 Data Structures & Algorithms January 26, 2007 Chapter 2. Algorithm Analysis.
CS Main Questions Given that the computer is the Great Symbol Manipulator, there are three main questions in the field of computer science: What kinds.
Analysis of Algorithms 7/2/2015CS202 - Fundamentals of Computer Science II1.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
Unit 1. Sorting and Divide and Conquer. Lecture 1 Introduction to Algorithm and Sorting.
Analysis of Algorithms Spring 2015CS202 - Fundamentals of Computer Science II1.
Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
A Review of Recursion Dr. Jicheng Fu Department of Computer Science University of Central Oklahoma.
Multithreaded Programming in Cilk L ECTURE 2 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Analysis of Multithreaded Programs Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Analysis of Algorithms
1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.
1 Recurrences Algorithms Jay Urbain, PhD Credits: Discrete Mathematics and Its Applications, by Kenneth Rosen The Design and Analysis of.
1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.
ECOE 456/556: Algorithms and Computational Complexity Lecture 1 Serdar Taşıran.
CMPT 438 Algorithms. Why Study Algorithms? Necessary in any computer programming problem ▫Improve algorithm efficiency: run faster, process more data,
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
Getting Started Introduction to Algorithms Jeff Chastine.
1Computer Sciences Department. Book: Introduction to Algorithms, by: Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Electronic:
Effective Fine-Grain Synchronization For Automatically Parallelized Programs Using Optimistic Synchronization Primitives Martin Rinard University of California,
CSC 221: Recursion. Recursion: Definition Function that solves a problem by relying on itself to compute the correct solution for a smaller version of.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Pointer Analysis for Multithreaded Programs Radu Rugina and Martin Rinard M I T Laboratory for Computer Science.
Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.
Assembly - Arrays תרגול 7 מערכים.
Algorithm Analysis. What is an algorithm ? A clearly specifiable set of instructions –to solve a problem Given a problem –decide that the algorithm is.
MT311 Java Application Development and Programming Languages Li Tak Sing ( 李德成 )
1. Searching The basic characteristics of any searching algorithm is that searching should be efficient, it should have less number of computations involved.
Pointer and Escape Analysis for Multithreaded Programs Alexandru Salcianu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Program Analysis and Design Conformance Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Review Quick Sort Quick Sort Algorithm Time Complexity Examples
Recursion Unrolling for Divide and Conquer Programs Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Analysis of Algorithms Spring 2016CS202 - Fundamentals of Computer Science II1.
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.
Analysis of Algorithms
Analysis of Algorithms
Program Analysis Techniques for Memory Disambiguation
Martin Rinard Laboratory for Computer Science
Design-Driven Compilation
Algorithm Analysis (not included in any exams!)
Linear Systems Chapter 3.
Radu Rugina and Martin Rinard Laboratory for Computer Science
Analysis of Algorithms
Math I Quarter I Standards
Major Design Strategies
Analysis of Algorithms
Major Design Strategies
Presentation transcript:

Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Outline Example Information required to parallelize divide and conquer algorithms How compiler extracts parallelism Key technique: constraint systems Results Related work Conclusion

Example - Divide and Conquer Sort

Divide

Example - Divide and Conquer Sort Divide Conquer

Example - Divide and Conquer Sort Divide Conquer Combine

Example - Divide and Conquer Sort Divide Conquer Combine

Divide and Conquer Algorithms Lots of Generated Concurrency Solve Subproblems in Parallel

Divide and Conquer Algorithms Lots of Generated Concurrency Solve Subproblems in Parallel

Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Recursively Solve Subproblems in Parallel

Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Recursively Solve Subproblems in Parallel Combine Results in Parallel

Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Recursively Solve Subproblems in Parallel Combine Results in Parallel Good Cache Performance Problems Naturally Scale to Fit in Cache No Cache Size Constants in Code

Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Recursively Solve Subproblems in Parallel Combine Results in Parallel Good Cache Performance Problems Naturally Scale to Fit in Cache No Cache Size Constants in Code Lots of Programs Sort Programs Dense Matrix Programs

“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+n/2,t+n/2,n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n);

“Recursively Sort Four Quarters of d” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+n/2,t+n/2,n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Subproblems Identified Using Pointers Into Middle of Array d d+n/4 d+n/2 d+3*(n/4)

“Recursively Sort Four Quarters of d” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+n/2,t+n/2,n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Sorted Results Written Back Into Input Array d d+n/4 d+n/2 d+3*(n/4)

“Merge Sorted Quarters of d Into Halves of t” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+n/2,t+n/2,n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); d t t+n/2

“Merge Sorted Halves of t Back Into d” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+n/2,t+n/2,n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); t t+n/ d

“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+n/2,t+n/2,n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); d d+n

“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+n/2,t+n/2,n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); d d+n

Parallel Execution void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+n/2,t+n/2,n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n);

What Do You Need to Know to Exploit this Form of Parallelism?

Calls to sort access disjoint parts of d and t Together, calls access [d,d+n-1] and [t,t+n-1] sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+n/2,t+n/2,n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); What Do You Need to Know to Exploit this Parallelism? d t d t d t d t d+n-1 t+n-1 d+n-1 t+n-1 d+n-1 t+n-1 d+n-1 t+n-1

First two calls to merge access disjoint parts of d,t Together, calls access [d,d+n-1] and [t,t+n-1] merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); What Do You Need to Know to Exploit this Parallelism? d t d t d t d+n-1 t+n-1 d+n-1 t+n-1 d+n-1 t+n-1

Calls to insertionSort access [d,d+n-1] insertionSort(d,d+n); What Do You Need to Know to Exploit this Parallelism? d t d+n-1 t+n-1

What Do You Need to Know to Exploit this Parallelism? The Regions of Memory Accessed by Complete Executions of Procedures

How Hard Is it to Extract these Regions?

Challenging

How Hard Is it to Extract these Regions? insertionSort(int *l, int *h) { int *p, *q, k; for (p = l+1; p < h; p++) { for (k = *p, q = p-1; l <= q && k < *q; q--) *(q+1) = *q; *(q+1) = k; } Not Immediately Obvious That insertionSort(l,h) Accesses [l,h-1]

merge(int *l1, int*m, int *h2, int *d) { int *h1 = m; int *l2 = m; while ((l1 < h1) && (l2 < h2)) if (*l1 < *l2) *d++ = *l1++; else *d++ = *l2++; while (l1 < h1) *d++ = *l1++; while (l2 < h2) *d++ = *l2++; } Not Immediately Obvious That merge(l,m,h,d) Accesses [l,h-1] and [d,d+(h-l)-1] How Hard Is it to Extract these Regions?

Issues Pervasive Use of Pointers Pointers into Middle of Arrays Pointer Arithmetic Pointer Comparison Multiple Procedures sort(int *d, int *t, n) insertionSort(int *l, int *h) merge(int *l, int *m, int *h, int *t) Recursion

How The Compiler Does It

Structure of Compiler Pointer Analysis Bounds Analysis Region Analysis Parallelization Disambiguate References at Granularity of Arrays Symbolic Upper and Lower Bounds for Each Memory Access in Each Procedure Symbolic Regions Accessed By Execution of Each Procedure Independent Procedure Calls That Can Execute in Parallel

Example f(char *p, int n) if (n > CUTOFF) { f(p, n/2); initialize first half f(p+n/2, n/2); initialize second half } else { base case: initialize small array int i = 0; while (i < n) { *(p+i) = 0; i++; } }

Bounds Analysis For each variable at each program point, derive upper and lower bounds for value Bounds are symbolic expressions symbolic variables in expressions represent initial values of parameters linear combinations of these variables multivariate polynomials

Bounds Analysis What are upper and lower bounds for region accessed by while loop in base case? int i = 0; while (i < n) { *(p+i) = 0; i++; }

Bounds Analysis, Step 1 Build control flow graph i = 0 i < n *(p+i) = 0; i = i +1

Bounds Analysis, Step 2 Number different versions of variables i 0 = 0 i 1 < n *(p+i 2 ) = 0; i 3 = i 2 +1

Bounds Analysis, Step 3 Set up constraints for lower bounds i 0 = 0 i 1 < n *(p+i 2 ) = 0; i 3 = i 2 +1 l(i 0 ) <= 0 l(i 1 ) <= l(i 0 ) l(i 1 ) <= l(i 3 ) l(i 2 ) <= l(i 1 ) l(i 3 ) <= l(i 2 )+1

Bounds Analysis, Step 3 Set up constraints for lower bounds i 0 = 0 i 1 < n *(p+i 2 ) = 0; i 3 = i 2 +1 l(i 0 ) <= 0 l(i 1 ) <= l(i 0 ) l(i 1 ) <= l(i 3 ) l(i 2 ) <= l(i 1 ) l(i 3 ) <= l(i 2 )+1

Bounds Analysis, Step 3 Set up constraints for lower bounds i 0 = 0 i 1 < n *(p+i 2 ) = 0; i 3 = i 2 +1 l(i 0 ) <= 0 l(i 1 ) <= l(i 0 ) l(i 1 ) <= l(i 3 ) l(i 2 ) <= l(i 1 ) l(i 3 ) <= l(i 2 )+1

Bounds Analysis, Step 4 Set up constraints for upper bounds i 0 = 0 i 1 < n *(p+i 2 ) = 0; i 3 = i 2 +1 l(i 0 ) <= 0 l(i 1 ) <= l(i 0 ) l(i 1 ) <= l(i 3 ) l(i 2 ) <= l(i 1 ) l(i 3 ) <= l(i 2 )+1 0 <= u(i 0 ) u(i 0 ) <= u(i 1 ) u(i 3 ) <= u(i 1 ) min(u(i 1 ),n-1) <= u(i 2 ) u(i 2 )+1 <= u(i 3 )

Bounds Analysis, Step 4 Set up constraints for upper bounds i 0 = 0 i 1 < n *(p+i 2 ) = 0; i 3 = i 2 +1 l(i 0 ) <= 0 l(i 1 ) <= l(i 0 ) l(i 1 ) <= l(i 3 ) l(i 2 ) <= l(i 1 ) l(i 3 ) <= l(i 2 )+1 0 <= u(i 0 ) u(i 0 ) <= u(i 1 ) u(i 3 ) <= u(i 1 ) min(u(i 1 ),n-1) <= u(i 2 ) u(i 2 )+1 <= u(i 3 )

Bounds Analysis, Step 4 Set up constraints for upper bounds i 0 = 0 i 1 < n *(p+i 2 ) = 0; i 3 = i 2 +1 l(i 0 ) <= 0 l(i 1 ) <= l(i 0 ) l(i 1 ) <= l(i 3 ) l(i 2 ) <= l(i 1 ) l(i 3 ) <= l(i 2 )+1 0 <= u(i 0 ) u(i 0 ) <= u(i 1 ) u(i 3 ) <= u(i 1 ) n-1 <= u(i 2 ) u(i 2 )+1 <= u(i 3 )

Bounds Analysis, Step 5 Generate symbolic expressions for bounds Goal: express bounds in terms of parameters l(i 0 ) = c 1 p + c 2 n + c 3 l(i 1 ) = c 4 p + c 5 n + c 6 l(i 2 ) = c 7 p + c 8 n + c 9 l(i 3 ) = c 10 p + c 11 n + c 12 u(i 0 ) = c 13 p + c 14 n + c 15 u(i 1 ) = c 16 p + c 17 n + c 18 u(i 2 ) = c 19 p + c 20 n + c 21 u(i 3 ) = c 22 p + c 23 n + c 24

c 1 p + c 2 n + c 3 <= 0 c 4 p + c 5 n + c 6 <= c 1 p + c 2 n + c 3 c 4 p + c 5 n + c 6 <= c 10 p + c 11 n + c 12 c 7 p + c 8 n + c 9 <= c 4 p + c 5 n + c 6 c 10 p + c 11 n + c 12 <= c 7 p + c 8 n + c <= c 13 p + c 14 n + c 15 c 13 p + c 14 n + c 15 <= c 16 p + c 17 n + c 18 c 22 p + c 23 n + c 24 <= c 16 p + c 17 n + c 18 n-1 <= c 19 p + c 20 n + c 21 c 19 p + c 20 n + c <= c 22 p + c 23 n + c 24 Bounds Analysis, Step 6 Substitute expressions into constraints

Goal Solve Symbolic Constraint System find values for constraint variables c 1,..., c 24 that satisfy the inequality constraints Maximize Lower Bounds Minimize Upper Bounds

Bounds Analysis, Step 7 Apply expression ordering principle c 1 p + c 2 n + c 3 <= c 4 p + c 5 n + c 6 If c 1 <= c 4, c 2 <= c 5, and c 3 <= c 6

Bounds Analysis, Step 7 Apply expression ordering principle Generate a linear program Objective Function: max (c1 + + c12) - (c c24) c 1 <= 0 c 2 <= 0 c 3 <= 0 c 4 <= c 1 c 5 <= c 2 c 6 <= c 3 c 4 <= c 10 c 5 <= c 11 c 6 <= c 12 c 7 <= c 4 c 8 <= c 5 c 9 <= c 6 c 10 <= c 7 c 11 <= c 8 c 12 <= c <= c 13 0 <= c 14 0 <= c 15 c 13 <= c 16 c 14 <= c 17 c 15 <= c 18 c 22 <= c 16 c 23 <= c 17 c 24 <= c 18 0 <= c 19 1 <= c <= c 21 c 19 <= c 22 c 20 <= c 23 c <= c 24 lower boundsupper bounds

Bounds Analysis, Step 8 Solve linear program to extract bounds l(i 0 ) = 0 l(i 1 ) = 0 l(i 2 ) = 0 l(i 3 ) = 0 u(i 0 ) = 0 u(i 1 ) = n u(i 2 ) = n-1 u(i 3 ) = n i 0 = 0 i 1 < n *(p+i 2 ) = 0; i 3 = i 2 +1

Region Analysis Goal: Compute Accessed Regions of Memory Intra-Procedural Use bounds at each load or store Compute accessed region Inter-Procedural Use intra-procedural results Set up another constraint system Solve to find regions accessed by entire execution of the procedure

Basic Principle of Inter-Procedural Region Analysis For each procedure Generate symbolic expressions for upper and lower bounds of accessed regions Constraint System Accessed regions include regions accessed by statements in procedure Accessed regions include regions accessed by invoked procedures

Inter-Procedural Constraints in Example f(char *p, int n) if (n > CUTOFF) { f(p, n/2); f(p+n/2, n/2); } else { int i = 0; while (i < n) { *(p+i) = 0; i++; } l(f,p,n) <= l(f,p,n/2) u(f,p,n) <= u(f,p,n/2) l(f,p,n) <= l(f,p+n/2,n/2) u(f,p,n) <= u(f,p+n/2,n/2) l(f,p,n) <= p u(f,p,n) <= p+n-1

Derive Constraint System Generate symbolic expressions l(f,p,n) = C 1 p + C 2 n + C 3 u(f,p,n) = C 4 p + C 5 n + C 6 Build constraint system C 1 p + C 2 n + C 3 <= p C 4 p + C 5 n + C 6 <= p + n -1 C 1 p + C 2 n + C 3 <= C 1 p + C 2 (n/2) + C 3 C 4 p + C 5 n + C 6 <= C 4 p + C 5 (n/2) + C 6 C 1 p + C 2 n + C 3 <= C 1 (p+n/2) + C 2 (n/2) + C 3 C 4 p + C 5 n + C 6 <= C 4 (p+n/2) + C 5 (n/2) + C 6

Solve Constraint System Simplify Constraint System C 1 p + C 2 n + C 3 <= p C 4 p + C 5 n + C 6 <= p + n -1 C 2 n <= C 2 (n/2) C 5 n <= C 5 (n/2) C 2 (n/2) <= C 1 (n/2) C 5 (n/2) <= C 4 (n/2) Generate and Solve Linear Program l(f,p,n) = p u(f,p,n) = p+n-1

Parallelization Dependence Testing of Two Calls Do accessed regions intersect? Based on comparing upper and lower bounds of accessed regions Comparison done using expression ordering principle Parallelization Find sequences of independent calls Execute independent calls in parallel

Details Inter-procedural positivity analysis Verify that variables are positive Required for correctness of expression ordering principle Correlation Analysis Integer Division Basic Idea : (n-1)/2 <=  n/2  <= n/2 Generalized : (n-m+1)/m <=  n/m  <= n/m Linear System Decomposition

Experimental Results Implementation - SUIF, lp_solve, Cilk Speedup for SortSpeedup for Matrix Multiply Thanks: Darko Marinov, Nate Kushman, Don Dailey

Related Work Shape Analysis Chase, Wegman, Zadek (PLDI 90) Ghiya, Hendren (POPL 96) Sagiv, Reps, Wilhelm (TOPLAS 98) Commutativity Analysis Rinard and Diniz (PLDI 96) Predicated Dataflow Analysis Moon, Hall, Murphy (ICS 98)

Related Work Array Region Analysis Triolet, Irigoin and Feautrier (PLDI 86) Havlak and Kennedy (IEEE TPDS 91) Hall, Amarasinghe, Murphy, Liao and Lam (SC 95) Gu, Li and Lee (PPoPP 97) Symbolic Analysis of Loop Variables Blume and Eigenmann (IPPS 95) Haghigat and Polychronopoulos (LCPC 93)

Future Static Race Detection for Explicitly Parallel Programs Static Elimination of Array Bounds Checks Static Pointer Validation Checks Result: Safety Guarantees No Efficiency Compromises

Context Mainstream Parallelizing Compilers Loop Nests, Dense Matrices Affine Access Functions Key Problem:Solving Diophantine Equations Compilers for Divide and Conquer Algorithms Recursion, Dense Arrays (dynamic) Pointers, Pointer Arithmetic Key Problems: Pointer Analysis, Symbolic Region Analysis, Solving Linear Programs