Generalized Index-Set Splitting Christopher Barton Arie Tal Bob Blainey Jose Nelson Amaral.

Slides:



Advertisements
Similar presentations
Towers of Hanoi Move n (4) disks from pole A to pole C such that a disk is never put on a smaller disk A BC ABC.
Advertisements

Optimizing Compilers for Modern Architectures Copyright, 1996 © Dale Carnegie & Associates, Inc. Dependence Testing Allen and Kennedy, Chapter 3 thru Section.
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Remember Sorting_Machine? Isn’t it a beauty! Go ahead... kick the tires! Change_To_ Extraction_Phase InOut.
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Bipartite Matching, Extremal Problems, Matrix Tree Theorem.
Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.
Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.
CS 280 Data Structures Professor John Peterson. Project Not a work day but I’ll answer questions as long as they keep coming! I’ll try to leave the last.
HCS Clustering Algorithm
Nonlinear and Symbolic Data Dependence Testing Presented by Chen-Yong Cher William Blume, Rudolf Eigenmann.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
WS 2004/5 Logistical Management - Dr. Christian Almeder 1 example  example edge [1,4] has the smallest lower boundary r(q hk ) = 6 local center of this.
CS 280 Data Structures Professor John Peterson. Project Questions?
Idiom Recognition in the Polaris Parallelizing Compiler Bill Pottenger and Rudolf Eigenmann Presented by Vincent Yau.
TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] ,
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic B: Loop Restructuring José Nelson Amaral
CS 280 Data Structures Professor John Peterson. Project Questions? /CIS280/f07/project5http://wiki.western.edu/mcis/index.php.
Adversarial Search and Game Playing Examples. Game Tree MAX’s play  MIN’s play  Terminal state (win for MAX)  Here, symmetries have been used to reduce.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Fundamental in Computer Science Recursive algorithms 1.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
ORGANISING THE DATA. What is data ?  Data is a collection of facts, such as values or Measurements.  It can be numbers, words, measurements, observations.
– 1 – Basic Machine Independent Performance Optimizations Topics Load balancing (review, already discussed) In the context of OpenMP notation Performance.
Vishnu Kotrajaras, PhD.1 Data Structures. Vishnu Kotrajaras, PhD.2 Introduction Why study data structure?  Can understand more code.  Can choose a correct.
Excel Simple Computation. Compute the Mean Place the cursor over a cell, click, and type. Pressing the enter key moves to the next cell. Here I typed.
Weka Project assignment 3
CSE 1342 Programming Concepts Recursion. Overview of Recursion nRecursion is present when a function is defined in terms of itself. nThe factorial of.
Suppose you have a problem involving N data points. Recursive solution of such problem is a follows: If the problem can be solved directly for N points.
Arrays and 2D Arrays.  A Variable Array stores a set of variables that each have the same name and are all of the same type.  Member/Element – variable.
To accompany Quantitative Analysis for Management, 8e by Render/Stair/Hanna 11-1 © 2003 by Prentice Hall, Inc. Upper Saddle River, NJ Chapter 11.
Carnegie Mellon Lecture 14 Loop Optimization and Array Analysis I. Motivation II. Data dependence analysis Chapter , 11.6 Dror E. MaydanCS243:
October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,
Searching and Sorting Recursion, Merge-sort, Divide & Conquer, Bucket sort, Radix sort Lecture 5.
Introduction to Algorithms: Verification, Complexity, and Searching (2) Andy Wang Data Structures, Algorithms, and Generic Programming.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 2 Prepared by İnanç TAHRALI.
Introduction to Algorithms: Verification, Complexity, and Searching Andy Wang Data Structures, Algorithms, and Generic Programming.
Integer Programming, Branch & Bound Method
Vishnu Kotrajaras, PhD.1 Data Structures
1 Chapter 8 Recursion. 2 Recursive Function Call a recursion function is a function that either directly or indirectly makes a call to itself. but we.
Find LCM Least Common Multiple of 3 and 5: List the Multiples of each number, The multiples of 3 are 3, 6, 9, 12, 15, 18,... etc The multiples of 5 are.
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.
1 Removing Impediments to Loop Fusion Through Code Transformations Bob Blainey 1, Christopher Barton 2, and Jos’e Nelson Amaral 2 1 IBM Toronto Software.
Analysis of Algorithms CS 477/677
Do Now from 1.2a Find the domain of the function algebraically and support your answer graphically. Find the range of the function.
Today in Pre-Calculus Go over homework
Loop Restructuring Loop unswitching Loop peeling Loop fusion
Introduction to Algorithms
KD Tree A binary search tree where every node is a
Algorithm Design Methods
CSCI1600: Embedded and Real Time Software
Optimizing Malloc and Free
Preliminary Transformations
Grade 6 Geometry and Measures Upper and Lower Bounds 2.
Topic: Divide and Conquer
Draw a cumulative frequency table.
Searching: linear & binary
White-Box Testing Techniques I
Describing a Skewed Distribution Numerically
CS 3343: Analysis of Algorithms
Functions and Their Properties II
Bounds Bingo!.
Topic: Divide and Conquer
Boxplots and Outliers Notes
Adversarial Search and Game Playing Examples
SnapChat Mini-Project
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Generalized Index-Set Splitting Christopher Barton Arie Tal Bob Blainey Jose Nelson Amaral

Generalized Index-Set Splitting ● Overview ● Compile-time unknown split points ● Loops with multiple split points ● Removing “dead” inductive-branches ● Nested inductive branches ● Controlling Code Growth ● Results ● Conclusions

Index-Set Splitting ● Divide the index-set (range) of a loop into sub-ranges ● Each sub-range represents a separate loop for (i=0; i < 100; i++) { if (i < 5) a[i] = 2 * a[i]; else a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } for (i=0; i < 5; i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i]; } for (i=5; i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i]; }

Compile-time Unknown Split Points ● When the lower-bound, upper-bound or split- point are unknown at compile time: for (i=0; i < 100; i++) { if (i < m) a[i] = 2 * a[i]; else a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } for (i=0; i < m; i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i]; } for (i=m; i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } Problem: m may be smaller than 0 or greater than 100!

Compile-time Unknown Split Points ● To overcome this problem, we will use the following sub-ranges: – First loop: 0 – min(m, 100) – Second loop: max(m,0) for (i=0; i < min(m,100); i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i]; } for (i=max(m,0); i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i]; }

for (i=max(m,0); i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } Compile-time Unknown Split Points m ≤ 0: 0 < m < ≤ m: for (i=0; i < min(m,100); i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i]; } second loop iterates from 0 to 100 for (i=max(m,0); i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } first loop iterates from 0 to m second loop iterates from m to 100 for (i=max(m,0); i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } for (i=0; i < min(m,100); i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i]; } first loop iterates from 0 to 100 for (i=0; i < min(m,100); i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i]; }

Compile-time Unknown Split Points ● In general: – Given a range with lower bound lb, upper bound ub, and split-point sp, we define two sub-ranges: ● lb to min(sp, ub) ● max(sp,lb) to ub for (i=lb; i < min(sp,ub); i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i]; } for (i=max(sp,lb); i < ub; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } No other conditions are needed in addition to the loop structure

Loops With Multiple Split Points ● Multiple split points can be handled iteratively ● Instead, we build a sub-range tree: for (i=0; i < 100; i++) { if (i < m) a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } 0,100 0,min(m,100)max(m,0), 100 0,min(m,n,100) max(0,n),min(m,100) max(m,0),min(n,100) max(m,n,0),100 if (i < m) if (i < n)

Loops with multiple split points ● The first level in the sub-range tree corresponds to the original loop range for (i=0; i < 100; i++) { if (i < m) a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } 0,100 0,min(m,100)max(m,0), 100 0,min(m,n,100) max(0,n),min(m,100) max(m,0),min(n,100) max(m,n,0),100

Loops with multiple split points ● The second level of the tree corresponds to the two loops created to remove the (i < m) inductive branch for (i=0; i < min(m,100); i++) { a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } for (i=max(m,0); i < 100; i++) { if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } 0,100 0,min(m,100)max(m,0), 100 0,min(m,n,100) max(0,n),min(m,100) max(m,0),min(n,100) max(m,n,0),100

Removing Dead Inductive- Branches ● To easily remove dead inductive-branches, we mark the edges in the Then/Else to correspond with the condition for (i=max(0,n); i<min(m,100); i++) { if (i < m) a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } 0,100 0,min(m,100)max(m,0), 100 0,min(m,n,100) max(0,n),min(m,100) max(m,0),min(n,100) max(m,n,0),100

Removing Dead Inductive- Branches ● We examine the path that leads from the root of the sub-range tree to the desired sub- range node for (i=max(0,n); i<min(m,100); i++) { if (i < m) a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } 0,100 0,min(m,100)max(m,0), 100 0,min(m,n,100) max(0,n),min(m,100) max(m,0),min(n,100) max(m,n,0),100

Nested Inductive-branches ● Nested branches only split the nodes of the ranges to which they apply for (i=0; i < 100; i++) { if (i < m) a[i] = 2 * a[i]; else if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } 0,100 0,min(m,100)max(m,0), 100 max(m,0),min(n,100) max(m,n,0),100

Nested Inductive-branches ● Nested branches only split the nodes of the ranges to which they apply ● The result will be three loops: for (i=max(m,n,0); i < 100; i++) { b[i] = a[i] * a[i]; } 0,100 0,min(m,100) max(m,0), 100 max(m,0),min(n,100) max(m,n,0),100 for (i=0; i < min(m,100; i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i]; } for (i=max(m,0); i < min(n,100); i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i]; }

Controlling Code Growth ● We descend down the sub-range tree using a greedy breadth-first approach for (i=0; i < 100; i++) { if (i < m) a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } 0,100 0,min(m,100)max(m,0), 100 0,min(m,n,100) max(0,n),min(m,100) max(m,0),min(n,100) max(m,n,0),100

Controlling Code Growth ● We stop when we reach all the leaf nodes, or the code growth limit ● The selected nodes designate the loops that will be generated for (i=0; i<min(m,100); i++) { a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i]; } 0,100 0,min(m,100)max(m,0), 100 0,min(m,n,100) max(0,n),min(m,100) max(m,0),min(n,100) max(m,n,0),100 for (i=max(m,n,0); i<100; i++) { b[i] = a[i] * a[i]; } for (i=max(m,0); i<min(n,100); i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i]; }

ISS Opportunities in SPEC2000

Compilation Times for SPEC2000

Runtimes for SPEC2000

Conclusions ● Presented a technique to remove loop-variant branches from loops by splitting the original loop into several loops with varying ranges ● Implemented in IBM’s XL Compiler suite and showed ISS opportunities in benchmark suites ● Exploring ways ISS can be used to improve effectiveness of other optimizations