Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fakultät für informatik informatik 12 technische universität dortmund Prepass Optimizations - Session 11 - Heiko Falk TU Dortmund Informatik 12 Germany.

Similar presentations


Presentation on theme: "Fakultät für informatik informatik 12 technische universität dortmund Prepass Optimizations - Session 11 - Heiko Falk TU Dortmund Informatik 12 Germany."— Presentation transcript:

1 fakultät für informatik informatik 12 technische universität dortmund Prepass Optimizations - Session 11 - Heiko Falk TU Dortmund Informatik 12 Germany Slides use Microsoft cliparts. All Microsoft restrictions apply.

2 - 2 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Schedule of the course TimeMondayTuesdayWednesdayThursdayFriday 09:30- 11:00 1: Orientation, introduction 2: Models of computation + specs 5: Models of computation + specs 9: Mapping of applications to platforms 13: Memory aware compilation 17: Memory aware compilation 11:00 Brief break 11:15- 12:30 6: Lab*: Ptolemy 10: Lab*: Scheduling 14: Lab*: Mem. opt. 18: Lab*: Mem. opt. 12:30Lunch 14:00- 15:20 3: Models of computation + specs 7: Mapping of applications to platforms 11: High-level optimizations* 15: Memory aware compilation 19: WCET & compilers* 15:20Break 15:40- 17:00 4: Lab*: Kahn process networks 8: Mapping of applications to platforms 12: High-level optimizations* 16: Memory aware compilation 20: Wrap-up * Dr. Heiko Falk

3 - 3 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Outline Motivation of Prepass Optimizations Loop Nest Splitting  Introduction and Code Examples  Workflow of Loop Nest Splitting Condition Satisfiability Condition Optimization Search Space Generation Search Space Exploration  Results References & Summary

4 - 4 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Motivation of Prepass Optimizations Lexical Analysis Source Code Tokens Syntactical Analysis Syntax Tree Semantical Analysis High- Level IR Code Selection Register Allocation Instruction Scheduling ASM Code Optimization High- Level IR Low- Level IR Code Optimization Low- Level IR Low- Level IR  Structure of an optimizing compiler: Question: Does only the compiler optimize code?

5 - 5 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Motivation of Prepass Optimizations Optimizations outside a compiler are called  Postpass optimization if applied after the compiler,  Prepass optimization if applied before the compiler. Advantages of prepass optimizations:  source code transformations easier to understand,  allow manual experimentation of an optimization technique before a costly full implementation,  independence of the actual compiler; basically applicable for every compiler supporting the source language,  independence of the actual target processor; basically applicable for arbitrary processors.

6 - 6 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Application Domain of Loop Nest Splitting Embedded multimedia applications:  Data flow dominated, i.e. are applied to huge amounts of data, produce huge amounts of data as output.  Most part of execution time spent in (deeply) nested loops.  Simple loop structures with known or statically analyzable lower and upper loop bounds.  Manipulation of large multi-dimensional arrays.  Typical example: Streaming applications like e.g. MPEG4.

7 - 7 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Example: MPEG4 Motion Estimation Reference Frame Search Area 36x36 Pixels v4x1 v4y1 144 Pixels 196 Pixels Current Frame 4x4 Pixels x4 y4 x y i

8 - 8 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Source Code MPEG4 Motion Estimation for (i=0; i<20; i++) for (x=0; x<36; x++) for (y=0; y<49; y++) for (vx=0; vx<9; vx++) for (vy=0; vy<9; vy++) for (x4=0; x4<4; x4++) for (y4=0; y4<4; y4++) { if (4*x+x4 35 || 4*y+y4 48) then_block_1; else else_block_1; if (4*x+vx+x4-4 35 || 4*y+vy+y4-4 48) then_block_2; else else_block_2; }

9 - 9 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Observations Compilation and execution of this source yields:  Overall execution of 91,445,760 if-statements.  Very irregular control flow due to if-statements.  Additional arithmetical overhead: Multiplications, additions, comparisons, logical or, …  Performance of this code constrained by control flow, and not by computation of motion vectors!

10 - 10 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Loop Nest Splitting Automatic analysis of loops & if-statements:  x, y, x4 and y4 never take values such that conditions 4*x+x4<0 and 4*y+y4<0 would ever be satisfied.  Conditions can be replaced by constant truth value ‘ 0 ’.  For x ≥ 10 or y ≥ 14: both if-statements are provably satisfied so that their then-parts are provably executed.  Both if-statements are satisfied for more than 92% of all executions of the innermost y4 -loop.

11 - 11 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Source Code after Loop Nest Splitting for (; y<49; y++) // 2nd y-Loop for (i=0; i<20; i++) for (x=0; x<36; x++) for (y=0; y<49; y++) if (x>=10 || y>=14) // Splitting-If else for (vx=0; vx<9; vx++)... { if (0 || 4*x+x4>35 || 0 || 4*y+y4>48) // Old then_block_1; else else_block_1; // If-Stmts if (4*x+vx+x4-4 35 ||...) then_block_2; else else_block_2; } for (vx=0; vx<9; vx++)... { then_block_1; then_block_2; } // No If-Stmts

12 - 12 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Structure of Optimized Code Splitting-If:  Satisfied splitting-if automatically implies that conditions of all original if-statements are satisfied.  Then-part of splitting-if does not contain original if- statements any more, but only their then-parts.  Unsatisfied splitting-if does not allow any statement about satisfaction of original if-statements.  Else-part of splitting-if contains all original if-statements in order to keep optimized code correct.

13 - 13 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Why a Second Y-Loop? y = 14 1516 for (x=0; x<36; x++) for (y=0; y<49; y++) if (x>=10 || y>=14) for (vx=0; vx<9; vx++)... Intuitive code: Splitting-If: 1 execution for every single y ∈ [14, 48] y = 14 for (x=0; x<36; x++) for (y=0; y<49; y++) if (x>=10 || y>=14) for (; y<49; y++) for (vx=0; vx<9; vx++)... Optimized code: 1516 Splitting-If: 1 single execution for all y ∈ [14, 48]

14 - 14 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Stages of Loop Nest Splitting Condition Satisfiability: Find single conditions of if-statements that are either always satisfiable or always unsatisfiable. Condition Optimization: For each condition C, find a “simpler” condition C’ such that C’ ⇒ C always holds (if C’ is true, C is also true). Search Space Generation: Combine all conditions C’ to a structure G modeling all if-statements including their logical structures ( &&, || ). Search Space Exploration: Using G, determine a condition for the splitting-if leading to an overall minimization of if-statement executions.

15 - 15 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting x x4 x x x 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4<61 1 - Condition Satisfiability ( && ) || ||

16 - 16 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting 1 - Condition Satisfiability 2 - Condition Optimization x x4 x x 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4<61 ( && ) || || x x4 x x

17 - 17 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting 1 - Condition Satisfiability 2 - Condition Optimization x x4 x x 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4<61 ( && ) || || 3 - Search Space Generation x x4 x

18 - 18 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting 1 - Condition Satisfiability 2 - Condition Optimization 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4<61 ( && ) || || 3 - Search Space Generation x x4 x 4 - Search Space Exploration x x4 x>=7 || x4>=1

19 - 19 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Assumptions Loop Bounds: All lower & upper bounds (l L, u L ) are constant. If-Statements: Sequence of loop-dependent conditions, connected with logical AND or logical OR. Format: if ( C 1 ⊗ C 2 ⊗ … ) ⊗ ∈ { &&, || } Loop-dependent Conditions: Linear terms depending on index variables i L of loops. Format:C x ≅ ∑ (c L * i L ) + c ≥ 0c L, c ∈ ℤ L=1 N

20 - 20 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Polytopes & Linear Conditions 4 3 1 0 -1 0 0 1 0 -1 36 0 -35 0 -3 x x4 Definition (Polyhedron & Polytope):  Polyhedron P = { x ∈ ℤ N | Ax ≥ b }A ∈ ℤ mxN, b ∈ ℤ m  Polyhedron P is called Polytope iff |P| < ∞. Model of linear conditions in nested loops: 4*x + 3*x4 > 35 for x ∈ [0, 35], x4 ∈ [0, 3] as polytope  P = {p ∈ ℤ 2 | p ≥ }

21 - 21 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Stage 1 – Condition Satisfiability Goal: To determine loop-dependent conditions C x that constantly result in ‘true’ or ‘false’ for all values of index variables of all surrounding loops. Approach:  Translate each condition C x into polytope P x (cf. prev. slide)  Compare with empty set: P x == ∅ ⇒ C x always ‘false’  Compare with universe: P x == U ⇒ C x always ‘true’ Modification of Source Code: Replace such constant conditions by truth values ‘ 0 ’ and ‘ 1 ’, resp.

22 - 22 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Stage 2 – Condition Optimization Given: Loop-dependent condition C = ∑ (c L * i L ) + c ≥ 0 Approach:  Use a genetic algorithm (GA) to determine values l C,L and u C,L L=1 N Goal: To determine values l C,L and u C,L per condition C and per loop L such that: C is provably satisfied for all l C,L ≤ i L ≤ u C,L and values l C,L and u C,L lead to minimization of if-statement executions

23 - 23 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Genetic Algorithms (1)  Analogy to natural evolution, “Survival of the fittest”  Optimization in loop i = 0, 1, …  Iteration i maintains population P i ; a population consists of several individuals.  An individual represents one possible solution for the modeled optimization problem.

24 - 24 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Genetic Algorithms (2)  Data structure of an individual: chromosome.  Chromosome is sequence of many genes storing data.  The actual value stored in a gene is called allele.

25 - 25 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Genetic Algorithms (3)  Fitness function computes fitness of each individual of P i.  Selection determines subset P i ’ of P i with highest/lowest fitness.  Variation generates next population P i+1 by adding random individuals to P i ’.

26 - 26 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Variation bases on two fundamental genetic operators: Workflow of Genetic Algorithms (4)  Crossover:  Mutation:

27 - 27 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund  Randomized variation may cause P i to contain individuals not representing a valid solution ↝ Repair mechanism. Workflow of Genetic Algorithms (5)  Termination of optimization if - N th iteration performed, - best determined fitness does not improve for k iterations, - …  Return individual with best fitness from last population P i as final result.

28 - 28 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Result of Condition Optimization Input of Condition Optimization:  Linear loop-dependent condition C  Loop bounds [l L, u L ] Output of Genetic Algorithm:  Values (l C,1, u C,1, …, l C,N, u C,N ) of individual with best fitness. Output of Condition Optimization:  Polytope P’ C = { (x 1, …, x N ) ∈ ℤ N | ∀ Loops L: l C,L ≤ x L ≤ u C,L }

29 - 29 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Stage 3 – Search Space Generation Given: If-statements, conditions & polytopes IF i = (C i,1 ⊗ C i,2 ⊗ … ⊗ C i,n ), ⊗ ∈ { &&, || } ∀ C i,j ↝ P i,j Construction of a polytope P i for each if-statement IF i : if C i,j-1 && C i,j : ∩ ⇒ if C i,j-1 || C i,j : ∪ ⇒ P i,j-1 P i,j Construction of a global polytope: Global Search Space G models iteration space where all if-statements are satisfied.  G = ⋂ P i

30 - 30 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Stage 4 – Search Space Exploration Given: Polytope G containing points where all if-statements are satisfied. Goal: To determine final polytope G’ ⊆ G such that: Translation of G’ into conditions of splitting-if leads to overall minimization of if-statement executions. Approach: Use a second genetic algorithm (omitted here) Resulting Splitting-if:  Placed into outermost possible loop.  Consists of all linear constraints included in G’.

31 - 31 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Relative Runtimes after LNS

32 - 32 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Relative Energy Dissipation (ARM7) after LNS

33 - 33 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Relative Code Sizes after LNS

34 - 34 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund References Loop Nest Splitting:  H. Falk, P. Marwedel, Control Flow driven Splitting of Loop Nests at the Source Code Level, DATE Conference, Munich 2003.  H. Falk, Control Flow Optimization by Loop Nest Splitting at the Source Code Level, University of Dortmund, Research Report N o 773, Dortmund 2003.

35 - 35 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Summary Non-Compiler Optimizations  Postpass if performed after compiler, e.g. at linker level  Prepass if performed before compiler, e.g. at source code level Loop Nest Splitting  Control flow optimization in data flow dominated embedded multimedia applications  Polytopes model linear conditions and loops  Genetic algorithms optimize polytope models  Huge improvements in terms of ACET and energy (and by the way WCET), but potentially large increases in code size

36 - 36 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Coffee/tea break (if on schedule) Q&A?


Download ppt "Fakultät für informatik informatik 12 technische universität dortmund Prepass Optimizations - Session 11 - Heiko Falk TU Dortmund Informatik 12 Germany."

Similar presentations


Ads by Google