1 Optimization Optimization = transformation that improves the performance of the target code Optimization must not change the output must not cause errors.

Slides:



Advertisements
Similar presentations
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Advertisements

7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.
Course Outline Traditional Static Program Analysis Software Testing
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic 5: Peep Hole Optimization José Nelson Amaral
Compiler-Based Register Name Adjustment for Low-Power Embedded Processors Discussion by Garo Bournoutian.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Carnegie Mellon Lecture 7 Instruction Scheduling I. Basic Block Scheduling II.Global Scheduling (for Non-Numeric Code) Reading: Chapter 10.3 – 10.4 M.
Code optimization: –A transformation to a program to make it run faster and/or take up less space –Optimization should be safe, preserve the meaning of.
Jeffrey D. Ullman Stanford University. 2  A never-published Stanford technical report by Fran Allen in  Fran won the Turing award in  Flow.
Optimizing single thread performance Dependence Loop transformations.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
C Chuen-Liang Chen, NTUCS&IE / 321 OPTIMIZATION Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
1 Code Optimization. 2 The Code Optimizer Control flow analysis: control flow graph Data-flow analysis Transformations Front end Code generator Code optimizer.
Control Flow Analysis (Chapter 7) Mooly Sagiv (with Contributions by Hanne Riis Nielson)
2.7 Signal-Flow Graph models (信号流图模型) Why SFG? Block Diagrams are adequate for representation, but cumbersome. SFG provides the relation between system.
Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Introduction to Program Optimizations Chapter 11 Mooly Sagiv.
Data-Flow Analysis (Chapter 11-12) Mooly Sagiv Make-up class 18/ :00 Kaplun 324.
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2008/2009.
Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Optimizing Compilers Nai-Wei Lin Department of Computer Science and Information Engineering National Chung Cheng University.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Code Optimization Chapter 9 (1 st ed. Ch.10) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
Introduction to Optimization, II Value Numbering & Larger Scopes Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari
1 June 4, June 4, 2016June 4, 2016June 4, 2016 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel: (800)
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
11/23/2015© Hal Perkins & UW CSEQ-1 CSE P 501 – Compilers Introduction to Optimization Hal Perkins Autumn 2009.
Lexical analyzer Parser Semantic analyzer Intermediate-code generator Optimizer Code Generator Postpass optimizer String of characters String of tokens.
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
1 Chapter10: Code generator. 2 Code Generator Source Program Target Program Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
CHAPTER 1 INTRODUCTION TO COMPILER SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Code Optimization Overview and Examples
High-level optimization Jakub Yaghob
Optimizing Compilers Background
Optimization Code Optimization ©SoftMoore Consulting.
Eugene Gavrin – MSc student
Optimizing Transformations Hal Perkins Autumn 2011
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Unit IV Code Generation
CS 201 Compiler Construction
TARGET CODE GENERATION
Introduction to Optimization Hal Perkins Summer 2004
Optimizing Transformations Hal Perkins Winter 2008
Code Optimization Overview and Examples Control Flow Graph
8 Code Generation Topics A simple code generator algorithm
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Control Systems Spring 2016
Code Generation Part II
CSc 453 Final Code Generation
Code Optimization.
CS 201 Compiler Construction
Presentation transcript:

1 Optimization Optimization = transformation that improves the performance of the target code Optimization must not change the output must not cause errors that were not present in the original program must be worth the effort (profiling often helps). Which optimizations are most important depends on the program, but generally, loop optimizations, register allocation and instruction scheduling are the most critical. Local optimizations : within Basic Blocks Superlocal optimizations : within Extended Basic Blocks Global optimizations: within Flow Graph

2 Extended Basic Block An Extended Basic Block is a maximal sequence of instructions beginning with a leader, that contains no join nodes other than its leader. Some local optimizations are more effective when applied on EBBs. Such optimizations tend to treat the paths through an EBB as if they were in a single block.

3 Algebraic simplifications These include: Taking advantage of algebraic identities (x*1) is x Strength reduction (x*2) is (x << 1) Simplifications such as - (- x ) is x (1 || x ) is true (1 && x ) is x *(& x ) is x

4 Constant folding Definition: The evaluation at compile time of expressions whose values are known to be constant. Is it always safe? Booleans: yes Integers: almost always issues: division by zero, overflow Floating point: usually no issues: compiler's vs. processor's floating point arithmetic, exceptions, etc.) May be combined with constant propagation.

5 Redundancy elimination Redundancy elimination = determining that two computations are equivalent and eliminating one. There are several types of redundancy elimination: Value numbering Associates symbolic values to computations and identifies expressions that have the same value Common subexpression elimination Identifies expressions that have operands with the same name Constant/Copy propagation Identifies variables that have constant/copy values and uses the constants/copies in place of the variables. Partial redundancy elimination Inserts computations in paths to convert partial redundancy to full redundancy.

6 Redundancy elimination read(i) j = i+1 k = i n = k+1 i = 2 j = i*2 k = i+2 a = b * c x = b * c

7 Value numbering Goal Assign a symbolic value (called a value number) to each expression. Two expressions should be assigned the same value number if the compiler can prove that they will be equal for all inputs. Use the value numbers to find and eliminate redundant computations. Extensions: Take algebraic identities into consideration Example: x*1 should be assigned the same value number as x Take commutativity into consideration Example: x+y should be assigned the same value number as y+x

8 Value numbering How does it work? Supporting data structure: hash table For expression x+y, look up x and y to get their value numbers, xv, yv. At this stage, we can order the operands by value number (to take advantage of commutativity) or apply algebraic simplifications or even constant folding. Look up (+, xv, yv) in hash table. If it is not there, insert it and give it a new value number. If the expression has a lhs, assign that value number to it. If the expression has no lhs, create a temporary one, assign the value number to it and insert a new instruction t=x+y to the basic block. If it is, then it has a value number already. Replace its computation by a reference to the variable with that value.

9 Value numbering Consider this situation: The second x+y should not be replaced by z, because z was redefined since it was assigned x+y. How do we deal with this? Option 1: Do not store the lhs of a computed expression in the ST, but its value number instead. Then, if the lhs is redefined, its value number will be different, so we will not do an invalid replacement. Option 2: Every time an expression is evaluated, create a temporary to hold the result. The temporary will never be redefined, so the problem is avoided. The code shown above would be converted to: Option 3: Apply the algorithm to the SSA form of that block. Then this problem is not an issue any longer: z = x+y z = w v = x+y t1 = x+y z = t1 z = w v = t1 z1 = x0+y0 z2 = w0 v0 = z1

10 Local value numbering Algorithm sketch for local value numbering: Processing of instruction inst located at BB[n,i] hashval = Hash(inst.opd, inst.opr1, inst.op2) If inst matches instruction inst2 in HT[hashval] if inst2 has a lhs, use that in inst If inst has a lhs remove all instructions in HT that use inst's lhs If inst has no lhs create new temp insert temp=inst.rhs before inst replace inst with temp Add i to the equivalence class at hashval.

11 Local value numbering s1:a=x + y s2:b=x + y s3:c=a + i s4: x=y s5:d=b + i s6:a=a * d s7:e=x + y s8:if (a==b) goto L hash table (+,0,1), [s1] value table x0 y1 hash table (+,0,1), [s1, s2] value table x0 y1 a2 a2 b2 s1:a=x + y s2:b=x + y s3:c=a + i s4: x=y s5:d=b + i s6:a=a * d s7:e=x + y s8:if (a==b) goto L s1:a=x + y s2:b=a s3:c=a + i s4: x=y s5:d=b + i s6:a=a * d s7:e=x + y s8:if (a==b) goto L

12 Local value numbering hash table (+,0,1), [s1, s2] value table x0 y1 a2 b2 i3 (+,2,3), [s3] c4 hash table (+,0,1), [s1, s2] value table (+,2,3), [s3] x1 y1 a2 b2 i3 c4 s1:a=x + y s2:b=a s3:c=a + i s4: x=y s5:d=b + i s6:a=a * d s7:e=x + y s8:if (a==b) goto L s1:a=x + y s2:b=a s3:c=a + i s4: x=y s5:d=b + i s6:a=a * d s7:e=x + y s8:if (a==b) goto L it uses x

13 Local value numbering hash table (+,0,1), [s2] value table (+,2,3), [s3, s5] x1 y1 a2 b2 i3 c4 s1:a=x + y s2:b=a s3:c=a + i s4: x=y s5:d=b + i s6:a=a * d s7:e=x + y s8:if (a==b) goto L s1:a=x + y s2:b=a s3:c=a + i s4: x=y s5:d=c s6:a=a * d s7:e=x + y s8:if (a==b) goto L d4 hash table (+,0,1), [s2] value table (+,2,3), [s3, s5] x1 y1 a5 b2 i3 c4 s1:a=x + y s2:b=a s3:c=a + i s4: x=y s5:d=c s6:a=a * d s7:e=x + y s8:if (a==b) goto L d4 (*,2,4), [s6]

14 Local value numbering hash table (+,0,1), [] value table (+,2,3), [s5] x1 y1 a5 b2 i3 c4 s1:a=x + y s2:b=a s3:c=a + i s4: x=y s5:d=c s6:a=a * d s7:e=x + y s8:if (a==b) goto L d4 (*,2,4), [s6] (+,1,1), [s7] e6 hash table (+,0,1), [] value table (+,2,3), [s5] x1 y1 a5 b2 i3 c4 s1:a=x + y s2:b=a s3:c=a + i s4: x=y s5:d=c s6:a=a * d s7:e=x + y s8:if (a==b) goto L d4 (*,2,4), [s6] (+,1,1), [s7] e6 (==,2,5), [s8] t7 Note how the value numbers for this expression's operands are sorted, to take advantage of commutativity s1:a=x + y s2:b=a s3:c=a + i s4: x=y s5:d=c s6:a=a * d s7:e=x + y s8: t=a==b s9:if (t) goto L

15 Local value numbering value table Adding an is_constant entry to the value table, along with the value of the constant, would allow us to incorporate constant folding. We will use SSA numbering for a variable's value number and the actual value for a constant's value number. s1:a=1 + 4 s2:b=4 + 1 s3:c=a + i s4:d=b + i s5:a=a * d s6:e=a + 2 s1:a=5 s2:b=5 s3:c=a + i s4:d=c s5:a=a * d s6:e=a + 2 hash table (+,1,4), [s1, s2] (+,a1,2), [s6] (*,5,c0), [s5] (+,5,i0), [s4] aa1F5 b5T5 ii0F- cc0F- d F- ee0F-

16 Local value numbering With a bit of extra work, we might also do some local constant propagation on the fly. value table s1:a=1 + 4 s2:b=4 + 1 s3:c=a + i s4:d=b + i s5:a=a * d s6:e=a + 2 s1:a=5 s2:b=5 s3:c=5 + i s4:d=c s5:a=5 * d s6:e=a + 2 hash table (+,1,4), [s1, s2] (+,a1,2), [s6] (*,5,c0), [s5] (+,5,i0), [s4] aa1F5 b5T5 ii0F- cc0F- d F- ee0F- Applying the same algorithm on a BB that is in SSA form will simplify things.

17 Superlocal value numbering Each path on the EBB should be handled separately However, some blocks are prefixes of more than one EBB. We'd like to avoid recomputing the values in those blocks Possible solutions : Use a mechanism similar to those for lexical scope handling Save the state of the table at the end of each BB

18 Global value numbering Main Idea: Variable equivalence Two variables are equivalent at point P iff they are congruent and their defining assignments dominate P Two variables are congruent iff their definitions have identical operators and congruent operands. We need SSA form

19 Global value numbering Data structure: The Value Graph. Nodes are labeled with operators function symbols constant values Nodes are named using SSA-form variables Edges point from operators or functions to operands Edges are labeled with numbers that indicate operand position

20 Global value numbering In the Value Graph: Two nodes are congruent iff They are the same node, OR Their labels are constants and the constants have the same value, OR Their labels are the same operator and their operands are congruent. Algorithm sketch: Partition nodes into congruent sets Initial partition is optimistic: nodes with the same label are placed together Note: An alternative would be a pessimistic version, where initial sets are empty and then fill up in a monotonic way. Iterate to a fixed point, splitting partitions where operands are not congruent.

21 entry read(n) i = 1 j = 1 i mod 2 == 0 i = i + 1 j = j + 1 i = i + 3 j = j + 3 j > n exit B1 B2 B3 B4 B5 entry read(n1) i1 = 1 j1 = 1 i3 =  2(i1, i2) j3 =  2(j1, j2) i3 mod 2 == 0 i4 = i3 + 1 j4 = j3 + 1 i5 = i3 + 3 j5 = j3 + 3 i2 =  5(i4, i5) j2 =  5(j4, j5) j2 > n1 exit B1 B2 B3 B4 B5

22 entry read(n1) i1 = 1 j1 = 1 i3 =  2(i1, i2) j3 =  2(j1, j2) i3 mod 2 == 0 i4 = i3 + 1 j4 = j3 + 1 i5 = i3 + 3 j5 = j3 + 3 i2 =  5(i4, i5) j2 =  5(j4, j5) j2 > n1 exit B1 B2 B3 B4 B 22 ++ 55 = mod 22 ++ 55 > c0 c4 c1 i1 c2 2 t1 c3 i3 i4i5 i2 j1 d1d2 j3 j4 j5 j2 d3 n1

22 ++ 55 = mod 22 ++ 55 > c0 c4 c1 i1 c2 2 t1 c3 i3 i4i5 i2 j1 d1d2 j3 j4 j5 j2 d3 n1 Initially, nodes that have the same label are placed in the same set. The initial partition is shown on the left. Nodes that are in the same set, have the same color. i4 and j4 are congruent because their operands are congruent. Similarly, i5 and j5 are congruent. However, i4 and i5 are not. The "red" partition needs to be split Exercise: How would the partitions change if i5 contained a minus? Answer: click here

22 +– 55 = mod 22 ++ 55 > c0 c4 c1 i1 c2 2 t1 c3 i3 i4i5 i2 j1 d1d2 j3 j4 j5 j2 d3 n1 The initial partition is shown on the left. Nodes that are in the same set, have the same color. As you can see, i5 and j5 are not congruent this time, since they are labeled differently. This, in turn, means that i2 and j2 are not congruent, so that set should be split. As a result of that, i3 and j3 are now not congruent. This causes i4 and j4 to not be congruent either. The final partition is shown on the next slide.

22 +– 55 = mod 22 ++ 55 > c0 c4 c1 i1 c2 2 t1 c3 i3 i4i5 i2 j1 d1d2 j3 j4 j5 j2 d3 n1