Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.

Similar presentations


Presentation on theme: "Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property."— Presentation transcript:

1 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Optimizing Compiler. Scalar optimizations.

2 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Main characteristics of the application, affecting its performance Calculations efficiency, Memory usage effectiveness, Correct branch prediction, Efficient use of vector instructions, The effectiveness of parallelization, Instructional parallelism level.

3 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Optimizing compiler role Compiler translates the entire source program into an equivalent program in the resulting machine code or assembly language. The main objective of optimizing compiler is obtaining effective code for target computer system. From a developer point of view, the program must be: easily readable and modifiable easy to debug quickly performed A developer needs reliable unified development environment ability to vary the levels of debugging and performance possibility to obtain high-performance code for different operating systems and microprocessor architectures.

4 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. An optimizing compiler is complex software system, driven by the requirements to the resulting code. Compiler developers face: the complexity of the optimizations legality proof, calculations of profitability, lack of compile-time representation of a typical input data, etc. It requires close cooperation with the developer for achieving the best results. To use features of the compiler successfully, the programmer must: have ideas about computer systems which will be used by his applications; have knowledge about compiler command line options; learn the basic techniques of performance improvements which are used by the compiler; be familiar with the main problems causing the application slowdown; have ideas about the input data which the application will use; know how to analyze program performance.

5 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Intel compilers Intel provides C/C++ and Fortran compilers for Windows, Linux and Mac OS operating systems. For Windows INTEL compiler is made as plug-in for the Microsoft Visual Studio. The important purposes of the Intel compilers are well-timed support of all new computer systems, compatibility with Microsoft Visual Studio on a Windows platform and gcc on Linux and Mac OS, supplying convenient environment to develop effective applications. www.intel.com/software/products

6 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 FE (C++/C or Fortran) Internal representation Profiler Scalar optimizations Loop optimizations Code generation Source files Object files Temporary files or object files with IR Temporary files or object files with IR Interprocedural optimizations Scalar optimizations Code generation Executable file of library Executable file of library Two pass and single pass compilation scheme Loop optimizations -Qipo/-Qip

7 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Front End Parsing is the process of input characters analysis, usually in accordance with a given formal grammar. During parsing the source code is converted into a data structure. Usually it is a tree that reflects the syntax structure of the input sequence and is well suited for further processing. Typically, parsing is divided into two levels: lexical analysis - the input stream of characters partitioned into a linear sequence of tokens - "words" of language (eg, integers, identifiers, string constants, etc.); semantic analysis - token are converted into statements and expressions of used language, according to grammatical rules. At the output we get FE related tables, which are called the internal representation of the program. The usual practice is to share one internal representation for the various high-level languages ​​.

8 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Internal representation void sub(int *a,int k,int r) { int i; for(i=0;i<k;i++) a[i]=r; } (Statements) STMT_ENTRY STMT_ASSIGN STMT_WHILE_DO STMT_RETURN List of statements is base structure of internal representation. Statements may be regarded as the smallest independent elements of the programming language. Statements are used to describe assignments, flow control commands (such as IF, GOTO, CALL, RETURN), the function calls, etc. List of statements is base structure of internal representation. Statements may be regarded as the smallest independent elements of the programming language. Statements are used to describe assignments, flow control commands (such as IF, GOTO, CALL, RETURN), the function calls, etc.

9 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 The statements are usually presented in a list and can be linked in two ways: 1.) Lexically. Each statement has a predecessor and a successor. 2.) By control flow graph. struct Stmt { common_members: int type; Stmt * pred; Stmt *succ; Basic_Block bblock; … } struct Stmt { common_members: int type; Stmt * pred; Stmt *succ; Basic_Block bblock; … } Some simple scalar optimizations based on walking through the list of statements to find some specific statements and process them: For_All_Subroutine_Stmt(subroutine,stmt) { if(Stmt_type(stmt) == Stmt_Assign { //assignment processing } Some simple scalar optimizations based on walking through the list of statements to find some specific statements and process them: For_All_Subroutine_Stmt(subroutine,stmt) { if(Stmt_type(stmt) == Stmt_Assign { //assignment processing }

10 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Expressions a = b + c; expressions represent expression tree. Boundary expressions can be variables or constants Internal representation also contains a lot of tables describing different objects such as variables, functions, types, etc.

11 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Control Flow Graph A Control Flow Graph (CFG) represents all paths through a program control could travers during its execution. In a control flow graph each node represents a basic block (a straight-line piece of code without any jumps or jump targets). Jump target starts a block, and jump ends a block. Directed edges are used to represent jumps of the control. There are two specially designated blocks: the entry block, through which control enters into the flow graph, and the exit block, through which all control flow leaves. The CFG is essential to many compiler optimizations.

12 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 int main() { int sum=0; int i=1; while (i<11) { sum=sum+i; i = i+1; } printf(“%d\n”,sum); } Entry Sum=0; i=1; Entry Sum=0; i=1; L12: if (i<11) L12: if (i<11) sum = sum+i; i = i+1; Goto L12 sum = sum+i; i = i+1; Goto L12 printf(..) Return CFG example Struct BBLOCK { STMT first_stmt STMT last_stmt BBLOCK_LIST pred_list BBLOCK_LIST succ_list … } Struct BBLOCK { STMT first_stmt STMT last_stmt BBLOCK_LIST pred_list BBLOCK_LIST succ_list … }

13 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 FE (C++/C or Fortran) Internal representation Profiler Scalar optimizations Loop optimizations Code generation Source files Object files Temporary files or object files with IR Temporary files or object files with IR Interprocedural optimizations Scalar optimizations Code generation Executable file of library Executable file of library Two pass and single pass compilation scheme Loop optimizations -Qipo/-Qip

14 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Scalar optimizations There are well-known scalar optimizations such as constant folding, constant propagation and copy propagation. Constant folding is a process of calculating a constants at compile time. Constant propagation is substitution of variables with known constant values by these values in the expression. int x = 14; int y = 7 - x/2; int x = 14; int y = 7 - x/2; int x = 14; int y = 7 - 14/2; int x = 14; int y = 7 - 14/2; int x = 14; int y = 0; int x = 14; int y = 0; Constant propagation Constant folding Copy propagation is substitution of variables by their values. y = x; z = 3+y; y = x; z = 3+y; y = x; z = 3+x; y = x; z = 3+x; Copy propagation

15 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Common subexpressions elimination Search for identical subexpressions and saving the calculation result in a temporary variable for later reuse. a = b * c + g; d = b * c * d; a = b * c + g; d = b * c * d; tmp = b * c; a = tmp + g; d = tmp * d; tmp = b * c; a = tmp + g; d = tmp * d; CSE

16 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Dead code elimination Removal of code that does not change the output of the program. int foo() { int a = 24; int b = 25; int c; if(a<0) printf(«a<0 »); c = a << 2; return c; } int foo() { int a = 24; int b = 25; int c; if(a<0) printf(«a<0 »); c = a << 2; return c; } int foo() { int a = 24; int c; c = a << 2; return c; } int foo() { int a = 24; int c; c = a << 2; return c; } Dead code elimination There are many cases when dead code can appear. It can be the result of scalar optimizations, inlining, etc.

17 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Removal of excessive branching, broaching conditions Sometimes conditional branches can be deleted because of previous conditions if(x>0) { … if(x>0) { a=x; } else { a=-x; } … } if(x>0) { … if(x>0) { a=x; } else { a=-x; } … } if(x>0) { … a=x; … } if(x>0) { … a=x; … } Condition propagation

18 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Why Control Flow Graph is important for scalar optimizations? 10/17/10 X = C1; L = X; X = C1; L = X; Y = X; X = C2; Y = X; X = C2; Z = X; IF(X>C1) When we can propagate the information about the values ​​of X? For straight-line piece of code the answer is trivial. CFG resolves ambiguity.

19 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Data Flow analysis Data Flow Analysis is a technique for gathering information about a possible set of values ​​for each variable calculated at various points of a program. Control flow graph (CFG) is used to identify those parts of the program in which a certain value is assigned to a variable can be propagated. A definition-use graph is a graph that contains the edges from each variable definition point in the program to every point of its use.

20 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Construction of def-use chain for the base block is trivial. Each variable definition is associated with all subsequent uses of it. Each subsequent redefinition stops and starts a new chain. In order to use this local graph CFG computed using several sets those characterize the behavior of the block: Uses (b): A set of variables used in the block, but have no definitions within the block. Defsout (b): A set of definitions that have been made in b, and reached the end of the block. Killed (b): A set of definitions that were canceled within a block by other definitions. Reaches (b): The set of all definitions made ​​in other units, including b, which can reach b.

21 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 To understand what definition will be used in our basic block, it is important to know reaches (b). It can be constructed via an iterative process that will calculate the reaches (b) through the sets of previous blocks. Reaches (b) = U for all predecessors (defsout (p) U (reaches (p) ∩ ¬ killed (p)) The problem is that in the presence of loops, the set reaches(b) may depend on the reaches (b). If we will repeat this equation many times for each basic block CFG – final decision can be get.

22 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Constructed sets are used for many scalar optimizations such as dead code elimination, constant propagation and etc. The main problem of this approach is a large number of edges in the Def-Use graph and a great time for calculation of these sets. As result a lot of resources are needed for processing. S1 X= S2 X= S3 X= S4 S5 =X S6 =X S7 =X This example illustrates the problem. Definitions of S1, S2, S3 pass through the top of S4. Since each definition reaches every use, there are nine edges. Static single assignment form (SSA) was proposed to simplify DEF/USE chain.

23 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 SSA (Static single assignment form) SSA form proposes unique name for each variable definition and introduction of special pseudo-assignments. X1= X2=X2=X2=X2=X3=X3=X3=X3= =X4 S1 S2 S3 S4 S5 S6S7 X4=φ(X1,X2,X3)

24 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 SSA is designed to save developers from building complex use / def chains for local variables. Power of SSA is that each variable has only one definition in the program. Therefore, use / def chain is trivial. SSA introduces special presentation of Phi-functions in places with uncertainty, to create a new variable. This so-called pseudo-assignment. In the construction is necessary to place Phi - functions and create new unique variables. The new variables are generated by completing the variable name with a unique option. In order to correctly insert the Phi function is necessary to consider some of the concepts of graph theory.

25 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Dominance frontier of node x is set w of all nodes where x dominates all predecessors nodes from w, but doesn’t dominates nodes from w. Example: Dom[5] = {5,6,7,8} DF[5] ={5,4,12,11} 1 2 3 4 5 6 7 8 9 11 12 5 67 8 10 Node N dominates node M if all ways to M pass through N. A node is an immediate dominator of node M if it is the last dominator on any path from entry node to M.

26 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 In SSA form, each variable definition must dominate the use of this variable. Construction of the dominators set for each basic block can be the following: The set of dominators for a node N is the intersection of the dominators set of all his predecessors, and the node itself. Strict dominator N, this dominator!= N. Immediate dominator – the closest node from the set of dominators. idom (N) - the immediate dominator for basic block N children (N) - the set of basic blocks for N, which it dominates 2 3 4 5 6

27 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Criterion of dominance frontier: if the basic block N contains a definition of variable A, then every node on the dominance frontier of node N requires Phi function for A. Each Phi function is also the definition, so you must apply the criterion while there are nodes which requires Phi function. B=A A=x B=A A=x A_2=φ(A_1,A_3) A_ B=A_2 3=x A_2=φ(A_1,A_3) A_ B=A_2 3=x Inserting φ functions for the node 5 of the scheme on slide 25. Inserting φ functions for the node 5 of the scheme on slide 25.

28 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Optimization using the SSA form: Dead code elimination If the variable a_ver is not used than it should be removed. Constant propagation If there is an assignment a_ver = const, then all of a_ver should be replaced by const If there is a φ-function a_next = φ (c, c) than φ should be replaced by c. Copy propagation If there is an assignment a_n = b_k than all usages of a_n should be replaced with b_k. If there is an assignment a_n = φ (b_k, b_k) than φ should be replaced with b_k.

29 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10 Thank you!


Download ppt "Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property."

Similar presentations


Ads by Google