Optimizing Compilers Background

Slides:



Advertisements
Similar presentations
Target Code Generation
Advertisements

Register Usage Keep as many values in registers as possible Register assignment Register allocation Popular techniques – Local vs. global – Graph coloring.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
1 Optimization Optimization = transformation that improves the performance of the target code Optimization must not change the output must not cause errors.
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Carnegie Mellon Lecture 7 Instruction Scheduling I. Basic Block Scheduling II.Global Scheduling (for Non-Numeric Code) Reading: Chapter 10.3 – 10.4 M.
1 CS 201 Compiler Construction Machine Code Generation.
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.
Program Representations. Representing programs Goals.
1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Code Generation Mooly Sagiv html:// Chapter 4.
Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
Improving Code Generation Honors Compilers April 16 th 2002.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Precision Going back to constant prop, in what cases would we lose precision?
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Chapter 7 Object Code Generation. Chapter 7 -- Object Code Generation2  Statements in 3AC are simple enough that it is usually no great problem to map.
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
1 Chapter10: Code generator. 2 Code Generator Source Program Target Program Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
COMPILERS Liveness Analysis hussein suleman uct csc3003s 2009.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Topic Register Allocation
Code Optimization.
Mooly Sagiv html://
Instruction Scheduling
Conception of parallel algorithms
The compilation process
Topic Register Allocation
Optimization Code Optimization ©SoftMoore Consulting.
Instruction Scheduling for Instruction-Level Parallelism
Code Generation.
Unit IV Code Generation
CS 201 Compiler Construction
Chapter 6 Intermediate-Code Generation
Instruction Scheduling Hal Perkins Winter 2008
CS 201 Compiler Construction
CS 201 Compiler Construction
Register Allocation Hal Perkins Summer 2004
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Lecture 16: Register Allocation
Chapter 12 Pipelining and RISC
8 Code Generation Topics A simple code generator algorithm
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Compiler Construction
Lecture 17: Register Allocation via Graph Colouring
Code Generation Part II
COMPILERS Liveness Analysis
Target Code Generation
CSc 453 Final Code Generation
Instruction Scheduling Hal Perkins Autumn 2011
(via graph coloring and spilling)
CS 201 Compiler Construction
CS 201 Compiler Construction
Presentation transcript:

Optimizing Compilers Background Spring 2004 © Vivek Sarkar, Krishna V. Palem and Weng Fai Wong

Structure of Optimizing Compilers TOOLS Source Program Source Program Front ends Program Database ….. Front-end #1 Front-end #2 High-level Intermediate Language HIL High-level Optimizer Middle end Optimized HIL Lowering of IL Low-level Intermediate Language LIL Low-level Optimizer Back ends Optimized LIL Target-1 Code Generator and Linker Target-2 Code Generator and Linker Target-3 Code Generator and Linker ….. Runtime Systems Target-1 Executable Target-2 Executable Target-3 Executable

High-level Optimizer The following steps may be performed more than once: 1. Global intraprocedural and interprocedural analysis of source program's control and data flow 2. Selection of high-level optimizations and transformations 3. Update of high-level intermediate language Note: this optimization phase can optionally be by passed, since its input and output use the same intermediate language Spring 2004

Lowering of Intermediate Language Linearized storage mapping of variables Array/structure references  load/store operations High-level control structures  low-level control flow Spring 2004

Low Level Optimizations Instruction Scheduling ILP Register Allocation Storage Spring 2004

Cost of Instruction Scheduling Given a program segment, the goal is to execute it as quickly as possible The completion time is the objective function or cost to be minimized This is referred to as the makespan of the schedule It has to be balanced against the running time and space needs of the algorithm for finding the schedule, which translates to compilation cost Spring 2004

Instruction Scheduling Example main(int argc, char *argv[]) { int a, b, c; a = argc; b = a * 255; c = a * 15; printf("%d\n", b*b - 4*a*c ); } op 10 MPY vr2  param1, 255 op 12 MPY vr3  param1, 15 op 14 MPY vr8  vr2, vr2 op 15 SHL vr9  param1, 2 op 16 MPY vr10  vr9, vr3 op 17 SUB param2  vr8, r10 op 18 MOV param1  addr("%d\n“) op 27 PBRR vb12  addr(printf) op 20 BRL ret_addr  vb12 Spring 2004

Example - continued Flow Dependences Assumptions: max ILP = 4 op 10 MPY vr2  param1, 255 op 12 MPY vr3  param1, 15 op 14 MPY vr8  vr2, vr2 op 15 SHL vr9  param1, 2 op 16 MPY vr10  vr9, vr3 op 17 SUB param2  vr8, r10 op 18 MOV param1  addr("%d\n“) op 27 PBRR vb12  addr(printf) op 20 BRL ret_addr  vb12 10 12 15 16 14 27 18 20 17 Assumptions: max ILP = 4 multiply instruction latency = 3 cycles all other instructions’ latencies = 1 cycle Spring 2004

(Prior to Register Allocation) After Scheduling (Prior to Register Allocation) Assuming a machine width of 4 Spring 2004

Cost of Register Allocation The goal or Objective Function in this case is to maximize the duration that operands are in registers Operations on register operands are much faster Registers are unfortunately a limited resource Therefore, when operands do not fit as is quite often the case in registers, then they are “spilled" to memory at the expense of additional cost Spring 2004

Cost of Register Allocation (Contd.) Therefore, maximizing the duration of operands in registers or minimizing the amount of spilling, is the goal Once again, the running time (complexity) and space used, of the algorithm for doing this is the compilation cost Spring 2004

Register Allocation - An Example op 10 MPY r2  param1, 255 op 12 MPY r3  param1, 15 op 15 SHL r4  param1, 2 op 27 PBRR b2  addr(printf) op 18 MOV param1  addr("%d\n") op 14 MPY r6  r2, r2 op 16 MPY r5  r4, r3 op 17 SUB param2  r6, r5 op 37 ADD temp_reg  sp, 24 op 38 ST temp_reg, b2 op 20 BRL ret_addr  b2 op 10 MPY vr2  param1, 255 op 12 MPY vr3  param1, 15 op 14 MPY vr8  vr2, vr2 op 15 SHL vr9  param1, 2 op 16 MPY vr10  vr9, r3 op 17 SUB param2  vr8, r10 op 18 MOV param1  addr("%d\n“) op 27 PBRR vb12  addr(printf) op 20 BRL ret_addr  vb12 After Before spill code (assuming only 2 branch registers are available) Spring 2004

Execution Frequencies?

What are Execution Frequencies Branch probabilities Average number of loop iterations Average number of procedure calls Spring 2004

How are Execution Frequencies Used? Focus optimization on most frequently used regions region-based compilation Provides quantitative basis for evaluating quality of optimization heuristics Spring 2004

How are Execution Frequencies Obtained? Profiling tools: Mechanism: sampling vs. counting Granularity = procedure vs. basic block Compile-time estimation: Default values Compiler analysis Goal is to select same set of program regions and optimizations that would be obtained from profiled frequencies Spring 2004

What are Execution Costs? Cost of intermediate code operation parametrized according to target architecture: Number of target instructions Resource requirement template Number of cycles Spring 2004

How are Execution Costs Used? In conjunction with execution frequencies: Identify most time-consuming regions of program Provides quantitative basis for evaluating quality of optimization heuristics Spring 2004

How are Execution Costs Obtained? Simplistic translation of intermediate code operation to corresponding instruction template for target machine Spring 2004

Graphs and Preliminaries Spring 2004

Basic Graphs A graph is made up of a set of nodes (V) and a set of edges (E) Each edge has a source and a sink, both of which must be members of the nodes set, i.e. E = V  V Edges may be directed or undirected A directed graph has only directed edges A undirected graph has only undirected edges Spring 2004

Examples Undirected graph Directed graph Spring 2004

Paths source path sink Undirected graph Directed graph Spring 2004

Cycles Acyclic Directed graph Directed graph Undirected graph Spring 2004

Connected Graphs Unconnected graph Connected directed graph Spring 2004

Connectivity of Directed Graphs A strongly connected directed graph is one which has a path from each vertex to every other vertex Is this graph strongly connected? A B D C E F G Spring 2004

Data Dependence Analysis If two operations have potentially interfering data accesses, data dependence analysis is necessary for determining whether or not an interference actually exists. If there is no interference, it may be possible to reorder the operations or execute them concurrently. The data accesses examined for data dependence analysis may arise from array variables, scalar variables, procedure parameters, pointer dereferences, etc. in the original source program. Data dependence analysis is conservative, in that it may state that a data dependence exists between two statements, when actually none exists. Spring 2004

Data Dependence: Definition A data dependence, S1  S2, exists between CFG nodes S1 and S2 with respect to variable X if and only if 1. there exists a path P: S1  S2 in CFG, with no intervening write to X, and 2. at least one of the following is true: (a) (flow) X is written by S1 and later read by S2, or (b) (anti) X is read by S1 and later is written by S2 or (c) (output) X is written by S1 and later written by S2 Spring 2004