Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By:

Slides:

Advertisements

Similar presentations

Target Code Generation

Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

CSCI 4717/5717 Computer Architecture

Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

Coalescing Register Allocation CS153: Compilers Greg Morrisett.

Register Allocation Mooly Sagiv Schrierber Wed 10:00-12:00 html://

COMPILERS Register Allocation hussein suleman uct csc305w 2004.

Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.

Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.

Apr. 12, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 6: Branching and Procedures in MIPS* Jeremy R. Johnson Wed. Apr. 12, 2000.

PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.

Carnegie Mellon Lecture 6 Register Allocation I. Introduction II. Abstraction and the Problem III. Algorithm Reading: Chapter Before next class:

From AST to Code Generation Professor Yihjia Tsai Tamkang University.

1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.

1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.

1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.

Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

CS 536 Spring Code generation I Lecture 20.

Code Generation Professor Yihjia Tsai Tamkang University.

Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.

Register Allocation (via graph coloring)

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.

Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.

Intermediate Code. Local Optimizations

1 Liveness analysis and Register Allocation Cheng-Chia Chen.

Improving Code Generation Honors Compilers April 16 th 2002.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)

Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.

CS745: Register Allocation© Seth Copen Goldstein & Todd C. Mowry Register Allocation.

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.

CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.

1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Register Allocation John Cavazos University.

CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.

CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.

1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.

Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:

Code Generation Ⅰ CS308 Compiler Theory1. 2 Background The final phase in our compiler model Requirements imposed on a code generator –Preserving the.

Reduced Instruction Set Computers. Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation.

1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.

Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.

CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.

Code Generation Part I Chapter 8 (1st ed. Ch.9)

Chapter 8 Code Generation

Mooly Sagiv html://

Register Allocation Hal Perkins Autumn 2009

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Code Generation Part I Chapter 9

CSC D70: Compiler Optimization Register Allocation

Code Generation.

Register Allocation Hal Perkins Autumn 2011

Unit IV Code Generation

Code Generation Part I Chapter 8 (1st ed. Ch.9)

Code Generation Part I Chapter 9

Lecture 16: Register Allocation

Program and memory layout

Chapter 12 Pipelining and RISC

Compiler Construction

Code Generation Part II

Target Code Generation

Lecture 4: Instruction Set Design/Pipelining

(via graph coloring and spilling)

Presentation transcript:

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 1 Compiler Optimization and Code Generation Professor: Sc.D., Professor Vazgen Melikyan

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 2 Course Overview Introduction: Overview of Optimizations  1 lecture Intermediate-Code Generation  2 lectures Machine-Independent Optimizations  3 lectures Code Generation  2 lectures

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 3 Code Generation

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 4 Machine Code Generation Input: intermediate code + symbol tables  In this case, three-address code  All variables have values that machines can directly manipulate  Assume program is free of errors Type checking has taken place, type conversion done Output:  Absolute/relocatable machine code or assembly code  In this case, use assembly  Architecture variations: RISC, CISC, stack-based Issues:  Memory management, instruction selection and scheduling, register allocation and assignment

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 5 Retargetable Back End Build retargetable compilers  Isolate machine dependent info  Compilers on different machines share a common IR Can have common front and mid ends  Table-based back ends share common algorithms Table-based instruction selector  Create a description of target machine, use back-end generator Back End Generator Machine Description Tables Pattern- Matching engine Instruction Selector

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 6 Translating from Three-Address Code No more support for structured control-flow  Function calls => explicit memory management and goto jumps Every three-address instruction is translated into one or more target machine instructions  The original evaluation order is maintained Memory management  Every variable must have a location to store its value Register, stack, heap, static storage  Memory allocation convention Scalar/atomic values and addresses => registers, stacks Arrays => heap Global variables => static storage

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 7 Assigning Storage Locations Compilers must choose storage locations for all values  Procedure-local storage Local variables not preserved across procedural calls  Procedure-static storage Local variables preserved across procedural calls  Global storage - global variables  Run-time heap - dynamically allocated storage Registers - temporary storage for applying operations to values  Unambiguous values can be assigned to registers with no backup storage

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 8 Function Call and Return At each function call  Allocate an new AR on stack  Save return address in new AR  Set parameter values and return results  Go to caller’s code Save SP and other regs; set AL if necessary At each function return  Restore SP and regs  Go to return address in caller’s AR  Pop caller’s AR off stack Different languages may implement this differently sp p1

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 9 Translating Function Calls Use a register SP to store address of activation record on top of stack  SP,AL and other registers saved/restored by caller Use C(Rs) address mode to access parameters and local variables /* code for s */ Action1 Param 5 Call q, 1 Action2 Halt …… /* code for q */ Action3 return LD stackStart =>SP /* initialize stack*/ …… 108: ACTION1 128: Add SP,ssize=>SP /*now call sequence*/ 136: ST 160 =>*SP /*push return addr*/ 144: ST 5 => 2(SP) /* push param1*/ 152: BR 300 /* call q */ 160: SUB SP, ssize =>SP /*restore SP*/ 168: ACTION2 190: HALT …… /* code for q*/ 300: save SP,AL and other regs ACTION3 restore SP,AL and other regs 400: BR *0(SP) /* return to caller*/

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 10 Translating Variable Assignment Keep track of locations for variables in symbol table  The current value of a variable may reside in a register, a stack memory location, a static memory location, or a set of these  Use symbol table to store locations of variables Allocation of variables to registers  Assume infinite number of pseudo registers  Relocate pseudo registers afterwards StatementsGenerated code Register descriptor Address descriptor t := a - b LD a => r0 LD b => r1 SUB r0,r1=>r0 r0 contains t r1 contains b t in r0 b in r1 u := t + c LD c => r2 ADD r0,r2=>r0 r0 contains u r1 contains b r2 contains c u in r0 b in r1 c in r2

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 11 Translating Arrays Arrays are allocated in heap Statementi in register ‘ri’i in memory ‘Mi’i in stack a := b[i] Mult ri, elsize=>r1 LD b(r1)=>ra LD Mi => ri Mult Ri,elsize=>r1 LD b(r1) =>ra LD i(SP) => ri Mult ri,elsize=>r1 LD b(r1) =>ra a[i] := b Mult ri, elsize=>r1 ST rb => a(r1) LD Mi => ri Mult Ri,elsize=>r1 ST rb => a(r1) LD i(SP) => ri Mult ri,elsize=>r1 ST rb => a(r1)

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 12 Translating Conditional Statements Condition determined after ADD or SUB If x < y goto z SUB rx, ry =>rt BLTZ z X := y + z if (x < 0) goto L ADD ry, rz => rx BLTZ L

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 13 Peephole Optimization Use a simple scheme to match IR to machine code  Efficiently discover local improvements by examining short sequences of adjacent operations StoreAI r1 => SP, 8 loadAI SP,8 => r15 StoreAI r1 => SP, 8 r2r r1 => r15 addI r2, 0 => r7 Mult r4, r7 => r10 Mult r4, r2 => r10 jumpI -> L10 L10: jumpI -> L11 jumpI -> L11 L10: jumpI -> L11

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 14 Efficiency of Peephole Optimization Design issues  Dead values May intervene with valid simplification Need to be recognized expansion process  Control flow operations Complicates simplifier: Clear window vs. special-case handling  Physical vs. logical windows Adjacent operations may be irrelevant Sliding window includes ops that define or use common values RISC vs. CISC architectures  RISC architectures makes instruction selection easier Additional issues  Automatic tools to generate large pattern libraries for different architectures  Front ends that generate LLIR make compilers more portable

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 15 Register Allocation Problem  Allocation of variables (pseudo-registers) to hardware registers in a procedure Features  The most important optimization Directly reduces running time (memory access => register access)  Useful for other optimizations E.g. CSE assumes old values are kept in registers Goals  Find an allocation for all pseudo-registers, if possible.  If there are not enough registers in the machine, choose registers to spill to memory

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 16 An Abstraction for Allocation and Assignment Two pseudo-registers interfere if at some point in the program they cannot both occupy the same register. Interference graph: an undirected graph, where  Nodes = pseudo-registers  There is an edge between two nodes if their corresponding pseudo-registers interfere What is not represented  Extent of the interference between uses of different variables  Where in the program is the interference

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 17 Register Allocation and Coloring A graph is n-colorable if:  Every node in the graph can be colored with one of the n colors such that two adjacent nodes do not have the same color. Assigning n register (without spilling) = Coloring with n colors  Assign a node to a register (color) such that no two adjacent nodes are assigned same registers(colors) Spilling is necessary = The graph is n-colorable

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 18 Algorithm Step 1: Build an interference graph  Refining notion of a node  Finding the edges Step 2. Coloring  Use heuristics to try to find an n-coloring Success:  Colorable and we have an assignment Failure:  Graph not colorable  Graph is colorable, but it is too expensive to color

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 19 Live Ranges and Merged Live Ranges Motivation: to create an interference graph that is easier to color  Eliminate interference in a variable’s “dead” zones  Increase flexibility in allocation Can allocate same variable to different registers A live range consists of a definition and all the points in a program (e.g. end of an instruction) in which that definition is live. Two overlapping live ranges for the same variable must be merged a = … … = a

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 20 Example A =... (A1) IF A goto L1 L1: C = … (C1) = A D = … (D1) B =... (B1) = A D = B (D2) Live Variables Reaching Definitions A = 2 (A2) = A Ret D {} {A}{A1} {A,B}{A1,B1} {B}{A1,B1} {D}{A1,B1,D2} {A}{A1} {A,C}{A1,C1} {C}{A1,C1} {D}{A1,C1,D1} {D}{A1,B1,C1,D1,D2} {A,D}{A2,B1,C1,D1,D2} {D}{A2,B1,C1,D1,D2} Merge

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 21 Merging Live Ranges Merging definitions into equivalence classes  Start by putting each definition in a different equivalence class  For each point in a program: If (i) variable is live, and (ii) there are multiple reaching definitions for the variable, then:  Merge the equivalence classes of all such definitions into one equivalence class From now on, refer to merged live ranges simply as live ranges  Merged live ranges are also known as “webs”

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 22 Edges of Interference Graph Two live ranges (necessarily of different variables) may interfere if they overlap at some point in the program. Algorithm  At each point in the program enter an edge for every pair of live ranges at that point. An optimized definition & algorithm for edges:  Algorithm: Check for interference only at the start of each live range  Faster  Better quality

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 23 Coloring Coloring for n > 2 is NP-complete Observations:  A node with degree < n can always color it successfully, given its neighbors’ colors Coloring Algorithm  Iterate until stuck or done Pick any node with degree < n Remove the node and its edges from the graph  If done (no nodes left) reverse process and add colors B EAC D Example ( n=3 ) Note: degree of a node may drop in iteration

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 24 When Coloring Fails Using heuristics to improve its chance of success and to spill code Build interference graph Iterative until there are no nodes left If there exists a node v with less than n neighbor place v on stack to register allocate else v = node chosen by heuristics (least frequently executed, has many neighbors) place v on stack to register allocate (mark as spilled) remove v and its edges from graph While stack is not empty Remove v from stack Reinsert v and its edges into the graph Assign v a color that differs from all its neighbors (guaranteed to be possible for nodes not marked as spilled)

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 25 Register Allocation: Summary Problem:  Find an assignment for all pseudo-registers, whenever possible. Solution:  Abstract on Abstraction: an interference graph Nodes: live ranges Edges: presence of live range at time of definition  Register Allocation and Assignment problems Equivalent to n-colorability of interference graph  Heuristics to find an assignment for n colors Successful: colorable, and finds assignment Not successful: colorability unknown and no assignment

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 26 Instruction Scheduling: The Goal Assume that the remaining instructions are all essential: otherwise, earlier passes would have eliminated them The way to perform fixed amount of work in less time: execute the instructions in parallel a = 1 + x; B = 2 + y; c = z + 3; a = 1 + x;B = 2 + y; c = z + 3; Time

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 27 Hardware Support for Parallel Execution Three forms of parallelism are found in modern machines:  Pipelining  Superscalar Processing  Multiprocessing Instruction Scheduling Automatic Parallelization

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 28 Pipelining Basic idea:  Break instruction into stages that can be overlapped Example:  Simple 5-stage pipeline from early RISC machines IFRFEXMEWB Instruction Time IF = Instruction Fetch RF = Decode & Register Fetch EX = Execute on ALU ME = Memory Access WB = Write Back to Register File

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 29 Pipelining Illustration IFRFEXMEWB Time IFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWB In a given cycle, each instruction is in a different stage

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 30 Beyond Pipelining: “Superscalar” Processing Basic idea:  Multiple (independent) instructions can proceed simultaneously through the same pipeline stages  Requires additional hardware Pipe Register EX Pipe Register Abstract Representation Pipe Register Hardware for Scalar pipeline 1 ALU Hardware for 2 way superscalar 2 ALUs r1 + r2 r3 + r4

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 31 Superscalar Pipeline Illustration IFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWBIFRFEXMEWB Time Original (scalar) pipeline:  Only one instruction in a given pipe stage at a given time Superscalar pipeline:  Multiple instructions in the same pipe stage at the same time

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 32 Limitations upon Scheduling Hardware Resources  Processors have finite resources, and there are often constraints on how these resources can be used.  Examples: Finite issue width Limited functional units (FUs) per given instruction type Limited pipelining within a given functional unit (FU) Data Dependences  While reading or writing a data location too early, the program may behave incorrectly. Control Dependences  Impractical to schedule for all possible paths  Choosing an expected path may be difficult

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan 33 Predictable Success