Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code.

Slides:



Advertisements
Similar presentations
Intermediate Representations CS 671 February 12, 2008.
Advertisements

1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
Intermediate Code Generation
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
CS 31003: Compilers Introduction to Phases of Compiler.
8 Intermediate code generation
1 Compiler Construction Intermediate Code Generation.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Generation I Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
Program Representations. Representing programs Goals.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
CS412/413 Introduction to Compilers Radu Rugina Lecture 16: Efficient Translation to Low IR 25 Feb 02.
CPSC Compiler Tutorial 8 Code Generator (unoptimized)
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
PLLab, NTHU Cs2403 Programming Languages Implementation Issues Cs2403 Programming Language Spring 2005 Kun-Yuan Hsieh.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Intermediate Code CS 471 October 29, CS 471 – Fall Intermediate Code Generation Source code Lexical Analysis Syntactic Analysis Semantic.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Compiler Construction Intermediate Representation I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Building An Interpreter After having done all of the analysis, it’s possible to run the program directly rather than compile it … and it may be worth it.
Precision Going back to constant prop, in what cases would we lose precision?
CS412/413 Introduction to Compilers Radu Rugina Lecture 15: Translating High IR to Low IR 22 Feb 02.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
10/1/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Autumn 2009.
CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode.
Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Compiler Chapter# 5 Intermediate code generation.
Languages and the Machine Chapter 5 CS221. Topics The Compilation Process The Assembly Process Linking and Loading Macros We will skip –Case Study: Extensions.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Joey Paquet, 2000, Lecture 10 Introduction to Code Generation and Intermediate Representations.
Intermediate Representation I High-Level to Low-Level IR Translation.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Introduction to Code Generation and Intermediate Representations
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
Chapter 1 Introduction Study Goals: Master: the phases of a compiler Understand: what is a compiler Know: interpreter,compiler structure.
Introduction CPSC 388 Ellen Walker Hiram College.
Code Generation Ⅰ CS308 Compiler Theory1. 2 Background The final phase in our compiler model Requirements imposed on a code generator –Preserving the.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.
Intermediate Representation II Storage Allocation and Management EECS 483 – Lecture 18 University of Michigan Wednesday, November 8, 2006.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Code Generation CPSC 388 Ellen Walker Hiram College.
Intermediate Language  Compiler Model Front-End− language dependant part Back-End− machine dependant part [1/34]
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Intermediate Code Generation CS 671 February 14, 2008.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404 Introduction to Compiler Design
Context-Sensitive Analysis
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Chapter 9. Intermediate Languages
Lecture 4: MIPS Instruction Set
Code Generation.
Intermediate Representations
CSE401 Introduction to Compiler Construction
The University of Adelaide, School of Computer Science
Intermediate Representations
8 Code Generation Topics A simple code generator algorithm
Course Overview PART I: overview material PART II: inside a compiler
Review: What is an activation record?
CSc 453 Interpreters & Interpretation
Presentation transcript:

Chapter 14: Building a Runnable Program

- 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code Generation 14.4 Address Space Organization

- 2 - Where We Are... Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Code Gen Source code (character stream) token stream abstract syntax tree abstract syntax tree + symbol tables, types Intermediate code regular expressions grammars static semantics

- 3 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code Generation 14.4 Address Space Organization

- 4 - Intermediate Representation (aka IR) v The compilers internal representation »Is language-independent and machine- independent ASTIR Pentium Java bytecode Itanium TI C5x ARM optimize Enables machine independent and machine dependent optis

- 5 - What Makes a Good IR? v Captures high-level language constructs »Easy to translate from AST »Supports high-level optimizations v Captures low-level machine features »Easy to translate to assembly »Supports machine-dependent optimizations v Narrow interface: small number of node types (instructions) »Easy to optimize »Easy to retarget

- 6 - Multiple IRs v Most compilers use 2 IRs: »High-level IR (HIR): Language independent but closer to the language »Low-level IR (LIR): Machine independent but closer to the machine »A significant part of the compiler is both language and machine independent! ASTHIR Pentium Java bytecode Itanium TI C5x ARM optimize LIR optimize C++ C Fortran

- 7 - High-Level IR v HIR is essentially the AST »Must be expressive for all input languages v Preserves high-level language constructs »Structured control flow: if, while, for, switch »Variables, expressions, statements, functions v Allows high-level optimizations based on properties of source language »Function inlining, memory dependence analysis, loop transformations

- 8 - Low-Level IR v A set of instructions which emulates an abstract machine (typically RISC) v Has low-level constructs »Unstructured jumps, registers, memory locations v Types of instructions »Arithmetic/logic (a = b OP c), unary operations, data movement (move, load, store), function call/return, branches

- 9 - Alternatives for LIR v 3 general alternatives »Three-address code or quadruples  a = b OP c  Advantage: Makes compiler analysis/opti easier »Tree representation  Was popular for CISC architectures  Advantage: Easier to generate machine code »Stack machine  Like Java bytecode  Advantage: Easier to generate from AST

Three-Address Code v a = b OP c »Originally, because instruction had at most 3 addresses or operands  This is not enforced today, ie MAC: a = b * c + d »May have fewer operands v Also called quadruples: (a,b,c,OP) v Example a = (b+c) * (-e) t1 = b + c t2 = -e a = t1 * t2 Compiler-generated temporary variable

IR Instructions v Assignment instructions »a = b OP C (binary op)  arithmetic: ADD, SUB, MUL, DIV, MOD  logic: AND, OR, XOR  comparisons: EQ, NEQ, LT, GT, LEQ, GEQ »a = OP b (unary op)  arithmetic MINUS, logical NEG »a = b : copy instruction »a = [b] : load instruction »[a] = b : store instruction »a = addr b: symbolic address v Flow of control »label L: label instruction »jump L: unconditional jump »cjump a L : conditional jump v Function call »call f(a1,..., an) »a = call f(a1,..., an) v IR describes the instruction set of an abstract machine

IR Operands v The operands in 3-address code can be: »Program variables »Constants or literals »Temporary variables v Temporary variables = new locations »Used to store intermediate values »Needed because 3-address code not as expressive as high-level languages

Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code Generation 14.4 Address Space Organization

Translating High IR to Low IR v May have nested language constructs »E.g., while nested within an if statement v Need an algorithmic way to translate »Strategy for each high IR construct »High IR construct  sequence of low IR instructions v Solution »Start from the high IR (AST like) representation »Define translation for each node in high IR »Recursively translate nodes

Notation v Use the following notation: »[[e]] = the low IR representation of high IR construct e v [[e]] is a sequence of low IR instructions v If e is an expression (or statement expression), it represents a value »Denoted as: t = [[e]] »Low IR representation of e whose result value is stored in t v For variable v: t = [[v]] is the copy instruction »t = v

Translating Expressions v Binary operations: t = [[e1 OP e2]] »(arithmetic, logical operations and comparisons) v Unary operations: t = [[OP e]] OP e1e2 t1 = [[e1]] t2 = [[e2]] t1 = t1 OP t2 OP e1 t1 = [[e1]] t = OP t1

Translating Array Accesses v Array access: t = [[ v[e] ]] »(type of e is array [T] and S = size of T) t1 = addr v t2 = [[e]] t3 = t2 * S t4 = t1 + t3 t = [t4] /* ie load */ array ve

Translating Structure Accesses v Structure access: t = [[ v.f ]] »(v is of type T, S = offset of f in T) t1 = addr v t2 = t1 + S t = [t2] /* ie load */ struct vf

Translating Short-Circuit OR v Short-circuit OR: t = [[e1 SC-OR e2]] »e.g., || operator in C/C++ t = [[e1]] cjump t Lend t = [[e2]] Lend: semantics: 1. evaluate e1 2. if e1 is true, then done 3. else evaluate e2 SC-OR e1e2

Class Problem v Short-circuit AND: t = [[e1 SC-AND e2]] »e.g., && operator in C/C++ Semantics: 1. Evaluate e1 2. if e1 is true, then evaluate e2 3. else done

Translating Statements v Statement sequence: [[s1; s2;...; sN]] v IR instructions of a statement sequence = concatenation of IR instructions of statements [[ s1 ]] [[ s2 ]]... [[ sN ]] seq s1s2sN...

Assignment Statements v Variable assignment: [[ v = e ]] v Array assignment: [[ v[e1] = e2 ]] v = [[ e ]] t1 = addr v t2 = [[e1]] t3 = t2 * S t4 = t1 + t3 t5 = [[e2] [t4] = t5 /* ie store */ recall S = sizeof(T) where v is array(T)

Translating If-Then [-Else] v [[ if (e) then s ]] v [[ if (e) then s1 else s2 ]] t1 = [[ e ]] t2 = not t1 cjump t2 Lelse Lthen: [[ s1 ]] jump Lend Lelse: [[ s2 ]] Lend: t1 = [[ e ]] t2 = not t1 cjump t2 Lend [[ s ]] Lend: How could I do this more efficiently??

While Statements v [[ while (e) s ]] Lloop: t1 = [[ e ]] t2 = NOT t1 cjump t2 Lend [[ s ]] jump Lloop Lend: or while-do translation do-while translation t1 = [[ e ]] t2 = NOT t1 cjump t2 Lend Lloop: [[ s ]] t3 = [[ e ]] cjump t3 Lloop Lend: Which is better and why?

Class Problem n = 0; while (n < 10) { n = n+1; } Convert the following code segment to IR

Switch Statements v [[ switch (e) case v1:s1,..., case vN:sN ]] t = [[ e ]] L1: c = t != v1 cjump c L2 [[ s1 ]] jump Lend /* if there is a break */ L2: c = t != v2 cjump c L3 [[ s2 ]] jump Lend /* if there is a break */... Lend: Can also implement switch as table lookup. Table contains target labels, ie L1, L2, L3. ‘t’ is used to index table. Benefit: k branches reduced to 1. Negative: target of branch hard to figure out in hardware

Call and Return Statements v [[ call f(e1, e2,..., eN) ]] v [[ return e ]] t1 = [[ e1 ]] t2 = [[ e2 ]]... tN = [[ eN ]] call f(t1, t2,..., tN) t = [[ e ]] return t

Statement Expressions v So far: statements which do not return values v Easy extensions for statement expressions: »Block statements »If-then-else »Assignment statements v t = [[ s ]] is the sequence of low IR code for statement s, whose result is stored in t

Statement Expressions v t = [[ if (e) then s1 else s2 ]] v t = [[ s1; s2;.. sN ]] v Result value of a block statement = value of last stmt in the sequence t1 = [[ e ]] cjump t1 Lthen t = [[ s2 ]] jump Lend Lthen: t = [[ s1 ]] Lend: [[ s1 ]] [[ s2 ]]... t = [[ sN ]]

Assignment Statements v t = [[ v = e ]] v Result value of an assignment statement = value of the assigned expression v = [[ e ]] t = v

Nested Expressions v Translation recurses on the expression structure v Example: t = [[ (a – b) * (c + d) ]] t1 = a t2 = b t3 = t1 – t2 t4 = c t5 = d t5 = t4 + t5 t = t3 * t5 [[ (a – b) ]] [[ (c + d) ]] [[ (a-b) * (c+d) ]]

Nested Statements v Same for statements: recursive translation v Example: t = [[ if c then if d then a = b ]] t1 = c t2 = NOT t1 cjump t2 Lend1 t3 = d t4 = NOT t3 cjump t4 Lend2 t3 = b a = t3 Lend2: Lend1: [[ if c... ]] [[ a = b ]] [[ if d... ]]

Class Problem for (i=0; i<100; i++) { A[i] = 0; } if ((a > 0) && (b > 0)) c = 2; else c = 3; Translate the following to the generic assembly code discussed

Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code Generation 14.4 Address Space Organization

Issues v These translations are straightforward v But, inefficient: »Lots of temporaries »Lots of labels »Lots of instructions v Can we do this more intelligently? »Should we worry about it?

Classes of Storage in Processor v Registers »Fast access, but only a few of them »Address space not visible to programmer  Doesn’t support pointer access! v Memory »Slow access, but large »Supports pointers v Storage class for each variable generally determined when map HIR to LIR

Storage Class Selection v Standard (simple) approach »Globals/statics – memory »Locals  Composite types (structs, arrays, etc.) – memory  Scalars u Accessed via ‘&’ operator? – memory u Rest – Virtual register, later we will map virtual registers to true machine registers. Note, as a result, some local scalars may be “spilled to memory” v All memory approach »Put all variables into memory »Register allocation relocates some mem vars to registers

Distinct Regions of Memory v Code space – Instructions to be executed »Best if read-only v Static (or Global) – Variables that retain their value over the lifetime of the program v Stack – Variables that is only as long as the block within which they are defined (local) v Heap – Variables that are defined by calls to the system storage allocator (malloc, new)

Memory Organization Code Static Data Stack Heap... Code and static data sizes determined by the compiler Stack and heap sizes vary at run-time Stack grows downward Heap grows upward Some ABI’s have stack/heap switched

Class Problem Specify whether each variable is stored in register or memory. For memory which area of the memory? int a; void foo(int b, double c) { int d; struct { int e; char f;} g; int h[10]; char i = 5; float j; }