CS 404 Introduction to Compiler Design

Slides:



Advertisements
Similar presentations
Intermediate Representations CS 671 February 12, 2008.
Advertisements

1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
Intermediate Code Generation
Chapter 8 ICS 412. Code Generation Final phase of a compiler construction. It generates executable code for a target machine. A compiler may instead generate.
8 Intermediate code generation
1 Compiler Construction Intermediate Code Generation.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
Chapter 14: Building a Runnable Program Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code.
CS412/413 Introduction to Compilers Radu Rugina Lecture 16: Efficient Translation to Low IR 25 Feb 02.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
CS 536 Spring Code generation I Lecture 20.
Intermediate Code CS 471 October 29, CS 471 – Fall Intermediate Code Generation Source code Lexical Analysis Syntactic Analysis Semantic.
1 Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing.
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2008/2009.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
10/1/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Autumn 2009.
Compiler Chapter# 5 Intermediate code generation.
Chapter 8: Intermediate Code Generation
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
1 June 3, June 3, 2016June 3, 2016June 3, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
Introduction to Code Generation and Intermediate Representations
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Code Generation Ⅰ CS308 Compiler Theory1. 2 Background The final phase in our compiler model Requirements imposed on a code generator –Preserving the.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
Intermediate Code Representations
Code Generation CPSC 388 Ellen Walker Hiram College.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
1 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
1 Chapter10: Code generator. 2 Code Generator Source Program Target Program Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator.
Compiler Design (40-414) Main Text Book:
Intermediate code Jakub Yaghob
Compiler Construction (CS-636)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
The compilation process
Compiler Chapter 9. Intermediate Languages
Optimization Code Optimization ©SoftMoore Consulting.
Compiler Construction
Intermediate Code Generation Part I
Intermediate Representations Hal Perkins Autumn 2011
Instructions - Type and Format
Lecture 4: MIPS Instruction Set
Code Generation.
Intermediate Representations
Intermediate Code Generation
Unit IV Code Generation
Chapter 6 Intermediate-Code Generation
Lecture 30 (based on slides by R. Bodik)
Intermediate Code Generation Part I
Compiler Construction
Intermediate Representations
Intermediate code generation
Three-address code A more common representation is THREE-ADDRESS CODE . Three address code is close to assembly language, making machine code generation.
Compiler Design 21. Intermediate Code Generation
Intermediate Representations Hal Perkins Autumn 2005
Intermediate Code Generation Part I
Local Optimizations.
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Intermediate Code Generation
Intermediate Code Generation Part I
Compiler Construction
Lecture 4: Instruction Set Design/Pipelining
Compiler Design 21. Intermediate Code Generation
Review: What is an activation record?
Intermediate Code Generating machine-independent intermediate form.
Presentation transcript:

CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat Intermediate Representation: Three Address Code (TAC) CS 404 Ahmed Ezzat

Outline What is Intermediate Language (IR) and why do we need one? IR Classification Levels Linear IR: TAC TAC Examples Summary CS 404 Ahmed Ezzat

1. Intermediate Representation (IR) What is IR? Intermediate code used by compiler, between source and target code Try to keep M/C dependencies out of IR Why? Support multiple front and back end, for example, support a new machine architecture Subdivide and postpone tasks Allow for reusing machine-independent optimizations (most important) CS 404 Ahmed Ezzat

1. Intermediate Code Generation AST: Abstract Syntax Tree

1. Intermediate Code Generation  Back end Code Generation IR Trace Formation Instruction Selection Register Allocation Optimizations * * phase ordering problem

1. IR Motivation What we have so far... An abstract syntax tree With all the program information Known to be correct Well-typed Nothing missing No ambiguities What we need... Something “Executable” Closer to actual machine level of abstraction CS 404 Ahmed Ezzat

1. IR Code Abstract machine code – (Intermediate Representation) Allows machine-independent code generation, optimization AST IR PowerPC Alpha x86 optimize CS 404 Ahmed Ezzat

1. Different IR Forms Low-level IR High-level IR Like RISC machine instructions Use registers, literals, simple operations High-level IR Syntax tree Postfix notation Three Address Code (TAC) CS 404 Ahmed Ezzat

1. Popular IR Forms a[i]; Any representation between the AST and ASM: TAC (Triples): low level Expression Tree: high-level a[i]; MEM + a BINOP MUL i CONST W CS 404 Ahmed Ezzat

1. What Makes a Good IR? Easy to translate from AST Easy to translate to assembly Narrow interface: small number of node types (instructions) Easy to optimize Easy to retarget AST (>40 node types) IR (~15 node types) x86 (>200 opcodes) CS 404 Ahmed Ezzat

2. IR Classification Levels Goal: get program closer to machine code without losing information needed to do useful optimizations Need multiple IR stages MIR PowerPC (LIR) Alpha (LIR) x86 (LIR) optimize AST HIR opt CS 404 Ahmed Ezzat

2. IR Classification Levels High-Level IR (HIR) Used early in the process Usually converted to lower form later on Preserves high-level language constructs Structured flow, variables, methods Allows high-level optimizations based on properties of source language (e.g. inlining, reuse of constant variables) Example: AST CS 404 Ahmed Ezzat

2. IR Classification Levels Medium-Level IR (MIR) Try to reflect the range of features in the source language in a language-independent way Intermediate between AST and assembly Unstructured jumps, registers, memory locations Convenient for translation to high-quality machine code Other MIRs: tree IR: easy to generate, easy to do reasonable instruction selection quadruples: a = b OP c (easy to optimize) stack machine based (like Java bytecode) CS 404 Ahmed Ezzat

2. IR Classification Levels Low-Level IR (LIR) Assembly code + extra pseudo instructions Machine dependent Translation to assembly code is trivial Allows optimization of code for low-level considerations: scheduling, memory layout CS 404 Ahmed Ezzat

2. IR Classification Levels Example i := op1 if step < 0 goto L2 L1: if i > op2 goto L3 instructions i := i + step goto L1 L2: if i < op2 goto L3 goto L2 L3: for i := op1 to op2 step op3 instructions endfor High-level Medium-level CS 404 Ahmed Ezzat

3. Linear IR: Why Use TAC? Makes complex expressions simple Makes complex flow-of-control simple Easy to re-arrange Easy to optimize Syntax trees or DAGs can be represented by TAC Close to assembly code, so easy to generate target code CS 404 Ahmed Ezzat

3. Linear IR: Three-Address Code (TAC) Basic idea: Each instruction is of the form X = Y op Z X, Y, Z can be only registers or constants, or compiler generated temporaries, i.e., similar to assembly, op is an operator. Example: The AST expression: x + y * z is translated to: t1 = y * z t2 = x + t1 Each sub-expression has a “home.” CS 404 Ahmed Ezzat

3. Linear IR: Common TAC Statements Assignment: X := Y + Z Assignment: X := 2 + Z Assignment: X := -1 Copy: X := Y Jump: goto L (L is a symbolic label, execute the statement labeled by L next.) CS 404 Ahmed Ezzat

3. Linear IR: Common TAC Statements Conditional jump: if X relop Y then goto L Function or procedure call Param x Param y Call f, 2 Indexed assignment: x := a[10] Indexed assignment: a[10] := x CS 404 Ahmed Ezzat

3. Linear IR: Common TAC Statements Address assignment: x := &y, x gets location of y Pointer assignment: x := *y, x gets the object pointed by y Pointer assignment: *x :=y, the object pointed by x gets the value of y CS 404 Ahmed Ezzat

3. Linear IR: TAC Design Tradeoffs IRs need to be rich enough to implement the source language Smaller set of operators Easier to implement Long sequence of statements Larger set of operators More difficult to implement Short sequence of statements CS 404 Ahmed Ezzat

3. Linear IR: How to Generate TAC Use syntax directed translations Can be folded into parsing if desired For a non-terminal E, define attributes E.place, the name that will hold the value of E E.code, the TAC evaluating E newtemp, a new temporary variable CS 404 Ahmed Ezzat

3. Linear IR: Storing TAC Quadruple entries TAC Store all instructions in a quadruple table Every instruction has four fields: op, arg1, arg2, result Label of instructions  index of instruction in table Quadruple entries TAC t1 := - c t2 := b * t1 t3 := -c t4 := b * t3 t5 := t2 + t4 a := t5 op arg1 arg2 result (0) Uminus c t1 (1) Mult b t2 (2) t3 (3) t4 (4) Plus t5 (5) Assign a CS 404 Ahmed Ezzat

3. Linear IR: Implementation of TAC Compilers can choose to use quadruples, triples, and indirect triples Quadruples: op, arg1, arg2, result Triples: avoid temporary names by using the location of the statement that computes it Indirect triples: list pointers CS 404 Ahmed Ezzat

3. Linear IR: Example: AST  TAC AST: a= b * (-c) + b * (-c) IR: TAC t1 := - c t2 := b * t1 t3 := - c t4 := b * t3 t5 := t2 + t4 a := t5 CS 404 Ahmed Ezzat

3. Linear IR: Differences of TAC Implementations Eventually we will put those TACs in different memory locations and go to the next step, i.e., generate assembly code to run Space: Quads take the most space, while triples take the least. Optimizations: Easier to move quads and indirect triples around, hard to move triples. CS 404 Ahmed Ezzat

3. Linear IR: Summary – The IR Machine A machine with: Infinite number of temporaries (think registers) Simple instructions 3-operands Branching Calls (function calls) with simple calling convention Simple code structure Array of instructions Labels to define targets of branches CS 404 Ahmed Ezzat

3. Linear IR: Summary - Temporaries The machine has an infinite number of temporaries: Call them t0, t1, t2, .... Temporaries can hold values of any type The type of the temporary is derived from the generation Temporaries go out of scope with each function CS 404 Ahmed Ezzat

3. Linear IR: Summary – Mapping Names to Variables Variables are names for values Names given by programmers in the input program Names given by compilers for storing intermediate results of computation Reusing temp variables can save space, but mask context and prevent analysis and optimizations Result of t1*t2 is no longer available after t1 is reused Three-address code for x – 2 * y t1 := 2 t2 := y t3 := t1*t2 t4 := x t5 := t4-t3 t1 := 2 t2 := y t1 := t1*t2 t2 := x t1 := t2-t1 Different values use distinct names Different values reuse the same name CS 404 Ahmed Ezzat

3. Linear IR: Summary – Mapping Storage to Variables Variables are placeholders for values: Every variable must have location to store its value Register, stack, heap, static storage Values must be loaded into registers before use x and y are in registers x and y are in memory t1 := 2 t2 := t1*y t3 := x-t2 t1 := 2 t2 := y t3 := t1*t2 t4 := x t5 := t4-t3 Three-address code for x – 2 * y: void A(int b, int *p) { int a, d; a = 3; d = foo(a); *p =b+d; } Which variables can be kept in registers? Which variables must be stored in memory? CS 404 Ahmed Ezzat

4. TAC Examples Below on the left is an arithmetic expression and on the right, is a translation into TAC instructions: Expression TAC Representation a = b * c + b * d; _t1 = b * c; _t2 = b * d; _t3 = _t1 + _t2; a = _t3; CS 404 Ahmed Ezzat

4. TAC Examples Below on the left is an example of branching and its translation to TAC instructions: Expression TAC Representation if (a < b + c) _t1 = b + c; a = a - c; _t2 = a < _t1; c = b * c; IfZ _t2 Goto _L0; _t3 = a - c; a = _t3; _L0: _t4 = b * c; c = _t4; CS 404 Ahmed Ezzat

4. TAC Examples Below on the left is an example of function call and its translation to TAC instructions: Expression TAC Representation n = ReadInteger(); _t0 = LCall _ReadInteger; Binky(arr[n]); n = _t0; _t1 = 4; _t2 = _t1 * n; _t3 = arr + _t2; _t4 = *(_t3); PushParam _t4; LCall _Binky; PopParams 4; CS 404 Ahmed Ezzat

5. Summary IRs provide the interface between the front and back ends of the compiler This can take many forms (Graphical, linear, hybrid). We focused on the linear TAC model Should be machine and language independent Should be amenable to optimization CS 404 Ahmed Ezzat

END CS 404 Ahmed Ezzat