Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Introduction to code generation.

Slides:



Advertisements
Similar presentations
CPSC 388 – Compiler Design and Construction
Advertisements

Symbol Table.
Intermediate Code Generation
Programming Languages and Paradigms
1 Compiler Construction Intermediate Code Generation.
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Syntax-Directed Translation.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
1 Semantic Processing. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Reference Book: Modern Compiler Design by Grune, Bal, Jacobs and Langendoen Wiley 2000.
Compiler Summary Mooly Sagiv html://
Compiler Construction1 A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University First Semester 2009/2010.
Chapter 5 Intermediate Code Generation. Chapter 5 -- Intermediate Code Generation2  Let us see where we are now.  We have tokenized the program and.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
COP4020 Programming Languages
Chapter 1. Introduction.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 2.
Compiler Construction
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Review Joey Paquet,
The TINY sample language and it’s compiler
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Compiler Chapter# 5 Intermediate code generation.
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
Joey Paquet, 2000, 2002, 2007, Concordia University Department of Computer Science COMP 442/6421 Compiler Design.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Code generation Joey Paquet,
Basic Semantics Associating meaning with language entities.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
1.  10% Assignments/ class participation  10% Pop Quizzes  05% Attendance  25% Mid Term  50% Final Term 2.
Joey Paquet, 2000, Lecture 10 Introduction to Code Generation and Intermediate Representations.
Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Introduction to Code Generation and Intermediate Representations
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
Chapter 1 Introduction Study Goals: Master: the phases of a compiler Understand: what is a compiler Know: interpreter,compiler structure.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style ADVANCED PROGRAM DESIGN WITH C++ Static arrays.
Introduction CPSC 388 Ellen Walker Hiram College.
Chapter 1 Introduction Major Data Structures in Compiler
Week 6(10.7): The TINY sample language and it ’ s compiler The TINY + extension of TINY Week 7 and 8(10.14 and 10.21): The lexical of TINY + Implement.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
INTRODUCTION TO COMPILERS(cond….) Prepared By: Mayank Varshney(04CS3019)
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
 Fall Chart 2  Translators and Compilers  Textbook o Programming Language Processors in Java, Authors: David A. Watts & Deryck F. Brown, 2000,
1 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
©SoftMoore ConsultingSlide 1 Structure of Compilers.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Advanced Computer Systems
Chapter 1 Introduction.
A Simple Syntax-Directed Translator
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler design Introduction to code generation
Chapter 1 Introduction.
Compiler Lecture 1 CS510.
Chapter 6 Intermediate-Code Generation
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler design Intermediate representations
COMP 442/6421 – Compiler Design
Compiler design Review COMP 442/6421 – Compiler Design
Presentation transcript:

Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Introduction to code generation Intermediate representations Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Front end: Lexical Analysis Syntactic Analysis Intermediate Code Generation Back end: Intermediate Code Optimization Object Code Generation The front end is machine-independent, i.e. the decisions made in its processing do not depend on the target machine on which the translated program will be executed. A well-designed front end can be reused to build compilers for different target machines. The back end is machine-dependent, i.e. these steps are related to the nature of the assembly or machine language of the target architecture. Introduction to code generation Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering After syntactic analysis, we have a number of options to choose from: generate object code directly from the parse. generate intermediate code, and then generate object code from it. generate an intermediate abstract representation, and then generate code directly from it. generate an intermediate abstract representation, generate intermediate code, and then the object code. All these options have one thing in common: they are all based on syntactic information gathered in the semantic analysis. Introduction to code generation Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Introduction to code generation Joey Paquet, COMP 442/6421 – Compiler Design Syntactic Analyzer Lexical Analyzer Intermediate Representation Intermediate Code Front EndBack End Object Code Object Code Object Code Object Code Lexical Analyzer Lexical Analyzer Lexical Analyzer Syntactic Analyzer Syntactic Analyzer Syntactic Analyzer Intermediate Representation Intermediate Code

Concordia University Department of Computer Science and Software Engineering Intermediate representations synthetize the syntactic information gathered during the parse, generally in the form of a tree or directed graph. Intermediate representations enable high-level code optimization. Intermediate code is a low-level coded (text) representation of the program, directly translatable to object code. Intermediate code enables low-level, architecture-dependent optimizations. Intermediate representations and intermediate code Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Click to edit Master title style Intermediate representations Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Each node represents the application of a rule in the grammar. A subtree is created only after the complete parsing of a right hand side. Pointers to subtrees are sent up and grafted as upper subtrees are completed. Parse trees (concrete syntax trees) emphasize the grammatical structure of the program. Abstract syntax trees emphasize the actual computations to be performed. They do not refer to the actual non-terminals defined in the grammar, hence their name. Abstract syntax tree Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Parse tree vs. abstract syntax tree Joey Paquet, COMP 442/6421 – Compiler Design x = a*b+a*b x + = ab * ab * E A E E Parse Tree x abab * = * + x = a*b+a*b Abstract Syntax Tree

Concordia University Department of Computer Science and Software Engineering Directed acyclic graphs (DAG) are a relative of syntax trees: they are used to show the syntactic structure of valid programs in the form of a “tree”. In DAGs, the nodes for repeated variables and expressions are merged into a single node. DAGs are more complicated to build and use than syntax trees, but easily allows the implementation of a variety of code optimization techniques by avoiding redundant operations. Directed acyclic graph Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Abstract syntax tree vs. directed acyclic graph Joey Paquet, COMP 442/6421 – Compiler Design x abab * = * + x = a*b+a*b Abstract Syntax Tree x = ab * + x = a*b+a*b Directed Acyclic Graph

Concordia University Department of Computer Science and Software Engineering Every expression is rewritten with its operators at the end, e.g.: Easy to generate from a bottom-up parse. Can be generated from a syntax tree using postorder traversal. Postfix notation Joey Paquet, COMP 442/6421 – Compiler Design a+b  ab+ a+b*c  abc*+ if A then B else C  ABC? If A then if B then C else D else E  ABCD?E? x=a*b+a*b xab*ab*+=

Concordia University Department of Computer Science and Software Engineering Its nature allows it to be naturally evaluated with the use of a stack. Operands are pushed onto the stack; operators pop the right amount of operands from the stack, do the operation, then push the result back onto the stack. However, this notation is restricted to simple expressions such as in arithmetics where every rule conveys an operation. It cannot be used for the expression of elaborated programming languages constructs. Postfix notation Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Three-address codes (3AC) is an intermediate language that maps directly to “assembly pseudo-code”, i.e. architecture-dependent assembly code. It breaks the program into short statements requiring no more than three variables (hence its name) and no more than one operator, e.g: Three-address code Joey Paquet, COMP 442/6421 – Compiler Design x = a+b*ct := b*c x := a+t source3AC

Concordia University Department of Computer Science and Software Engineering The temporary variables are generated at compile time and added to the symbol table. In the generated code, the variables will refer to actual memory cells. Their address (or alias) is also stored in the symbol table. 3AC can also be represented as quadruples, which are even more related to assembly languages. Three-address code Joey Paquet, COMP 442/6421 – Compiler Design t := b*cL 3,b M 3,c ST 3,t x := a+tL 3,a A 3,t ST 3,x 3ACASM t := b*cMULT t,b,c x := a+tADD x,a,t 3ACQuadruples

Concordia University Department of Computer Science and Software Engineering In this case, we generate code in a language for which we already have a compiler or interpreter. Such languages are generally very low-level and dedicated to the compiler construction task. It provides the compiler writer with a “virtual machine”. Various compilers can be built using the same virtual machine. The virtual machine compiler can be compiled on different machines to provide a translator to various architectures. Many contemporary languages, such as Java, Perl, PHP, Python and Ruby use a similar execution architecture. For the project, we have the Moon compiler, which provides a virtual assembly language and a compiler/interpreter. Intermediate languages Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Click to edit Master title style Project architectural overview Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Your compiler generates Moon code. The Moon interpreter (virtual machine) is used to execute your output program. Your compiler is thus retargetable by recompilation of the Moon compiler on your target processor. As we are not using any intermediate representation, Moon code is generated directly as the program is parsed. This puts a lot of diverse responsibilities on the Syntax-Directed Translation phase. In order to do it in an orderly manner, a good architectural design is advisable to separate these different responsibilities. Project architectural overview Joey Paquet, COMP 442/6421 – Compiler Design Parsing Semantic attributes/records migration Symbol table generation Semantic verification Semantic translation

Concordia University Department of Computer Science and Software Engineering Many semantic actions need to be used by the parser in order to do the semantic verification/translation. Hard-coding these actions into the parser will quickly make it extremely complex, confusing, and error-prone. A separate Semantic Actions module should be created that consists of a library of functions that the parser will call, removing coupling between parsing and semantic verification/translation. Similarly, semantic verification and translation actions can be uncoupled from each other as well as from the parser. Project architectural overview Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Click to edit Master title style Semantic actions and code generation Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Semantics is about giving a meaning to the compiled program. Semantic actions have two parts: Semantic checking: check if the compiled program can have a meaning, e.g variables are declared, operators and functions have the right parameter types and number of parameters upon calling. Semantic translation: translate declarations, statements and expressions to target code. Semantic translation is conditional to semantic checking Semantic actions Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Semantic actions are inserted in the grammar (thus transforming it in an attribute grammar) In recursive descent parsers, they are represented by function calls imbedded in the parsing functions. In table-driven top-down parsers, they are represented by semantic action placeholders pushed on the stack along with the right hand sides they belong to. When a placeholder is removed from the stack, its corresponding semantic action is executed. Most semantic actions use attributes for their resolution: In recursive descent parsers, they are migrated using reference parameter passing. In table-driven top-down parsers, they are migrated using a semantic stack. Most semantic actions rely on an attribute migration mechanism to be in place. Semantic actions Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering There are semantic actions associated with: Declarations: variable declarations type declarations function declarations Control structures: conditional statements loop statements function calls Assignments and expressions: assignment operations arithmetic and logical expressions Semantic actions Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering In processing declarations, the only semantic checking there is to do is to ensure that every object (e.g. variable, type, class, function, etc.) is declared once and only once in the same scope. This restriction is tested using the symbol table mechanism. Symbol table entries are generated as declarations are encountered. A symbol table is created every time a scope is entered. Afterwards, every time an identifier is encountered, a check is made in the symbol table to ensure that it has been properly defined in the scope where the identifier is encountered. Processing declarations Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Code generation in type declarations comes in the form of calculation of the total memory size to be allocated for the objects defined. Every variable defined, no matter its type, will eventually have to be stored in the computer’s memory. Memory allocation must be done according to the size of the variables defined, the data encoding used, and the word length of the computer, which depends on the target machine. For each variable identifier declared, you must generate a unique label that will be used to refer to that variable in the Moon code and store it in the location field of its entry in the symbol table. See the Moon machine description documentation for more explanations specific to the project. Processing declarations Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering  ; {varDeclSem} An entry is created in the corresponding symbol table. If successful, memory space is reserved for the variable according to the size of the type of the variable and linked to a unique label in the ASM code. The starting address (or its label) is stored in the symbol table entry of the variable. In the case of arrays, the offsets (size of the elements) are often stored in the symbol table record, though it can be calculated from the array’s type.  ; {varDeclSem} To generate each entry, (one for each element in the list), the compiler must keep track of the type of the declaration. This is an attribute that is migrated using a technique appropriate to the parsing method used. Processing variable declarations Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Most programming languages allow the definition of user-defined types that are aggregates of the basic types defined in the language. There are typically arrays or record types, or even abstract data types (or classes) in object-oriented programming languages.  type is ; {typeDeclSem} An entry is created in the symbol table for the new type defined. It contains information that allows to determine the size of a variable of the new type. This information is used when new objects of that type are declared in the program, to compute the offset when arrays of elements of that type are created, and when the members of a class are referred to in expressions. Processing type declarations Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Static arrays are arrays with static size defined at compile time. Most programming languages allow only integer literals for the initialization of array size, or constant integer variables when available in the language. Pascal: A: array (1..10) of integer C: int A[10]; or const size=10; int A[size]; Processing arrays Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering This restriction comes from the fact that the memory allocated to the array has to be set at compile time, and is fixed throughout the execution of the program. When processing a static array declaration, a sufficient amount of memory is allocated to the variable depending on the size of the elements and the cardinality of the array. Only the starting address (or in the case of Moon, a label) is stored in the symbol table. The offset (the size of elements) is also sometimes stored in the symbol table to facilitate code generation of array indexing during code generation. Dynamic arrays are generally implemented using pointers, dynamic memory allocation functions and an execution stack or heap, which requires the implementation of a runtime system to execute the programs. For simplicity, we do not have any absolute need for dynamic memory allocation in the project. Processing arrays Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Semantic records contain the type and location for variables (in our case, labels in the Moon code) or the type and value for constant factors. Semantic records are created at the leaves of the tree when factors (F) are recognized, and then passed upwards in the tree using attribute migration. These semantic records contain the attributes that are migrated within the tree to find a global result for the symbol on top of the tree for that expression. Processing expressions Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering As new nodes (or subtrees) are created during tree creation/traversal, intermediate results are stored in semantic records containing subresults for subexpressions. Each time an operator node is resolved, its corresponding semantic checking and translation is done and its subresult is stored in a temporary variable for which you have to allocate some memory and generate a label. An entry is inserted in the symbol table for each intermediate result generated. It can then be used for further reference when doing semantic verification and translation while going upwards in the tree. Processing expressions Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Doing so, the code is generated sequentially as the tree is traversed: Processing expressions Joey Paquet, COMP 442/6421 – Compiler Design a bc + * x = t1 = b*cL 3,b M 3,c ST 3,t1 t2 = a+t1L 3,a A 3,t1 ST 3,t2 x = t2L 3,t2 ST 3,x subtreeASM

Concordia University Department of Computer Science and Software Engineering Most compilers build an intermediate representation of the parsed program, normally as an abstract syntax tree. These will allow high-level optimizations to occur before the code is generated. In the project, we are outputting Moon code, which is an intermediate language. Moon code could be the subject of low-level optimizations. Semantic actions are composed of a semantic checking, and a semantic translation part. Semantic actions are inserted at appropriate places in the grammar to achieve the semantic checking and translation phase. Semantic translation is conditional to semantic checking. There are semantic actions for: Declarations (variables, functions, types, etc) Expressions (arithmetic, logic, etc) Control structures (loops, conditions, function calls, etc) Conclusions Joey Paquet, COMP 442/6421 – Compiler Design

Concordia University Department of Computer Science and Software Engineering Fischer, Cytron, LeBlanc. Crafting a Compiler. Chapter 7, 8, 9, 10, 11. Addison-Wesley, References Joey Paquet, COMP 442/6421 – Compiler Design