Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Slides:



Advertisements
Similar presentations
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
Advertisements

CPSC Compiler Tutorial 9 Review of Compiler.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Chapter 4 Lexical and Syntax Analysis Sections
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Chapter 4 Lexical and Syntax Analysis Sections 1-4.
ISBN Chapter 4 Lexical and Syntax Analysis.
ISBN Chapter 4 Lexical and Syntax Analysis.
Slide1 Chapter 4 Lexical and Syntax Analysis. slide2 OutLines: In this chapter a major topics will be discussed : Introduction to lexical analysis, including.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
CS 330 Programming Languages 09 / 23 / 2008 Instructor: Michael Eckmann.
Lexical and Syntax Analysis
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
ISBN Lecture 04 Lexical and Syntax Analysis.
Chapter 4 Lexical and Syntax Analysis. Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing.
Lexical and syntax analysis
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
CS 330 Programming Languages 09 / 26 / 2006 Instructor: Michael Eckmann.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Lexical and Syntax Analysis
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
Syntax and Semantics Structure of programming languages.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
ISBN Chapter 4 Lexical and Syntax Analysis.
Introduction to Compiling
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
ISBN Chapter 4 Lexical and Syntax Analysis.
Parser Generation Using SLK and Flex++ Copyright © 2015 Curt Hill.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
BNF A CFL Metalanguage Some Variations Particular View to SLK Copyright © 2015 – Curt Hill.
4.1 Introduction - Language implementation systems must analyze
Lexical and Syntax Analysis
CS 326 Programming Languages, Concepts and Implementation
Programming Languages Translator
Chapter 4 Lexical and Syntax Analysis.
Lexical and Syntax Analysis
Chapter 2 :: Programming Language Syntax
Parsing and Parser Parsing methods: top-down & bottom-up
Automata and Languages What do these have in common?
PROGRAMMING LANGUAGES
Compiler Lecture 1 CS510.
Parser and Scanner Generation: An Introduction
CS416 Compiler Design lec00-outline September 19, 2018
Lexical and Syntax Analysis
Introduction CI612 Compiler Design CI612 Compiler Design.
Review: Compiler Phases:
Chapter 4: Lexical and Syntax Analysis Sangho Ha
CS416 Compiler Design lec00-outline February 23, 2019
Lexical and Syntax Analysis
Chapter 2 :: Programming Language Syntax
Chapter 2 :: Programming Language Syntax
Lexical and Syntax Analysis
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
4.1 Introduction - Language implementation systems must analyze
Presentation transcript:

Compilation With an emphasis on getting the job done quickly Copyright © – Curt Hill

Introduction One of the goals of these courses is to get the ability to implement simple control languages –This is what this is about There are whole courses devoted to compiler theory and construction of compilers and interpreters –Just not here Copyright © – Curt Hill

Stages Compilers typically have several stages These may be sequentially run or concurrently run These include –The lexical analyzer or scanner –The syntax analyzer or parser –The code generator –The optimization routines In this class little concern for last two Copyright © – Curt Hill

Lexical analyzer or scanner Front end for a parser Take a string of characters and produce a sequence of tagged tokens In the input there are things like comments, white space, and line breaks –None of those make any difference to the parser –Neither does input buffering and numerous other details Copyright © – Curt Hill

Why separate scanner and parser? This simplifies the parser –Which is inherently more complicated We can optimize the scanner in different ways than the parser Separation makes both more modular The parser is mostly portable –Scanner may or may not be since it deals more seriously with files Copyright © – Curt Hill

Scanner Lexical errors - there are just a few –Invalid format of a number or identifier name –Unmatched two part comments or quoted strings –Character not in alphabet The lexical analyzer might or might not actually do something with the symbol table –Depends on the format of the token stream Copyright © – Curt Hill

The token stream A token is usually constructed from a record or class One item must contain the class of token –This usually assigns a number or enumeration for each reserved word, punctuation mark and for various things like identifiers, and constants These may contain supplemental information as needed Copyright © – Curt Hill

Supplemental information A reserved word is defined sufficiently by the assigned number of enumeration A numeric constant needs to carry the actual value and possibly type An identifier needs the canonical representation –In languages that are not case sensitive this is usually every character converted to all upper or all lower case –In case sensitive languages this is the exact representation of the name Copyright © – Curt Hill

Supplemental information Often the location of the token is also passed along so that an error message may be pinned to a usable source location –Line number and column position Not needed for parsing but helps the user determine how to fix the error The parser will merely ask for one token at a time, picking it off the token stream –Parser sees no lines Copyright © – Curt Hill

Creating a lexical analyzer A lexical analyzer usually recognizes a Type 3 language –A regular language Thus you can describe this token language rather simply –Such as using regular expressions or a Finite State Automaton These are easy to code by hand –There are programs that do so as well Copyright © – Curt Hill

Relationship It is possible to have the scanner operate like the preprocessor –Read in a source code file and write out a file of tokens Usually it is just a function called by the parser to deliver the next token No reason why it could not be a co- routine as well Copyright © – Curt Hill

Generated Scanners There are programs that generate scanners based on some type of formal language that describes –The most famous of which is lex on UNIX systems Many types of parser generators either come with one that is easy to modify to your particular grammar or generate one while processing the syntax Copyright © – Curt Hill

Simple Example Copyright © – Curt Hill Program demo(input,output); { a comment } var x:integer; begin x := 5; writeln(‘This is x’,x) End. program identdemo ( identinput, identoutput ) ; var identx : integer ; begin identx := numeric constant 5 Source Code Token Stream

The Parser Determines if the source file conforms to language syntax –Generate meaningful error messages Build a structure suitable for later stages to operate –Mostly the code generator –This structure is usually a parse tree Copyright © – Curt Hill

Parsers There are several types of parsers Top down or bottom up Recursive descent LL LR Generated or table parsers Copyright © – Curt Hill

Top down Uses leftmost derivation Build the parse tree in a top down and left to right fashion Languages that may be parsed in this way are termed LL(N) –The first L specifies a Left to right scan of the source code –The second L specifies that the left most derivation is the one generated –N is often one and it represents the number of symbols that needs to be looked ahead at in order to decide which rule to use next Copyright © – Curt Hill

Example Consider a handout for this –After the ident is either a comma or right parenthesis –The parser looks ahead to that item to determine what to do next –In every fork in the syntax diagrams there is a look ahead set of tokens –We determine which way to go by determining if the next symbol is in one set or another –If we never need more than a single look ahead then 1 is the constant Copyright © – Curt Hill

Top Down Again For most programming languages an LL(1) grammar exists –This requires just a single look ahead token Lets look through the handout and see how this works Copyright © – Curt Hill

Bottom up Looks at the leaves and works its way towards the root Bottom up usually accepts LR(n) languages Since they start at the bottom of the parse tree they use shift-reduce algorithms In this presentation we will skip how this actually works Copyright © – Curt Hill

LL and LR In theory you could parse a programming language in many ways other than LL or LR –However, doing so makes the running time of O(N 3 ) which is not good for something that is run as often as a compiler –By forcing a Left to right scan of the source (the first L) we can get O(N) compiles which makes everyone much happier Copyright © – Curt Hill

Subsets LL(1) languages are a subset of LR(1) Hence for any LL(1) language there is an LR(1) grammar There may be an LR(1) language for which there is no LL(1) grammar There are two other classes SLR and LALR which are simplifications of LR Copyright © – Curt Hill

Commonly The LL, SLR and LALR parsers have been the dominant ones because the tables needed by an LR could grow exponentially in the worst case Hence most of the table driven parsers were LL or LALR However, since that time quite a bit of work has been done and there are now some decent LR table parsers Copyright © – Curt Hill

Recursive descent parsers Since the grammar is recursive we can make our program follow the grammar For each production/non-terminal generate a function that processes that non-terminal It simply calls another function for each non-terminal on the RHS of the production We are less interested in these Copyright © – Curt Hill

Recursive Descent We have seen –LR(n) –LL(n) –LALR(n) The N determines how many tokens have to be looked at to make a decision which production is involved Most programming languages have an LL(1) grammar –A recursive descent parser can look ahead just one token and then choose which is the right production Copyright © – Curt Hill

Generated parsers These parsers are usually LALR(1) or LL(1) but a few are LR(1) There are number of these available –YACC UNIX LALR(1) These read in some form of a grammar Generate a series of tables that is used by a parser The scanner is also generated and then plugged into the parser Copyright © – Curt Hill

How do these work? The scanner reads in so many tokens The parser then looks for tokens that fit into the pattern of a production That is it looks for what is on the RHS When it finds the pattern it does a reduction –A reduction is moving right to left across a production –Replace RHS with the LHS Copyright © – Curt Hill

More When a reduction is done a semantic routine is usually called This routine may do any of the following: –Check semantic things Is a variable defined? –Update the symbol table –Generate code –Do the things not possible with BNF Eventually we should be able to reduce to the distinguished symbol, then we are done Copyright © – Curt Hill

Finally Lexical analysis implements a finite state automaton Parsing implements a push down automaton –In a recursive descent the stack is the run-time stack of function calls –In bottom up parsing the stack contains tokens As Software Engineers we are usually interested in generated parsers for the ease of construction Copyright © – Curt Hill