Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.

Slides:



Advertisements
Similar presentations
Chapter 2-2 A Simple One-Pass Compiler
Advertisements

Lesson 6 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Chapter 2 Syntax. Syntax The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be.
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
CH2.1 CSE4100 Chapter 2: A Simple One Pass Compiler Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371.
Chapter 2 A Simple Compiler
1 Study of a Simple Compiler In this chapter we will study a simple compiler and study the different steps to build a compiler. This chapter will be an.
Chapter 2 Chang Chi-Chung rev.1. A Simple Syntax-Directed Translator This chapter contains introductory material to Chapters 3 to 8  To create.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
Chapter 5 Syntax-Directed Translation Section 0 Approaches to implement Syntax-Directed Translation 1、Basic idea Guided by context-free grammar (Translating.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Syntax and Semantics Structure of programming languages.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring CSE3302 Programming Languages,
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
Introduction Fan Wu Department of Computer Science and Engineering
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
Topic #2: Infix to Postfix EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Lexical and Syntax Analysis
Syntax and Semantics Structure of programming languages.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
Simple One-Pass Compiler
Introduction to Compiling
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 4.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Overview of Previous Lesson(s) Over View  In syntax-directed translation 1 st we construct a parse tree or a syntax tree then compute the values of.
LESSON 04.
Overview of Previous Lesson(s) Over View  Syntax-directed translation is done by attaching rules or program fragments to productions in a grammar. 
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Chapter 2: A Simple One Pass Compiler
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Syntax and Semantics Structure of programming languages.
C HAPTER 2. A S IMPLE S YNTAX -D IRECTED T RANSLATOR DR. NIDJO SANDJOJO, M.Sc.
Lecture 9 Symbol Table and Attributed Grammars
Chapter 3 – Describing Syntax
A Simple Syntax-Directed Translator
Constructing Precedence Table
Programming Languages Translator
CS510 Compiler Lecture 4.
Lecture #12 Parsing Types.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compiler Construction
Compiler Construction
CSE 3302 Programming Languages
Lexical and Syntax Analysis
Syntax-Directed Definition
Chapter 2: A Simple One Pass Compiler
Chapter 2: A Simple One Pass Compiler
R.Rajkumar Asst.Professor CSE
Designing a Predictive Parser
BNF 9-Apr-19.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Faculty of Computer Science and Information System
Presentation transcript:

Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler

Yu-Chen Kuo2 2.1 Overview Programming language: –What its program look like (Syntax : context-free grammars) –What its program mean (Semantics : more difficult)

Yu-Chen Kuo3 2.2 Syntax Definition Context-free grammar Grammar : hierarchical structure –stmt  if (expr) stmt else stmt –production –token: if, (, else –nonterminal: expr, stmt

Yu-Chen Kuo4 Context-free Grammar 1.A set of tokens (terminals) Digits Sign (+, -, <, =) if, while 2.A set of nonterminals 3.A set of productions nonterminal  ternimal/nonterminal left side  right side 4.First nonterminal symbol: start symbol

Yu-Chen Kuo5 Example 2.1: Grammars of expression ‘9-5+2’ Example 2.1: grammars of expression ‘9-5+2’ list  list + digit list  list – digit list  digit digit  0| 1| 2| 3| 4| 5| 6| 7| 8| 9  list  list+digit | list-digit | digit nonterminal: list ( start symbol ), digit terminal (token): 0| 1| 2| 3| 4| 5| 6| 7| 8| 9

Yu-Chen Kuo6 Example 2.1: Grammars of expression ‘9-5+2’ Token strings are derived from the start symbol and repeatedly replacing a nonterminal by the right side of a production Empty string:  All possible token strings form the language defined by the grammar

Yu-Chen Kuo7 Example 2.2: Parse Tree Show how the start symbol derives a string list  list + digit list  list – digit list  digit digit  0| 1| 2| 3| 4| 5| 6| 7| 8| 9

Yu-Chen Kuo8 Parse Trees A XY Z 1.Root is labeled by start symbol 2.Each leaf is labeled by a token or  3.Each interior is labeled by a nonterminal 4.If A is the nonterminal node and X 1, X 2,..X n are the labels of children of that node from left to right, then A  X 1, X 2,..X n, is a production

Yu-Chen Kuo9 Example 2.3: Pascal begin-end blocks block  begin opt_stmts end opt_stmts  stmt_list |  stmt_list  stmt_list ; stmt | stmt stmt  if (expr) stmt else stmt | assignment stmt

Yu-Chen Kuo10 Ambiguity of A Grammar A grammar is said to be ambiguous if it can have more than one parser tree generating a given string.

Yu-Chen Kuo11 Ambiguity of A Grammar string  string+string | string-string string  0|1|2|3|4|5|6|7|8|9 Two expressions (9-5)+2 and 9-(5+2)

Yu-Chen Kuo12 Associativity of Operators Left Associative:  (9+5)-2 –+, -, *, / –Parse tree grows down towards the left Right Associative: a=b=c  a=(b=c) –Parse tree grows down towards the right

Yu-Chen Kuo13 Associativity of Operators right  letter = right | letter letter  a|b|c|…|z

Yu-Chen Kuo14 Precedence of Operators 9+5*2  9+(5*2) *, / has higher precedence than +, - *, /, +, - are all left associative term for *, / –term  term * factor | term / factor | factor expr for +,- –expr  expr + factor | expr – factor | factor factor  digit |(expr)

Yu-Chen Kuo15 Precedence of Operators Syntax of expression expr  expr + term | expr – term | term term  term * factor | term / factor | factor factor  digit |(expr) Syntax of statement for Pascal (ambiguous?) stmt  id := expr | if expr then stmt | if expr then stmt else stmt | while expr do stmt | begin opt_stmts end

Yu-Chen Kuo Syntax-Directed Translation The syntax-directed definition and translation schema are two formalisms for specifying translations for programming language A syntax-directed definition uses a context- grammar to specify the syntactic structure With each grammar symbol X, it associates a set of attributes, and with each production, a set of semantic rules for computing value of the attributes X.a of the symbols The grammar and the set of semantic rules constitute the syntax-directed definition

Yu-Chen Kuo Syntax-Directed Translation A syntax-directed definition for translating expressions consisting of digits separated by plus or minus into postfix notation

Yu-Chen Kuo18 Postfix Notation 1.If E is a variable, then postfix(E)=E 2.If E is an expression of form E 1 op E 2, then the postfix(E)= E 1 E 2 op, where E 1 = postfix(E 1 )= and E 2 = postfix(E 2 ) 3.If E is an expression of the form (E 1 ), then postfix(E)= postfix (E 1 ) postfix(9-5+2)=95-2+

Yu-Chen Kuo19 Postfix Notation

Yu-Chen Kuo20 Robot’s position

Yu-Chen Kuo21 Robot’s position

Yu-Chen Kuo22 Robot’s position

Yu-Chen Kuo23 Depth-First Traversals

Yu-Chen Kuo24 Translation Schemes A translation scheme is a context-free grammar in which semantic actions are embedded within the right sides of productions A translation scheme is like a syntax- directed definition, except the order of evaluation of the semantic rules is explicitly shown

Yu-Chen Kuo25 Translation Schemes

Yu-Chen Kuo Parsing Parsing is the process of determining if a string of tokens can be generated by a grammar. For any context-free grammar, a parser will takes at most O(n 3 ) time to parse a string of n tokens, too expensive. Given a programming language, we can generally construct a grammar that can be parsed in linear time ( make a single left-to-right scan, looking ahead one token at a time)

Yu-Chen Kuo Parsing Top-down parser: parser tree construction starts at the root and proceeds towards the leaves Bottom-up parser : parser tree construction starts at the leaves and proceeds towards the root. (most class of grammars)

Yu-Chen Kuo28 Top-Down Parsing The construction of parser tree is done by started with the root, labeled with the starting nonterminal, and repeatedly performing the following two steps. 1.At node n, labeled with A, select one of production for A and construct children at n for the symbols on the right side of production. 2.Find the next node at which a subtree is to be constructed.

Yu-Chen Kuo29 Example type  simple |  id | array [simple] of type simple  integer | char | num dotdot num e.x.; array [ num dotdot num ] of integer

Yu-Chen Kuo30 Example (Cont.) type  simple |  id | array [simple] of type simple  integer | char | num dotdot num e.x.; array [ num dotdot num ] of integer

Yu-Chen Kuo31 Example (Cont.) type  simple |  id | array [simple] of type simple  integer | char | num dotdot num

Yu-Chen Kuo32 Example (Cont.) type  simple |  id | array [simple] of type simple  integer | char | num dotdot num

Yu-Chen Kuo33 Predictive Parsing Recursive-descent parsing is a top-down parsing

Yu-Chen Kuo34 Predictive Parsing (Cont.) type  simple |  id | array [simple] of type simple  integer | char | num dotdot num

Yu-Chen Kuo35 Predictive Parsing (Cont.) Use lookahead symbol and first symbol ( FIRST )of a production to unambiguously determine the procedure selected for each nonterminal. FIRST (  ): the set of tokens that appear as the first symbols of one or more strings generated from  –FIRST (simple) = { integer, char, num } –FIRST (  id ) = {  } –FIRST ( array [ simple] of type) = { array } A   | , then FIRST (  )  FIRST (  ) in predictive parsing

Yu-Chen Kuo36 When to Use  -Production stmt  begin opt_stmts end opt_stmts  stmt_list |  While parsing opt_stmts, if lookahead symbol is not in FIRST (stmt_list), then  –production is used, lookahead symbol is end ; otherwise, error

Yu-Chen Kuo37 Designing a Predictive Parser Consisting of a procedure for every nonterminal Each procedure does two things. 1.Decide which production to use by looking at the lookahead symbol. The production with right side  is used if the lookahead symbol is in FIRST (  ). If the lookahead symbol is not in the FIRST set of any other right hand side, a production with  on the right side is used. 2.The procedure uses a production by mimicking the right side. A nonterminal results in a procedure call for the nonterminal. A token matching the lookahead symbol results in reading the next input token.

Yu-Chen Kuo38 Eliminating Left Recursion expr  expr + term | term –Loop forever expr( )  A  A  |   A   R R   R |   expr  expr + term | term  expr  term rest rest  + term rest | 

Yu-Chen Kuo39 Eliminating Left Recursion (Cont.)

Yu-Chen Kuo40 A Translator for Simple Expressions

Yu-Chen Kuo41 Adapting the Translation Scheme Eliminate left recursion  A  A  | A  |   A   R R   R |  R |   expr  expr + term {print(‘+’)}  expr  term rest rest  + term {print(‘+’)} rest | - term {print(‘-’)} rest |  term  0 {print(‘0’)}  term  9 {print(‘9’)}

Yu-Chen Kuo42 Adapting the Translation Scheme (Cont.)

Yu-Chen Kuo43 Procedures for the Nonterminals expr, term, and rest

Yu-Chen Kuo44 Optimizing the Translator Replacing tail recursion by iteration rest ( ) { L: if (lookahead == ‘+’) { match(‘+’); term ( ); putchar(‘+’); goto L; } else if (lookahead == ‘-’) { match(‘-’); term ( ); putchar(‘-’); goto L; } else; }

Yu-Chen Kuo45 Optimizing the Translator (Cont.)

Yu-Chen Kuo46 The Complete Program

Yu-Chen Kuo47 The Complete Program (Cont.)

Yu-Chen Kuo48 The Complete Program (Cont.)

Yu-Chen Kuo Lexical Analysis Removal of White Space and Comments –Blanks, tabs, newlines Constants –Adding production to the grammar for expressions –Creating a token num for constants – Recognizing Identifiers and Keywords –Keywords are reserved –begin /* keyword */ count = count + increment; /* id = id + id */ end

Yu-Chen Kuo50 Interface to the Lexical Analyzer A lexical analyzer reads characters, group into lexemes, and passes the tokens formed by the lexemes, together with their attribute values to the later stages of the compiler.

Yu-Chen Kuo51 Interface to the Lexical Analyzer In some situations, the lexical analyzer has to read some characters ahead before it can decide on the token to be returned to the parser. –Decide ‘>’ or ‘>=‘ –Push back if need –Using an input buffer and a pointer keeping track the next character The lexical analyzer produces a token and the parser consumes the token. Usually, the parser call the lexical analyzer to return tokens on demand.

Yu-Chen Kuo52 A Lexical Analyzer A lexical analyzer allows white space and numbers to appear within expressions.

Yu-Chen Kuo53 A Lexical Analyzer (Cont.) If a data structure does not be allowed to be returned, then tokens and their attributed have to be passed separately. Usually, lexan returns an integer encoding of a token Use integer ‘256’ to encode num tokenval: token attribute value –When scans an integer 13, token num (256) and tokenval (13) are returned to parser –When scans an identifier initial, token id (259) and tokenval (symbol table index p) are returned to parser

Yu-Chen Kuo54 A Lexical Analyzer (Cont.) Allowing numbers within expressions requires a change in grammar expr  factor factor  (expr) | num {print( num.value)}

Yu-Chen Kuo55 A Lexical Analyzer (Cont.)

Yu-Chen Kuo56 A Lexical Analyzer (Cont.)

Yu-Chen Kuo Incorporating a Symbol Table The symbol table is collected by the analysis phases (lexical:identifier, syntax: type) of the compiler and used by the synthesis phases (code generator). Primarily routines are saving and retrieving lexemes. –insert(s,t) : Returns index of new entry for string s, token t –lookup(s) : Returns index of the entry for string s, or 0 if s is not found. The lexical analyzer uses the lookup operation to determine if there is an entry for a lexeme in the symbol table. If no entry exists, then it uses the insert operation to create one.

Yu-Chen Kuo58 Handling Reserved Keywords Reserved keywords are inserted into the symbol table initially. For example, consider tokens div and mod with lexemes div and mod, respectively. We can initialize the symbol table using the calls –insert (“ div ”, div ); –insert (“ mod ”, mod ); Any subsequent call lookup(“ div ”) returns the token div, so div cannot be used as an identifier.

Yu-Chen Kuo59 A Symbol-Table Implementation integer 259 real (type)

Yu-Chen Kuo60 Pseudo-code for a lexical analyzer

Yu-Chen Kuo61 Pseudo-code for a lexical analyzer (Cont.)

Yu-Chen Kuo Putting The Techniques Together (infix  postfix translator) An infix-to-postfix translator for expressions Expressions consist of numbers, identifiers, and operators +,-, *, /, div, and mod. id : a sequence of letters and digits beginning with a letter num : a sequence of digits Tokens are separated by blanks, tabs, newlines (white space)

Yu-Chen Kuo63 infix  postfix translator (Cont.)

Yu-Chen Kuo64 Modules the infix-to-postfix translator

Yu-Chen Kuo65 The Lexical Analysis Module lexer.c

Yu-Chen Kuo66 The Parser Module parser.c

Yu-Chen Kuo67 The Emitter Module emitter.c Emit(t, tval) –Output for token t with attribute value tval

Yu-Chen Kuo68 The Symbol-Table Module symbol.c and init.c Implement symtable data strucrure and functions –lookup(s) –insert(s, tok)

Yu-Chen Kuo69 The Error Module error.c Error reporting ( printf)