Chapter 2: A Simple One Pass Compiler

Slides:



Advertisements
Similar presentations
Chapter 2-2 A Simple One-Pass Compiler
Advertisements

Lesson 6 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into.
Context-Free Grammars Lecture 7
Parsing III (Eliminating left recursion, recursive descent parsing)
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)
CH2.1 CSE4100 Chapter 2: A Simple One Pass Compiler Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371.
Chapter 2 A Simple Compiler
Chapter 2 Chang Chi-Chung rev.1. A Simple Syntax-Directed Translator This chapter contains introductory material to Chapters 3 to 8  To create.
MIT Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Abstract Syntax Trees Lecture 14 Wed, Mar 3, 2004.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Chapter 5 Syntax-Directed Translation Section 0 Approaches to implement Syntax-Directed Translation 1、Basic idea Guided by context-free grammar (Translating.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Review Joey Paquet,
LANGUAGE DESCRIPTION: SYNTACTIC STRUCTURE
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Introduction Fan Wu Department of Computer Science and Engineering
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
Topic #2: Infix to Postfix EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
Simple One-Pass Compiler
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Overview of Previous Lesson(s) Over View  In syntax-directed translation 1 st we construct a parse tree or a syntax tree then compute the values of.
LESSON 04.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Lesson 4 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
1 Semantic Analysis  Semantic analysis includes  Dynamic Checking (Those checks for which to perform, compiler doesn’t have sufficient information) 
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
C HAPTER 2. A S IMPLE S YNTAX -D IRECTED T RANSLATOR DR. NIDJO SANDJOJO, M.Sc.
Chapter 3 – Describing Syntax
A Simple Syntax-Directed Translator
Programming Languages Translator
CS510 Compiler Lecture 4.
Lecture #12 Parsing Types.
Chapter 3 Context-Free Grammar and Parsing
Parsing with Context Free Grammars
CS 363 Comparative Programming Languages
4 (c) parsing.
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Syntax-Directed Definition
Chapter 2: A Simple One Pass Compiler
Chapter 2: A Simple One Pass Compiler
R.Rajkumar Asst.Professor CSE
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Designing a Predictive Parser
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Presentation transcript:

Chapter 2: A Simple One Pass Compiler

The Entire Compilation Process Grammars for Syntax Definition Syntax-Directed Translation Parsing - Top Down & Predictive Pulling Together the Pieces The Lexical Analysis Process Symbol Table Considerations A Brief Look at Code Generation Concluding Remarks/Looking Ahead

Grammars for Syntax Definition A Context-free Grammar (CFG) Is Utilized to Describe the Syntactic Structure of a Language A CFG Is Characterized By: 1. A Set of Tokens or Terminal Symbols 2. A Set of Non-terminals 3. A Set of Production Rules Each Rule Has the Form NT  {T, NT}* 4. A Non-terminal Designated As the Start Symbol

Grammars for Syntax Definition Example CFG list  list + digit list  list - digit list  digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 (the “|” means OR) (So we could have written list  list + digit | list - digit | digit )

Grammars are Used to Derive Strings: Using the CFG defined on the previous slide, we can derive the string: 9 - 5 + 2 as follows: list  list + digit  list - digit + digit  digit - digit + digit  9 - digit + digit  9 - 5 + digit  9 - 5 + 2 P1 : list  list + digit P2 : list  list - digit P3 : list  digit P4 : digit  9 P4 : digit  5 P4 : digit  2

Grammars are Used to Derive Strings: This derivation could also be represented via a Parse Tree (parents on left, children on right) list  list + digit  list - digit + digit  digit - digit + digit  9 - digit + digit  9 - 5 + digit  9 - 5 + 2 list digit 9 5 2 - +

A More Complex Grammar What is this grammar for ? block  begin opt_stmts end opt_stmts  stmt_list |  stmt_list  stmt_list ; stmt | stmt What is this grammar for ? What does “” represent ? What kind of production rule is this ?

Defining a Parse Tree More Formally, a Parse Tree for a CFG Has the Following Properties: Root Is Labeled With the Start Symbol Leaf Node Is a Token or  Interior Node (Now Leaf) Is a Non-Terminal If A  x1x2…xn, Then A Is an Interior; x1x2…xn Are Children of A and May Be Non-Terminals or Tokens

Other Important Concepts Ambiguity Two derivations (Parse Trees) for the same token string. string - 9 + 5 2 string + 2 - 5 9 Grammar: string  string + string | string – string | 0 | 1 | …| 9 Why is this a Problem ?

Other Important Concepts Associativity of Operators Left vs. Right right letter c b a = list digit 9 5 2 - + list  list + digit | | list - digit | digit digit  0 | 1 | 2 | …| 9 right  letter = right | letter letter  a | b | c | …| z

Embedding Associativity The language of arithmetic expressions with + - (ambiguous) grammar that does not enforce associativity string  string + string | string – string | 0 | 1 | …| 9 non-ambiguous grammar enforcing left associativity (parse tree will grow to the left) string  string + digit | string - digit | digit digit  0 | 1 | 2 | …| 9 non-ambiguous grammar enforcing right associativity (parse tree will grow to the right) string  digit + string | digit - string | digit

Other Important Concepts Operator Precedence What does 9 + 5 * 2 mean? ( ) * / + - is precedence order Typically This can be incorporated into a grammar via rules: expr  expr + term | expr – term | term term  term * factor | term / factor | factor factor  digit | ( expr ) digit  0 | 1 | 2 | 3 | … | 9 Precedence Achieved by: expr & term for each precedence level Rules for each are left recursive or associate to the left

Syntax-Directed Translation Associate Attributes With Grammar Rules & Constructs and Translate As Parsing Occurs The translation will follow the parse tree structure (and as a result the structure and form of the parse tree will affect the translation). First example: Inductive Translation. Infix to Postfix Notation Translation for Expressions Translation defined inductively As: Postfix(E) where E is an Expression. Rules 1. If E is a variable or constant then Postfix(E) = E 2. If E is E1 op E2 then Postfix(E) = Postfix(E1 op E2) = Postfix(E1) Postfix(E2) op 3. If E is (E1) then Postfix(E) = Postfix(E1)

Examples Postfix( ( 9 – 5 ) + 2 ) = Postfix( ( 9 – 5 ) ) Postfix( 2 ) + = Postfix( 9 – 5 ) Postfix( 2 ) + = Postfix( 9 ) Postfix( 5 ) - Postfix( 2 ) + = 9 5 – 2 + Postfix(9 – ( 5 + 2 ) ) = Postfix( 9 ) Postfix( ( 5 + 2 ) ) - = Postfix( 9 ) Postfix( 5 + 2 ) – = Postfix( 9 ) Postfix( 5 ) Postfix( 2 ) + – = 9 5 2 + –

Syntax-Directed Definition Each Production Has a Set of Semantic Rules Each Grammar Symbol Has a Set of Attributes For the Following Example, String Attribute “t” is Associated With Each Grammar Symbol recall: a Derivation for 9 + 5 - 2? expr  expr – term | expr + term | term term  0 | 1 | 2 | 3 | … | 9 expr  expr - term  expr + term - term  term + term - term  9 + term - term  9 + 5 - term  9 + 5 - 2

Syntax-Directed Definition (2) Each Production Rule of the CFG Has a Semantic Rule Note: Semantic Rules for expr define t as a “synthesized attribute” i.e., the various copies of t obtain their values from “children t’s” Production Semantic Rule expr  expr + term expr.t := expr.t || term.t || ‘+’ expr  expr – term expr.t := expr.t || term.t || ’-’ expr  term expr.t := term.t term  0 term.t := ‘0’ term  1 term.t := ‘1’ …. …. term  9 term.t := ‘9’

Semantic Rules are Embedded in Parse Tree expr.t =95- expr.t =9 expr.t =95-2+ term.t =5 term.t =2 term.t =9 2 + 5 - 9 How Do Semantic Rules Work ? What Type of Tree Traversal is Being Performed? How Can We More Closely Associate Semantic Rules With Production Rules ? (“postorder depth-first”)

Translation Schemes Embed Semantic Actions into the right sides of the productions. expr  expr + term {print(‘+’)}  expr - term {print(‘-’)}  term term  0 {print(‘0’)} term  1 {print(‘1’)} … term  9 {print(‘9’)} term expr 9 5 2 - + {print(‘-’)} {print(‘9’)} {print(‘5’)} {print(‘2’)} {print(‘+’)}

Parsing – Top-Down & Predictive Top-Down Parsing  Parse tree / derivation of a token string occurs in a top down fashion. For Example, Consider: Start symbol type  simple |  id | array [ simple ] of type simple  integer | char | num dotdot num Suppose input is : array [ num dotdot num ] of integer Parsing would begin with type  ???

Top-Down Parse (type = start symbol) Lookahead symbol Input : array [ num dotdot num ] of integer type ? type ] simple of [ array type  simple |  id | array [ simple ] of type simple  integer | char | num dotdot num Start symbol Lookahead symbol Input : array [ num dotdot num ] of integer type ] simple of [ array num dotdot

Top-Down Parse (type = start symbol) Lookahead symbol Input : array [ num dotdot num ] of integer type ] simple of [ array num dotdot type  simple |  id | array [ simple ] of type simple  integer | char | num dotdot num Start symbol type ] simple of [ array num dotdot integer

Top-Down Process Recursive Descent or Predictive Parsing Parser Operates by Attempting to Match Tokens in the Input Stream Utilize both Grammar and Input Below to Motivate Code for Algorithm array [ num dotdot num ] of integer type  simple |  id | array [ simple ] of type simple  integer | char | num dotdot num procedure match ( t : token ) ; begin if lookahead = t then lookahead : = nexttoken else error end ;

Top-down algorithm (continued) procedure simple ; begin if lookahead = integer then match ( integer ); else if lookahead = char then match ( char ); else if lookahead = num then begin match (num); match (dotdot); match (num) end else error end ; type  simple |  id | array [ simple ] of type simple  integer | char | num dotdot num

Top-Down Algorithm (Continued) procedure type ; begin if lookahead is in { integer, char, num } then simple else if lookahead = ‘’ then begin match (‘’ ) ; match( id ) end else if lookahead = array then begin match( array ); match(‘[‘); simple; match(‘]’); match(of); type end else error end ; type  simple |  id | array [ simple ] of type simple  integer | char | num dotdot num

Tracing type  simple |  id | array [ simple ] of type simple  integer | char | num dotdot num Input: array [ num dotdot num ] of integer To initialize the parser: set global variable : lookahead = array call procedure: type Procedure call to type with lookahead = array results in the actions: match( array ); match(‘[‘); simple; match(‘]’); match(of); type Procedure call to simple with lookahead = num results in the actions: match (num); match (dotdot); match (num) Procedure call to type with lookahead = integer results in the actions: simple Procedure call to simple with lookahead = integer results in the actions: match ( integer )

Limitations Can we apply the previous technique to every grammar? NO: type  simple | array [ simple ] of type simple  integer | array digit digit  0|1|2|3|4|5|6|7|8|9 consider the string “array 6” the predictive parser starts with type and lookahead= array apply production type  simple OR type  array digit ??

Designing a Predictive Parser Consider A FIRST()=set of leftmost tokens that appear in  or in strings generated by . E.g. FIRST(type)={,array,integer,char,num} Consider productions of the form A, A the sets FIRST() and FIRST() should be disjoint Then we can implement predictive parsing (initially: start NT + lookahead=lefmost) Starting with A? we find into which FIRST() set the lookahead symbol belongs to and we use this production. Any non-terminal results in the corresponding procedure call Terminals are matched.

Problems with Top Down Parsing Left Recursion in CFG May Cause Parser to Loop Forever. Indeed: In the production AA we write the program procedure A { if lookahead belongs to First(A) then call the procedure A } Solution: Remove Left Recursion... without changing the Language defined by the Grammar.

Dealing with Left recursion Solution: Algorithm to Remove Left Recursion: BASIC IDEA: AA| becomes A R R R|  expr  expr + term | expr - term | term term  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr  term rest rest  + term rest | - term rest | 

What happens to semantic actions? expr  expr + term {print(‘+’)}  expr - term {print(‘-’)}  term term  0 {print(‘0’)} term  1 {print(‘1’)} … term  9 {print(‘9’)} expr  term rest rest  + term {print(‘+’)} rest  - term {print(‘-’)} rest   term  0 {print(‘0’)} term  1 {print(‘1’)} … term  9 {print(‘9’)}

Comparing Grammars with Left Recursion Notice Location of Semantic Actions in Tree What is Order of Processing? expr term {print(‘2’)} {print(‘+’)} {print(‘5’)} {print(‘-’)} {print(‘9’)} 5 + 2 - 9

Comparing Grammars without Left Recursion Now, Notice Location of Semantic Actions in Tree for Revised Grammar What is Order of Processing in this Case? {print(‘2’)} expr term term {print(‘-’)} term {print(‘+’)} {print(‘5’)} {print(‘9’)} rest 2 5 - 9 +  rest

The Lexical Analysis Process A Graphical Depiction returns token to caller uses getchar ( ) to read character lexan ( ) lexical analyzer pushes back c using ungetc (c , stdin) tokenval Sets global variable to attribute value

The Lexical Analysis Process Functional Responsibilities Input Token String Is Broken Down White Space and Comments Are Filtered Out Individual Tokens With Associated Values Are Identified Symbol Table Is Initialized and Entries Are Constructed for Each “Appropriate” Token Under What Conditions will a Character be Pushed Back?

Example of a Lexical Analyzer function lexan: integer ; var lexbuf : array[ 0 .. 100 ] of char ; c : char ; begin loop begin read a character into c ; if c is a blank or a tab then do nothing else if c is a newline then lineno : = lineno + 1 else if c is a digit then begin set tokenval to the value of this and following digits ; return NUM end

Algorithm for Lexical Analyzer else if c is a letter then begin place c and successive letters and digits into lexbuf ; p : = lookup ( lexbuf ) ; if p = 0 then p : = insert ( lexbf, ID) ; tokenval : = p return the token field of table entry p end else set tokenval to NONE ; / * there is no attribute * / return integer encoding of character c Note: Insert / Lookup operations occur against the Symbol Table !

Symbol Table Considerations OPERATIONS: Insert (string, token_ID) Lookup (string) NOTICE: Reserved words are placed into symbol table for easy lookup Attributes may be associated with each entry, i.e., Semantic Actions Typing Info: id  integer etc. ARRAY symtable lexptr token attributes div mod id 1 2 3 4 d i v EOS m o d EOS c o u n t EOS i EOS ARRAY lexemes

A Brief Look at Code Generation Back-end of Compilation Process - Which Will Not Be Our Emphasis We’ll Focus on Front-end Important Concepts to Re-emphasize •• Abstract Stack Machine for Intermediate Code Generation: (i) basic arithmetic, (ii) stack, (iii), flow control •• L-value Vs. R-value of an identifier I : = 5 ; L - Location I : = I + 1 ; R - Contents

A Brief Look at Code Generation Employ Statement Templates for Code Generation. Each Template Characterizes the Translation Different Templates for Each Major Programming Language Construct, if, while, procedure, etc. WHILE IF label test code for expr code for expr gofalse out gofalse out code for stmt code for stmt label out goto test label out

Concluding Remarks / Looking Ahead We’ve Reviewed / Highlighted Entire Compilation Process Introduced Context-free Grammars (CFG) and Indicated /Illustrated Relationship to Compiler Theory Reviewed Many Different Versions of Parse Trees That Assist in Both Recognition and Translation We’ll Return to Beginning - Lexical Analysis We’ll Explore Close Relationship of Lexical Analysis to Regular Expressions, Grammars, and Finite Automatons