C HAPTER 2. A S IMPLE S YNTAX -D IRECTED T RANSLATOR DR. NIDJO SANDJOJO, M.Sc.

Slides:



Advertisements
Similar presentations
Chapter 2-2 A Simple One-Pass Compiler
Advertisements

Lesson 6 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Intermediate Code Generation
1 Compiler Construction Intermediate Code Generation.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
CH2.1 CSE4100 Chapter 2: A Simple One Pass Compiler Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371.
Chapter 2 A Simple Compiler
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
Chapter 2 Chang Chi-Chung rev.1. A Simple Syntax-Directed Translator This chapter contains introductory material to Chapters 3 to 8  To create.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Review Joey Paquet,
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring CSE3302 Programming Languages,
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Introduction Fan Wu Department of Computer Science and Engineering
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
Topic #2: Infix to Postfix EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Bernd Fischer RW713: Compiler and Software Language Engineering.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
CPS 506 Comparative Programming Languages Syntax Specification.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Introduction to Compiling
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Overview of Previous Lesson(s) Over View  In syntax-directed translation 1 st we construct a parse tree or a syntax tree then compute the values of.
LESSON 04.
Overview of Previous Lesson(s) Over View  Syntax-directed translation is done by attaching rules or program fragments to productions in a grammar. 
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter 2: A Simple One Pass Compiler
Lecture 9 Symbol Table and Attributed Grammars
Chapter 3 – Describing Syntax
A Simple Syntax-Directed Translator
Constructing Precedence Table
Programming Languages Translator
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSE 3302 Programming Languages
CS 3304 Comparative Languages
Syntax-Directed Definition
Chapter 2: A Simple One Pass Compiler
Chapter 6 Intermediate-Code Generation
CSE401 Introduction to Compiler Construction
R.Rajkumar Asst.Professor CSE
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
Designing a Predictive Parser
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
Chapter 10: Compilers and Language Translation
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Faculty of Computer Science and Information System
Presentation transcript:

C HAPTER 2. A S IMPLE S YNTAX -D IRECTED T RANSLATOR DR. NIDJO SANDJOJO, M.Sc

Outline Syntax Definition Syntax-Directed Translation Parsing A Translator for Simple Expressions Lexical Analysis Symbol Tables Intermediate Code Generation

Introduction An introduction to compiling techniques in Chap. 3-6 The working Java translator appears in Appendix A

A Code Fragment to be Translated { int i; int j; float[100] a; float v; float x; while (true) { do i = i+1; while (a[i] v); if (i >= j) break; x = a[i]; a[i] = a[j]; a[j] = x; } }

Simplified Intermediate Code 1.i = i t1 = a[ i ] 3.if t1 < v goto 1 4.j = j – 1 5.t2 = a [ j ] 6.if t2 > v goto 4 7.ifFalse I >= j goto 9 8.goto 14 9.x = a [ i ] 10.t3 = a [ j ] 11.a [ i ] = t3 12.a [ j ] = x 13.goto 1

A Model of a Compiler Front End Lexical Analyzer Parser Intermediate Code Generator Symbol Table source program tokens syntax tree three-address code

Intermediate Code for “ do i=i+1; while (a[i]<v); ” 1: i = i + 1 2: t1 = a [ i ] 3: if t1 < v goto 1 do-while body < [] v a assign i i+ i1

Syntax Definition Grammar: describes the hierarchical structure of programming language constructs –Ex: if-else statement in Java if (expression) statement else statement A production – stmt  if ( expr ) stmt else stmt –terminals: if, else, (, ) –nonterminals: expr, stmt

Definition of Grammars A context-free grammar –A set of terminal symbols (tokens) –A set of nonterminals (syntactic variables) –A set of productions –A designation of one of the nonterminals as the start symbol

Ex. “Lists of digits separated by plus or minus signs” – list  list + digit (2.1) – list  list – digit (2.2) – list  digit (2.3) – digit  0|1|2|3|4|5|6|7|8|9(2.4)

Derivations A grammar derives strings by beginning with the start symbol and repeatedly replacing a nonterminal by the body of a production The terminal strings that can be derived from the start symbol form the language defined by the grammar Ex is a list

Parsing Parsing: the problem of taking a string of terminals and figuring out how to derive it from the start symbol of the grammar, and reporting syntax errors (Chap. 4) Parse Trees A  XYZ A XZY

Example Parse Tree Ex. Parse tree for list digit listdigit 9-5+2

Ambiguity A grammar is ambiguous if it has more than one parse tree generating a given string of terminals Ex. String  string + string | string – string | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Two parse trees for string

Associativity of Operators Left-associative: +, -, *, / Right-associative: ^, = (assignment) Ex. a=b=c right  letter = right | letter letter  a | b | … | z

Parse trees for left- and right-associative grammars right letterright = a letterright b=c list digit listdigit 9-5-2

Precedence of Operators Precedence: relative importance of operators Left-associative: + - Left-associative: * / A grammar for arithmetic expressions: expr  expr + term | expr – term | term term  term * factor | term / factor | factor factor  digit | ( expr )

Ex. An ambiguous grammar for a subset of Java statements stmt  id = expr ; | if ( expr ) stmt | if ( expr ) stmt else stmt | while ( expr ) stmt | do stmt while ( expr ) ; | { stmts } stmts  stmts stmt | 

Syntax-Directed Translation Syntax-directed translation is done by attaching rules or program fragments to productions in a grammar E.g. expr  expr 1 + term Translate expr 1 ; Translate term ; Handle +; Ex. Translation of infix expressions into postfix notation

Two Concepts Attributes : any quantity associated with a programming construct (Syntax-directed) translation schemes A notation for attaching program fragments to the productions of a grammar (Chap. 5)

Postfix Notation If E is a variable or constant,  E If E is an expression of the form E 1 op E 2,  E 1 ’ E 2 ’ op, where E 1 ’ and E 2 ’ are the postfix notations for E 1 and E 2, respectively If E is a parenthesized expression of the form ( E 1 ),  E 1 Ex:  (9-5)+2   9-(5+2)   * = ?

Synthesized Attributes A syntax-directed definition associates  With each grammar symbol, a set of attributes  With each production, a set of semantic rules for computing the values of the attributes

Attribute values at nodes in a parse tree expr.t = expr.t = 95- term.t = 2 expr.t = 9term.t = 5 term.t =

A Simple Syntax-Directed Definitions Ex. ProductionSemantic rules expr  expr 1 + term expr  expr 1 – term expr  term term  0 … term  9 expr.t = expr 1.t ||term.t || ‘+’ expr.t = expr 1.t ||term.t || ‘-’ expr.t = term.t term.t = ‘ 0 ’ … term.t = ‘ 9 ’

Tree Traversals Depth-first traversal Procedure visit(node N) { for (each child C of N, from left to right) { visit(C); } evaluate semantic rules at node N; }

Translation Schemes Program fragments embedded within production bodies are called semantic actions –Ex. rest  + term {print(‘+’)} rest 1 rest +{print(‘+’)}term rest 1

Actions translating into expr term exprterm {print(‘+’)} {print(‘2’)} {print(‘-’)} {print(‘5’)} {print(‘9’)}

Actions for translating into postfix notation – expr  expr + term {print(‘+’)} expr  expr – term {print(‘-’)} expr  term term  0 {print(‘0’)} … term  9 {print(‘9’)}

Parsing Two classes: depending on the order in which nodes in the parse trees are constructed  Top-down: starts at the root Efficient parsers can be constructed more easily by hand  Bottom-up: starts at the leaves Can handle a large class of grammars, often used by software tools

Top-Down Parsing Two steps: –At node N (labeled with nonterminal A), select one of the productions for nonterminal A, and construct children at N –Find the next node at which a subtree is to be constructed, typically the leftmost unexpanded nonterminal

Ex: A grammar for some statements in C and Java – stmt  expr ; | if ( expr ) stmt | for ( optexpr ; optexpr ; optexpr ) stmt | other – optexpr  expr | 

A parse tree according to the grammar stmt ; optexpr stmt ) optexpr ; (for otherexpr 

Input: for ( ; expr ; expr ) other Parse tree: stmt ; optexpr stmt) optexpr ; (for

Predictive Parsing Recursive-decent parsing : a top-down method – Predictive parsing : a simple form of recursive- decent parsing, in which the lookahead symbol umambiguously determines the flow of control for each nonterminal

Pseudocode for a Predictive Parser void stmt() { switch (lookahead) { case expr : match( expr ); match(';'); break; case if : match( if ); match('('); match( expr ); match(')'); stmt(); break; case for : match( for ); match('('); optexpr(); match(';'); optexpr(); match(';'); optexpr(); match(')'); stmt(); break; case other : match( other ); break; default: report("syntax error"); }

void optexpr() { if (lookahead == expr ) match( expr ); } void match(terminal t) { if (lookahed == t) lookahead = nextTerminal; else report("syntax error"); }

FIRST(  ): the set of terminals that appear as the first symbols of one or more strings of terminals generated from  Ex. –FIRST( stmt ) = { expr, if, for, other } –FIRST( expr ;) = { expr } In predictive parsing, the FIRST sets must be disjoint for different productions for the same nonterminal

 -Productions Doing nothing corresponds to applying an  -production –if (lookahead == expr ) match( expr ); (For more details, see LL(1) grammars in Section 4.4.3)

Designing a Predictive Parser The procedure for nonterminal A: –To decide which A-production to use by examining the lookahead symbol The production with body  is used if the lookahead symbol is in FIRST(  ) –To “execute” the symbols of the body in turn Nonterminals: call the procedure for the nonoterminal Terminals: reading the next input symbol

Left Recursion Problem: –Ex: expr  expr + term Left-recursive production can be eliminated by rewriting the offending production –Original: A  A  |  –Becomes: A   R R   R | 

A Translator for Simple Expressions A conflict: –We need a grammar that facilitates translation –We need a grammar that facilitates parsing Solution: –To begin with the grammar for easy translation –To carefully transform it to facilitate parsing

Abstract and Concrete Syntax Abstract syntax tree  Interior node: operator Any programming construct  Children of the node: operands Parse tree  Interior node: nonterminals Some are programming constructs, some are “helpers”  A concrete syntax tree It’s desirable for a translation scheme to be based on a grammar whose parse trees are as close to syntax trees as possible

Adapting the Translation Scheme The left-recursion-elimination technique extends to multiple productions:  A  A  | A  |   A   R R   R |  R |  We need to transform productions that have embedded actions, not just terminals and nonterminals

Let A = expr  = + term {print(‘+’) }  = - term {print(‘-’) }  = term After left-recursion elimination expr  term rest rest  + term {print(‘+’) } rest | - term {print(‘-’) } rest |  term  0 { print(‘0’) } | 1 { print(‘1’) } … | 9 { print(‘9’) }

expr rest term rest term {print(‘+’)} {print(‘2’)} {print(‘-’)} {print(‘5’)} {print(‘9’)} rest  Translation of to 95-2+

Procedures for the Nonterminals void expr() { term(); rest(); } void rest() { if (lookahead==‘+’) { match(‘+’); term(); print(‘+’); rest(); } else if (lookahead==‘-’) { match(‘-’); term(); print(‘-’); rest(); } else {} } void term() { if (lookahead is a digit) { t = lookahead; match(lookahead); print(t); } }

Simplifying the Translator Certain recursive calls can be replaced by iterations –Tail recursive calls void rest() { while (true) { if (lookahead==‘+’) { match(‘+’); term(); print(‘+’); continue; } else if (lookahead==‘-’) { match(‘-’); term(); print(‘-’); continue; } break; } }

The Complete Program (See Fig. 2.27)

Lexical Analysis Lexeme: a sequence of input characters that comprises a single token The extended translation scheme – expr  expr + term {print(‘+’)} | expr – term {print(‘-’)} | term term  term * factor { print(‘*’) } | term / factor { print(‘/’) } | factor factor  ( expr ) | num {print(num.value) } | id {print(id.lexeme) }

Removal of White Space and Comments for ( ; ; peek = next input character) { if (peek is a bank or a tab) do nothing; else if (peek is a newline) line=line+1; else break; }

Reading Ahead To maintain an input buffer from which the lexical analyzer can read and push back characters The lexical analyzer reads ahead only when it must

Constants The job of collecting characters into integers and computing their numerical value is given to a lexical analyzer – is transformed into –if (peek holds a digit) { v = 0; do { v = v*10 + integer value of digit peek; peek = net input character; } while (peek holds a digit); return token ; }

Recognizing Keywords and Identifiers Ex: –count = count + increment; Using a table to hold character strings –Single representation –Reserved words

In Java, –Hashtable words = new Hashtable(); –if (peek holds a letter) { collect letters or digits into a buffer b; s = string formed from the characters in b; w = token returned by words.get(s); if (w is not null) return w; else { enter the key-value pair (s, ) into words; return token ; } }

A Lexical Analyzer Token scan() { skip white space; handle numbers; handle reserved words and identifiers; Token t = new Token(peek); peek = blank; return t; }

class Token  class Num  class Word (details in Sec and Appendix A) Code: (Fig & 2.35)

Symbol Table Information about source programs constructs –Identifiers: lexeme, type, position in storage, … Scopes implemented by setting up a separate symbol table for each scope –{ int x; char y; { bool y; x; y; } x; y;} –{ { x:int; y:bool; } x:int; y:char; }

Symbol Table Per Scope The most-closely nested rule: an identifier x is in the scope of the most-closely nested declaration of x It can be implemented by chaining symbol tables

Chained Symbol Table Ex: (Fig. 2.36) B0B0 B1B1 B2B2

Chained Symbol Table Code: (Fig. 2.37)

The Use of Symbol Tables (Fig. 2.38)

Intermediate Code Generation Two kinds of intermediate representations  Trees: parse trees, syntax trees  Linear representations, e.g. “three-address code” A sequence of elementary program steps Static checking for syntax and semantic rules

Construction of Syntax Trees (See Fig. 2.39)

Static Checking Static checking includes:  Syntax checking  Type checking L-values and R-values  l-value: the left side of an assignment (location)  r-value: the right side of an assignment (value)

Coercions: the type of an operand is automatically converted to the type expected by the operator Overloading: having different meanings depending on the context

Three-Address Code x = y op z –x [ y ] = z, x = y [ z ] Translation of statements –if expr then stmt 1 ; –(Fig. 2.42): code to compute expr into x ifFalse x goto after code for stmt 1 –(Fig. 2.43)

Translation of expressions ▫i-j+k t1 = i – j t2 = t1 + k ▫2*a[i] t1 = a [ i ] t2 = 2 * t1 ▫Two functions: lvalue(), rvalue() ▫(Fig. 2.44, 2.45)

Expr lvalue(x: Expr) { if (x if an Id node) return x; else if (x is an Access(y, z) node and y is an Id node) { return new Access(y, rvalue(z)); } }

Expr rvalue(x: Expr) { if (x is an Id or a Constant node) return x; else if (x is an Op(op, y, z) or a Rel(op, y, z) node) { t = new temporary; emit string for t=rvalue(y) op rvalue(z); return a new node for t; } else if (x is an Access(y, z) node) { t = new temporary; call lvalue(x), which returns Access(y, z’); emit string for t = Access(y, z’); return a new node for t; } else if (x is an Assign(y, z) node) { z’ = rvalue(z); emit string for lvalue(y)=z’; return z’; } }

Ex: ▫a[i] = 2*a[j-k] ▫t3 = j-k t2 = a [ t3 ] t1 = 2 * t2 a [ i ] = t1

Better code for expressions ▫Reduce the number of copy instructions in a subsequent optimization phase ▫Generate fewer instructions in the first place by taking context into account

Summary (Fig. 2.46)