1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman (1435-1436)

Slides:



Advertisements
Similar presentations
lec02-parserCFG March 27, 2017 Syntax Analyzer
Advertisements

Compiler Construction
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Lecture # 7 Chapter 4: Syntax Analysis. What is the job of Syntax Analysis? Syntax Analysis is also called Parsing or Hierarchical Analysis. A Parser.
LESSON 18.
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Top-Down Parsing.
Chapter 4 Chang Chi-Chung
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
UNIT - 2 -Compiled by: Namratha Nayak | Website for Students | VTU - Notes - Question Papers.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Lecture 3: Parsing CS 540 George Mason University.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman Summer 2004 (1425)
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Introduction to Parsing
lec02-parserCFG May 8, 2018 Syntax Analyzer
CS510 Compiler Lecture 4.
Context free grammars Terminals Nonterminals Start symbol productions
Top-down parsing cannot be performed on left recursive grammars.
Syntax Analysis Chapter 4.
Compiler Construction
Top-Down Parsing.
Syntax Analysis Sections :.
Subject Name:COMPILER DESIGN Subject Code:10CS63
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Top-Down Parsing The parse tree is created top to bottom.
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )

2 Lexical Analyzer and Parser

3 Parser Accepts string of tokens from lexical analyzer (usually one token at a time) Verifies whether or not string can be generated by grammar Reports syntax errors (recovers if possible)

4 Errors Lexical errors (e.g. misspelled word) Syntax errors (e.g. unbalanced parentheses, missing semicolon) Semantic errors (e.g. type errors) Logical errors (e.g. infinite recursion)

5 Error Handling Report errors clearly and accurately Recover quickly if possible Poor error recover may lead to avalanche of errors

6 Error Recovery Panic mode: discard tokens one at a time until a synchronizing token is found Phrase-level recovery: Perform local correction that allows parsing to continue Error Productions: Augment grammar to handle predicted, common errors Global Production: Use a complex algorithm to compute least-cost sequence of changes leading to parseable code

7 Context Free Grammars CFGs can represent recursive constructs that regular expressions can not A CFG consists of: –Tokens (terminals, symbols) –Nonterminals (syntactic variables denoting sets of strings) –Productions (rules specifying how terminals and nonterminals can combine to form strings) –A start symbol (the set of strings it denotes is the language of the grammar)

8 Derivations (Part 1) One definition of language: the set of strings that have valid parse trees Another definition: the set of strings that can be derived from the start symbol E  E + E | E * E | (E) | – E | id E => -E (read E derives –E ) E => -E => -(E) => -(id)

9 Derivations (Part 2) αAβ => αγβ if A  γ is a production and α and β are arbitrary strings of grammar symbols If a 1 => a 2 => … => a n, we say a 1 derives a n => means derives in one step * => means derives in zero or more steps + => means derives in one or more steps

10 Sentences and Languages Let L(G) be the language generated by the grammar G with start symbol S : –Strings in L(G) may contain only tokens of G –A string w is in L(G) if and only if S + => w –Such a string w is a sentence of G Any language that can be generated by a CFG is said to be a context-free language If two grammars generate the same language, they are said to be equivalent

11 Sentential Forms If S * => α, where α may contain nonterminals, we say that α is a sentential form of G A sentence is a sentential form with no nonterminals

12 Leftmost Derivations Only the leftmost nonterminal in any sentential form is replaced at each step A leftmost step can be written as wAγ lm => wδγ –w consists of only terminals –γ is a string of grammar symbols If α derives β by a leftmost derivation, then we write α lm * => β If S lm * => α then we say that α is a left- sentential form of the grammar Analogous terms exist for rightmost derivations

13 Parse Trees A parse tree can be viewed as a graphical representation of a derivation Every parse tree has a unique leftmost derivation (not true of every sentence) An ambiguous grammars has: –more than one parse tree for at least one sentence –more than one leftmost derivation for at least one sentence

14 Regular Expressions vs. CFGs Every construct that can be described by an RE and also be described by a CFG Why use REs at all? –Lexical rules are simpler to describe this way –REs are often easier to read –More efficient lexical analyzers can be constructed

15 Eliminating Ambiguity (1) stmt  if expr then stmt | if expr then stmt else stmt | other if E 1 then if E 2 then S 1 else S 2

16 Eliminating Ambiguity (2)

17 Eliminating Ambiguity (3) stmt  matched | unmatched matched  if expr then matched else matched | other unmatched  if expr then stmt | if expr then matched else unmatched

18 Left Recursion A grammar is left recursive if for any nonterminal A such that there exists any derivation A + => Aα for any string α Most top-down parsing methods can not handle left-recursive grammars

19 Eliminating Left Recursion (1) A  Aα 1 | Aα 2 | … | Aα m | β 1 | β 2 | … | β n A  β 1 A’ | β 2 A’ | … | β n A’ A’  α 1 A’ | α 2 A’ | … | α m A’ | ε Harder case: S  Aa | b A  Ac | Sd | ε

20 Eliminating Left Recursion (2) First arrange the nonterminals in some order A 1, A 2, … A n Apply the following algorithm: for i = 1 to n { for j = 1 to i-1 { replace each production of the form A i  A j γ by the productions A i  δ 1 γ | δ 2 γ | … | δ k γ, where A j  δ 1 | δ 2 | … | δ k are the A j productions } eliminate the left recursion among A i productions }

21 Left Factoring Rewriting productions to delay decisions Helpful for predictive parsing Not guaranteed to remove ambiguity A  αβ 1 | αβ 2 A  αA’ A’  β 1 | β 2

22 Top Down Parsing Can be viewed two ways: –Attempt to find leftmost derivation for input string –Attempt to create parse tree, starting from at root, creating nodes in preorder General form is recursive descent parsing –May require backtracking –Backtracking parsers not used frequently because not needed

23 Predictive Parsing A special case of recursive-descent parsing that does not require backtracking Must always know which production to use based on current input symbol Can often create appropriate grammar: –removing left-recursion –left factoring the resulting grammar

24 FIRST FIRST(α) is the set of all terminals that begin any string derived from α Computing FIRST : –If X is a terminal, FIRST(X) = {X} –If X  ε is a production, add ε to FIRST(X) –If X is a nonterminal and X  Y 1 Y 2 …Y n is a production: For all terminals a, add a to FIRST(X) if a is a member of any FIRST(Y i ) and ε is a member of FIRST(Y 1 ), FIRST(Y 2 ), … FIRST(Y i-1 ) If ε is a member of FIRST(Y 1 ), FIRST(Y 2 ), … FIRST(Y n ), add ε to FIRST(X)

25 FOLLOW FOLLOW(A), for any nonterminal A, is the set of terminals a that can appear immediately to the right if A in some sentential form More formally, a is in FOLLOW(A) if and only if there exists a derivation of the form S * =>αAaβ $ is in FOLLOW(A) if and only if there exists a derivation of the form S * => αA

26 Computing FOLLOW Place $ in FOLLOW(S) If there is a production A  αBβ, then everything in FIRST(β) (except for ε ) is in FOLLOW(B) If there is a production A  αB, or a production A  αBβ where FIRST(β) contains ε,then everything in FOLLOW(A) is also in FOLLOW(B)

27 E  E + T | T T  T * F | F F  (E) | id We can remove the left recursion: E  T X X  +TX |  T  F Y Y  *FY |  F  (E) | id Left recursion Example

28 E  TX X  +TX |  T  FY Y  *FY |  F  (E) | id First(E) = First(T) = First(F) = { (, id} First(X) = {+,  } First(Y) = {*,  } Follow(E) = Follow(X) = { ), $} Follow(T) = Follow(Y) = {+, ), $} Follow(F) = {+, *, ), $} FIRST and FOLLOW Example

29 Parser After removing the ambiguity and the left recursion, and left factoring. And after getting the first and follow sets we can write the parser as follow: 1-The start symbol is the name of the parser function. 2-For each non terminal there is a function that takes the name of this non terninal. So if we have S   Void S( ) { T(  ); // T is the transformation }

30 T is defined as follow: 1- if “a” is terminal with token a: T(“a”) = match(a) 2- for any non terminal A: T(A) = A( ) 3- T(  1  2 …  n ) = T(  1 ); T(  2 ); …; T(  n ); 3- T(  |  | … |  ) = switch(lookahead) { case First(  ): T(  ); break; case First(  ): T(  ); break; … case First(  ): T(  ); break; default: error(“syntax error”); }

31 5- S   |  | … |  |  switch(lookahead) { case First(  ): T(  ); break; case First(  ): T(  ); break; … case First(  ): T(  ); break; case Follow(S): break; // do nothing default: error(“syntax error”); }

32 Exercise Write the parser of the last grammar with: – plus is the token of ‘+” –mult is the token of ‘*’ –closep is the token of ‘)’ –openp is the token of ‘(‘ –id is the token of identifiers

33 E() { T(); X(); } ________________________ X() { lookahead = lexan(); switch(lookahead){ case plus: match(plus); T(); X(); break; case closep: break; default: error(“syntax error”); } T() { F(); Y(); }________________________ Y() { lookahead = lexan(); switch(lookahead){ case mult: match(plus);F(); Y(); break; case closep: case plus: break; default: error(“syntax error”); }

34 F() { lookahead = lexan(); switch(lookahead){ case openp: match(openp); E(); match(closep); break; case id: match(id); break; default: error(“syntax error”); }