Syntax Analyzer (Parser)

Slides:



Advertisements
Similar presentations
lec02-parserCFG March 27, 2017 Syntax Analyzer
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
LESSON 18.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Context-Free Grammars Lecture 7
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
COP4020 Programming Languages
Parsing II : Top-down Parsing Lecture 7 CS 4318/5331 Apan Qasem Texas State University Spring 2015 *some slides adopted from Cooper and Torczon.
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
CS 321 Programming Languages and Compilers VI. Parsing.
Context-Free Grammars
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Chapter 4. Syntax Analysis (1). 2 Application of a production  A  in a derivation step  i   i+1.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Bernd Fischer RW713: Compiler and Software Language Engineering.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Introduction to Parsing
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Top-Down Parsing.
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman Summer 2004 (1425)
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Syntax analysis Jianguo Lu School of Computer Science University of Windsor.
lec02-parserCFG May 8, 2018 Syntax Analyzer
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Syntax Specification and Analysis
lec02-parserCFG July 30, 2018 Syntax Analyzer
Compiler Construction
CSE 3302 Programming Languages
Context-Free Grammars
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
R.Rajkumar Asst.Professor CSE
CSC 4181 Compiler Construction Context-Free Grammars
BNF 9-Apr-19.
lec02-parserCFG May 27, 2019 Syntax Analyzer
Context-Free Grammars
COMPILER CONSTRUCTION
Presentation transcript:

Syntax Analyzer (Parser)

Lexical Analyzer and Parser

Parser Accepts string of tokens from lexical analyzer (one token at a time) Verifies whether or not string can be generated by grammar Reports syntax errors (recovers if possible)

Errors Lexical errors (e.g. misspelled word) Syntax errors (e.g. unbalanced parentheses, missing semicolon) Semantic errors (e.g. type errors) Logical errors (e.g. infinite loop)

Error Handling Report errors clearly and accurately Recover quickly if possible Poor error recover may lead to bunch of errors Very poor programming style if source code does not handle errors!!

Context Free Grammars CFGs can represent recursive constructs that regular expressions can not A CFG consists of: A finite set of terminals (in our case, this will be the set of tokens) A finite set of non-terminals (syntactic-variables) A finite set of productions rules in the following form: A   where A is a non-terminal and  is a string of terminals and non-terminals (including the empty string) A start symbol (one of the non-terminal symbol)

Derivations (Part 1) One definition of language: the set of strings that have valid parse trees Another definition: the set of strings that can be derived from the start symbol E  E + E | E * E | (E) | – E | id E => -E (read E derives –E) E => -E => -(E) => -(id) (read E derives –(id))

Derivations (Part 2) αAβ => αγβ if A  γ is a production and α and β are arbitrary strings of grammar symbols If a1 => a2 => … => an, we say a1 derives an => means derives in one step *=> means derives in zero or more steps +=> means derives in one or more steps

Sentences and Languages Let L(G) be the language generated by the grammar G with start symbol S: Strings in L(G) may contain only tokens of G A string w is in L(G) if and only if S +=> w Such a string w is a sentence of G Any language that can be generated by a CFG is said to be a context-free language If two grammars generate the same language, they are said to be equivalent

Sentential and Sentences let S *=>  - If  contains non-terminals, it is called as a sentential form of G. - If  does not contain non-terminals, it is called as a sentence of G.

Leftmost and rightmost Derivations Leftmost Derivations -- Only the leftmost nonterminal in any sentential form is replaced at each step If α derives β by a leftmost derivation, then we write α lm*=> β Rightmost Derivations -- Only the rightmost nonterminal in any sentential form is replaced at each step If α derives β by a rightmost derivation, then we write α rm*=> β

Example Left-Most Derivation E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) Right-Most Derivation E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) lm lm lm lm lm rm rm rm rm rm

Parse Tree A parse tree can be seen as a graphical representation of a derivation. -Inner nodes are non-terminal symbols. -The leaves of a parse tree are terminal symbols. E  -E  -(E) E - ( ) E + - ( ) E -  -(E+E) E + - ( ) id E id + - ( )  -(id+E)  -(id+id)

Ambiguity A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar. E  E+E  id+E  id+E*E  id+id*E  id+id*id E + id * E  E*E  E+E*E  id+E*E  id+id*E  id+id*id E id + *

Ambiguity (cont.) For the most parsers, the grammar must be unambiguous. unambiguous grammar  unique selection of the parse tree for a sentence We should eliminate the ambiguity in the grammar during the design phase of the compiler. We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice.

Ambiguity (cont.) 2 1 stmt  if expr then stmt | if expr then stmt else stmt | otherstmts if E1 then if E2 then S1 else S2 stmt if expr then stmt else stmt E1 if expr then stmt S2 E2 S1 stmt if expr then stmt E1 if expr then stmt else stmt E2 S1 S2 1 2

Ambiguity (cont.) We prefer the second parse tree (else matches with closest if) So, we have to disambiguate our grammar to reflect this choice. The unambiguous grammar will be: stmt  matchedstmt | unmatchedstmt matchedstmt  if expr then matchedstmt else matchedstmt | otherstmt unmatched  if expr then stmt | if expr then matchedstm else unmatchedstm

Ambiguity – Operator Precedence Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence rules. E  E+E | E*E | E^E | id | (E) disambiguate the grammar precedence: ^ (right to left) * (left to right) + (left to right) E  E+T | T T  T*F | F F  G^F | G G  id | (E) 

Left Recursion A grammar is left recursion if it has a non-terminal A such that there is a derivation A +=> Aα for some string  Most top-down parsing methods can not handle left-recursive grammars

Eliminate Left-Recursion A  A  |  where  does not start with A  eliminate left recursion A   A’ A’   A’ |  an equivalent grammar In general, A  Aα1 | Aα2 | … | Aαm | β1 | β2 | … | βn A  β1A’ | β2A’ | … | βnA’ A’  α1A’ | α2A’ | … | αmA’ | ε

Eliminate Left-Recursion -- Example E  E+T | T T  T*F | F F  id | (E)  eliminate left recursion E  T E’ E’  +T E’ |  T  F T’ T’  *F T’ |  F  id | (E)

Left-Recursion -- Problem S  Aa | b A  Sc | d This grammar is not immediately left-recursive, but it is still left-recursive. So, we have to eliminate all left-recursions from our grammar by following some kind of algorithm!!

Eliminate Left-Recursion -- Algorithm - Arrange non-terminals in some order: A1 ... An - for i from 1 to n do { - for j from 1 to i-1 do { replace each production Ai  Aj  by Ai  1  | ... | k  where Aj  1 | ... | k } // eliminate immediate left-recursions among Ai productions }

Eliminate Left-Recursion -- Example S  Aa | b A  Ac | Sd | f - Replace A  Sd with A  Aad | bd So, we will have A  Ac | Aad | bd | f - Eliminate the immediate left-recursion in A A  bdA’ | fA’ A’  cA’ | adA’ |  So, the resulting equivalent grammar which is not left-recursive is:

Eliminate Left-Recursion – Example2 S  Aa | b A  Ac | Sd | f - Eliminate the immediate left-recursion in A A  SdA’ | fA’ A’  cA’ |  - Replace S  Aa with S  SdA’a | fA’a - So, we will have S  SdA’a | fA’a | b - Eliminate the immediate left-recursion in S S  fA’aS’ | bS’ S’  dA’aS’ |  So, the resulting equivalent grammar which is not left-recursive is:

Left Factoring stmt  if expr then stmt else stmt | if expr then stmt when we see if, we cannot know which production rule to choose to re-write stmt in the derivation. In general, A  αβ1 | αβ2 A  αA’ A’  β1 | β2

Limitations of CFGs There are some language constructions in the programming languages which are not context-free. Example: L1 = {wcw | w is in (a|b)*} CFG can not verify repeated strings Example: L2 = {anbmcndm | n≥1 & m≥1} Can not verify repeated counts Therefore, some checks put off until semantic analysis