Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.

Slides:



Advertisements
Similar presentations
Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Exercise 1: Balanced Parentheses Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence)
Lecture # 7 Chapter 4: Syntax Analysis. What is the job of Syntax Analysis? Syntax Analysis is also called Parsing or Hierarchical Analysis. A Parser.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Context-Free Grammars Lecture 7
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
COP4020 Programming Languages
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
EECS 6083 Intro to Parsing Context Free Grammars
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Context-Free Grammars
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
Chapter 4. Syntax Analysis (1). 2 Application of a production  A  in a derivation step  i   i+1.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Introduction to Parsing
LESSON 04.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Chapter 3 – Describing Syntax
lec02-parserCFG May 8, 2018 Syntax Analyzer
LESSON 16.
CS 404 Introduction to Compiler Design
Programming Languages Translator
CS510 Compiler Lecture 4.
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Context-Free Grammars
Compiler Construction
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
R.Rajkumar Asst.Professor CSE
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSC 4181 Compiler Construction Context-Free Grammars
BNF 9-Apr-19.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
lec02-parserCFG May 27, 2019 Syntax Analyzer
Faculty of Computer Science and Information System
Presentation transcript:

Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE COMPILER DESIGN (170701)

Introduction Syntax analysis is the second phase after lexical analysis in compiler design. It basically checks the syntax of the language. It takes the token from lexical analyzer and groups them in such a way that some programming structure can be recognized. GPERI – CD - UNIT-32

Introduction After grouping the tokens if any syntax cannot be recognized then syntactic error will be generated. It is a major component of the front end of a compiler. For the syntactic specification of a programming language, use a notation called context free grammar. GPERI – CD - UNIT-33

Role of the parser It obtains a string of tokens from the lexical analyzer. Group the tokens to identify large structure in the program. It should be report any syntax error in the program. It should be recover from the error so that it can continue to process the rest of the input. GPERI – CD - UNIT-34

Role of the parser. GPERI – CD - UNIT-35 Lexical analyzer Parser Symbol Table Source Program Token getNextToken Parse Tree Syntax Error

Context-Free Grammar Grammar involves four quantities: Terminals, Non-terminals, A start symbol and Production. One non-terminal is selected as a start symbol. Each production consist of a non-terminal, followed by an arrow (  ) or (:=) followed by a string of non-terminals and terminals. GPERI – CD - UNIT-36

Context-Free Grammar A context free grammar (CFG) is defined: As 4-tuples (V N, ∑, P, S). Where: V N = Set of non-terminals ∑ = Set of terminals. S = A start symbol. P = Set of production rules. One non-terminal  finite string of terminals and/or non- terminals. GPERI – CD - UNIT-37

Context-Free Grammar Example. stmt  if ( expr ) stmt else stmt Where: Non-terminals: stmt, expr Terminals: if, (, ), else Start symbol: stmt GPERI – CD - UNIT-38

Context-Free Grammar Example. expression -> expression + term expression -> expression – term expression -> term term -> term * factor term -> term / factor term -> factor GPERI – CD - UNIT-39

Context-Free Grammar Example: factor -> ( expression ) factor -> id GPERI – CD - UNIT-310

Context-Free Grammar Notational Conventions: Terminal symbols: Lower case letters such as a,b,c. Operator symbols such as +, *, -, / etc. Punctuation symbols such as parentheses, comma and so on. The digits 0,1, ….., 9. Bold face string such as id or if, each of which represents a single terminal symbol. GPERI – CD - UNIT-311

Context-Free Grammar Notational Conventions: Non-terminal symbols: Uppercase letters, such as A, B, C. The letter S, when it appears, it usually the start symbol. Lowercase, italic such as expr or stmt. GPERI – CD - UNIT-312

Derivation The construction of parse tree can be precise by taking a derivational view, In which each productions are treated as rewriting rules. Beginning with start symbol, Each rewriting step replace a non-terminal by the body of one of its production. E  E + E | E * E | - E | ( E ) | id GPERI – CD - UNIT-313

Derivation list  list + digit list  list – digit list  digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 GPERI – CD - UNIT-314

Derivation list => list + digit => list – digit + digit => digit – digit + digit => 9 – digit + digit => 9 – 5 + digit => 9 – GPERI – CD - UNIT-315

Derivation This is an example leftmost derivation, because we replaced the leftmost nonterminal (underlined) in each step. Likewise, a rightmost derivation replaces the rightmost nonterminal in each step. GPERI – CD - UNIT-316

Derivation Construct a CFG, for the language L = {w c w : w ϵ (a,b)*}. Sol, G = (V N,∑,P,S) Here, V N = {S}, ∑ = {a,b,c} Production rule P is defined as GPERI – CD - UNIT-317 S -> a S a S -> b S b S -> c

Parse Tree The string generated by a context free grammar can be represented by a hierarchical structure called tree. Such tree representing derivations are called derivation trees or parse tree or syntax tree. GPERI – CD - UNIT-318

Parse Tree Characteristics of parse tree: The root of the tree is labeled by the start symbol. Each leaf of the tree is labeled by a terminal (token or ϵ). Each interior node is labeled by a nonterminal. If A → X1 X2 … Xn is a production, then node A has immediate children X1, X2, …, Xn where Xi is a (non)terminal or ε (ε denotes the empty string) GPERI – CD - UNIT-319

Parse Tree - Example GPERI – CD - UNIT-320 list digit listdigit list digit

Exercise Write a CGF, which generates strings having equal number of a’s and b’s: Sol: CGF, G = (V N,∑,P,S) where V N = {S}, ∑ = {a,b} P is defined as: S -> aSb S -> bSa S -> ^ GPERI – CD - UNIT-321

Exercise Construct a CGF for the language L = {a n b n : n >= 1} Sol: CGF, G = (V N,∑,P,S) where V N = {S}, ∑ = {a,b} P is defined as: S -> aSb S -> ab GPERI – CD - UNIT-322

Exercise Write a CGF, which generates string of balanced parenthesis. Sol: Grammar will accept the balanced right and left parenthesis. e.g. (), ((( ))), CGF, G = (V N,∑,P,S) where V N = {S}, ∑ = { (, )} P is given by: S -> SS S -> (S) S -> ^ GPERI – CD - UNIT-323

Exercise A CGF given by the productions is: S -> a | a A S A -> bS Obtain the derivation tree of the word : a b a a b a a. GPERI – CD - UNIT-324

Exercise Given the grammar G = (V N,∑,P,S) where V N = {E}, S = E, ∑ = {id,+,*,c} and P consist of E -> E + E | E * E | (E) | id Obtain the derivation tree for id*id + id and (id+id)*id GPERI – CD - UNIT-325

Ambiguity A grammar is said to be ambiguous, If there exist more than one parse tree for the same sentence. Example: S -> aSbS | bSaS | ϵ For the string “abab” have two different parse tree. GPERI – CD - UNIT-326

Ambiguity A classical example of ambiguous grammar is that of: if-then-else construct of many programming language. Most of the language have both if-then and if-then-else versions of the statement. The grammar rules for it as follows: stmt -> if condition then stmt else stmt | if condition then stmt GPERI – CD - UNIT-327

Ambiguity Consider the following code segment: If a>b then if c>d then x=y else x=z GPERI – CD - UNIT-328

Ambiguity Leftmost derivation GPERI – CD - UNIT-329 stmt ifconditionthenstmtelsestmt ifcondition then stmt a>bx=z c>d x=y

Ambiguity Rightmost derivation GPERI – CD - UNIT-330 stmt ifconditionthenstmt ifcondition then stmt a>b x=z c>d x=y else stmt

Eliminating Ambiguity Ambiguities may be eliminated by rewriting the grammar: If-then-else grammar may be rewritten as: stmt -> m_stmt | un_stmt m_stmt -> if condition then m_stmt else m_stmt | other_stmt unm_stmt -> if condition then stmt | if condition then m_stmt else unm_stmt GPERI – CD - UNIT-331

Eliminating Ambiguity Another technique is to modify the language a bit. Many language require that an if should have a matching endif. Thus the grammar is modified as stmt -> if condition then stmt else stmt endif | if condition then stmt endif GPERI – CD - UNIT-332

Eliminating Ambiguity Example: Grammar GPERI – CD - UNIT-333 E -> I E -> E + E E -> E * E E -> (E) I -> a | b | c Ambiguity is due to the precedence of operator, if we correct the precedence then ambiguity may be removed. Here two causes of ambiguity: 1.The precedence of operator is not respected. 2.The sequence of identical operators can group either from left or from right..

Eliminating Ambiguity The unambiguous grammar. GPERI – CD - UNIT-334 E -> T T -> F F -> I E -> E + T T -> T * F F -> (E) I -> a | b | c

Eliminating Ambiguity The solve parse tree for a + b * c GPERI – CD - UNIT-335 E

Eliminating Ambiguity The solve parse tree for a + b * c GPERI – CD - UNIT-336 E +TE

Eliminating Ambiguity The solve parse tree for a + b * c GPERI – CD - UNIT-337 E +TE T

Eliminating Ambiguity The solve parse tree for a + b * c GPERI – CD - UNIT-338 E +TE T F I a

Eliminating Ambiguity The solve parse tree for a + b * c GPERI – CD - UNIT-339 E +TE T F I a T * F

Eliminating Ambiguity The solve parse tree for a + b * c GPERI – CD - UNIT-340 E +TE T F I a T * F F I b

Eliminating Ambiguity The solve parse tree for a + b * c GPERI – CD - UNIT-341 E +TE T F I a T * F F I b I c

Left Recursion A grammar is left recursive if it has a nonterminal, say A, that has a derivation of Aα from it. Presence of left recursion creates difficulties while designing parsers. Types of left recursion: Immediate left recursion General left recursion GPERI – CD - UNIT-342

Left Recursion Immediate left recursion: It happen with a nonterminal A having production rule of the form : A -> Aα OR The production is recursive if the leftmost symbol on right side is the same as non-terminal of the left side, for example: A -> Aα GPERI – CD - UNIT-343

Left Recursion Immediate left recursion: (Continue..) It can be eliminated by introducing a new nonterminal symbol, say A’. Modify the grammar: A -> βA’ A’ -> αA’ | ϵ GPERI – CD - UNIT-344

Left Recursion Immediate left recursion: (Continue..) Thus the rule. A -> Aα 1 | Aα 2 |…….| Aα m |β 1 | β 1 |…..…| β n A -> β 1 A’| β 2 A’|……| β n A’ A’ -> α 1 A’| α 2 A’|……. |α m A’|ϵ GPERI – CD - UNIT-345

Left Recursion Immediate left recursion: (Continue..) Example. E -> E + T | T T -> T * F | F F -> (E) | id GPERI – CD - UNIT-346 E -> TE’ E’ -> +TE’ | ϵ T -> FT’ T’ -> *FT’ | ϵ F -> (E) | id

Left Recursion General left recursion: (Continue..) If there may be no immediate left recursion, a number of production rules may act together to give a general left recursion. For example: S -> Aa A -> Sb | c GPERI – CD - UNIT-347 Here, S is left recursive, because: S -> Aa -> Sba

Left Recursion Algorithm eliminate left recursion: 1. Arrange non-terminals in some order say A 1,A 2,….,A m 2. For i = 1 to m do for j = 1 to i-1 do for each set of production A i -> A j γ and A j -> ᵟ1 | ᵟ2 | …….|ᵟk replace A i -> A j γ by A i -> ᵟ1γ | ᵟ2γ |…..|ᵟkγ 3. Eliminate immediate felt recursion from all production. GPERI – CD - UNIT-348

Left Recursion Example: S -> Aa A -> Sb | c GPERI – CD - UNIT-349 The order of non-terminals S,A. For i = 1, the rule S -> Aa, no immediate left recursion For i = 2, A -> Sb | c is modified as, A -> Aab | c, which has immediate left recursion, eliminated by modifying the rule as: A -> cA’ A’ -> abA’ | ϵ