Parsing. 2301373Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.

Slides:



Advertisements
Similar presentations
Mooly Sagiv and Roman Manevich School of Computer Science
Advertisements

Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Top-Down Parsing.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
CPSC 388 – Compiler Design and Construction
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
Syntax and Semantics Structure of programming languages.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
Chapter 5 Top-Down Parsing.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Context-Free Grammars
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.
Syntax and Semantics Structure of programming languages.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University.
Compiler Principle and Technology Prof. Dongming LU Mar. 27th, 2015.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Top-Down Parsing.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Bernd Fischer RW713: Compiler and Software Language Engineering.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Syntax and Semantics Structure of programming languages.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Programming Languages Translator
Context free grammars Terminals Nonterminals Start symbol productions
Short introduction to compilers
Table-driven parsing Parsing performed by a finite state machine.
Parsing.
Top-down parsing cannot be performed on left recursive grammars.
Context-Free Grammars
Top-Down Parsing CS 671 January 29, 2008.
Context-Free Grammars
CSC 4181 Compiler Construction Parsing
Context-Free Grammars
Parsers and control structures
CSC 4181Compiler Construction Context-Free Grammars
Syntax Analysis - Parsing
CSC 4181 Compiler Construction Context-Free Grammars
Context-Free Grammars
Context-Free Grammars
Presentation transcript:

Parsing

Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First and follow sets Constructing LL(1) parsing table Error recovery Bottom-up parsing Shift-reduce parsers LR(0) parsing LR(0) items Finite automata of items LR(0) parsing algorithm LR(0) grammar SLR(1) parsing SLR(1) parsing algorithm SLR(1) grammar Parsing conflict

Chapter 4 Parsing3 Introduction Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream of tokens. We already learn how to describe the syntactic structure of a language using (context-free) grammar. So, a parser only need to do this? Stream of tokens Context-free grammar Parser Parse tree

Chapter 4 Parsing4 Top–Down Parsing Bottom–Up Parsing A parse tree is created from root to leaves The traversal of parse trees is a preorder traversal Tracing leftmost derivation Two types: Backtracking parser Predictive parser A parse tree is created from leaves to root The traversal of parse trees is a reversal of postorder traversal Tracing rightmost derivation More powerful than top-down parsing Try different structures and backtrack if it does not matched the input Guess the structure of the parse tree from the next input

Chapter 4 Parsing5 Parse Trees and Derivations E  E + E  id + E  id + E * E  id + id * E  id + id * id E  E + E  E + E * E  E + E * id  E + id * id  id + id * id Top-down parsing Bottom-up parsing id E * E + E E E + * E E E E E

Chapter 4 Parsing6 Top-down Parsing What does a parser need to decide? Which production rule is to be used at each point of time ? How to guess? What is the guess based on? What is the next token? Reserved word if, open parentheses, etc. What is the structure to be built? If statement, expression, etc.

Chapter 4 Parsing7 Top-down Parsing Why is it difficult? Cannot decide until later Next token: ifStructure to be built: St St  MatchedSt | UnmatchedSt UnmatchedSt  if ( E ) St| if ( E ) MatchedSt else UnmatchedSt MatchedSt  if ( E ) MatchedSt else MatchedSt |... Production with empty string Next token: idStructure to be built: par par  parList | parList  exp, parList | exp

Chapter 4 Parsing8 Recursive-Descent Write one procedure for each set of productions with the same nonterminal in the LHS Each procedure recognizes a structure described by a nonterminal. A procedure calls other procedures if it need to recognize other structures. A procedure calls match procedure if it need to recognize a terminal.

Chapter 4 Parsing9 Recursive-Descent: Example E  E O F | F O  + | - F  ( E ) | id procedure F {switch token {case (: match(‘(‘); E; match(‘)’); case id: match(id); default: error; } } For this grammar: We cannot decide which rule to use for E, and If we choose E  E O F, it leads to infinitely recursive loops. Rewrite the grammar into EBNF procedure E {F; while (token=+ or token=-) {O; F;} } procedure E {E; O; F; } E ::= F {O F} O ::= + | - F ::= ( E ) | id

Chapter 4 Parsing10 Match procedure procedure match(expTok) {if (token==expTok) then getToken else error } The token is not consumed until getToken is executed.

Chapter 4 Parsing11 Problems in Recursive-Descent Difficult to convert grammars into EBNF Cannot decide which production to use at each point Cannot decide when to use -production A 

Chapter 4 Parsing12 LL(1) Parsing LL(1) Read input from (L) left to right Simulate (L) leftmost derivation 1 lookahead symbol Use stack to simulate leftmost derivation Part of sentential form produced in the leftmost derivation is stored in the stack. Top of stack is the leftmost nonterminal symbol in the fragment of sentential form.

Chapter 4 Parsing13 Concept of LL(1) Parsing Simulate leftmost derivation of the input. Keep part of sentential form in the stack. If the symbol on the top of stack is a terminal, try to match it with the next input token and pop it out of stack. If the symbol on the top of stack is a nonterminal X, replace it with Y if we have a production rule X  Y. Which production will be chosen, if there are both X  Y and X  Z ?

Chapter 4 Parsing14 Example of LL(1) Parsing (n+(n))*n$ $ E E  T XE  T X X  A T X | A  + | - T  F NT  F N N  M F N | M  * F  ( E ) | n T X F N ) E ( T X F N n A T X + F N ( E ) T X F N n M F N * n Finished E  TX  FNX  (E)NX  (TX)NX  (FNX)NX  (nNX)NX  (nX)NX  (nATX)NX  (n+TX)NX  (n+FNX)NX  (n+(E)NX)NX  (n+(TX)NX)NX  (n+(FNX)NX)NX  (n+(nNX)NX)NX  (n+(nX)NX)NX  (n+(n)NX)NX  (n+(n)X)NX  (n+(n))NX  (n+(n))MFNX  (n+(n))*FNX  (n+(n))*nNX  (n+(n))*nX  (n+(n))*n

Chapter 4 Parsing15 LL(1) Parsing Algorithm Push the start symbol into the stack WHILE stack is not empty ($ is not on top of stack) and the stream of tokens is not empty (the next input token is not $) SWITCH (Top of stack, next token) CASE (terminal a, a): Pop stack;Get next token CASE (nonterminal A, terminal a): IF the parsing table entry M[A, a] is not empty THEN Get A  X 1 X 2... X n from the parsing table entry M[A, a] Pop stack; Push X n... X 2 X 1 into stack in that order ELSEError CASE ($,$):Accept OTHER:Error

Chapter 4 Parsing16 LL(1) Parsing Table If the nonterminal N is on the top of stack and the next token is t, which production rule to use? Choose a rule N  X such that X  * tY or X  * and S  * WNtY N Q t……… X Y t Y t N X

Chapter 4 Parsing17 First Set Let X be or be in V or T. First(X ) is the set of the first terminal in any sentential form derived from X. If X is a terminal or, then First(X ) ={X }. If X is a nonterminal and X  X 1 X 2... X n is a rule, then First(X 1 ) -{ } is a subset of First(X) First(X i )-{ } is a subset of First(X) if for all j<i First(X j ) contains { } is in First(X) if for all j≤n First(X j )contains

Chapter 4 Parsing18 Examples of First Set exp  exp addop term | term addop  + | - term  term mulop factor | factor mulop  * factor  (exp) | num First(addop) = {+, -} First(mulop) = {*} First(factor) = {(, num} First(term) = {(, num} First(exp) = {(, num} st  ifst | other ifst  if ( exp ) st elsepart elsepart  else st | exp  0 | 1 First(exp) = {0,1} First(elsepart) = {else, } First(ifst) = {if} First(st) = {if, other}

Chapter 4 Parsing19 Algorithm for finding First(A) For all terminals a, First(a) = {a} For all nonterminals A, First(A) := {} While there are changes to any First(A) For each rule A  X 1 X 2... X n For each X i in {X 1, X 2, …, X n } If for all j<i First(X j ) contains, Then add First(X i )-{ } to First(A) If is in First(X 1 ), First(X 2 ),..., and First(X n ) Then add to First(A) If A is a terminal or, then First(A) = {A}. If A is a nonterminal, then for each rule A  X 1 X 2... X n, First(A) contains First(X 1 ) - { }. If also for some i<n, First(X 1 ), First(X 2 ),..., and First(X i ) contain, then First(A) contains First(X i+1 )-{ }. If First(X 1 ), First(X 2 ),..., and First(X n ) contain, then First(A) also contains.

Chapter 4 Parsing20 Finding First Set: An Example exp  term exp’ exp’  addop term exp’ | addop  + | - term  factor term’ term’  mulop factor term’ | mulop  * factor  ( exp ) | num First exp exp’ addop term term’ mulop factor + - * ( num + - ( num *

Chapter 4 Parsing21 Follow Set Let $ denote the end of input tokens If A is the start symbol, then $ is in Follow(A). If there is a rule B  X A Y, then First(Y) - { } is in Follow(A). If there is production B  X A Y and is in First(Y), then Follow(A) contains Follow(B).

Chapter 4 Parsing22 Algorithm for Finding Follow(A) Follow(S) = {$} FOR each A in V-{S} Follow(A)={} WHILE change is made to some Follow sets FOR each production A  X 1 X 2... X n, FOR each nonterminal X i Add First(X i+1 X i+2...X n )-{ } into Follow(X i ). (NOTE: If i=n, X i+1 X i+2...X n = ) IF is in First(X i+1 X i+2...X n ) THEN Add Follow(A) to Follow(X i ) If A is the start symbol, then $ is in Follow(A). If there is a rule A  Y X Z, then First(Z) - { } is in Follow(X). If there is production B  X A Y and is in First(Y), then Follow(A) contains Follow(B).

Chapter 4 Parsing23 Finding Follow Set: An Example exp  term exp’ exp’  addop term exp’ | addop  + | - term  factor term’ term’  mulop factor term’ | mulop  * factor  ( exp ) | num First exp exp’ addop term term’ mulop factor + - * ( num + - ( num * Follow ) + - $( num + - * $ ( num $ * + - $ $ $ )) ) )) )

Chapter 4 Parsing24 Constructing LL(1) Parsing Tables FOR each nonterminal A and a production A  X FOR each token a in First(X) A  X is in M(A, a) IF is in First(X)THEN FOR each element a in Follow(A) Add A  X to M(A, a)

Chapter 4 Parsing25 Example: Constructing LL(1) Parsing Table First Follow exp {(, num} {$,)} exp’ {+,-, } {$,)} addop {+,-} {(,num} term {(,num} {+,-,),$} term’ {*, } {+,-,),$} mulop {*}{(,num} factor {(, num} {*,+,-,),$} 1 exp  term exp’ 2 exp’  addop term exp’ 3 exp’  4 addop  + 5 addop  - 6 term  factor term’ 7 term’  mulop factor term’ 8 term’  9 mulop  * 10 factor  ( exp ) 11 factor  num ()+-*n$ exp exp’ addop term term’ mulop factor

Chapter 4 Parsing26 LL(1) Grammar A grammar is an LL(1) grammar if its LL(1) parsing table has at most one production in each table entry.

Chapter 4 Parsing27 LL(1) Parsing Table for non-LL(1) Grammar 1 exp  exp addop term 2 exp  term 3 term  term mulop factor 4 term  factor 5 factor  ( exp ) 6 factor  num 7 addop  + 8 addop  - 9 mulop  * First(exp) = { (, num } First(term) = { (, num } First(factor) = { (, num } First(addop) = { +, - } First(mulop) = { * }

Chapter 4 Parsing28 Causes of Non-LL(1) Grammar What causes grammar being non-LL(1)? Left-recursion Left factor

Chapter 4 Parsing29 Left Recursion Immediate left recursion A  A X | Y A  A X 1 | A X 2 |…| A X n | Y 1 | Y 2 |... | Y m General left recursion A => X =>* A Y Can be removed very easily A  Y A’, A’  X A’| A  Y 1 A’ | Y 2 A’ |...| Y m A’, A’  X 1 A’| X 2 A’|…| X n A’| Can be removed when there is no empty-string production and no cycle in the grammar A=Y X* A={Y 1, Y 2,…, Y m } {X 1, X 2, …, X n }*

Chapter 4 Parsing30 Removal of Immediate Left Recursion exp  exp + term | exp - term | term term  term * factor | factor factor  ( exp ) | num Remove left recursion exp  term exp’ exp’  + term exp’ | - term exp’ | term  factor term’ term’  * factor term’ | factor  ( exp ) | num exp = term (  term)* term = factor (* factor)*

Chapter 4 Parsing31 General Left Recursion Bad News! Can only be removed when there is no empty- string production and no cycle in the grammar. Good News!!!! Never seen in grammars of any programming languages

Chapter 4 Parsing32 Left Factoring Left factor causes non-LL(1) Given A  X Y | X Z. Both A  X Y and A  X Z can be chosen when A is on top of stack and a token in First(X) is the next token. A  X Y | X Z can be left-factored as A  X A’ and A’  Y | Z

Chapter 4 Parsing33 Example of Left Factor ifSt  if ( exp ) st else st | if ( exp ) st can be left-factored as ifSt  if ( exp ) st elsePart elsePart  else st | seq  st ; seq | st can be left-factored as seq  st seq’ seq’  ; seq |

End of Top-Down Parsing Credit : the creator of the slides Jaruloj Chongstitvatana Department of Mathematics and Computer Science Faculty of Science Chulalongkorn University Chapter 4 Parsing34