Top-Down Parsing.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Mooly Sagiv and Roman Manevich School of Computer Science
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
Parsing III (Eliminating left recursion, recursive descent parsing)
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
Prof. Fateman CS 164 Lecture 8 1 Top-Down Parsing CS164 Lecture 8.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
CS 536 Spring Ambiguity Lecture 8. CS 536 Spring Announcement Reading Assignment –“Context-Free Grammars” (Sections 4.1, 4.2) Programming.
Professor Yihjia Tsai Tamkang University
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Top-Down Parsing.
CPSC 388 – Compiler Design and Construction
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
# 1 CMPS 450 Parsing CMPS 450 J. Moloney. # 2 CMPS 450 Check that input is well-formed Build a parse tree or similar representation of input Recursive.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
-Mandakinee Singh (11CS10026).  What is parsing? ◦ Discovering the derivation of a string: If one exists. ◦ Harder than generating strings.  Two major.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Compiler Parsing 邱锡鹏 复旦大学计算机科学技术学院
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Parsing Top-Down.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Parsing #1 Leonidas Fegaras.
Programming Languages Translator
CS510 Compiler Lecture 4.
Context free grammars Terminals Nonterminals Start symbol productions
Introduction to Parsing (adapted from CS 164 at Berkeley)
4 (c) parsing.
Parsing Techniques.
Compiler Design 7. Top-Down Table-Driven Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
Ambiguity, Precedence, Associativity & Top-Down Parsing
Syntax Analysis - Parsing
Nonrecursive Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

Top-Down Parsing

A parser consumes a sequence of tokens s and produces a parse tree Review A parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s  L(G) ? A parse tree of s describes how s  L(G) Ambiguity: more than one parse tree (interpretation) for some string s Error: no parse tree for some string s How do we construct the parse tree?

Ambiguity Grammar E  E + E | E * E | ( E ) | int Strings int + int + int int * int + int

The string int + int + int has two parse trees Ambiguity. Example The string int + int + int has two parse trees E E E + E E E + E E int int E + E + int int int int + is left-associative

The string int * int + int has two parse trees Ambiguity. Example The string int * int + int has two parse trees E E E + E E E * E E int int E + E * int int int int * has higher precedence than +

Ambiguity is common in programming languages Ambiguity (Cont.) A grammar is ambiguous if it has more than one parse tree for some string Equivalently, there is more than one right-most or left-most derivation for some string Ambiguity is bad Leaves meaning of some programs ill-defined Ambiguity is common in programming languages Arithmetic expressions IF-THEN-ELSE

Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite the grammar unambiguously E  E + T | T T  T * int | int | ( E ) Enforces precedence of * over + Enforces left-associativity of + and *

Ambiguity. Example The int * int + int has ony one parse tree now E E E + T E E * T int int E + E int int T * int int

Ambiguity: The Dangling Else Consider the grammar E  if E then E | if E then E else E | OTHER This grammar is also ambiguous

The Dangling Else: Example The expression if E1 then if E2 then E3 else E4 has two parse trees if E1 E2 E3 E4 if E1 E2 E3 E4 Typically we want the second form

The Dangling Else: A Fix else matches the closest unmatched then We can describe this in the grammar (distinguish between matched and unmatched “then”) E  MIF /* all then are matched */ | UIF /* some then are unmatched */ MIF  if E then MIF else MIF | OTHER UIF  if E then E | if E then MIF else UIF Describes the same set of strings

The Dangling Else: Example Revisited The expression if E1 then if E2 then E3 else E4 if E1 E2 E3 E4 if E1 E2 E3 E4 A valid parse tree (for a UIF) Not valid because the then expression is not a MIF

No general techniques for handling ambiguity Impossible to convert automatically an ambiguous grammar to an unambiguous one Used with care, ambiguity can simplify the grammar Sometimes allows more natural definitions We need disambiguation mechanisms

Precedence and Associativity Declarations Instead of rewriting the grammar Use the more natural (ambiguous) grammar Along with disambiguating declarations Most tools allow precedence and associativity declarations to disambiguate grammars Examples …

Associativity Declarations Consider the grammar E  E + E | int Ambiguous: two parse trees of int + int + int E + int E + int Left-associativity declaration: %left +

Precedence Declarations Consider the grammar E  E + E | E * E | int And the string int + int * int E * int + E + int * Precedence declarations: %left + %left *

We can specify language syntax using CFG Review We can specify language syntax using CFG A parser will answer whether s  L(G) … and will build a parse tree … and pass on to the rest of the compiler Next: How do we answer s  L(G) and build a parse tree?

Top-Down Parsing

Intro to Top-Down Parsing Terminals are seen in order of appearance in the token stream: t1 t2 t3 t4 t5 The parse tree is constructed From the top From left to right A t1 B C t2 D t3 t4

Recursive Descent Parsing Consider the grammar E  T + E | T T  ( E ) | int | int * T Token stream is: int * int Start with top-level non-terminal E Try the rules for E in order

Recursive Descent Parsing. Example (Cont.) Try E0  T1 + E2 Then try a rule for T1  ( E3 ) But ( does not match input token int Try T1  int . Token matches. But + after T1 does not match input token * Try T1  int * T2 This will match but + after T1 will be unmatched Have exhausted the choices for T1 Backtrack to choice for E0

Recursive Descent Parsing. Example (Cont.) Try E0  T1 Follow same steps as before for T1 And succeed with T1  int * T2 and T2  int With the following parse tree E0 T1 int * T2

Recursive-Descent Parsing Parsing: given a string of tokens t1 t2 ... tn, find its parse tree Recursive-descent parsing: Try all the productions exhaustively At a given moment the fringe of the parse tree is: t1 t2 … tk A … Try all the productions for A: if A  BC is a production, the new fringe is t1 t2 … tk B C … Backtrack when the fringe doesn’t match the string Stop when there are no more non-terminals

When Recursive Descent Does Not Work Consider a production S  S a: In the process of parsing S we try the above rule What goes wrong? A left-recursive grammar has a non-terminal S S + S for some  Recursive descent does not work in such cases It goes into an ∞ loop

Elimination of Left Recursion Consider the left-recursive grammar S  S  |  S generates all strings starting with a  and followed by a number of  Can rewrite using right-recursion S   S’ S’   S’ | 

Elimination of Left-Recursion. Example Consider the grammar S  1 | S 0 (  = 1 and  = 0 ) can be rewritten as S  1 S’ S’  0 S’ | 

More Elimination of Left-Recursion In general S  S 1 | … | S n | 1 | … | m All strings derived from S start with one of 1,…,m and continue with several instances of 1,…,n Rewrite as S  1 S’ | … | m S’ S’  1 S’ | … | n S’ | 

General Left Recursion The grammar S  A  |  A  S  is also left-recursive because S + S   This left-recursion can also be eliminated

Summary of Recursive Descent Simple and general parsing strategy Left-recursion must be eliminated first … but that can be done automatically Unpopular because of backtracking Thought to be too inefficient Often, we can avoid backtracking …

Predictive parsers accept LL(k) grammars Like recursive-descent but parser can “predict” which production to use By looking at the next few tokens No backtracking Predictive parsers accept LL(k) grammars L means “left-to-right” scan of input L means “leftmost derivation” k means “predict based on k tokens of lookahead” In practice, LL(1) is used

Can be specified as a 2D table LL(1) Languages In recursive-descent, for each non-terminal and input token there may be a choice of production LL(1) means that for each non-terminal and token there is only one production that could lead to success Can be specified as a 2D table One dimension for current non-terminal to expand One dimension for next token A table entry contains one production

Predictive Parsing and Left Factoring Recall the grammar E  T + E | T T  int | int * T | ( E ) Impossible to predict because For T two productions start with int For E it is not clear how to predict A grammar must be left-factored before use for predictive parsing

Left-Factoring Example Recall the grammar E  T + E | T T  int | int * T | ( E ) Factor out common prefixes of productions E  T X X  + E |  T  ( E ) | int Y Y  * T | 

LL(1) Parsing Table Example Left-factored grammar E  T X X  + E |  T  ( E ) | int Y Y  * T |  The LL(1) parsing table ($ is a special end marker): int * + ( ) $ T int Y ( E ) E T X X + E  Y * T

LL(1) Parsing Table Example (Cont.) Consider the [E, int] entry “When current non-terminal is E and next input is int, use production E  T X This production can generate an int in the first place Consider the [Y,+] entry “When current non-terminal is Y and current token is +, get rid of Y” We’ll see later why this is so

LL(1) Parsing Tables. Errors Blank entries indicate error situations Consider the [E,*] entry “There is no way to derive a string starting with * from non-terminal E”

Method similar to recursive descent, except Using Parsing Tables Method similar to recursive descent, except For each non-terminal S We look at the next token a And choose the production shown at [S,a] We use a stack to keep track of pending non-terminals We reject when we encounter an error state We accept when we encounter end-of-input

LL(1) Parsing Algorithm initialize stack = <S $> and next (pointer to tokens) repeat case stack of <X, rest> : if T[X,*next] = Y1…Yn then stack  <Y1… Yn rest>; else error (); <t, rest> : if t == *next ++ then stack  <rest>; else error (); until stack == < >

LL(1) Parsing Example Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $  X $ $  $ $ ACCEPT

Constructing Parsing Tables LL(1) languages are those defined by a parsing table for the LL(1) algorithm No table entry can be multiply defined Once we have the table The parsing algorithm is simple and fast No backtracking is necessary We want to generate parsing tables from CFG

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal E T E + int * int + int

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal E The leaves at any point form a string bAg b contains only terminals The input string is bbd The prefix b matches The next token is b T E + int T * int * int + int

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal E The leaves at any point form a string bAg b contains only terminals The input string is bbd The prefix b matches The next token is b T E + T int T * int int * int + int

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal E The leaves at any point form a string bAg b contains only terminals The input string is bbd The prefix b matches The next token is b T E + T int T * int int int * int + int

Constructing Predictive Parsing Tables Consider the state S  * bAg With b the next token Trying to match bbd There are two possibilities: b belongs to an expansion of A Any A  a can be used if b can start a string derived from a In this case we say that b 2 First(a) Or…

Constructing Predictive Parsing Tables (Cont.) b does not belong to an expansion of A The expansion of A is empty and b belongs to an expansion of g (e.g., bw) Means that b can appear after A in a derivation of the form S  * bAbw We say that b 2 Follow(A) in this case What productions can we use in this case? Any A  a can be used if a can expand to e We say that e 2 First(A) in this case

Definition First(X) = { b | X * b}  { | X * } Computing First Sets Definition First(X) = { b | X * b}  { | X * } First(b) = { b } For all productions X  A1 … An Add First(A1) – {} to First(X). Stop if   First(A1) Add First(A2) – {} to First(X). Stop if   First(A2) … Add First(An) – {} to First(X). Stop if   First(An) Add  to First(X) (ignore Ai if it is X)

First( ( ) = { ( } First( T ) = {int, ( } First Sets. Example Recall the grammar E  T X X  + E |  T  ( E ) | int Y Y  * T |  First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First( int) = { int } First( X ) = {+,  } First( + ) = { + } First( Y ) = {*,  } First( * ) = { * }

Definition Follow(X) = { b | S *  X b  } Computing Follow Sets Definition Follow(X) = { b | S *  X b  } Compute the First sets for all non-terminals first Add $ to Follow(S) (if S is the start non-terminal) For all productions Y  … X A1 … An Add First(A1) – {} to Follow(X). Stop if   First(A1) Add First(A2) – {} to Follow(X). Stop if   First(A2) … Add First(An) – {} to Follow(X). Stop if   First(An) Add Follow(Y) to Follow(X)

Follow( + ) = { int, ( } Follow( * ) = { int, ( } Follow Sets. Example Recall the grammar E  T X X  + E |  T  ( E ) | int Y Y  * T |  Follow sets Follow( + ) = { int, ( } Follow( * ) = { int, ( } Follow( ( ) = { int, ( } Follow( E ) = {), $} Follow( X ) = {$, ) } Follow( T ) = {+, ) , $} Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $} Follow( int) = {*, +, ) , $}

Constructing LL(1) Parsing Tables Construct a parsing table T for CFG G For each production A   in G do: For each terminal b  First() do T[A, b] =  If   * , for each b  Follow(A) do

Constructing LL(1) Tables. Example Recall the grammar E  T X X  + E |  T  ( E ) | int Y Y  * T |  Where in the line of Y we put Y  * T ? In the lines of First( *T) = { * } Where in the line of Y we put Y  e ? In the lines of Follow(Y) = { $, +, ) }

Notes on LL(1) Parsing Tables If any entry is multiply defined then G is not LL(1) If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well Most programming language grammars are not LL(1) There are tools that build LL(1) tables

Review For some grammars there is a simple parsing strategy Predictive parsing (LL(1)) Once you build the LL(1) table, you can write the parser by hand Next: a more powerful parsing strategy for grammars that are not LL(1)