Table-driven parsing Parsing performed by a finite state machine.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Compiler Construction
A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Lesson 8 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Compiler Designs and Constructions
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
Compiler Principles Fall Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.
Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
Chapter 3 Syntax Analysis
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
Mooly Sagiv and Roman Manevich School of Computer Science
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Lecture #8, Feb. 7, 2007 Shift-reduce parsing,
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Shift/Reduce and LR(1) Professor Yihjia Tsai Tamkang University.
Bottom-up parsing Goal of parser : build a derivation
Syntax and Semantics Structure of programming languages.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
1 Compiler Construction Syntax Analysis Top-down parsing.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Syntax and Semantics Structure of programming languages.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Lecture 5: LR Parsing CS 540 George Mason University.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax and Semantics Structure of programming languages.
Announcements/Reading
Parsing — Part II (Top-down parsing, left-recursion removal)
Chapter 4 - Parsing CSCE 343.
Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.
Programming Languages Translator
Bottom-up parsing Goal of parser : build a derivation
Compiler design Bottom-up parsing Concepts
Unit-3 Bottom-Up-Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Lecture #12 Parsing Types.
CS 488 Spring 2012 Lecture 4 Bapa Rao Cal State L.A.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 5, 09/25/2003 Prof. Roy Levow.
Parsing IV Bottom-up Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
Parsing.
CS 404 Introduction to Compiler Design
Fall Compiler Principles Lecture 4: Parsing part 3
Bottom-Up Syntax Analysis
4 (c) parsing.
Parsing Techniques.
CS 3304 Comparative Languages
Subject Name:COMPILER DESIGN Subject Code:10CS63
Regular Grammar - Finite Automaton
Top-Down Parsing CS 671 January 29, 2008.
CSC 4181 Compiler Construction Parsing
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Parsing #2 Leonidas Fegaras.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Design 7. Top-Down Table-Driven Parsing
Bottom Up Parsing.
Parsing IV Bottom-up Parsing
LR Parsing. Parser Generators.
Parsing #2 Leonidas Fegaras.
Parsing — Part II (Top-down parsing, left-recursion removal)
5. Bottom-Up Parsing Chih-Hung Wang
Kanat Bolazar February 16, 2010
Chapter Fifteen: Stack Machine Applications
Compiler Construction
Nonrecursive Predictive Parsing
Context Free Grammar – Quick Review
Chap. 3 BOTTOM-UP PARSING
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 7, 10/09/2003 Prof. Roy Levow.
Presentation transcript:

Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically from grammar. Language generator tables Input parser stack tables

Pushdown Automata A context-free grammar can be recognized by a finite state machine with a stack: a PDA. The PDA is defined by set of internal states and a transition table. The PDA can read the input and read/write on the stack. The actions of the PDA are determined by its current state, the current top of the stack, and the current input symbol. There are three distinguished states: start state: nothing seen accept state: sentence complete error state: current symbol doesn’t belong.

Top-down parsing Parse tree is synthesized from the root (sentence symbol). Stack contains symbols of rhs of current production, and pending non-terminals. Automaton is trivial (no need for explicit states) Transition table indexed by grammar symbol G and input symbol a. Entries in table are terminals or productions: P ABC…

Top-down parsing Actions: initially, stack contains sentence symbol At each step, let S be symbol on top of stack, and a be the next token on input. if T (S, a) is terminal a, read token, pop symbol from stack if T (S, a) is production P ABC…., remove S from stack, push the symbols A, B, C on the stack (A on top). If S is the sentence symbol and a is the end of file, accept. If T (S, a) is undefined, signal error. Semantic action: when starting a production, build tree node for non-terminal, attach to parent.

Table-driven parsing and recursive descent parsing Recursive descent: every production is a procedure. Call stack holds active procedures corresponding to pending non-terminals. Table-driven parser: recursion simulated with explicit stack.

Building the parse table Define two functions on the symbols of the grammar: FIRST and FOLLOW. For a non-terminal N, FIRST (N) is the set of terminal symbols that can start any derivation from N. First (If_Statement) = {if} First (Expr) = {id, ( } FOLLOW (N) is the set of non-terminals that can appear after a string derived from N: Follow (Expr) = {+, ), $ }

Computing FIRST (N) If N e First (N) includes e if N aABC First (N) includes a if N X1X2 First (N) includes First (X1) if N X1X2… and X1 e, First (N) includes First (X2) Obvious generalization to First (a) where a is X1X2...

Computing First (N) Grammar for expressions, without left-recursion: E TE’ | T E’ +TE’ | e T FT’ | F T’ *FT’ | e F id | (E) First (F) = { id, ( } First (T’) = { *, e} First (T) = { id, ( } First (E’) = { +, e} First (E) = { id, ( }

Computing Follow (N) Follow (N) is computed from productions in which N appears on the rhs For the sentence symbol S, Follow (S) includes $ if A a N b, Follow (N) includes First (b) because an expansion of N will be followed by an expansion from b if A a N, Follow (N) includes Follow (A) because N will be expanded in the context in which A is expanded if A a N B , B e, Follow (N) includes Follow (A)

Computing Follow (N) Follow (E) = { ), $ } Follow (E’) = { ), $ } E TE’ | T E’ +TE’ | e T FT’ | F T’ *FT’ | e F id | (E) Follow (E) = { ), $ } Follow (E’) = { ), $ } Follow (T) = First (E’ ) + Follow (E’) = { +, ), $ } Follow (T’) = Follow (T) = { +, ), $ } Follow (F) = First (T’) + Follow (T’) = { *, +, ), $ }

Building parse tables for each production P: A a loop for each terminal a in First (a) loop T (A, a) := P; end loop; if e in First (a), then for each terminal b in Follow (a) loop T (A, b) := P; end loop; end if; All other entries are errors. If two assignments conflict, parse table cannot be built.

LL (1) grammars If table construction is successful, grammar is LL (1): left-to right, leftmost derivation with one-token lookahead. If construction fails, can conceive of LL (2), etc. Ambiguous grammars are never LL (k) If a terminal is in First for two different productions of A, the grammar cannot be LL (1). Grammars with left-recursion are never LL (k) Some useful constructs are not LL (k)

Bottom-up parsing Synthesize tree from fragments Automaton performs two actions: shift: push next symbol on stack reduce: replace symbols on stack Automaton synthesizes (reduces) when end of a production is recognized States of automaton encode synthesis so far, and expectation of pending non-terminals Automaton has potentially large set of states Technique more general than LL (k)

LR (k) parsing Left-to-right, rightmost derivation with k-token lookahead. Most general parsing technique for deterministic grammars. In general, not practical: tables too large (10^6 states for C++, Ada). Common subsets: SLR, LALR (1).

The states of the LR(0) automaton An item is a point within a production, indicating that part of the production has been recognized: A a . B b , seen the expansion of a, expect to see expansion of B A state is a set of items Transition within states are determined by terminals and non-terminals Parsing tables are built from automaton: action: shift / reduce depending on next symbol goto: change state depending on synthesized non-terminal

Building LR (0) states If a state includes: A a . B b it also includes every state that is the start of B: B . X Y Z Informally: if I expect to see B next, I expect to start anything that B can start with: X . G H I States are built by closure from individual items.

A grammar of expressions: initial state E’ E E E + T | T; -- left-recursion ok here. T T * F | F; F id | (E) S0 = { E’ .E, E .E + T, E .T, T .T * F, T .F, F .id, F . ( E ) }

Adding states If a state has item A a .a b, and the next symbol in the input is a, we shift a on the stack and enter a state with item A a a.b and everything else brought in by closure if a state has as item A a. , this indicates the end of a production: reduce action. If a state has an item A a .N b, then after a reduction that find an N, go to a state with A a N. b

The LR (0) states S1 = { E’ E., E E. + T } S2 = { E T., T T. * F } S3 = { T F. } S4 = { F (. E), } + S0 (by closure) S5 = { F id. } S6 = { E E +. T, T .T * F, T .F, F .id, F .(E)} S7 = { T T *. F, F .id, F .(E)} S8 = { F (E.), E E.+ T} S9 = { E E + T., T T.* F} S10 = { T T * F.}, S11 = {F (E).}

Building SLR tables An arc between two states labelled with a terminal is a shift action. An arc between two states labelled with a non-terminal is a goto action. if a state contains an item A a. , (a reduce item) the action is to reduce by this production, for all terminals in Follow (A). If there are shift-reduce conflicts or reduce-reduce conflicts, more elaborate techniques are needed.