1 Compiler Construction Syntax Analysis Top-down parsing.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
CH4.1 CSE244 Sections 4.5,4.6 Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-155 Storrs,
Chapter 4-2 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR Other.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
LR(1) Languages An Introduction Professor Yihjia Tsai Tamkang University.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Top-Down Parsing.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
Syntax and Semantics Structure of programming languages.
Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,
410/510 1 of 21 Week 2 – Lecture 1 Bottom Up (Shift reduce, LR parsing) SLR, LR(0) parsing SLR parsing table Compiler Construction.
Chapter 5 Top-Down Parsing.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CH4.1 CSE244 Sections 4.5,4.6 Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-155 Storrs,
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Bottom Up Parsing.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Chapter 3-3 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR 
Syntax and Semantics Structure of programming languages.
1 Bottom-Up Parsing  “Shift-Reduce” Parsing  Reduce a string to the start symbol of the grammar.  At every step a particular substring is matched (in.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
1 Compiler Construction Syntax Analysis Top-down parsing.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Bottom-Up Parsing David Woolbright. The Parsing Problem Produce a parse tree starting at the leaves The order will be that of a rightmost derivation The.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
10/10/2002© 2002 Hal Perkins & UW CSED-1 CSE 582 – Compilers LR Parsing Hal Perkins Autumn 2002.
Top-Down Parsing.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Syntax and Semantics Structure of programming languages.
Compiler Construction
Programming Languages Translator
Unit-3 Bottom-Up-Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Table-driven parsing Parsing performed by a finite state machine.
Compiler Construction
Top-down parsing cannot be performed on left recursive grammars.
CS 404 Introduction to Compiler Design
Subject Name:COMPILER DESIGN Subject Code:10CS63
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7 Predictive Parsing
4d Bottom Up Parsing.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Bottom Up Parsing.
4d Bottom Up Parsing.
Lecture 7 Predictive Parsing
Kanat Bolazar February 16, 2010
4d Bottom Up Parsing.
4d Bottom Up Parsing.
4d Bottom Up Parsing.
4d Bottom Up Parsing.
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

1 Compiler Construction Syntax Analysis Top-down parsing

2 Syntax Analysis, continued

3 Syntax analysis Last week we covered The goal of syntax analysis Context-free grammars Top-down parsing (a simple but weak parsing method) Today, we will Wrap up top-down parsing, including LL(1) Start on bottom-up parsing Shift-reduce parsers LR parsers: SLR(1), LR(1), LALR(1)

4 Top-Down Parsing

5 Recursive descent (Last Week) Recursive descent parsers simply try to build a parse tree, top-down, and BACKTRACK on failure. Recursion and backtracking are inefficient. It would be better if we always knew the correct action to take. It would be better if we could avoid recursive procedure calls during parsing. PREDICTIVE PARSERS can solve both problems.

6 Predictive parsers A predictive parser always knows which production to use, so backtracking is not necessary. Example: for the productions stmt -> if ( expr ) stmt else stmt | while ( expr ) stmt | for ( stmt expr stmt ) stmt a recursive descent parser would always know which production to use, depending on the input token.

7 Transition diagrams Transition diagrams can describe recursive parsers, just like they can describe lexical analyzers, but the diagrams are slightly different. Construction:   Eliminate left recursion from G   Left factor G   For each non-terminal A, do   Create an initial and final (return) state   For each production A -> X1 X2 … Xn, create a path from the initial to the final state with edges X1 X2 … Xn.

8 Using transition diagrams Begin in the start state for the start symbol When we are in state s with edge labeled by terminal a to state t, if the next input symbol is a, move to state t and advance the input pointer. For an edge to state t labeled with non-terminal A, jump to the transition diagram for A, and when finished, return to state t For an edge labeled ε, move immediately to t. Example (4.15 in text): parse the string “id + id * id”

9 Example transition diagrams An expression grammar with left recursion and ambiguity removed: E -> T E’ E’ -> + T E’ | ε T -> F T’ T’ -> * F T’ | ε F -> ( E ) | id Corresponding transition diagrams:

10 Predictive parsing without recursion To get rid of the recursive procedure calls, we maintain our own stack.

11 The parsing table and parsing program The table is a 2D array M[A,a] where A is a nonterminal symbol and a is a terminal or $. At each step, the parser considers the top-of-stack symbol X and input symbol a: If both are $, accept If they are the same (nonterminals), pop X, advance input If X is a nonterminal, consult M[X,a]. If M[X,a] is “ERROR” call an error recovery routine. Otherwise, if M[X,a] is a production of he grammar X -> UVW, replace X on the stack with WVU (U on top)

12 Example Use the table-driven predictive parser to parse id + id * id Assuming parsing table Initial stack is $E Initial input is id + id * id $

13 Building a predictive parse table We still don’t know how to create M, the parse table. The construction requires two functions: FIRST and FOLLOW. For a string of grammar symbols α, FIRST(α) is the set of terminals that begin all possible strings derived from α. If α = * > ε, then ε is also in FIRST(α). FOLLOW(A) for nonterminal A is the set of terminals that can appear immediately to the right of A in some sentential form. If A can be the last symbol in a sentential form, then $ is also in FOLLOW(A).

14 How to compute FIRST(α)   If X is a terminal, FIRST(X) = X.   Otherwise (X is a nonterminal),   1. If X -> ε is a production, add ε to FIRST(X)   2. If X -> Y 1 … Y k is a production, then place a in FIRST(X) if for some i, a is in FIRST(Y i ) and Y 1 …Y i-1 = * > ε. Given FIRST(X) for all single symbols X, Let FIRST(X 1 …X n ) = FIRST(X 1 ) If ε ∈ FIRST(X 1 ), then add FIRST(X 2 ), and so on…

15 How to compute FOLLOW(A) Place $ in FOLLOW(S) (for S the start symbol) If A -> α B β, then FIRST(β)-ε is placed in FOLLOW(B) If there is a production A -> α B or a production A -> α B β where β = * > ε, then everything in FOLLOW(A) is in FOLLOW(B). Repeatedly apply these rules until no FOLLOW set changes.

16 Example FIRST and FOLLOW For our favorite grammar: E -> TE’ E’ -> +TE | ε T -> FT’ T’ -> *FT’ | ε F -> (E) | id What is FIRST() and FOLLOW() for all nonterminals?

17 Parse table construction with FIRST/FOLLOW Basic idea: if A -> α and a is in FIRST(α), then we expand A to α any time the current input is a and the top of stack is A. Algorithm: For each production A -> α in G, do: For each terminal a in FIRST(α) add A -> α to M[A,a] If ε ∈ FIRST(α), for each terminal b in FOLLOW(A), do: add A -> α to M[A,b] If ε ∈ FIRST(α) and $ is in FOLLOW(A), add A -> α to M[A,$] Make each undefined entry in M[ ] an ERROR

18 Example predictive parse table construction For our favorite grammar: E -> TE’ E’ -> +TE | ε T -> FT’ T’ -> *FT’ | ε F -> (E) | id What the predictive parsing table?

19 LL(1) grammars The predictive parser algorithm can be applied to ANY grammar. But sometimes, M[ ] might have multiply defined entries. Example: for if-else statements and left factoring: stmt -> if ( expr ) stmt optelse optelse -> else stmt | ε When we have “optelse” on the stack and “else” in the input, we have a choice of how to expand optelse (“else” is in FOLLOW(optelse) so either rule is possible)

20 LL(1) grammars If the predictive parsing construction for G leads to a parse table M[ ] WITHOUT multiply defined entries, we say “G is LL(1)” 1 symbol of lookahead Leftmost derivation Left-to-right scan of the input

21 LL(1) grammars Necessary and sufficient conditions for G to be LL(1): If A -> α | β   There does not exist a terminal a such that a ∈ FIRST(α) and a ∈ FIRST(β)   At most one of α and β derive ε   If β = * > ε, then FIRST(α) does not intersect with FOLLOW(β). This is the same as saying the predictive parser always knows what to do!

22 Top-down parsing summary RECURSIVE DESCENT parsers are easy to build, but inefficient, and might require backtracking. TRANSITION DIAGRAMS help us build recursive descent parsers. For LL(1) grammars, it is possible to build PREDICTIVE PARSERS with no recursion automatically. Compute FIRST() and FOLLOW() for all nonterminals Fill in the predictive parsing table Use the table-driven predictive parsing algorithm

23 Bottom-Up Parsing

24 Bottom-up parsing Now, instead of starting with the start symbol and working our way down, we will start at the bottom of the parse tree and work our way up. The style of parsing is called SHIFT-REDUCE SHIFT refers to pushing input symbols onto a stack. REDUCE refers to “reduction steps” during a parse: We take a substring matching the RHS of a rule Then replace it with the symbol on the LHS of the rule If you can reduce until you have just the start symbol, you have succeeded in parsing the input string.

25 Reduction example S -> aABe Grammar: A -> Abc | b Input: abbcbcde B -> d Reduction steps: abbcbcde aAbcbcde aAbcde aAde aABe S <-- SUCCESS! In reverse, the reduction traces out a rightmost derivation.

26 Handles The HANDLE is the part of a sentential form that gets reduced in a backwards rightmost derivation. Sometimes part of a sentential form will match a RHS in G, but if that string is NOT reduced in the backwards rightmost derivation, it is NOT a handle. Shift-reduce parsing, then, is really all about finding the handle at each step then reducing the handle. If we can always find the handle, we never have to backtrack. Finding the handle is called HANDLE PRUNING.

27 Shift-reduce parsing with a stack A stack helps us find the handle for each reduction step. The stack holds grammar symbols. An input buffer holds the input string. $ marks the bottom of the stack and the end of input. Algorithm:   Shift 0 or more input symbols onto the stack, until a handle β is on top of the stack.   Reduce β to the LHS of the appropriate production.   Repeat until we see $S on stack and $ in input.

28 Shift-reduce example E -> E + E Grammar: E -> E * E w = id + id * id E -> ( E ) E -> id STACK INPUT ACTION 1. $ id+id*id$ shift

29 Shift-reduce parsing actions SHIFT: The next input symbol is pushed onto the stack. REDUCE: When the parser knows the right end of a handle is on the stack, the handle is replaced with the corresponding LHS. ACCEPT: Announce success (input is $, stack is $S) ERROR: The input contained a syntax error; call an error recovery routine.

30 Conflicts during shift/reduce parsing Like predictive parsers, sometimes a shift-reduce parser won’t know what to do. A SHIFT/REDUCE conflict occurs when the parser can’t decide whether to shift the input symbol or reduce the current top of stack. A REDUCE/REDUCE conflict occurs when the parser doesn’t know which of two or more rules to use for reduction. A grammar whose shift-reduce parser contains errors is said to be “Not LR”

31 Example shift/reduce conflict Ambiguous grammars are NEVER LR. stmt -> if ( expr ) stmt | if ( expr ) stmt else stmt | other If we have a shift-reduce parser in configuration STACK INPUT … if ( expr ) stmt else … $ what to do? We could reduce “if ( expr ) stmt” to “stmt” (assuming the else is part of a different surrounding if-else statement) We could also shift the “else” (assuming this else goes with the current if)

32 Example reduce/reduce conflict Some languages use () for function calls AND array refs. stmt -> id ( parameter_list ) stmt -> expr := expr parameter_list -> parameter_list, parameter parameter_list -> parameter parameter -> id expr -> id ( expr_list ) expr -> id expr_list -> expr_list, expr expr_list -> expr

33 Example reduce/reduce conflict For input A(I,J) we would get token stream id(id,id) The first three tokens would certainly be shifted: STACK INPUT … id ( id, id ) … The id on top of the stack needs to be reduced, but we have two choices: parameter -> id OR expr -> id The stack gives no clues. To know which rule to use, we need to look up the first ID in the symbol table to see if it is a procedure name or an array name. One solution is to have the lexer return “procid” for procedure names. Then the shift-reduce parser can look into the stack to decide which reduction to use.

34 LR (Bottom-Up) Parsers

35 Relationship between parser types

36 LR parsing A major type of shift-reduce parsing is called LR(k). “L” means left-to-right scanning of the input “R” means rightmost derivation “k” means lookahead of k characters (if omitted, assume k=1) LR parsers have very nice properties: They can recognize almost all programming language constructs for which we can write a CFG They are the most powerful type of shift-reduce parser, but they never backtrack, and are very efficient They can parse a proper superset of the languages parsable by predictive parsers They tell you as soon as possible when there’s a syntax error. DISADVANTAGE: hard to build by hand (we need something like yacc)

37 LR parsing

38 LR parsing The parser’s structure is similar to predictive parsing. The STACK now stores pairs (X i, s i ). X i is a grammar symbol. s i is a STATE. The parse table now has two parts: ACTION and GOTO. The action table specifies whether to SHIFT, REDUCE, ACCEPT, or flag an ERROR given the state on the stack and the current input. The goto table specifies what state to go to after a reduction is performed.

39 Parser configurations A CONFIGURATION of the LR parser is a pair (STACK, INPUT): ( s 0 X 1 s 1 … X m s m, a i a i+1 … a n $ ) The stack configuration is just a list of the states and grammar symbols currently on the stack. The input configuration is the list of unprocessed input symbols. Together, the configuration represents a right-sentential form X 1 … X m a i a i+1 … a n (some intermediate step in a right derivation of the input from the start symbol)

40 The LR parsing algorithm At each step, the parser is in some configuration. The next move depends on reading a i from the input and s m from the top of the stack. If action[s m,a i ] = shift s, we execute a SHIFT move, entering the configuration ( s 0 X 1 s 1 … X m s m a i s, a i+1 … a n $ ). If action[s m,a i ] = reduce A -> β, then we enter the configuration ( s 0 X 1 s 1 … X m-r s m-r A s, a i+1 … a n $ ), where r = | β | and s = goto[s m-r,A]. If action[s m,a i ] = accept, we’re done. If action[s m,a i ] = error, we call an error recovery routine.

41 LR parsing example Grammar: 1. E -> E + T 2. E -> T 3. T -> T * F 4. T -> F 5. F -> ( E ) 6. F -> id

42 LR parsing example CONFIGURATIONS STACK INPUT ACTION 0 id * id + id $ shift 5

43 LR grammars If it is possible to construct an LR parse table for G, we say “G is an LR grammar”. LR parsers DO NOT need to parse the entire stack to decide what to do (other shift-reduce parsers might). Instead, the STATE symbol summarizes all the information needed to make the decision of what to do next. The GOTO function corresponds to a DFA that knows how to find the HANDLE by reading the top of the stack downwards. In the example, we only looked at 1 input symbol at a time. This means the grammar is LR(1).

44 How to construct an LR parse table? We will look at 3 methods: Simple LR (SLR): simple but not very powerful Canonical LR: very powerful but too many states LALR: almost as powerful with many fewer states yacc uses the LALR algorithm.