CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.

CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Today’s Topics Questions / comments? Chapter 4 –Parsers Top down, recursive descent Bottom up parsers

Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Goals of the parser, given an input program: –Find all syntax errors; for each, produce an appropriate diagnostic message, and recover quickly - -- that is, be able to find as many syntax errors in the same pass through the code (is this easy?) –Produce the parse tree, or at least a trace of the parse tree, for the program Two categories of parsers –Top down parsers – build the tree from the root down to the leaves –Bottom up parsers – build the tree from the leaves upwards to the root.

Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Top down - produce the parse tree, beginning at the root –Order is that of a leftmost derivation Bottom up - produce the parse tree, beginning at the leaves –Order is that of the reverse of a rightmost derivation The parsers we'll be dealing with look only one token ahead in the input. –what does that mean again?

Top Down Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 OK, as input we have a possible sentence (program) in the language that may or may not be syntactically correct. The lexical analyzer is called by the parser to get the next lexeme and associated token code from the input sentence (program.)

Top Down Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 A step in the life of a top down parser... Think of this now as during some arbitrary step in the top down parser we have a sentential form that is part of a leftmost derivation, the goal is to find the next sentential form in that leftmost derivation. e.g. Given a left sentential form xA –where x is a string of terminals –A is a non-terminal – is a string of terminals and non-terminals Since we want a leftmost derivation, what would we do next?

Top Down Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 We have a left sentential form xA (some string in the derivation) Expand A (the leftmost non-terminal) using a production that has A as its LHS. How might we do this?

Top Down Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 We have a left sentential form xA Expand A (the leftmost non-terminal) using a production that has A as its LHS. How might we do this? –We would call the lexical analyzer to get the next lexeme. –Then determine which production to expand based on the lexeme/token returned. –For grammars that are “one lookahead”, exactly one RHS of A will have as its first symbol the lexeme/token returned. If there is no RHS of A with its first symbol being the lexeme/token returned --- what happened?

Top Down Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Top down parsers can be implemented as recursive descent parsers --- that is, a program based directly on the BNF description of the language. Or they can be represented by a parsing table that contains all the information in the BNF rules in table form. These top down parsers are called LL algorithms --- L for left-to-right scan of the input and L for leftmost derivation.

Top Down Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Left recursion is problematic for top-down parsers. e.g. -> b Each non-terminal is written as a function in the parser. So, a recursive descent parser function for this production would continually call itself first and never get out of the recursion. Similarly, for any top-down parser. Left recursion is problematic for top-down parsers even if it is indirect. e.g. -> b -> a Bottom up parsers do not have this problem with left recursion.

Top Down Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 When a nonterminal has more than one RHS, the first terminal symbol that can be generated in a derivation for each of them must be unique to that RHS. If it is not unique, then top-down parsing with one lookahead is impossible for that grammar. Because there is only one lookahead symbol, which lex would return to the parser, that one symbol has to uniquely determine which RHS to choose.

Top Down Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Example: Here are two productions in a possible grammar: -> identifier | identifier [ ] Assuming we have a left sentential form where the next nonterminal to be expanded is and we call out to lex() to get the next token and it returns identifier. Which production will we pick?

Top Down Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 The pairwise disjointness test is used on grammars to determine if they have this problem. What it does is, it determines the first symbol for every RHS of a production and for those with the same LHS, the first symbols must be different for each. -> a | b b | c | d The 4 RHSs of A here each have as their first symbol {a}, {b}, {c}, {d} respectively. These sets are all disjoint so these productions would pass the test. The ones on the previous slide would not. Note the use of curly braces on this slide is NOT the EBNF usage but the standard set usage of braces.

Top Down Parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Note: if the first symbol on the RHS is a nonterminal, we would need to expand it to determine the first symbol. e.g. -> a | b b | c | d -> f -> s Here, to determine the first symbol of the first RHS of A, we need to follow then to find out that it is s. So, the 4 RHSs of A here each have as their first symbol {s}, {b}, {c}, {d} respectively. These sets are all disjoint so these productions would pass the test.

Recursive descent example Michael Eckmann - Skidmore College - CS 330 - Fall 2007 So called because many of its subroutines/functions (one for each non-terminal) are recursive and descent because it is a top-down parser. EBNF is well suited for these parsers because EBNF reduces the number of non-terminals (as compared to plain BNF.) A grammar for simple expressions:  {(+ | -) }  {(* | /) }  id | ( ) Anyone remember what the curly braces mean in EBNF?

Recursive descent example Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Assume we have a lexical analyzer named lex, which puts the next token code in nextToken The coding process when there is only one RHS: –For each terminal symbol in the RHS, compare it with the next input token; if they match, continue, else there is an error –For each nonterminal symbol in the RHS, call its associated parsing subprogram

Recursive descent example Michael Eckmann - Skidmore College - CS 330 - Fall 2007 /* Function expr (written in the C language) Parses strings in the language generated by the rule: → {(+ | -) } */ void expr() { /* Parse the first term */ term(); /* term() is it's own function representing the non-terminal */ /* As long as the next token is + or -, call lex to get the next token, and parse the next term */ while (nextToken == PLUS_CODE || nextToken == MINUS_CODE) { lex(); term(); }

Recursive descent example Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Note: by convention each function leaves the most recent token in nextToken A nonterminal that has more than one RHS requires an initial process to determine which RHS it is to parse –The correct RHS is chosen on the basis of the next token of input (the lookahead) –The next token is compared with the first token that can be generated by each RHS until a match is found –If no match is found, it is a syntax error Let's now look at an example of how the parser would handle multiple RHS's.

Recursive descent example Michael Eckmann - Skidmore College - CS 330 - Fall 2007 What does each function do by convention?

Recursive descent example Michael Eckmann - Skidmore College - CS 330 - Fall 2007 What does each function do by convention? each function leaves the most recent token in nextToken This is important!

Recursive descent example Michael Eckmann - Skidmore College - CS 330 - Fall 2007 /* Function factor Parses strings in the language generated by the rule: -> id | ( ) */ void factor() { if (nextToken) == ID_CODE) /*Determine which RHS*/ /* For the RHS id, just call lex */ lex(); /* If the RHS is ( ) – call lex to pass over the leftparenthesis, call expr, and check for the right parenthesis */ else if (nextToken == LEFT_PAREN_CODE) { lex(); expr(); if (nextToken == RIGHT_PAREN_CODE) lex(); else error(); } /* End of else if (nextToken ==... */ else error(); /* Neither RHS matches */ }

Recursive descent example Michael Eckmann - Skidmore College - CS 330 - Fall 2007 The term function for the non-terminal will be similar to the expr function for the non-terminal. See the text for this function.

Bottom Up parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Bottom-up parsers –Given a right sentential form, , determine what substring of  is the right-hand side of the rule in the grammar that must be reduced to produce the previous sentential form in the right derivation –This substring is called the handle –This is backwards from the way top down parsing works –The most common bottom-up parsing algorithms are in the LR family (L=left-to-right scan of input, R= rightmost derivations)

Bottom Up parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Bottom-up parsers –The whole process is: Starting with the input sentence (all terminal symbols) (also this is the last sentential form in a derivation), produce the sequence of sentential forms (up) until all that remains is the start symbol. –One step in the process of bottom-up parsing is: given a right sentential form, find the correct RHS of a production to reduce to get the previous right- sentential form in the derivation –What are we reducing a RHS to?

Bottom Up parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Example: E -> E + T | T T -> T * F | F F -> ( E ) | id E => E + T => E + T * F => E + T * id => E + F * id => E + id * id => T + id * id => F + id * id => id + id * id Read from the bottom going up. Remember --- we start with the sentence (in this case id + id + id.) The underlined symbol is the one that gets reduced.

Bottom Up parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Example: E -> E + T | T T -> T * F | F F -> ( E ) | id E => E + T => E + T * F => E + T * id => E + F * id => E + id * id => T + id * id => F + id * id => id + id * id Note: Sometimes it is not obvious which RHS to reduce in each step. For instance, the right-sentential form E + T * id includes more than one RHS. Which RHSs are included in this sentential form? And why can we not choose the others?

Bottom Up parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 The correct choice of the RHS to reduce in any given step is called the handle (of that right-sentential form.) The text contains intuition on how to find the handle --- we won't cover that here. Just know that the handle of any right-sentential form is unique. To decide what part of the rightmost sentential form is the handle, one should understand what a phrase is and what a simple phrase is. Then the handle is defined to be the leftmost simple phrase within a right sentential form.

Bottom Up parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Most bottom-up parsers are LR --- what is LR again? They use small programs and a parsing table.

Bottom Up parsers Michael Eckmann - Skidmore College - CS 330 - Fall 2007 LR parsers consist of –A parse stack –An input string (the sentence to be determined if it is syntactically correct) –A parse table created from the grammar beforehand. Its rows are states, and its columns are terminal and nonterminal symbols. –So, given an input symbol and a state, the program will lookup in the table what to do.

arithmetic expression grammar Michael Eckmann - Skidmore College - CS 330 - Fall 2007 1. E -> E + T 2. E -> T 3. T -> T * F 4. T -> F 5. F -> ( E ) 6. F -> id For next slide, R means reduce, S means shift. e.g. R4 means reduce using production 4 above. S6 means shift the next symbol of input onto the stack and push state 6 onto the stack.

LR parsing table for arithmetic expression grammar Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Error in table Michael Eckmann - Skidmore College - CS 330 - Fall 2007 The S4 in state 0 row should be under the left paren, not the *.

ACTION & GOTO Michael Eckmann - Skidmore College - CS 330 - Fall 2007 The GOTO portion of the table is used to determine which state to push onto the stack after a reduction. The ACTION portion of the table is used to determine whether to shift or reduce based on the current state and next input symbol. Blank cells in the table imply syntax errors. accept means the sentence is syntactically correct. The stack starts with only state 0 on it. The input string is the complete sentence followed by a termination symbol, usually a $.

LR Parser structure Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Top of stack is to the right, the next input symbol for the input is the leftmost symbol. S's are State #'s and X's are grammar symbols.

Let's go through an example parse Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Before we go through an example parse a few things should be stated. Recall that a shift pushes the next input symbol on the stack and then pushes the specified state on the stack as well. That's pretty straightforward.

Let's go through an example parse Michael Eckmann - Skidmore College - CS 330 - Fall 2007 A reduce is more complex. When a reduce occurs, 1. the handle (the whole RHS) is popped off the stack (along with each state per symbol in the handle) 2. then the LHS is pushed onto the stack 3. followed by another state pushed onto the stack. The state to be pushed is determined by the GOTO portion of the parse table. –column is the LHS just pushed –row is the state that was on the top of the stack after the handle and it's associated states were popped.

Let's go through an example parse Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Now we're ready to go through an example. We'll determine if id + id * id is syntactically correct for the example grammar a few slides ago. We'll do this by viewing the table on screen and I'll show the stack and input and action on the board. Also, I'll write on the board what happens during a shift and a reduce.

Let's go through an example parse Michael Eckmann - Skidmore College - CS 330 - Fall 2007 How about we determine if ( id ) id + id is syntactically correct for the example grammar a few slides ago. We'll do this by viewing the table on screen and I'll show the stack and input and action on the board. Also, I'll write on the board what happens during a shift and a reduce.

CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.

Similar presentations

Presentation on theme: "CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.

Similar presentations

Presentation on theme: "CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann."— Presentation transcript:

Similar presentations

About project

Feedback