Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,

Parsing

Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse, a complete parse tree Parse tree (or trace) is basis for translation

Top-down Parsers Parse tree is built from the root down to the leaves Builds parse tree in preorder Corresponds to a leftmost derivation Parsing decision problem: choosing correct rule Two most common algorithms: Recursive Descent – implemented in code Table driven implementation Both are LL algorithms (left-to-right scan, left-most derivation)

Bottom-up Parse tree is built from the leaves up to the root Builds parse in reverse of a rightmost derivation Requires finding a handle, that is, a correct RHS Most common algorithms are LR (left-to-right, rightmost derivation)

Complexity The most general parsing algorithms work for any unambiguous grammar Complicated, inefficient O(n^3) Trade generality for efficiency Commercial compilers have complexity O(n)

Recursive Descent Parser is made up of a collection of subprograms One for each non-terminal Subprogram responsible for generating the parse tree rooted at the given non-terminal Pulls tokens from the tokenizer, and leaves the first token not a part of its rule in nextToken If multiple rules associated with the current non-terminal, first a determination of the correct rule must be made

Function Factor // -> id | ( ) void factor() { if (nextToken == ID_CODE) lex(); else if (nextToken == LEFT_PAREN_CODE) { lex(); expr(); if (nextToken == RIGHT_PAREN_CODE) lex(); else error(); } else error(); /* Neither RHS matches */ }

::= if ( ) [else ] void ifstmt() { if (nextToken != IF_CODE) error(); else { lex(); if (nextToken != LEFT_PAREN) error(); else { lex(); boolexpr(); if (nextToken != RIGHT_PAREN) error(); else { lex(); statement(); if (nextToken == ELSE_CODE) { lex(); statement(); }

Grammar Restrictions Left-recursion is a problem A ::= A + B Parsing would never terminate! In some cases, left-recursion can be eliminated by refactoring the grammar E ::= E + T | T E ::= T E’ E’ ::= + T E’ | ε

Grammar restrictions continued Ability to choose correct production based on a single next token Pairwise disjointedness test indicates whether or not this choice can be accomplished If the first terminal that can be generated from a rule is unique A ::= aB | bAb | Bb B ::= cB | d A ::= aB | Bab B ::= aB | b FIRST Sets {a} {b} {c, d} Disjoint, Recursive descent parsable FIRST Sets {a} {a,b} Not disjoint, not recursive descent parsable

Table driven parsers Encode production choice in a table Rows indicate current top of the stack Columns for each input token Entry in matrix gives production number Preferred for large grammars Algorithm is fixed Only table size grows

Bottom-up Parsing Often called shift-reduce algorithms Integral piece of every bottom-up parser is a stack Shift moves the next input token onto the stack Reduce replaces a RHS on the top of the stack with the corresponding LHS Most bottom-up parsing algorithms are variations of the LR process Originally designed by Donald Knuth Relatively small program and a parsing table

Advantages of LR Parsers Will work for nearly all grammars that describe programming languages. Work on a larger class of grammars than other bottom-up algorithms, but are as efficient as any other bottom-up parser. Can detect syntax errors as soon as it is possible. LR class of grammars is a superset of the class parsable by LL parsers

Disadvantage For anything but very small grammars, it is difficult to produce by hand the parsing table But this is exactly what tools like yacc and bison can do for us automatically! Original version was computationally intensive (both in terms of time and memory) Variations developed: Less computer resources required Not as general

Key Insight A bottom-up parser can use the entire history of the parse, up to the current point, to make parsing decisions There are only a finite and relatively small number of different parse situations that could have occurred, so the history can be stored in a parser state, on the parse stack

Parser Configuration Made up of both the stack, and the input For each state on the stack, there is an associated grammar symbol E.g. (S 0 X 1 S 1 X 2 S 2 …X m S m, a i a i +1…a n $) where S i indicates a state, and X i indicates a grammar symbol Initial configuration: (S 0, a 0 …a n $)

Table driven bottom up parsing Table has two components: ACTION table Specifies the action of the parser, given the parser state and the next token Rows are state names Columns are terminals GOTO table Specifies state to put in the stack after a reduce operation Rows are state names Columns are non-terminals

Structure of an LR parser

Parser actions If ACTION[S m, a i ] = Shift S, the next configuration is: (S 0 X 1 S 1 X 2 S 2 …X m S m a i S, a i+1 …a n $) If ACTION[S m, a i ] = Reduce A   and S = GOTO[S m-r, A], where r = the length of , the next configuration is (S 0 X 1 S 1 X 2 S 2 …X m-r S m-r AS, a i a i+1 …a n $) If ACTION[S m, a i ] = Accept, the parse is complete and no errors were found. If ACTION[S m, a i ] = Error, the parser calls an error-handling routine.

Example Grammar 1. E ::= E + T 2. E ::= T 3. T ::= T * F 4. T ::= F 5. F ::= ( E ) 6. F ::= id

Example LR Parsing Table 1. E ::= E + T 2. E ::= T 3. T ::= T * F 4. T ::= F 5. F ::= ( E ) 6. F ::= id

Trace of parse of id + id * id StackInputAction 0Id + id * id $Shift 5 0id5+ id * id $Reduce 6 (GOTO[0,F] 0F3+ id * id $Reduce 4 (GOTO[0,T] 0T2+ id * id $Reduce 2 (GOTO[0,E] 0E1+ Id * id $Shift 6 0E1+6id * id $Shift 5 0E1+6id5* id $Reduce 6 (GOTO [6,F] 0E1+6F3* id $Reduce 4 (GOTO [6,T] 0E1+6T9* id $Shift 7 0E1+6T9*7Id $Shift 5 0E1+6T9*7id5$Reduce 6 (GOTO [7,F] 0E1+6T9*7F10$Reduce 3 (GOTO [6,T] 0E1+6T9$Reduce 1 (GOTO [0,E] 0E1$accept

Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,

Similar presentations

Presentation on theme: "Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,

Similar presentations

Presentation on theme: "Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,"— Presentation transcript:

Similar presentations

About project

Feedback