Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2.5
Efficient Parsers Pushdown automata Deterministic Report an error as soon as the input is not a prefix of a valid program Not usable for all context free grammars bison context free grammar tokens parser “Ambiguity errors” parse tree
Kinds of Parsers Top-Down (Predictive Parsing) LL –Construct parse tree in a top-down matter –Find the leftmost derivation –For every non-terminal and token predict the next production Bottom-Up LR –Construct parse tree in a bottom-up manner –Find the rightmost derivation in a reverse order – For every potential right hand side and token decide when a production is found
Bottom-Up Syntax Analysis Input –A context free grammar –A stream of tokens Output –A syntax tree or error Method –Construct parse tree in a bottom-up manner –Find the rightmost derivation in (reversed order) –For every potential right hand side and token decide when a production is found –Report an error as soon as the input is not a prefix of valid program
Plan Pushdown automata Bottom-up parsing (informal) Bottom-up parsing (given a parser table) Constructing the parser table Interesting non LR grammars
Pushdown Automaton control parser-table input stack $ $utw V
Informal Example S E$ E T | E + T T i | ( E ) i + i $ input $ stack tree input + i $ i$i$ stack tree i shift
Informal Example S E$ E T | E + T T i | ( E ) input + i $ i$i$ stack tree i reduce T i input + i $ T$T$ stack tree i T
Informal Example S E$ E T | E + T T i | ( E ) input + i $ T$T$ stack tree i reduce E T T input + i $ E$E$ stack tree i T E
Informal Example S E$ E T | E + T T i | ( E ) shift input + i $ E$E$ stack tree i T E input i $ +E$+E$ stack tree i T E +
Informal Example S E$ E T | E + T T i | ( E ) shift input i $ +E$+E$ stack tree i T E + input $ i+ E $ stack tree i T E + i
Informal Example S E$ E T | E + T T i | ( E ) reduce T i input $ i+ E $ stack tree i T E + i input $ T+E$T+E$ stack tree i T E + i T
Informal Example S E$ E T | E + T T i | ( E ) reduce E E + T input $ T+E$T+E$ stack tree i T E + i T input $ E$E$ stack tree i T E + T E
Informal Example S E$ E T | E + T T i | ( E ) input $ E$E$ stack tree i T E + T E shift input $E$$E$ stack tree i T E + T E
Informal Example S E$ E T | E + T T i | ( E ) input $ $E$$E$ stack tree i T E + T E reduce Z E $
Informal Example reduce E E + T reduce T i reduce E T reduce T i reduce Z E $
Handles Identify the leftmost node (nonterminal) that has not been constructed but all whose children have been constructed –A
Identifying Handles Create a finite state automaton over grammar symbols –Accepting states identify handles Report if the grammar is inadequate Push automaton states onto parser stack Use the automaton states to decide –Shift/Reduce
Example Finite State Automaton
Example Control Table
i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E) Example Control Table (2)
i + i $ input s0($) stack shift 5 stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
+ i $ input s5 (i) s0 ($) reduce T i stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
+ i $ input s6 (T) s0 ($) reduce E T stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
+ i $ input s1(E) s0 ($) shift 3 stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
i $ input s3 (+) s1(E) s0 ($) shift 5 stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
$ input s5 (i) s3 (+) s1(E) s0($) reduce T i i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
$ input s4 (T) s3 (+) s1(E) s0($) reduce E E + T stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
$ input s1 (E) s0 ($) shift 2 stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
input s2 ($) s1 (E) s0 ($) reduce Z E $ stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
((i) $ input s0($) shift 7 stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
(i) $ input s7(() s0($) shift 7 stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
i) $ input s7 (() s0($) shift 5 stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
) $ input s5 (i) s7 (() s0($) reduce T i stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
) $ input s6 (T) s7 (() s0($) reduce E T stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
) $ input s8 (E) s7 (() s0($) shift 9 stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
$ input s9 ()) s8 (E) s7 (() s0($) reduce T ( E ) stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
$ input s6 (T) s7(() s0($) reduce E T stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
$ input s8 (E) s7(() s0($) err stack i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
Grammar Hierarchy Non-ambiguous CFG CLR(1) LALR(1) SLR(1) LL(1) LR(0)
Constructing LR(0) parsing table Add a production S’ S$ Construct a finite automaton accepting “valid stack symbols” States are set of items A –The states of the automaton becomes the states of parsing-table –Determine shift operations –Determine goto operations –Determine reduce operations
Example Finite State Automaton
Constructing a Finite Automaton NFA –For X X 1 X 2 … X n X X 1 X 2 …X i X i+1 … X n –“prefixes of rhs (handles)” –X 1 X 2 … X i is at the top of the stack and we expect X i+1 … X n The initial state S’ S$ – (X X 1 …X i X i+1 … X n, X i+1 ) = X X 1 …X i X i+1 … X n –For every production X i+1 (X X 1 X 2 …X X i+1 … X n, ) = X i+1 Convert into DFA
S E$ E T | E + T T i | ( E ) S E $ E T E E + T T i T ( E ) E T T i T i T ( E ) ( T (E ) ) E E + T E E E + T + E E + T T S E $ E S E $ $ T (E ) E
Example Finite State Automaton
Filling Parsing Table A state s i reduce A –A s i Shift –A t s i Goto(s i, X) = s j –A X s i – (s i, X) = s j When conflicts occurs the grammar is not LR(0)
Example Finite State Automaton
Example Control Table i+()$ET 0s5errs7err 16 1 s3err s2 2acc 3s5errs7err 4 4 reduce E E+T 5 reduce T i 6 reduce E T 7s5errs7err 86 8 s3errs9err 9 reduce T (E)
Example S E $ E E + E | i S E$ E E+E E i 0 S E $ E E +E 2 E E E+ E E E+E E i 4 + E E+ E E E +E 5 E + E i i 1 S E $ $ 3 i
S E$ E E+E E i 0 S E $ E E +E 2 E E E+ E E E+E E i 4 + E E+ E E E +E 5 E + E i i 1 S E $ $ 3 i i+$E 0s1err 2 1 reduce E i 2errs4s3 3accept 4s15 5reds4/redred
Interesting Non LR(0) Grammar S’ S$ S L = R | R L *R | id R L S’ S$ S L=R S R L *R L id R L S L =R R L L = Partial DFA S L= R R L L *R L id
LR(1) Parser Item A , t – is at the top of the stack and we are expecting t LR(1) State –Sets of items LALR(1) State –Merge items with the same look-ahead
Interesting Non LR(1) Grammars Ambiguous –Arithmetic expressions –Dangling-else Common derived prefix –A B 1 a b | B 2 a c –B 1 –B 2 Optional non-terminals –St OptLab Ass –OptLab id : | –Ass id := Exp
A motivating example Create a desk calculator Challenges –Non trivial syntax –Recursive expressions (semantics) Operator precedence
Solution (lexical analysis) /* desk.l */ % [0-9]+ { yylval = atoi(yytext); return NUM ;} “+” { return PLUS ;} “-” { return MINUS ;} “/” { return DIV ;} “*” { return MUL ;} “(“ { return LPAR ;} “)” { return RPAR ;} “//”.*[\n] ; /* comment */ [\t\n ] ; /* whitespace */. { error (“illegal symbol”, yytext[0]); }
Solution (syntax analysis) /* desk.y */ %token NUM %left PLUS, MINUS %left MUL, DIV %token LPAR, RPAR %start P % P : E {printf(“%d\n”, $1) ;} ; E : NUM { $$ = $1 ;} | LPAR e RPAR { $$ = $2; } | e PLUS e { $$ = $1 + $3 ; } | e MINUS e { $$ = $1 - $3 ; } | e MUL e { $$ = $1 * $3; } | e DIV e {$$ = $1 / $3; } ; % #include “lex.yy.c” flex desk.l bison desk.y cc y.tab.c –ll -ly
Summary LR is a powerful technique Generates efficient parsers Generation tools exit –Bison, yacc, CUP But some grammars need to be tuned –Shift/Reduce conflicts –Reduce/Reduce conflicts –Efficiency of the generated parser