Syntax Analysis (chapter 4) SLR, LR1, LALR: Improving the parser From the previous examples: => LR0 parsing is rather weak. Cannot handle many languages.

Syntax Analysis (chapter 4) SLR, LR1, LALR: Improving the parser From the previous examples: => LR0 parsing is rather weak. Cannot handle many languages Why? And how can we improve it? S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] T ::= i T ::= i [ E ] Example (recap) has an LR0 shift reduce conflict in state: Q: How could we decide between the two actions?

Syntax Analysis (chapter 4) SLR: an improved LR0 parser LR0 parsing is rather weak. Cannot handle many languages Why? Because it uses a lookahead of 0 tokens to determine the next action! (i.e. parser-action decision only based on the parser state) S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] T ::= i T ::= i [ E ] Example (recap) Example: Which tokens can follow a T ? SLR parsing: use the LR0 GOTO table, but... only reduce to a non-terminal if next input symbol can follow such symbol.

Syntax Analysis (chapter 4) LR1: LR items with lookahead information SLR parsing is better than LR0, but still rather weak. Still cannot handle many languages. Why? Lookahead decision is based on what can follow a particular non- terminal anywhere but does not take context into account. LR 1 parsing: similar to LR0, but based on LR 1 items. An LR 1 item looks like this: N ::=   {lookahead set} In the next slides we’ll construct a small part of the LR1 items of our simple example expression language as we discuss a parsing example.

Syntax Analysis (chapter 4) Notation annoyances LR1 items have three different concrete syntaxes: Why? Historical accident/different views What to do? Live with it! N ::=   { x y z } N ::=  , x y z N ::=  , x N ::=  , y N ::=  , z

Syntax Analysis (chapter 4) and epsilon closure LR1: Example Q: what’s the starting set s0 of the handler matching DFA? S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] Recap: the grammar S ::= E $ {$} E ::= T {$} E ::= E + T {$} T ::= i {$} T ::= ( E ) {$} T ::= i [ E ] {$}

Syntax Analysis (chapter 4) Algorithm: For each item of the form M ::=  N  {  } find each grammar production of the form N ::=  and add a new item N ::=  starters[  {  }]} Repeat this until no more new items can be added. Epsilon Closure for LR 1 Items

Syntax Analysis (chapter 4)  starters[  {  }]}???  starters[  {  }]} is the union over all elements s in  of starters[  s] Example:  starters[  {x y z}]} =  starters[  x]   starters[  y]  starters[  z] Note that these lookahead sets will always h ave at least one terminal in them. Why?

Syntax Analysis (chapter 4) LR1: Parsing Example S ::= E $ {$} E ::= T {$+} E ::= E + T {$+} T ::= i {$+} T ::= ( E ) {$+} T ::= i [ E ] {$+} s0: Q: what state do we arrive in after shifting an i token? s0 i[i]$

Syntax Analysis (chapter 4) LR1: Example T ::= i {$+} T ::= i [ E ] {$+} s0 i[i]$ s1 s1: Q: shift or reduce? (Is there a shift reduce conflict here?)

Syntax Analysis (chapter 4) LR1: Example T ::= i {$+} T ::= i [ E ] {$+} s0 i[i]$ s1 s1: s2 T ::= i [ E ] {$+} s2:

Syntax Analysis (chapter 4) LR1: Example s0 i[i]$ s1 s2: s2 s3: T ::= i [ E ] {$+} E ::= T {]} E ::= E + T {]} T ::= i {]} T ::= ( E ) {]} T ::= i [ E ] {]} s3 T ::= i {]} T ::= i [ E ] {]}

Syntax Analysis (chapter 4) LR1: Example s0 i[ T ]$ s1 s2: s2 s4: T ::= i [ E ] {$+} E ::= T {]} E ::= E + T {]} T ::= i {]} T ::= ( E ) {]} T ::= i [ E ] {]} s4 E ::= T {]} i

Syntax Analysis (chapter 4) LR1: Example s0 i[ E ]$ s1 s2: s2 s5: T ::= i [ E ] {$+} E ::= T {]} E ::= E + T {]} T ::= i {]} T ::= ( E ) {]} T ::= i [ E ] {]} s5 T ::= i [ E ] {$+} E ::= E + T {]} i T

Syntax Analysis (chapter 4) LR1: Example s0 i[ E ]$ s1 s5: s2s5 T ::= i [ E ] {$+} E ::= E + T {]} i T s6 s6: T ::= i [ E ] {$+}

Syntax Analysis (chapter 4) LR1: Example s0 T $ s7 i T E []i S ::= E {$} E ::= T {$+} E ::= E + T {$+} T ::= i {$+} T ::= ( E ) {$+} T ::= i [ E ] {$+} s0: E ::= T {$+} s7:

Syntax Analysis (chapter 4) LR1: Example s0 E $ s8 i T E []i S ::= E {$} E ::= T {$+} E ::= E + T {$+} T ::= i {$+} T ::= ( E ) {$+} T ::= i [ E ] {$+} s0: S ::= E {$} E ::= E + T {$+} s8: T

Syntax Analysis (chapter 4) LR1: Example s0 S $ E []i T E T i

Syntax Analysis (chapter 4) LR1 and LALR LR1 parsers are very powerful (it can be theoretically proven that they are the most powerful bottom-up parsers possible with one lookahead token!). But… they have very big parsing tables. (For normal programming language, order of magnitude is megabytes!) SLR, LR0 only require order of magnitudes of 100Kb. (but are not very strong). LALR comes to the rescue! The LALR algorithm is based on LR1 but reduces the number of states of the automaton => less memory (the same number as LR0 and SLR)

Syntax Analysis (chapter 4) LR1 and LALR S ::= A | xB A ::= aAb | B B ::= x S ::= A | xB A ::= aAb | B B ::= x Example Gramar: The LR0 automaton: picture handed out in class The LR1 automaton: picture handed out in class The LALR1 automaton: construct this yourself

Syntax Analysis (chapter 4) LR1 and LALR The LR1 automaton: picture handed out in class Note the automaton has several states which look very similar: The states are identical except for the lookahead sets. Definition: The core LR 0 set of an LR 1 item set, is the set of LR 0 items obtained by removing the lookaheads of the LR 1 items. Example: T ::= i [ E ] {$+} E ::= E + T {]} T ::= i [ E ] E ::= E + T LR 1 items: Core LR 0 items:

Syntax Analysis (chapter 4) LR1 and LALR LALR automaton can be obtained from an LR 1 automaton by “merging” all states which have the same core items into a single state. => LALR automaton has precisely the same number of states as an LR0 automaton! It is possible this introduces conflicts but... In practice it almost never does! LALR is now the most widely used algorithm.

Syntax Analysis (chapter 4) Parser Conflict Resolution Most programming language grammars are LR 1. But, in practice, one still encounters grammars which have parsing conflicts. => a common cause is an ambiguous grammar Ambiguous grammars always have parsing conflicts (because they are ambiguous this is just unavoidable). In practice, parser generators still generate a parser for such grammars, using a “resolution rule” to resolve parsing conflicts deterministically. => The resolution rule may or may not do what you want/expect => You will get a warning message. If you know what you are doing this can be ignored. Otherwise => try to solve the conflict by disambiguating the grammar.

Syntax Analysis (chapter 4) Parser Conflict Resolution Example: (from Mini-triangle grammar) single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command if a then if b then c1 else c2 This parse tree?

Syntax Analysis (chapter 4) Parser Conflict Resolution Example: (from Mini-triangle grammar) single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command if a then if b then c1 else c2 or this one ?

Syntax Analysis (chapter 4) Parser Conflict Resolution Example: “dangling-else” problem (from Mini-triangle grammar) single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command sC ::= if E then sC {… else …} sC ::= if E then sC else sC {…} LR1 items (in some state of the parser) Resolution rule: shift has priority over reduce. Q: Does this resolution rule solve the conflict? What is its effect on the parse tree?

Syntax Analysis (chapter 4) Parser Conflict Resolution There is usually also a resolution rule for reduce reduce conflicts, for example the rule which appears first in the grammar description has priority. Reduce-reduce conflicts usually mean there is a real problem with your grammar. => You need to fix it! Don’t rely on the resolution rule!

Syntax Analysis (chapter 4) JavaCUP: A LALR generator for Java Grammar BNF-like Specification JavaCUP Java File: Parser Class Uses Scanner to get Tokens Parses Stream of Tokens Definition of tokens Regular Expressions JFlex Java File: Scanner Class Recognizes Tokens Syntactic Analyzer

Syntax Analysis (chapter 4) Example: Mini Scheme Parser Example: An implementation of a simplistic Scheme parser with CUP and Flex. 1) AST Node representation 2) Flex Scanner 3) Cup Parser

Syntax Analysis (chapter 4) Example: Mini Scheme Parser 1) Mini Scheme AST Node representation public abstract class Sexpr { public static SAtom nil = new SAtom(“()”); public SPair cons(Sexpr cdr) { return new SPair(... } } public class SAtom extends Sexpr { private String lexeme; public SAtom(String s) { lexeme = s; }... } public class SPair extends Sexpr { private Sexpr car,cdr;... } public abstract class Sexpr { public static SAtom nil = new SAtom(“()”); public SPair cons(Sexpr cdr) { return new SPair(... } } public class SAtom extends Sexpr { private String lexeme; public SAtom(String s) { lexeme = s; }... } public class SPair extends Sexpr { private Sexpr car,cdr;... }

Syntax Analysis (chapter 4) Example: Mini Scheme Parser 2) Flex Scanner... blah blah... Ident = {ALPHA}({ALPHA}|{DIGIT}|_)* % "(" { return token(sym.LPAREN); } ")" { return token(sym.RPAREN); } "'" { return token(sym.QUOTE); } "." { return token(sym.PERIOD); } "+" { return token(sym.PLUS); }... blah blah... {DIGIT}+ { return token(sym.NUMBER); } {Ident} { return token(sym.IDENTIFIER); }... blah blah... Ident = {ALPHA}({ALPHA}|{DIGIT}|_)* % "(" { return token(sym.LPAREN); } ")" { return token(sym.RPAREN); } "'" { return token(sym.QUOTE); } "." { return token(sym.PERIOD); } "+" { return token(sym.PLUS); }... blah blah... {DIGIT}+ { return token(sym.NUMBER); } {Ident} { return token(sym.IDENTIFIER); }... blah blah...

Syntax Analysis (chapter 4) Example: Mini Scheme Parser /* Simplified Scheme parser for CUP. * Copyright (C) 2000 * Norman C. Hutchinson * * Modifications 27-01-2002 * Kris De Volder * => somewhat more object oriented representation * for Sexprs */ parser code {:... declarations to be added in generated parser... :};... parser definitions... /* Simplified Scheme parser for CUP. * Copyright (C) 2000 * Norman C. Hutchinson * * Modifications 27-01-2002 * Kris De Volder * => somewhat more object oriented representation * for Sexprs */ parser code {:... declarations to be added in generated parser... :};... parser definitions... 3) Cup Parser

Syntax Analysis (chapter 4) Example: Mini Scheme Parser... parser code {: // This code is inserted in generated parser S canner lexer; public parser(Scanner l) { this(); lexer=l; }... blah blah... :}; scan with {: return lexer.next_token(); :};... parser definitions continued next page...... parser code {: // This code is inserted in generated parser S canner lexer; public parser(Scanner l) { this(); lexer=l; }... blah blah... :}; scan with {: return lexer.next_token(); :};... parser definitions continued next page...

Syntax Analysis (chapter 4) Example: Mini Scheme Parser scan with {: return lexer.next_token(); :}; terminal Token IDENTIFIER; terminal Token MULT, EQ, LPAREN, RPAREN; terminal Token PLUS, MINUS, DIV; terminal Token LT, GT, LTEQ, GTEQ; terminal Token NOTEQ; terminal Token QUOTE; terminal Token NUMBER; terminal Token PERIOD; non terminal goal; non terminal Sexpr sexpr, sexprlist;... parser definitions continued next page... scan with {: return lexer.next_token(); :}; terminal Token IDENTIFIER; terminal Token MULT, EQ, LPAREN, RPAREN; terminal Token PLUS, MINUS, DIV; terminal Token LT, GT, LTEQ, GTEQ; terminal Token NOTEQ; terminal Token QUOTE; terminal Token NUMBER; terminal Token PERIOD; non terminal goal; non terminal Sexpr sexpr, sexprlist;... parser definitions continued next page...

Syntax Analysis (chapter 4) Example: Mini Scheme Parser non terminal Sexpr sexpr, sexprlist; start with goal; goal ::= | goal sexpr:s {: System.out.println(s.toString()); :} ; sexpr ::= NUMBER:i {: RESULT = new SAtom(i.text); :} | IDENTIFIER:i {: RESULT = new SAtom(i.text); :} | MULT:i {: RESULT = new SAtom(i.text); :}...... parser definitions continued next page... non terminal Sexpr sexpr, sexprlist; start with goal; goal ::= | goal sexpr:s {: System.out.println(s.toString()); :} ; sexpr ::= NUMBER:i {: RESULT = new SAtom(i.text); :} | IDENTIFIER:i {: RESULT = new SAtom(i.text); :} | MULT:i {: RESULT = new SAtom(i.text); :}...... parser definitions continued next page...

Syntax Analysis (chapter 4) Example: Mini Scheme Parser sexpr ::= NUMBER:i {: RESULT = new SAtom(i.text); :}... blah blah... | QUOTE sexpr:s {: RESULT = new SAtom("quote").cons(s.cons(Sexpr.nil)); :} | LPAREN sexpr:s sexprlist:sl RPAREN {: RESULT = s.cons(sl); :} | LPAREN RPAREN {: RESULT = Sexpr.nil; :} | LPAREN sexpr:left PERIOD sexpr:right RPAREN {: RESULT = left.cons(right); :} ; sexprlist ::= /*epsilon*/ {: RESULT = Sexpr.nil; :} | sexpr:l sexprlist:r {: RESULT = l.cons(r); :} ; sexpr ::= NUMBER:i {: RESULT = new SAtom(i.text); :}... blah blah... | QUOTE sexpr:s {: RESULT = new SAtom("quote").cons(s.cons(Sexpr.nil)); :} | LPAREN sexpr:s sexprlist:sl RPAREN {: RESULT = s.cons(sl); :} | LPAREN RPAREN {: RESULT = Sexpr.nil; :} | LPAREN sexpr:left PERIOD sexpr:right RPAREN {: RESULT = left.cons(right); :} ; sexprlist ::= /*epsilon*/ {: RESULT = Sexpr.nil; :} | sexpr:l sexprlist:r {: RESULT = l.cons(r); :} ;

Syntax Analysis (chapter 4) Example: Mini Scheme Parser public static void main(String argv[]) { try { //try to scan and parse input files for (int i = 0; i < argv.length; i++) { Scanner s; parser p;... s = new Scanner(...get input stream... ); p = new parser(s);... p.parse(); } catch... all kinds of nasty exceptions... } public static void main(String argv[]) { try { //try to scan and parse input files for (int i = 0; i < argv.length; i++) { Scanner s; parser p;... s = new Scanner(...get input stream... ); p = new parser(s);... p.parse(); } catch... all kinds of nasty exceptions... } Driver class

Syntax Analysis (chapter 4) SLR, LR1, LALR: Improving the parser From the previous examples: => LR0 parsing is rather weak. Cannot handle many languages.

Similar presentations

Presentation on theme: "Syntax Analysis (chapter 4) SLR, LR1, LALR: Improving the parser From the previous examples: => LR0 parsing is rather weak. Cannot handle many languages."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Syntax Analysis (chapter 4) SLR, LR1, LALR: Improving the parser From the previous examples: => LR0 parsing is rather weak. Cannot handle many languages.

Similar presentations

Presentation on theme: "Syntax Analysis (chapter 4) SLR, LR1, LALR: Improving the parser From the previous examples: => LR0 parsing is rather weak. Cannot handle many languages."— Presentation transcript:

Similar presentations

About project

Feedback