Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler Principles Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University.

Similar presentations


Presentation on theme: "Compiler Principles Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University."— Presentation transcript:

1 Compiler Principles Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University

2 Today Shift-reduce parsing – How does a shift-reduce parser work? – How do we construct a shift-reduce parse table? – LR(1), SLR(1), LALR(1) – Meaning of shift-reduce / reduce-reduce conflicts – Behavior on left-recursion/right-recursion Automatic parser generation (CUP) – Handling ambiguity 2

3 Model of an LR parser 3 LR Parsing program 0 T 2 + 7 id 5 Stack $id+ + Output state symbol gotoaction Input

4 LR parser stack Sequence made of state, symbol pairs For instance a possible stack for the grammar S  E $ E  T E  E + T T  id T  ( E ) could be: 0 T 2 + 7 id 5 4

5 Form of LR parsing table 5 state terminals non-terminals shift/reduce actions goto part 0 1...... snsn rkrk shift state nreduce by rule k gmgm goto state m acc accept error

6 LR parser table example 6 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) gotoactionSTATE TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9

7 Shift move 7 LR Parsing program q...... Stack $…a… Output gotoaction Input If action[q, a] = sn

8 Result of shift If action[q, a] = sn 8 LR Parsing program n a q...... Stack $…a… Output gotoaction Input

9 Reduce move If action[qn, a] = rk Production: (k) A  β If β = σ 1 … σ n Top of stack looks like q1 σ 1 … qn σ n goto[q, A] = qm 9 LR Parsing program qn … q … Stack $…a… Output gotoaction Input 2*|β|

10 Result of reduce move If action[qn, a] = rk Production: (k) A  β If β = σ 1 … σ n Top of stack looks like q1 σ 1 … qn σ n goto[q, A] = qm 10 LR Parsing program Stack Output gotoaction 2*|β| qm A q … $…a… Input

11 Accept move 11 LR Parsing program q...... Stack $a… Output gotoaction Input If action[q, a] = accept parsing completed

12 Error move 12 LR Parsing program q...... Stack $…a… Output gotoaction Input If action[q, a] = error parsing discovered a syntactic error

13 Parsing id+id$ 13 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) StackInputAction 0id + id $s5 Initialize with state 0

14 Parsing id+id$ 14 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) StackInputAction 0id + id $s5 Initialize with state 0

15 Parsing id+id$ 15 StackInputAction 0id + id $s5 0 id 5+ id $r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

16 Parsing id+id$ 16 StackInputAction 0id + id $s5 0 id 5+ id $r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) pop id 5

17 Parsing id+id$ 17 StackInputAction 0id + id $s5 0 id 5+ id $r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) push T 6

18 Parsing id+id$ 18 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

19 Parsing id+id$ 19 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

20 Parsing id+id$ 20 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

21 Parsing id+id$ 21 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 0 E 1 + 3 id 5$r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

22 Parsing id+id$ 22 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 0 E 1 + 3 id 5$r4 0 E 1 + 3 T 4$r3 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

23 Parsing id+id$ 23 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 0 E 1 + 3 id 5$r4 0 E 1 + 3 T 4$r3 0 E 1$s2 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

24 Constructing an LR parsing table 1.Construct a (determinized) transition diagram from LR items 2.If there are conflicts – stop 3.Fill table entries from diagram 24

25 Form of LR(0) items N  α  β Already matched To be matched Input Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β 25

26 Types of LR(0) items N  α  β Shift Item N  αβ  Reduce Item 26

27 LR(0) items enumeration example All items can be obtained by placing a dot at every position for every production: 27 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) 1: S   E$ 2: S  E  $ 3: S  E $  4: E   T 5: E  T  6: E   E + T 7: E  E  + T 8: E  E +  T 9: E  E + T  10: T   i 11: T  i  12: T   (E) 13: T  (  E) 14: T  (E  ) 15: T  (E)  Grammar LR(0) items

28 Operations for transition diagram construction Initial = {S’   S$} For an item set I Closure(I) = Closure(I) + {X   µ is in grammar| N  α  Xβ in I} Goto(I, X) = { N  αX  β | N  α  Xβ in I} 28

29 Initial example Initial = { S   E $ } 29 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) Grammar

30 Closure example Initial = { S   E $ } Closure({ S   E $ }) = S   E $ E   T E   E + T T   id T   ( E ) 30 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) Grammar

31 Goto example Initial = { S   E $ } Closure({ S   E $ }) = S   E $ E   T E   E + T T   id T   ( E ) Goto({S   E $, E   E + T, T   id}, E) = {S  E  $, E  E  + T} 31 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) Grammar

32 Constructing the transition diagram 1.Start with state 0 containing item Closure({ S   E $ }) 2.Repeat until no new states are discovered – For every state p containing item set Ip, and symbol N, compute state q containing item set Iq = Closure(goto(Ip, N)) 32

33 Automaton construction example 33 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) S   E$ E   T E   E + T T   i T   (E) T  (  E) E   T E   E + T T   i T   (E) E  E + T  T  (E)  S  E$  S  E  $ E  E  + T E  E+  T T   i T   (E) T  i  T  (E  ) E  E  +T E  T  q0q0 q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 q8q8 q9q9 T ( i E + $ T ) + E i T ( i (

34 Automaton construction example 34 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) S   E$ q0q0 Initialize

35 Automaton construction example 35 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) S   E$ E   T E   E + T T   i T   (E) q0q0 apply Closure

36 Automaton construction example 36 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) S   E$ E   T E   E + T T   i T   (E) q0q0 E  T  q6q6 T T  (  E) E   T E   E + T T   i T   (E) ( T  i  q5q5 i S  E  $ E  E  + T q1q1 E

37 Automaton construction example 37 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) S   E$ E   T E   E + T T   i T   (E) T  (  E) E   T E   E + T T   i T   (E) E  E + T  T  (E)  S  E$  Z  E  $ E  E  + T E  E+  T T   i T   (E) T  i  T  (E  ) E  E  +T E  T  q0q0 q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 q8q8 q9q9 T ( i E + $ T ) + E i T ( i ( terminal transition corresponds to shift action in parse table non-terminal transition corresponds to goto action in parse table a single reduce item corresponds to reduce action

38 Conflicts Can construct a diagram for every grammar but some may introduce conflicts shift-reduce conflict: an item set contains at least one shift item and one reduce item reduce-reduce conflict: an item set contains two reduce items 38

39 LR(0) conflicts 39 S  E $ E  T E  E + T T  i T  ( E ) T  i[E] S   E$ E   T E   E + T T   i T   (E) T   i[E] T  i  T  i  [E] q0q0 q5q5 T ( i E Shift/reduce conflict … … …

40 LR(0) conflicts 40 S  E $ E  T E  E + T T  i V  i T  ( E ) S   E$ E   T E   E + T T   i T   (E) T   i[E] T  i  V  i  q0q0 q5q5 T ( i E reduce/reduce conflict … … …

41 LR(0) conflicts Any grammar with an  -rule cannot be LR(0) Inherent shift/reduce conflict – A   – reduce item – P  α  Aβ – shift item – A   can always be predicted from P  α  Aβ 41

42 LR variants LR(0) – what we’ve seen so far SLR(0) – Removes infeasible reduce actions via FOLLOW set reasoning LR(1) – LR(0) with one lookahead token in items LALR(0) – LR(1) with merging of states with same LR(0) component 42

43 SRL parsing A handle should not be reduced to a non- terminal N if the lookahead is a token that cannot follow N A reduce item N  α  is applicable only when the lookahead is in FOLLOW(N) – If b is not in FOLLOW(N) we just proved there is no derivation S =>* βNαb and thus it is safe to remove the reduce item from the conflicted state Differs from LR(0) only on the ACTION table – Now a row in the parsing table may contain both shift actions and reduce actions and we need to consult the current token to decide which one to take 43

44 SLR action table Statei+()[]$ 0shift 1 accept 2 3shift 4E  E+T 5TiTiTiTishiftTiTi 6ETETETETETET 7 8 9T  (E) vs. stateaction q0shift q1shift q2 q3shift q4E  E+T q5TiTi q6ETET q7shift q8shift q9TETE SLR – use 1 token look-aheadLR(0) – no look-ahead … as before… T  i T  i[E] Lookahead token from the input 44

45 Going beyond SLR(0) Some common language constructs introduce conflicts even for SLR (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L 45

46 S’ →  S S →  L = R S →  R L →  * R L →  id R →  L S’ → S  S → L  = R R → L  S → R  L → *  R R →  L L →  * R L →  id L → id  S → L =  R R →  L L →  * R L →  id L → * R  R → L  S → L = R  S L R id * = R * R L * L q0 q4 q7 q1 q3 q9 q6 q8 q2 q5 46

47 shift/reduce conflict S → L  = R vs. R → L  FOLLOW(R) contains = – S ⇒ L = R ⇒ * R = R SLR cannot resolve conflict 47 S → L  = R R → L  S → L =  R R →  L L →  * R L →  id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

48 Inputs requiring shift/reduce For the input id the rightmost derivation S’ => S => R => L => id requires reducing in q2 For the input id = id S’ => S => L = R => L = L => L = id => id = id requires shifting 48 S → L  = R R → L  S → L =  R R →  L L →  * R L →  id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

49 LR(1) grammars In SLR: a reduce item N  α  is applicable only when the lookahead is in FOLLOW(N) But FOLLOW(N) merges lookahead for all alternatives for N – Insensitive to the context of a given production LR(1) keeps lookahead with each LR item Idea: a more refined notion of follows computed per item 49

50 LR(1) items LR(1) item is a pair – LR(0) item – Lookahead token Meaning – We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token Example – The production L  id yields the following LR(1) items 50 [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L [L → ● id] [L → id ●] LR(0) items LR(1) items

51  -closure for LR(1) For every [A → α ● Bβ, c] in S – for every production B→δ and every token b in the grammar such that b  FIRST(βc) – Add [B → ● δ, b] to S 51

52 (S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id 52

53 Back to the conflict Is there a conflict now? 53 (S → L ∙ = R, $) (R → L ∙, $) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) = q6 q2

54 LALR(1) LR(1) tables have huge number of entries Often don’t need such refined observation (and cost) Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts LALR(1) not as powerful as LR(1) in theory but works quite well in practice – Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice 54

55 (S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id 55

56 (S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, $) q10 R q13 id 56

57 Left/Right- recursion At home: create a simple grammar with left- recursion and one with right-recursion Construct corresponding LR(0) parser – Any conflicts? Run on simple input and observe behavior – Attempt to generalize observation for long inputs 57

58 Automated Parser Generation (via CUP)CUP 58

59 High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser AST TPL.cup TPL.lex sym.java Parser.java Lexer.java (Token.java) 59

60 Expression calculator expr  expr + expr | expr - expr | expr * expr | expr / expr | - expr | ( expr ) | number Goals of expression calculator parser: Is 2+3+4+5 a valid expression? What is the meaning (value) of this expression? 60

61 Syntax analysis with CUP CUPjavac Parser spec.javaParser AST CUP – parser generator Generates an LALR(1) Parser Input: spec file Output: a syntax analyzer tokens 61

62 CUP spec file Package and import specifications User code components Symbol (terminal and non-terminal) lists – Terminals go to sym.java – Types of AST nodes Precedence declarations The grammar – Semantic actions to construct AST 62

63 Expression Calculator – 1 st Attempt terminal Integer NUMBER; terminal PLUS, MINUS, MULT, DIV; terminal LPAREN, RPAREN; non terminal Integer expr; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr | LPAREN expr RPAREN | NUMBER ; Symbol type explained later 63

64 Ambiguities a * b + c a bc + * a bc * + a + b + c a bc + + a bc + + 64

65 terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; non terminal Integer expr; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Expression Calculator – 2 nd Attempt Increasing precedence Contextual precedence 65

66 Parsing ambiguous grammars using precedence declarations Each terminal assigned with precedence – By default all terminals have lowest precedence – User can assign his own precedence – CUP assigns each production a precedence Precedence of last terminal in production or user-specified contextual precedence On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce In case of equal precedences left / right help resolve conflicts – left means reduce – right means shift More information on precedence declarations in CUP’s manualprecedence declarations 66

67 Resolving ambiguity a + b + c a bc + + a bc + + precedence left PLUS 67

68 Resolving ambiguity a * b + c a bc + * a bc * + precedence left PLUS precedence left MULT 68

69 Resolving ambiguity - a - b a b - - MINUS expr %prec UMINUS a - b - 69

70 Resolving ambiguity terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Rule has precedence of UMINUS UMINUS never returned by scanner (used only to define precedence) 70

71 More CUP directives precedence nonassoc NEQ – Non-associative operators: == != etc. – 1<2<3 identified as an error (semantic error?) start non-terminal – Specifies start non-terminal other than first non-terminal – Can change to test parts of grammar Getting internal representation – Command line options: -dump_grammar -dump_states -dump_tables -dump 71

72 import java_cup.runtime.*; % %cup %eofval{ return new Symbol(sym.EOF); %eofval} NUMBER=[0-9]+ % ”+” { return new Symbol(sym.PLUS); } ”-” { return new Symbol(sym.MINUS); } ”*” { return new Symbol(sym.MULT); } ”/” { return new Symbol(sym.DIV); } ”(” { return new Symbol(sym.LPAREN); } ”)” { return new Symbol(sym.RPAREN); } {NUMBER} { return new Symbol(sym.NUMBER, new Integer(yytext())); } \n { }. { } Parser gets terminals from the scanner Scanner integration Generated from token declarations in.cup file 72

73 Recap Package and import specifications and user code components Symbol (terminal and non-terminal) lists – Define building-blocks of the grammar Precedence declarations – May help resolve conflicts The grammar – May introduce conflicts that have to be resolved 73

74 Assigning meaning So far, only validation Add Java code implementing semantic actions expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; 74

75 Symbol labels used to name variables RESULT names the left-hand side symbol expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | MINUS expr:e1 {: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS | LPAREN expr:e1 RPAREN {: RESULT = e1; :} | NUMBER:n {: RESULT = n; :} ; Assigning meaning 75

76 Building an AST More useful representation of syntax tree – Less clutter – Actual level of detail depends on your design Basis for semantic analysis Later annotated with various information – Type information – Computed values 76

77 Parse tree vs. AST + expr 12+3 ()() 12 + 3 + 77

78 AST construction AST Nodes constructed during parsing – Stored in push-down stack Bottom-up parser – Grammar rules annotated with actions for AST construction – When node is constructed all children available (already constructed) – Node (RESULT) pushed on stack Top-down parser – More complicated 78

79 1 + (2) + (3) expr + (expr) + (3) + expr 12+3 expr + (3) expr ()() expr + (expr) expr expr + (2) + (3) int_const val = 1 plus e1e2 int_const val = 2 int_const val = 3 plus e1e2 expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :} AST construction 79

80 terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI; terminal UMINUS; non terminal Integer expr; non terminal expr_list, expr_part; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr_list ::= expr_list expr_part | expr_part ; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI ; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Designing an AST 80

81 See you next time


Download ppt "Compiler Principles Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University."

Similar presentations


Ads by Google