Presentation is loading. Please wait.

Presentation is loading. Please wait.

Theory of Compilation 236360 Erez Petrank Lecture 3: Syntax Analysis: Bottom-up Parsing 1.

Similar presentations


Presentation on theme: "Theory of Compilation 236360 Erez Petrank Lecture 3: Syntax Analysis: Bottom-up Parsing 1."— Presentation transcript:

1 Theory of Compilation 236360 Erez Petrank Lecture 3: Syntax Analysis: Bottom-up Parsing 1

2 You are here 2 Executable code Executable code exe Source text Source text txt Compiler Lexical Analysis Syntax Analysis Parsing Semantic Analysis Inter. Rep. (IR) Inter. Rep. (IR) Code Gen.

3 Parsers: from tokens to AST 3 Lexical Analysis Syntax Analysis Sem. Analysis Inter. Rep. Code Gen. ‘b’ ‘4’ ‘b’ ‘a’ ‘c’ ID factor term factor MULT term expression factor term factor MULT term expression term MULTfactor MINUS Syntax Tree

4 Parsing A context free language can be recognized by a non- deterministic pushdown automaton Parsing can be seen as a search problem – Can you find a derivation from the start symbol to the input word? – Easy (but very expensive) to solve with backtracking We want efficient parsers – Linear in input size – Deterministic pushdown automata – We will sacrifice generality for efficiency 4

5 Reminder: Efficient Parsers Top-down (predictive parsing) 5  Bottom-up (shift reduce) to be read… already read…

6 LR(k) Grammars A grammar is in the class LR(K) when it can be derived via: – Bottom-up derivation – Scanning the input from left to right (L) – Producing the rightmost derivation (R) – With lookahead of k tokens (k) A language is said to be LR(k) if it has an LR(k) grammar The simplest case is LR(0), which we discuss next. 6

7 LR is More Powerful, But… Any LL(k) language is also in LR(k) (and not vice versa), i.e., LL(k) ⊂ LR(k). But the lookahead is counted differently in the two cases. In an LL(k) derivation the algorithm sees k tokens of the right- hand side of the rule and then must select the derivation rule. With LR(k), the algorithm sees all right-hand side of the derivation rule plus k more tokens. – LR(0) sees the entire right-side. LR is more popular today. We will see Yacc, a parser generator for LR parsing, in the exercises and homework. 7

8 An Example: a Simple Grammar E → E * B | E + B | B B → 0 | 1 Let us number the rules: (1) E → E * B (2) E → E + B (3) E → B (4) B → 0 (5) B → 1

9 Goal: Reduce the Input to the Initial Variable Example: 0 + 0 * 1 B + 0 * 1 E + 0 * 1 E + B * 1 E * 1 E * B E E → E * B | E + B | B B → 0 | 1 E BE * B 1 0 We’d like to go over the input so far, and upon seeing a right- hand side of a rule, “invoke” the rule and replace the right-hand side with the left-hand side (which is a single varuable). B 0 E +

10 Use Shift & Reduce In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules. Example: “0+0*1”. E → E * B | E + B | B B → 0 | 1 E BE * B 1 0 StackInputaction 0+0*1$shift 0 +0*1$reduce B +0*1$reduce E +0*1$shift E+0 *1$reduce E+B *1$reduce E *1$shift E*1 $reduce E*B $reduce E $accept B 0 E +

11 How does the parser know what to do? A state will keep the info gathered so far. A table will tell it “what to do” based on current state and next token. Some info will be kept in a stack. 11

12 LR(0) Parsing Stack Parser Input Output Action Table Goto table

13 States and LR(0) Items The state will “remember” the potential derivation rules given the part that was already identified. For example, if we have already identified E then the state will remember the two alternatives: (1) E → E * B, (2) E → E + B Actually, we will also remember where we are in each of them: (1) E → E ● * B, (2) E → E ● + B A derivation rule with a location marker is called LR(0) item. The state is actually a set of LR(0) items. E.g., q 13 = { E → E ● * B, E → E ● + B} E → E * B | E + B | B B → 0 | 1

14 Intuition We gather the input token by token until we find a right-hand side of a rule and then we replace it with the variable on the left side. Go over a token and remembering it in the stack is a shift. Each shift moves to a state that remembers what we’ve seen so far. A reduce replaces a string in the stack with the variable that derives it.

15 Why do we need a stack? Suppose so far we have discovered E → B → 0 and gather information on “E +”. In the given grammar this can only mean E → E + ● B Suppose state q 6 represents this possibility. Now, the next token is 0, and we need to ignore q 6 for a minute, and work on B → 0 to obtain E+B. Therefore, we push q 6 to the stack, and after identifying B, we pop it to continue. E BE + B 0 0 0 (q4,E) (q6,+) (q1,0) top

16 The Stack The stack contains states. For readability we will also include variables and tokens. (The algorithm does not need it.) Except for the q 0 in the bottom of the stack, the rest of the stack will contains pairs of (state, token) or (state, variable). The initial stack contains the initial variable only. At each stage we need to decide if to shift the next token to the stack (and change to the appropriate state) or apply a “reduce”. The action table tells us what to do based on current state and next token: s n : shift and move to q n. r m: reduce according to production rule m. (also accept and error).

17 The Goto Table The second table serves reduces. After reducing a right-hand side to the deriving variable, we need to decide what the next state is. This is determined by the previous state (which is on the stack) and the variable we got. Suppose we reduce according to M → β. We remove β from the stack, and look at the state q that is currently at the top. GOTO(q,M) specifies the next state.

18 For example… (1) E → E * B (2) E → E + B (3) E → B (4) B → 0 (5) B → 1 טבלת goto טבלת הפעולות BE$10+* 43s2s1q0 r4 q1 r5 q2 accs6s5q3 r3 q4 7s2s1q5 8s2s1q6 r1 q7 r2 q8

19 An Execution Example: 0 + 0 * 1 q0 טבלת goto טבלת הפעולות BE$10+* 43s2s1 q0 r4 q1 r5 q2 accs6s5 q3 r3 q4 7s2s1 q5 8s2s1 q6 r1 q7 r2 q8 q0 q1 0 (1) E → E * B (2) E → E + B (3) E → B (4) B → 0 (5) B → 1

20 An Execution Example: 0 + 0 * 1 q0 q1 0 q0 q4 B q0 q3 E q0 q3 E + q6 ● ● ● (1) E → E * B (2) E → E + B (3) E → B (4) B → 0 (5) B → 1

21 The Algorithm Formally Initiate the stack to q 0. Repeat until halting: – Consider action [t,q] for q at the head of the stack and t the next token. – “Shift n”: remove t from the input and push t and then q n to the stack. – “Reduce m”, where Rule m is: M → β: Remove |β| pairs from the stack, let q be the state at the top of the stack. Push M and the the state Goto[q,M] to the stack. – “accept”: halt successfully. – Empty cell: halt with an error.

22 LR(0) Item 22 E ➞  β Already matched To be matched Input So far we’ve matched α, expecting to see β

23 Using LR Items to Build the Tables Typically a state consists of several LR items. For example, if we identified a string that is reduced to E, then we may be in one of the following LR items: E → E ● + B orE → E ● * B Therefore one of the states will be: q = {E → E ● + B, E → E ● * B}. But if the current state includes E → E + ● B, then we must be able to let B be derived: Closure! 23

24 Adding the Closure If the current state has the item E → E + ● B, then it should be possible to add derivation of B, e.g., to 0 or 1. Generally, if a state includes an item in which the dot stands before a non-terminal, then we add all the rules for this non- terminal with a dot in the beginning. Definition: the closure set for a set of items is the set of items in which for every item in the set of the form A → ● B β, the set contains an item B → ● δ, for each rule of the form B → δ in the grammar. Building the closure set is iterative, as δ may also begin with a variable.

25 Closure: an Example For example, consider the grammar: and the set: C = { E → E + ● B } C’s closure is: clos(C) = { E → E + ● B, B → ● 0, B → ● 1 } And this will be the state that will be used for the parser. E → E * B | E + B | B B → 0 | 1

26 Extended Grammar Goal: easy termination. We will assume that the initial variable only appears in a single rule. This guarantee that the last reduction can be (easily) determined. A grammar can be (easily) extended to have such structure. Example: the grammar (1) E → E * B (2) E → E + B (3) E → B (4) B → 0 (5) B → 1 Can be extended into : (0) S → E (1) E → E * B (2) E → E + B (3) E → B (4) B → 0 (5) B → 1

27 The Initial State To build the action/goto table, we go through all possible states possible during derivation. Each state represents a (closure) set of LR(0) items. The initial state q 0 is the closure of the initial rule. In our example the initial rule is S → ● E, and therefore the initial state is q 0 = clos({S → ● E }) = {S → ● E, E → ● E * B, E → ● E + B, E → ● B, B → ● 0, B → ● 1} We go on and build all possible continuations by seeing a single token (or variable).

28 The Next States For each possible terminal or variable x, and each possible state (closure set) q, 1.Find all items in the set of q in which the dot is before an x. We denote this set by q|x 2.Move the dot ahead of the x in all items in q|x. 3.Find the closure of the obtained set: this is the state into which we move from q upon seeing x. Formally, the next set from a set S and the next symbol X – step(S,X) = { N ➞ αX  β | N ➞ α  Xβ in the item set S} – nextSet(S,X) =  -closure(step(S,X))

29 The Next States Recall that in our example: q 0 = clos({S → ● E }) = {S → ● E, E → ● E * B, E → ● E + B, E → ● B, B → ● 0, B → ● 1} Let us check which states are reachable from it.

30 States reachable from q 0 in the example: State q1: x = 0 q 0 |0= { B → ● 0 } q1 = clos(q 0 |0) = { B → 0 ● } State q2: x = 1 q 0 |1= { B → ● 1 } q 2 = clos({ B → 1 ● }) = { B → 1 ∙ } State q3: x = E q 0 |E= {S → ∙ E, E → ∙ E * B, E → ∙ E + B} q 3 = clos ({S → E ∙, E → E ∙ * B, E → E ∙ + B}) ={S → E∙, E → E∙*B, E → E ∙ + B} State q4: x = B q 0 |B = { E → ∙ B } q 4 = clos ({ E → B ∙ }) = {E →B∙}

31 From these new states there are more reachable states From q1, q2, q4, there are no continuations because the dot is in the end of any item in their sets. From state q3 we can reach the following two states: מצב 5q: x = * q 3 |* = { E → E ∙ * B } q 5 = clos ({ E → E * ∙ B }) = { E → E * ∙ B, B → ∙ 0, B → ∙ 1 } מצב 6q: q 6 = { E → E + ∙ B, B → ∙ 0, B → ∙ 1 }

32 Finally, From q5 we can proceed with x=0, or x=1, or x=B. But for x=0 we reach q1 again and for x=1 we reach q2. For x=B we get q7: Similarly, from q6 with x=B we get q8: These two states have no continuations. (Why?) State q7: clos(q 7 ) = { E → E * B ∙ } State q8: clos(q 8 ) = { E → E + B ∙ }

33 Building the Tables A row for each state. If qj was obtained when x comes after qi, then in row qi and column x, we write j. טבלת goto טבלת הפעולות מצב BE$10+* 4321 q0 q1 q2 65 q3 q4 721 q5 821 q6 q7 q8

34 Building the tables: accept Add “acc” in column $ for each state that has S → E ● as an item. טבלת goto טבלת הפעולות מצב BE$10+* 4321 q0 q1 q2 acc65 q3 q4 721 q5 821 q6 q7 q8

35 Building the Tables: Shift Any number n in the action table becomes s n. טבלת goto טבלת הפעולות מצב BE$10+* 43s2s1 q0 q1 q2 accs6s5 q3 q4 7s2s1 q5 8s2s1 q6 q7 q8

36 Building the Tables: Reduce For any state whose set includes the item A → α ●, such that A → α is production rule m (m>o) : Fill all columns of that state in the action table with “r m”. טבלת goto טבלת הפעולות מצב BE$10+* 43s2s1 q0 r4 q1 r5 q2 accs6s5 q3 r3 q4 7s2s1 q5 8s2s1 q6 r1 q7 r2 q8

37 Note LR(0) When a reduce is possible, we execute it without checking the next token. טבלת goto טבלת הפעולות מצב BE$10+* 43s2s1 q0 r4 q1 r5 q2 accs6s5 q3 r3 q4 7s2s1 q5 8s2s1 q6 r1 q7 r2 q8

38 Another Example 38 Z ➞ expr EOF expr ➞ term | expr + term term ➞ ID | ( expr ) Z ➞ E $ E ➞ T | E + T T ➞ i | ( E ) (just shorthand of the grammar on the top)

39 Example: Parsing with LR Items 39 Z ➞ E $ E ➞ T | E + T T ➞ i | ( E ) E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $ } Closure Input: i + i $

40 40 T ➞ i  Reduce item! i E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $ Z ➞ E $ E ➞ T | E + T T ➞ i | ( E ) Shift

41 Input: i + i $ 41 i E ➞ T  Reduce item! T Z ➞ E $ E ➞ T | E + T T ➞ i | ( E ) E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $

42 Input: i + i $ 42 T E ➞ T  Reduce item! i E E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $ Z ➞ E $ E ➞ T | E + T T ➞ i | ( E )

43 Input: i + i $ 43 T i E+ E ➞ E  + T Z ➞ E  $ E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $ Z ➞ E $ E ➞ T | E + T T ➞ i | ( E )

44 Input: i + i $ 44 E ➞ E  + T Z ➞ E  $ E ➞ E+  T T ➞  i T ➞  ( E ) E+i T i E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $ Z ➞ E $ E ➞ T | E + T T ➞ i | ( E )

45 Input: i + i $ 45 E ➞ E  + T Z ➞ E  $ E ➞ E+  T T ➞  i T ➞  ( E ) E+i$ T i E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $ Z ➞ E $ E ➞ T | E + T T ➞ i | ( E ) T ➞ i  Reduce item!

46 Input: i + i $ E+T T i E ➞ E  + T Z ➞ E  $ E ➞ E+  T T ➞  i T ➞  ( E ) i E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $ Z ➞ E $ E ➞ T | E + T T ➞ i | ( E )

47 Input: i + i $ 47 E+T T i E ➞ E  + T Z ➞ E  $ E ➞ E+  T T ➞  i T ➞  ( E ) i E ➞ E+T  $ Reduce item! E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $ Z ➞ E $ E ➞ T | E + T T ➞ i | ( E )

48 Input: i + i $ 48 E$ E T i +T Z ➞ E  $ E ➞ E  + T i E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $ Z ➞ E $ E ➞ T | E + T T ➞ i | ( E )

49 Input: i + i $ 49 E$ E T i +T Z ➞ E  $ E ➞ E  + T Z ➞ E$  Reduce item! i E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $ Z ➞ E $ E ➞ T | E + T T ➞ i | ( E )

50 Input: i + i $ 50 Z E T i +T Z ➞ E  $ E ➞ E  + T Z ➞ E$  Reducing the initial rule means accept E$ i E ➞  T E ➞  E + T T ➞  i T ➞  ( E ) Z ➞  E $ Z ➞ E $ E ➞ T | E + T T ➞ i | ( E )

51 View as an LR(0) Automaton 51 Z ➞  E$ E ➞  T E ➞  E + T T ➞  i T ➞  (E) T ➞ (  E) E ➞  T E ➞  E + T T ➞  i T ➞  (E) E ➞ E + T  T ➞ (E)  Z ➞ E$  Z ➞ E  $ E ➞ E  + T E ➞ E+  T T ➞  i T ➞  (E) T ➞ i  T ➞ (E  ) E ➞ E  +T E ➞ T  q0q0 q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 q8q8 q9q9 T ( i E + $ T ) + E i T ( i (

52 GOTO/ACTION Table 52 Statei+()$ET q0s5s716 q1s3s2 q2r1 q3s5s74 q4r3 q5r4 q6r2 q7s5s786 q8s3s9 q9r5 (1)Z ➞ E $ (2)E ➞ T (3)E ➞ E + T (4)T ➞ i (5)T ➞ ( E ) rn = reduce using rule number n sm = shift to state m

53 GOTO/ACTION Table 53 sti+()$ET q0s5s7s1s6 q1s3s2 q2r1 q3s5s7s4 q4r3 q5r4 q6r2 q7s5s7s8s6 q8s3s9 q9r5 StackInputAction q0i + i $s5 q0 i q5+ i $r4 q0 T q6+ i $r2 q0 E q1+ i $s3 q0 E q1 + q3i $s5 q0 E q1 + q3 i q5$r4 q0 E q1 + q3 T q4$r3 q0 E q1$s2 q0 E q1 $ q2r1 q0 Z top is on the right (1)Z ➞ E $ (2)E ➞ T (3)E ➞ E + T (4)T ➞ i (5)T ➞ ( E )

54 Are we done? Can make a transition diagram for any grammar Can make a GOTO table for every grammar But the states are not always clear on what to do Cannot make a deterministic ACTION table for every grammar 54

55 LR(0) Conflicts 55 Z ➞ E $ E ➞ T E ➞ E + T T ➞ i T ➞ ( E ) T ➞ i[E] Z ➞  E$ E ➞  T E ➞  E + T T ➞  i T ➞  (E) T ➞  i[E] T ➞ i  T ➞ i  [E] q0q0 q5q5 T ( i E Shift/reduce conflict … … …

56 View in Action/Goto Table shift/reduce conflict… 56 Statei+()$[ET q0s5s716 q1s3s2 q2r1 q3s5s74 q4r3 q5r4 r4/s10r4 q6r2 q7s5s786 q8s3s9 q9r5

57 LR(0) Conflicts 57 Z ➞ E $ E ➞ T E ➞ E + T T ➞ i V ➞ i T ➞ (E) Z ➞  E$ E ➞  T E ➞  E + T T ➞  i V ➞  i T ➞  (E) T ➞ i  V ➞ i  q0q0 q5q5 T ( i E reduce/reduce conflict … … … Can there be a shift/shift conflict?

58 View in Action/Goto Table reduce/reduce conflict… 58 Statei+()$ET q0s5s716 q1s3s2 q2r1 q3s5s74 q4r3 q5r4/r5 q6r2 q7s5s786 q8s3s9 q9r5

59 LR(0) Conflicts Any grammar with an  -rule cannot be LR(0) You should always be able to tell if an (invisible)  should be reduced to A, without looking at the next token. Inherent shift/reduce conflict – A ➞  - reduce item – P ➞ α  Aβ – shift item – A ➞  can always be predicted from P ➞ α  Aβ (and should be predicted without looking at any token). 59

60 Back to Action/Goto Table Note that reductions ignore the input… 60 Statei+()$ET q0s5s716 q1s3s2 q2r1 q3s5s74 q4r3 q5r4 q6r2 q7s5s786 q8s3s9 q9r5

61 SLR Grammars A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N A reduce item N ➞  is applicable only when the look-ahead is in FOLLOW(N) Differs from LR(0) only on the originally “reduce” rows. We will not reduce sometimes and will shift (or do nothing = error) instead. 61

62 GOTO/ACTION Table 62 Statei+()$ET q0s5s716 q1s3s2 q2r1 q3s5s74 q4r3 q5r4 q6r2 q7s5s786 q8s3s9 q9r5 (1)Z ➞ E (2)E ➞ T (3)E ➞ E + T (4)T ➞ i (5)T ➞ ( E ) Rule 1 can only be used with the end of input “$” sign.

63 GOTO/ACTION Table 63 Statei+()$ET q0s5s716 q1s3s2 q2r1 q3s5s74 q4r3 q5r4 q6r2 q7s5s786 q8s3s9 q9r5 (1)Z ➞ E (2)E ➞ T (3)E ➞ E + T (4)T ➞ i (5)T ➞ ( E ) The things that can follow E are ‘+’ ‘)’ and ‘$’.

64 GOTO/ACTION Table 64 Statei+()$ET q0s5s716 q1s3s2 q2r1 q3s5s74 q4r3 q5r4 q6r2 q7s5s786 q8s3s9 q9r5 (1)Z ➞ E (2)E ➞ T (3)E ➞ E + T (4)T ➞ i (5)T ➞ ( E ) The things that can follow E are ‘+’ ‘)’ and ‘$’.

65 GOTO/ACTION Table 65 Statei+()$ET q0s5s716 q1s3s2 q2r1 q3s5s74 q4r3 q5r4 q6r2 q7s5s786 q8s3s9 q9r5 (1)Z ➞ E (2)E ➞ T (3)E ➞ E + T (4)T ➞ i (5)T ➞ ( E ) Same for T

66 GOTO/ACTION Table 66 Statei+()$ET q0s5s716 q1s3s2 q2r1 q3s5s74 q4r3 q5r4 q6r2 q7s5s786 q8s3s9 q9r5 (1)Z ➞ E (2)E ➞ T (3)E ➞ E + T (4)T ➞ i (5)T ➞ ( E ) Same for T

67 The shift/reduce conflict 67 Z ➞ E $ E ➞ T E ➞ E + T T ➞ i T ➞ ( E ) T ➞ i[E] Z ➞  E$ E ➞  T E ➞  E + T T ➞  i T ➞  (E) T ➞  i[E] T ➞ i  T ➞ i  [E] q0q0 q5q5 T ( i E Shift/reduce conflict … … …

68 Now let’s add “T ➞ i[E]” 68 Statei+()$[]ET q0s5s716 q1s3s2 q2r1 q3s5s74 q4r3 s 10 q5r4 q6r2 q7s5s786 q8s3s9 q9r5 (1)Z ➞ E (2)E ➞ T (3)E ➞ E + T (4)T ➞ i (5)T ➞ ( E ) (6)T ➞ i[E]

69 SLR: check next token when reducing Simple LR(1), or SLR(1), or SLR. Example demonstrates elimination of a shift/reduce conflict. Can eliminate reduce/reduce conflict when conflicting rule’s variables satisfy FOLLOW(T) ≠ FOLLOW(V). But cannot solve all conflicts. 69

70 Consider the following grammar 70 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

71 71 S’ →  S S →  L = R S →  R L →  * R L →  id R →  L S’ → S  S → L  = R R → L  S → R  L → *  R R →  L L →  * R L →  id L → id  S → L =  R R →  L L →  * R L →  id L → * R  R → L  S → L = R  S L R id * = R * R L * L q0 q4 q7 q1 q3 q9 q6 q8 q2 q5

72 Shift/reduce conflict S → L  = R vs. R → L  FOLLOW(R) contains = – S ⇒ L = R ⇒ * R = R SLR cannot resolve the conflict either 72 S → L  = R R → L  S → L =  R R →  L L →  * R L →  id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

73 Resolving the Conflict In SLR: a reduce item N ➞ α  is applicable only when the look- ahead is in FOLLOW(N). But there is a full sentential form that we have discovered so far. We can ask what the next token may be given all the derivation made so far. For example, even looking at the follow of the entire sentential form is more restrictive than looking at the follow of the last variable. In a way, FOLLOW(N) merges look-ahead for all alternatives for N follow(σA)  follow(A) LR(1) keeps look-ahead with each LR item

74 LR(1) Item LR(1) item is a pair: [ LR(0) item, Look-ahead token ] Meaning: We matched the part left of the dot, looking to match the part on the right of the dot, followed by the look-ahead token. Example: t he production L ➞ id yields the following LR(1) items 74 [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

75 Creating the states for LR(1) We start with the initial state: q0 will be the closure of: (S’ → ● S, $) Closure for LR(1): For every [A → α ● Bβ, c] in the state: – for every production B → δ and every token b in the grammar such that b  FIRST(βc) – Add [B → ● δ, b] to S 75

76 Closure of (S’ → ● S, $) We would like to add rules that start with S, but keep track of possible lookahead. (S’ → ● S, $) (S → ● L = R, $) :Rules for S (S → ● R, $) (L → ● * R, = ) :Rules for L (L → ● id, = ) (R → ● L, $ ) :Rules for R (L → ● id, $ ) :More rules for L (L → ● * R, $ ) (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

77 The State Machine מצב 0 (S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) מצב 1 (S’ → S ∙, $) מצב 2 (S → L ∙ = R, $) (R → L ∙, $) מצב 3 (S → R ∙, $) מצב 4 (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) מצב 5 (L → id ∙, $) (L → id ∙, =) מצב 6 (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) מצב 7 (L → * R ∙, =) (L → * R ∙, $) מצב 8 (R → L ∙, =) (R → L ∙, $) S L R id * = R * L (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

78 מצב 0 (S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) מצב 1 (S’ → S ∙, $) מצב 2 (S → L ∙ = R, $) (R → L ∙, $) מצב 3 (S → R ∙, $) מצב 4 (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) מצב 5 (L → id ∙, $) (L → id ∙, =) מצב 6 (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) מצב 7 (L → * R ∙, =) (L → * R ∙, $) מצב 8 (R → L ∙, =) (R → L ∙, $) מצב 9 (S → L = R ∙, $) S L R id * = R * R L * L מצב 11 מצב 12 מצב 10 The State Machine

79 מצב 1 (S’ → S ∙, $) מצב 2 (S → L ∙ = R, $) (R → L ∙, $) מצב 3 (S → R ∙, $) מצב 5 (L → id ∙, $) (L → id ∙, =) מצב 6 (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) מצב 7 (L → * R ∙, =) (L → * R ∙, $) מצב 8 (R → L ∙, =) (R → L ∙, $) מצב 9 (S → L = R ∙, $) S L R id = R R L * L מצב 10 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) מצב 11 (L → id ∙, $) מצב 12 (R → L ∙, $) מצב 13 (L → * R ∙, $) id R L * The State Machine

80 Back to the conflict Is there a conflict now? 80 (S → L ∙ = R, $) (R → L ∙, $) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) = q6 q2

81 Bottom-up Parsing LR(K) SLR LALR All follow the same pushdown-based algorithm Differ on type of “LR Items” 81

82 Building the Tables Similarly to SLR, we start with the automaton. Turn each transition in a token column to a shift. The variables columns form the goto section. The “acc” is put in the $ column, for any rule that contains (S’ → S●, $). For any state that contains an item of the form [A → β●,a], and any rule of the form A → β whose number is m, use “reduce m” for the row of this state and the column of token a.

83 Building the Table GotoActions LRS$=*id 231s4s50 acc1 r5s62 r23 87s4s54 r4 5 129s10s116 r3 7 r5 8 r19 1213s10s1110 r411 r512 r113

84 LALR LR tables have large number of entries Often don’t need such refined observation (and cost) LALR idea: find states with the same LR(0) component and merge their look-ahead component as long as there are no conflicts LALR not as powerful as LR(1) For a language like C: – SLR cannot handle some of the structures and requires hundreds of states. – CLR handles everything but requires thousands of states. – LALR handles everything with hundreds of states (not much higher than SLR). 84

85 Chomsky Hierarchy 85 Regular Context free Context sensitive Recursively enumerable Finite-state automaton Non-deterministic pushdown automaton Linear-bounded non-deterministic Turing machine Turing machine

86 Grammar Hierarchy 86 Non-ambiguous CFG CLR(1) LALR(1) SLR(1) LL(1) LR(0)

87 Summary Bottom up derivation LR(k) can decide on a reduce after seeing the entire right side of the rule plus k look-ahead tokens. particularly LR(0). Using a table and a stack to derive. LR Items and the automaton. Creating the table from the automaton. LR parsing with pushdown automata LR(0), SLR, LR(1) – different kinds of LR items, same basic algorithm LALR: in the exercise. 87

88 Next time Semantic analysis 88


Download ppt "Theory of Compilation 236360 Erez Petrank Lecture 3: Syntax Analysis: Bottom-up Parsing 1."

Similar presentations


Ads by Google