Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parsers and control structures

Similar presentations


Presentation on theme: "Parsers and control structures"— Presentation transcript:

1 Parsers and control structures
Grammar checking

2 Outline Control Structures Parsers Expressions Selection statements
Iterative statements Context-free grammar Parsers Recursive-descent parsing LL(1) parsing SLR(1) parsing Chapter 4 Parsers and Control Structures 24/11/61

3 Control Structures Expressions Selection statements
Iterative statements Chapter 4 Parsers and Control Structures 24/11/61

4 Expressions Control flow in an expression is determined by
Operator precedence Operators with higher precedence are executed before. Operator associativity If an operator  is left/right associative, then in the expression …  a  …, a is used by the operator  on the left/ right first. When  is left-associative, a  b  c  d means (((a  b)  c)  d) . When  is right-associative, a  b  c  d means (a  (b  (c  d))). Chapter 4 Parsers and Control Structures 24/11/61

5 Selection Statements Two-way selections: If statements
Nested ifs Explicit: using blocks, such as {…} or begin…end Implicit: using semantic rule “else go with the nearest if” Multiway selections: switch-case statements Type of control expression Selectable segment Execution flow Unrepresented expression Chapter 4 Parsers and Control Structures 24/11/61

6 Type of Control Expression
Real number FORTRAN Use 3-way selector : if (arithmeticExpression) N1, N2,N3 Ordinal (int, char, boolean, enum) Pascal, Ada Integer only C, C++, JAVA Why control expression must be ordinal Jump tables are used for implementation Chapter 4 Parsers and Control Structures 24/11/61

7 Execution flow FORTRAN PASCAL C case … of const1: … const2: otherwise:
end IF (…) N1, N2, N3 N1 … GOTO OUT N2 … N3 … OUT switch (…) { case const1: case const2: break; default: } Chapter 4 Parsers and Control Structures 24/11/61

8 Selectable segments Single statement Compound statement
Allowed in most languages Compound statement C, C++, JAVA Chapter 4 Parsers and Control Structures 24/11/61

9 Unrepresented Expression
Unspecified Error occurs during execution Some Pascal Can be specified, but not required Use default, else, otherwise as keywords C, other Pascal Require lists to be exhaustive Ada Chapter 4 Parsers and Control Structures 24/11/61

10 Iteration Statements Counter-controlled loops
Logically-controlled loops Should both be combined into one construct ? Example: for loop in C Chapter 4 Parsers and Control Structures 24/11/61

11 Loop Variables in Counter-controlled Loops
Type Only integer in FORTRAN Integer and real in ALGOL Ordinal in Pascal Discrete range in Ada scope only in loop in Pascal, Ada As defined Value at loop termination Can be changed in loop body ? FORTRAN, Pascal: Allowed, but not effect loop control ALGOL: Allowed and effect loop control Ada: not allowed Evaluation of loop parameters Once in FORTRAN, Pascal, Ada Every iteration in ALGOL ALGOL allows var := list Chapter 4 Parsers and Control Structures 24/11/61

12 Logically-controlled Loops
Pre-test or post-test ? While or until ? Pascal has while-do and repeat- until. C, C++ and Java have while-do and do-while. Ada has a pretest version, but no posttest. FORTRAN 77 and 90 have none. Perl has two pretest loops, while and until, but no posttest loop. Chapter 4 Parsers and Control Structures 24/11/61

13 User-located Loop Control
Break or exit Bad for readability Allow in most languages Chapter 4 Parsers and Control Structures 24/11/61

14 Iteration Based on Data Structures
Foreach in Perl: for arrays Chapter 4 Parsers and Control Structures 24/11/61

15 Context-free Grammar 2301380 Chapter 4 Parsers and Control Structures
24/11/61

16 Grammar Rules V is a finite set of nonterminals, containing S,
T is a finite set of terminals, P is a set of production rules in the form of aβ where  is in V and β is in (V UT )*, and S is the start symbol Terminology sentential form Derivation Leftmost derivation Rightmost derivation Chapter 4 Parsers and Control Structures 24/11/61

17 Left/Right Recursive A grammar is a left recursive if its production rules can generate a derivation of the form A * A X. Examples: E  F + id | (E) | id F  E * id | id E  F + id E * id + id A grammar is a right recursive if its production rules can generate a derivation of the form A  * X A. Examples: E  id + F | (E) | id F  id * E | id E  id + F  id + id * E Chapter 4 Parsers and Control Structures 24/11/61

18 Parse Tree A labeled tree in which
the interior nodes are labeled by nonterminals leaf nodes are labeled by terminals the children of an interior node represent a replacement of the associated nonterminal in a derivation corresponding to a derivation Chapter 4 Parsers and Control Structures 24/11/61

19 Abstract Syntax Tree + + * * Representation of actual source tokens
Interior nodes represent operators. Leaf nodes represent operands. + E * id3 id2 id1 + id1 id3 * id2 Chapter 4 Parsers and Control Structures 24/11/61

20 Abstract Syntax Tree for If Statement
) exp true ( if else elsePart return if true st return Chapter 4 Parsers and Control Structures 24/11/61

21 Ambiguous Grammar A grammar is ambiguous if it can generate two different parse trees for one string. Ambiguous grammars can cause inconsistency in parsing. E  E + E | E – E | E * E | E / E | id * E id2 + id1 id3 + E * id3 id2 id1 Chapter 4 Parsers and Control Structures 24/11/61

22 Ambiguity in Expressions
Which operation is to be done first? solved by precedence An operator with higher precedence is done before one with lower precedence. An operator with higher precedence is placed in a rule (logically) further from the start symbol. solved by associativity Right-associated : W + (X + (Y + Z)) Left-associated : ((W + X) + Y) + Z Chapter 4 Parsers and Control Structures 24/11/61

23 Parsers Recursive-descent parsers LL(1) parsers SLR(1) parsers
Chapter 4 Parsers and Control Structures 24/11/61

24 Parsing Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream of tokens. We already learn how to describe the syntactic structure of a language using (context-free) grammar. Stream of tokens Parser Parse tree Context-free grammar Chapter 4 Parsers and Control Structures 24/11/61

25 Top-down Parsing Bottom-up Parsing
A parse tree is created from leaves to root Tracing rightmost derivation  id + id * id  E + id * id  E + E * id  E + E * E E  E + E A parse tree is created from root to leaves Tracing leftmost derivation E  E + E  id + E  id + E * E  id + id * E  id + id * id id E * + Chapter 4 Parsers and Control Structures 24/11/61

26 Recursive-Descent Parsing
Write one procedure for each set of productions with the same nonterminal in the LHS Each procedure recognizes a structure described by a nonterminal. A procedure calls other procedures if it need to recognize other structures. A procedure calls match procedure if it need to recognize a terminal. Chapter 4 Parsers and Control Structures 24/11/61

27 Recursive-Descent: Example
E  E O F | F O  + | - F  ( E ) | id procedure F { switch token { case ‘(‘: match(‘(‘); E; match(‘)’); case id: match(id); default: error; } } We cannot decide which rule to use for E If we choose E  E O F, it leads to infinitely recursive loops. Rewrite the grammar into E  F { O F } procedure E { F; while (token==‘+’ or token==‘-’) { O; F; } } Chapter 4 Parsers and Control Structures 24/11/61

28 Problems in Recursive-Descent
Difficult to convert grammars into EBNF Cannot decide which production to use at each point Cannot decide when to use -production A  Chapter 4 Parsers and Control Structures 24/11/61

29 LL(1) Parsers 2301380 Chapter 4 Parsers and Control Structures
24/11/61

30 LL(1) Parsing LL(1) Use stack to simulate leftmost derivation
Read input from (L) left to right Simulate (L) leftmost derivation 1 lookahead symbol Use stack to simulate leftmost derivation Part of sentential form produced in the leftmost derivation is stored in the stack. Top of stack is the leftmost nonterminal symbol in the fragment of sentential form. Chapter 4 Parsers and Control Structures 24/11/61

31 How LL(1) Parsing Works Keep part of sentential form in the stack.
If the symbol on the top of stack is a terminal, try to match it with the next input token and pop it out of stack. If the symbol on the top of stack is a nonterminal X, and there is a production rule X  Y, then replace X with Y. Which production will be chosen, if there are both X  Y and X  Z ? Chapter 4 Parsers and Control Structures 24/11/61

32 Example: LL(1) Parsing F n ( T N E X ) A + F F n Finished N ( T T N M
E  TX  FNX  (E)NX (TX)NX (FNX)NX (nNX)NX (nX)NX (nATX)NX (n+TX)NX (n+FNX)NX (n+(E)NX)NX (n+(TX)NX)NX (n+(FNX)NX)NX (n+(nNX)NX)NX (n+(nX)NX)NX (n+(n)NX)NX (n+(n)X)NX (n+(n))NX (n+(n))MFNX (n+(n))*FNX (n+(n))*nNX (n+(n))*nX (n+(n))*n E  T X X  A T X |  A  + | - T  F N N  M F N | M * F  ( E ) | n ) A + F F n Finished N ( T T N M X * E X n ) F F N N T X E ( n + ) * $ $ Chapter 4 Parsers and Control Structures 24/11/61

33 (LL(1) Parsing Algorithm
Push the start symbol into the stack WHILE stack is not empty ($ is not on top of stack) and the stream of tokens is not empty (the next input token is not $) SWITCH (Top of stack, next token) CASE (terminal a, a): Pop stack; Get next token CASE (nonterminal A, terminal a): IF the parsing table entry M[A, a] is not empty THEN Get A X1 X2 ... Xn from the parsing table entry M[A, a] Pop stack; Push Xn ... X2 X1 into stack in that order ELSE Error CASE ($,$): Accept OTHER: Error Chapter 4 Parsers and Control Structures 24/11/61

34 What is in LL(1) Parsing Table
Which production to use if nonterminal N is on the top of stack and the next token is t ? Choose a production such that Left handside is N Right handside X FINALLY produces t X * tY or X *  and S * WNtY Chapter 4 Parsers and Control Structures 24/11/61

35 First Let X be  or be in V or T.
First(X ) is the set of the first terminal in any sentential form derived from X. If X is a terminal or , then First(X)={X}. If X is a nonterminal and there is X  X1 X2 ... Xn First(X1) -{} is a subset of First(X) First(Xi )-{} is a subset of First(X) if for all j<i First(Xj) contains {}  is in First(X) if for all j≤n First(Xj)contains  Chapter 4 Parsers and Control Structures 24/11/61

36 Example: First exp  exp addop term | term addop  + | -
term  term mulop factor|factor mulop  * factor  (exp) | num First(addop) = {+, -} First(mulop) = {*} First(factor) = {(, num} First(term) = {(, num} First(exp) = {(, num} st  ifst | other ifst  if ( exp ) st elsepart elsepart  else st |  exp  0 | 1 First(exp) = {0, 1} First(elsepart) = {else, } First(ifst) = {if} First(st) = {if, other} Chapter 4 Parsers and Control Structures 24/11/61

37 Algorithm for finding First
FOR ALL terminals a, First(a) = {a} FOR ALL nonterminals A, First(A) := {} WHILE there is any change to any First(A) FOR ALL rule A  X1 X2 ... Xn FOR ALL Xi in {X1, X2, …, Xn } IF FOR ALL j<i First(Xj) contains , THEN add First(Xi)-{} to First(A) IF  is in First(X1), First(X2), ..., and First(Xn) THEN add  to First(A) Chapter 4 Parsers and Control Structures 24/11/61

38 Example: Finding First
exp  term exp’ exp’  addop term exp’ |  addop  + | - term  factor term’ term’ mulop factor term’|  mulop  * factor  ( exp ) | num exp ( num exp’ + - addop + - term term’ * mulop factor Chapter 4 Parsers and Control Structures 24/11/61

39 Follow Let $ denote the end of input tokens
If A is the start symbol, then $ is in Follow(A). If there is a rule B  X A Y, then First(Y) - {} is in Follow(A). If there is production B  X A Y and  is in First(Y), then Follow(A) contains Follow(B). Chapter 4 Parsers and Control Structures 24/11/61

40 Algorithm for Finding Follow(A)
Follow(S) = {$} FOR ALL A in V-{S} Follow(A) = {} WHILE change is made to some Follow sets FOR each production A  X1 X2 ... Xn, FOR ALL nonterminal Xi Add First(Xi+1 Xi+2...Xn) – {} into Follow(Xi). (NOTE: If i=n, Xi+1 Xi+2...Xn= ) IF  is in First(Xi+1 Xi+2...Xn) THEN Add Follow(A) to Follow(Xi). Chapter 4 Parsers and Control Structures 24/11/61

41 Example: Finding Follow Set
exp  term exp’ exp’  addop term exp’ |  addop  + | - term  factor term’ term’mulop factor term’ | mulop  * factor  ( exp ) | num First Follow exp ( num $ ) exp’  + - addop + - term + - $ ) term’  * mulop * factor Chapter 4 Parsers and Control Structures 24/11/61

42 Constructing LL(1) Parsing Tables
FOR each nonterminal A and a production A  X FOR each token a in First(X) A  X is in M(A, a) IF  is in First(X) THEN FOR each element a in Follow(A) Add A  X to M(A, a) Chapter 4 Parsers and Control Structures 24/11/61

43 Example: Constructing LL(1) Parsing Table
First Follow exp {(, num} {$,)} exp’ {+,-, } {$,)} addop {+,-} {(,num} term {(,num} {+,-, ),$} term’ {*, } {+,-, ),$} mulop {*} {(,num} factor {(, num} {+,-,*,),$} 1 exp  term exp’ 2 exp’  addop term exp’ 3 exp’   4 addop  + 5 addop  - 6 term  factor term’ 7 term’  mulop factor term’ 8 term’   9 mulop  * 10 factor  ( exp ) 11 factor  num ( ) + - * n $ exp exp’ addop term term’ mulop factor 1 1 3 2 2 3 4 5 6 6 8 8 8 7 8 9 10 11 Chapter 4 Parsers and Control Structures 24/11/61 43

44 LL(1) Grammar A grammar is an LL(1) grammar if its LL(1) parsing table has at most one production in each table entry. Non-LL(1) grammar More than 1 production in at least one entry of the table. Causes: Left-recursion Left-factor Chapter 4 Parsers and Control Structures 24/11/61

45 Non-LL(1) Grammar 1 exp  exp addop term 2 exp  term 3 term  term mulop factor 4 term  factor 5 factor  ( exp ) 6 factor  num 7 addop  + 8 addop  - 9 mulop  * First(exp) = { (, num } First(term) = { (, num } First(factor) = { (, num } First(addop) = { +, - } First(mulop) = { * } Chapter 4 Parsers and Control Structures 24/11/61

46 Left Recursion Immediate left recursion General left recursion
A  A X | Y (A=Y X*) A  A X1 | A X2 |…| A Xn | Y1 | Y2 |... | Ym A=(Y1,Y2,..,Ym) (X1,X2,…,Xn)* General left recursion A => X =>* A Y Can be removed easily A  Y A’, A’  X A’|  A  Y1 A’ | Y2 A’ |...| Ym A’, A’  X1 A’| X2 A’|…| Xn A’|  Can be removed when there is no empty-string production and no cycle in the grammar Chapter 4 Parsers and Control Structures 24/11/61

47 Removal of Immediate Left Recursion
exp  exp + term | exp - term | term term  term * factor | factor factor  ( exp ) | num Remove left recursion exp  term exp’ exp’  + term exp’ | - term exp’ |  term  factor term’ term’  * factor term’ |  Chapter 4 Parsers and Control Structures 24/11/61

48 Left Factoring Left factor causes non-LL(1) A  X Y | X Z
Given A  X Y | X Z. Both A  X Y and A  X Z can be chosen when A is on top of stack and a token in First(X) is the next token. A  X Y | X Z can be left-factored as A  X A’ and A’  Y | Z Chapter 4 Parsers and Control Structures 24/11/61

49 Example: Left-factor ifSt  if ( exp ) st else st | if ( exp ) st
can be left-factored as ifSt  if ( exp ) st elsePart elsePart  else st |  seq  st ; seq | st seq  st seq’ seq’  ; seq |  Chapter 4 Parsers and Control Structures 24/11/61

50 SLR(1) Parsers 2301380 Chapter 4 Parsers and Control Structures
24/11/61

51 SLR(1) Parsing Bottom-up parsing
Simulate rightmost derivation (R) from left (L) to right, thus called LR parsing More powerful than top-down parsing Left recursion does not cause problem Two actions Shift: take next input token into the stack Reduce: replace a string B on top of stack by a nonterminal A, given a production A  B Chapter 4 Parsers and Control Structures 24/11/61

52 Example of Shift-reduce Parsing
Grammar S’  S S  (S)S |  Parsing actions Stack Input Action $ ( ( ) ) $ shift $ ( ( ) ) $ shift $ ( ( ) ) $ reduce S   $ ( ( S ) ) $ shift $ ( ( S ) ) $ reduce S   $ ( ( S ) S ) $ reduce S  ( S )S $ ( S ) $ shift $ ( S ) $ reduce S   $ ( S ) S $ reduce S  ( S )S $ S $ accept Reverse of rightmost derivation from left to right 1  ( ( ) ) 2  ( ( ) ) 3  ( ( ) ) 4  ( ( S ) ) 5  ( ( S ) ) 6  ( ( S ) S ) 7  ( S ) 8  ( S ) 9  ( S ) S 10 S’  S Chapter 4 Parsers and Control Structures 24/11/61

53 Terminologies $ ( ( ) ) $ shift $ ( ( ) ) $ shift $ ( ( ) ) $ reduce S   $ ( ( S ) ) $ shift $ ( ( S ) ) $ reduce S   $ ( ( S ) S ) $ reduce S  ( S )S $ ( S ) $ shift $ ( S ) $ reduce S   $ ( S ) S $ reduce S  ( S )S $ S $ accept 1  ( ( ) ) 2  ( ( ) ) 3  ( ( ) ) 4  ( ( S ) ) 5  ( ( S ) ) 6  ( ( S ) S ) 7  ( S ) 8  ( S ) 9  ( S ) S 10 S’  S Right sentential form: sentential form in a rightmost derivation Viable prefix: sequence of symbols on the parsing stack Handle: right sentential form + position where reduction can be performed + production used for reduction Chapter 4 Parsers and Control Structures 24/11/61

54 Items Production with distinguished position in its RHS
Keep track of what is left to be done in the parsing process by using finite automata of items An item A  w . B y means: A  w B y might be used for the reduction in the future, at the time, we know we already construct w in the parsing process, if B is constructed next, we get the new item A  w B . Y Chapter 4 Parsers and Control Structures 24/11/61

55 Types of LR(0) Items Initial Item Complete Item Closure Item of x
Item with the distinguished position on the leftmost of the production Complete Item Item with the distinguished position on the rightmost of the production Closure Item of x Item x together with items which can be reached from x via -transition Kernel Item Original item, not including closure items Chapter 4 Parsers and Control Structures 24/11/61

56 Finite Automata of Items
Grammar: S’  S S  (S)S S   Items: S’  .S S’  S. S  .(S)S S  (.S)S S  (S.)S S  (S).S S  (S)S. S  . S S’  .S S’  S. S  .(S)S S  . ( S  (.S)S S S  (S.)S ) S  (S).S S  (S)S. S Chapter 4 Parsers and Control Structures 24/11/61

57 DFA of LR(0) Items S’  .S S’  S. S  .(S)S S  . S  (.S)S S  (S.)S
S  .(S)S S  . ( ( S  (S.)S S S  .(S)S S  (.S)S S  . S  (.S)S S ) S  (S).S S  .(S)S S  . S  (S.)S ( ( ) S  (S).S S S S  (S)S. S  (S)S. Chapter 4 Parsers and Control Structures 24/11/61

58 LR(0) Parsing Algorithm
Let B be a terminal. Item in state token Action A  x.By B shift B and push state s containing A  xB.y not B error A  x. - reduce with A  x (i.e. pop x, backup to the state s on top of stack) and push A with new state d(s,A) S’  S. none accept any Chapter 4 Parsers and Control Structures 24/11/61

59 LR(0) Parsing Table State Action Rule ( a ) A shift 3 2 1 reduce
shift 3 2 1 reduce A’  A A  a 4 5 A  (A) A’  .A A  .(A) A  .a A A’  A. 1 a ( a A  a. 2 A  (.A) A  .(A) A  .a A  (A.) 4 3 A ) ( A  (A). 5 Chapter 4 Parsers and Control Structures 24/11/61

60 Example of LR(0) Parsing
Stack Input Action $0 ( ( a ) ) $ shift $0(3 ( a ) ) $ shift $0(3(3 a ) ) $ shift $0(3(3a2 ) ) $ reduce $0(3(3A4 ) ) $ shift $0(3(3A4)5 ) $ reduce $0(3A4 ) $ shift $0(3A4)5 $ reduce $0A1 $ accept State Action Rule ( a ) A shift 3 2 1 reduce A’  A A  a 4 5 A (A) Chapter 4 Parsers and Control Structures 24/11/61

61 Non-LR(0) Grammar Conflict
Shift-reduce conflict A state contains a complete item A  x. and a shift item A  x.By Reduce-reduce conflict A state contains more than one complete items. A grammar is a LR(0) grammar if there is no conflict in the grammar S’  .S S  .(S)S S  . S S’  S. ( S  (S.)S S S  .(S)S S  (.S)S S  . ) S  (S).S S  .(S)S S  . ( ( S S  (S)S. Chapter 4 Parsers and Control Structures 24/11/61

62 SLR(1) Parsing Simple LR with 1 lookahead symbol
Examine the next token before deciding to shift or reduce If the next token is the token expected in an item, then it can be shifted into the stack. If a complete item A  x. is constructed and the next token is in Follow(A), then reduction can be done using A  x. Otherwise, error occurs. Can avoid some conflict. Chapter 4 Parsers and Control Structures 24/11/61

63 SLR(1) parsing algorithm
Let B be a terminal. Item in state token Action A x.By B shift B & push state s containing AxB.y not B error A  x. Follow(A) reduce with Ax (pop x, backup to the state s on top of stack) & push A with state d(s,A) Follow(A) S’  S. none accept any Chapter 4 Parsers and Control Structures 24/11/61

64 Constructing SLR(1) Parsing Table
A  (A) | a State ( a ) $ A S3 S2 1 AC 2 R2 3 4 S5 5 R1 A A’  A. 1 A’  .A A  .(A) A  .a 0 a A  a 2 ( a A  (.A) A  .(A) A  .a 3 A  (A.) 4 A ) ( A  (A). 5 Chapter 4 Parsers and Control Structures 24/11/61

65 SLR(1), but not LR(0), Grammar
S  (S)S |  S’  .S S  .(S)S S  S S’  S. 1 State ( ) $ S S2 R2 1 AC 2 3 S4 4 5 R1 ( S  (S.)S 3 S S  .(S)S S  (.S)S S  ) S  (S).S S  .(S)S S  ( ( S S  (S)S. 5 Chapter 4 Parsers and Control Structures

66 Disambiguating Rules for Parsing Conflict
Shift-reduce conflict Prefer shift over reduce In case of nested if statements, preferring shift over reduce implies most closely nested rule for dangling else Reduce-reduce conflict Error in design Chapter 4 Parsers and Control Structures 24/11/61

67 Dangling Else S’  S. 1 S’  .S 0 S  .I S  .other I  .if S
state if else other $ S I S4 S3 1 2 ACC R1 3 R2 4 5 S6 R3 6 7 R4 S’  S. 1 S S’  .S S  .I S  .other I  .if S I  .if S else S I S  I. 2 I I  if S else .S 6 S  .I S  .other I  .if S I  .if S else S other I if other S  other. 3 if I  if .S I  if .S else S S  .I S  .other I  .if S I  .if S else S I  .if S else S 7 S other else S I  if S I  if S. else S if Chapter 4 Parsers and Control Structures 24/11/61


Download ppt "Parsers and control structures"

Similar presentations


Ads by Google