Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016.

Similar presentations


Presentation on theme: "1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016."— Presentation transcript:

1 1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016

2 2 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm

3 3 Derivation Trees Illustrate the derivation of a certain sentence from a grammar Different derivation trees with different derivation orders

4 4 Derivation Order Consider the following example grammar with 5 productions:

5 5 Leftmost derivation order of string : At each step, we substitute the leftmost variable

6 6 Rightmost derivation order of string : At each step, we substitute the rightmost variable

7 7 Rightmost derivation of : Leftmost derivation of :

8 8 Derivation Trees Consider the same example grammar: And a derivation of :

9 9 yield

10 10 yield

11 11 yield

12 12 yield

13 13 yield Derivation Tree (parse tree)

14 14 Give same derivation tree Sometimes, derivation order doesn’t matter Leftmost derivation: Rightmost derivation:

15 15 Ambiguity A grammar can have multiple parser tree to derive a certain sentence Inherent ambiguity and non-inherent ambiguity

16 16 Grammar for mathematical expressions Example strings: Denotes any number

17 17 A leftmost derivation for

18 18 Another leftmost derivation for

19 19 Two derivation trees for

20 20 take

21 21 Good TreeBad Tree Compute expression result using the tree

22 22 Two different derivation trees may cause problems in applications which use the derivation trees: Evaluating expressions In general, in compilers for programming languages

23 23 Ambiguous Grammar: A context-free grammar is ambiguous if there is a string which has: two different derivation trees or two leftmost derivations (Two different derivation trees give two different leftmost derivations and vice-versa)

24 24 stringhas two derivation trees this grammar is ambiguous since Example:

25 25 stringhas two leftmost derivations this grammar is ambiguous also because

26 26 IF_STMTif EXPR then STMT if EXPR then STMT else STMT Another ambiguous grammar: VariablesTerminals Very common piece of grammar in programming languages

27 27 If expr1 then if expr2 then stmt1 else stmt2 IF_STMT expr1then elseifexpr2then STMT stmt1 if IF_STMT expr1thenelse ifexpr2then STMTstmt2 if stmt1 stmt2 Two derivation trees

28 28 In general, ambiguity is bad and we want to remove it Sometimes it is possible to find a non-ambiguous grammar for a language But, in general we cannot do so

29 29 Ambiguous Grammar Non-Ambiguous Grammar Equivalent generates the same language A successful example:

30 30 Unique derivation tree for

31 31 An un-successful example: every grammar that generates this language is ambiguous is inherently ambiguous:

32 32 Example (ambiguous) grammar for :

33 33 The string has always two different derivation trees (for any grammar) For example

34 34 Ambiguity: Summary A grammar can have multiple parser tree to derive a certain sentence Inherent ambiguous language –All grammars are ambiguous Non-inherent ambiguous language –There exist at least one grammar that is not ambiguous Checking ambiguity of a grammar or a language: un-decidable problem

35 35 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm

36 36 Normal Forms Chomsky Normal Forms BNF Normal Forms

37 37 A → BC A → α A context free grammar is said to be in Chomsky Normal Form if all productions are in the following form: A, B and C are non terminal symbols α is a terminal symbol

38 38 Eliminate Useless Symbols 1 Eliminate ε productions 2 Eliminate unit productions 3 There are three preliminary simplifications

39 39 Eliminate Useless Symbols We need to determine if the symbol is useful by identifying if a symbol is generating and is reachable X is generating if X ω for some terminal string ω. X is reachable if there is a derivation X αXβ for some α and β * *

40 40 Example: Removing non-generating symbols S → AB | a A → b S → AB | a A → b Initial CFL grammar S → AB | a A → b S → AB | a A → b Identify generating symbols S → a A → b S → a A → b Remove non-generating

41 41 Example: Removing non-reachable symbols S → a Eliminate non-reachable S → a A → b S → a A → b Identify reachable symbols

42 42 The order is important. S → AB | a A → b S → AB | a A → b Looking first for non-reachable symbols and then for non-generating symbols can still leave some useless symbols. S → a A → b S → a A → b

43 43 Finding generating symbols If there is a production A → α, and every symbol of α is already known to be generating. Then A is generating S → AB | a A → b S → AB | a A → b We cannot use S → AB because B has not been established to be generating

44 44 Finding reachable symbols S is surely reachable. All symbols in the body of a production with S in the head are reachable. S → AB | a A → b S → AB | a A → b In this example the symbols {S, A, B, a, b} are reachable.

45 45 Eliminate Useless Symbols 1 Eliminate ε productions 2 Eliminate unit productions 3 There are three preliminary simplifications

46 46 Eliminate ε Productions In a grammar ε productions are convenient but not essential If L has a CFG, then L – {ε} has a CFG Nullable variable A ε *

47 47 If A is a nullable variable Whenever A appears on the body of a production A might or might not derive ε S → ASA | aB A → B | S B → b | ε Nullable: {A, B}

48 48 Create two version of the production, one with the nullable variable and one without it Eliminate productions with ε bodies S → ASA | aB A → B | S B → b | ε S → ASA | aB | AS | SA | S | a A → B | S B → b Eliminate ε Productions

49 49 Create two version of the production, one with the nullable variable and one without it Eliminate productions with ε bodies S → ASA | aB A → B | S B → b | ε S → ASA | aB | AS | SA | S | a A → B | S B → b Eliminate ε Productions

50 50 Create two version of the production, one with the nullable variable and one without it Eliminate productions with ε bodies S → ASA | aB A → B | S B → b | ε S → ASA | aB | AS | SA | S | a A → B | S B → b Eliminate ε Productions

51 51 Eliminate Useless Symbols 1 Eliminate ε productions 2 Eliminate unit productions 3 There are three preliminary simplifications

52 52 Eliminate unit productions A unit production is one of the form A → B where both A and B are variables A B * A → B, B → ω, then A → ω Identify unit pairs

53 53 Example: I → a | b | Ia | Ib | I0 | I1 F → I | (E) T → F | T * F E → T | E + T PairsProductions ( E, E )E → E + T ( E, T )E → T * F ( E, F )E → (E) ( E, I )E → a | b | Ia | Ib | I0 | I1 ( T, T )T → T * F ( T, F )T → (E) ( T, I )T → a | b | Ia |Ib | I0 | I1 ( F, F )F → (E) ( F, I )F → a | b | Ia | Ib | I0 | I1 ( I, I )I → a | b | Ia | Ib | I0 | I1 Basis: (A, A) is a unit pair of any variable A, if A A by 0 steps. * T = {*, +, (, ), a, b, 0, 1}

54 54 Example: PairsProductions …… ( T, T )T → T * F ( T, F )T → (E) ( T, I )T → a | b | Ia |Ib | I0 | I1 …… I → a | b | Ia | Ib | I0 | I1 E → E + T | T * F | (E ) | a | b | la | lb | l0 | l1 T → T * F | (E) | a | b | Ia | Ib | I0 | I1 F → (E) | a | b | Ia | Ib | I0 | I1

55 55 Chomsky Normal Form (CNF) 1.Arrange that all bodies of length 2 or more to consists only of variables. 2.Break bodies of length 3 or more into a cascade of productions, each with a body consisting of two variables. Starting with a CFL grammar with the preliminary simplifications performed

56 56 Step 1: For every terminal α that appears in a body of length 2 or more create a new variable that has only one production. E → E + T | T * F | (E ) | a | b | la | lb | l0 | l1 T → T * F | (E) | a | b | Ia | Ib | I0 | I1 F → (E) | a | b | Ia | Ib | I0 | I1 I → a | b | Ia | Ib | I0 | I1 E → EPT | TMF | LER | a | b | lA | lB | lZ | lO T → TMF | LER | a | b | IA | IB | IZ | IO F → LER | a | b | IA | IB | IZ | IO I → a | b | IA | IB | IZ | IO A → aB → bZ → 0O → 1 P → +M → *L → (R → )

57 57 Step 2: Break bodies of length 3 or more adding more variables E → EPT | TMF | LER | a | b | lA | lB | lZ | lO T → TMF | LER | a | b | IA | IB | IZ | IO F → LER | a | b | IA | IB | IZ | IO I → a | b | IA | IB | IZ | IO A → aB → bZ → 0O → 1 P → +M → *L → (R → ) C 1 → PT C 2 → MF C 3 → ER

58 58 Normal Forms Chomsky Normal Forms BNF Normal Forms

59 59 BNF BNF stands for either Backus-Naur Form or Backus Normal Form BNF is used to describe the grammar of a programming language BNF is formal and precise –BNF is a notation for context-free grammars BNF is essential in compiler construction

60 60 BNF indicate a nonterminal that needs to be further expanded, e.g. Symbols not enclosed in are terminals; they represent themselves, e.g. if, while, ( The symbol ::= means is defined as The symbol | means or; it separates alternatives, e.g. ::= + | - This is all there is to “plain” BNF; but we will discuss extended BNF (EBNF) later in this lecture

61 61 BNF uses recursion ::= | or ::= | Recursion is all that is needed (at least, in a formal sense) "Extended BNF" allows repetition as well as recursion Repetition is usually better when using BNF to construct a compiler

62 62 BNF Examples I ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ::= if ( ) | if ( ) else

63 63 BNF Examples II ::= | ::= | + | -

64 64 BNF Examples III ::= | | ::= { } ::= |

65 65 BNF Examples IV ::= | |...

66 66 Extended BNF The following are pretty standard: –[ ] enclose an optional part of the rule Example: ::= if ( ) [ else ] –{ } mean the enclosed can be repeated any number of times (including zero) Example: ::= ( ) | ( {, } )

67 67 Variations The preceding notation is the original and most common notation –BNF was designed before we had boldface, color, more than one font, etc. –A typical modern variation might: – Use boldface to indicate multi-character terminals –Quote single-character terminals (because boldface isn’t so obvious in this case) Example: –if_statement ::= if "(" condition ")" statement [ else statement ]

68 68 Limitations of BNF No easy way to impose length limitations, such as maximum length of variable names No easy way to describe ranges, such as 1 to 31 No way at all to impose distributed requirements, such as, a variable must be declared before it is used Describes only syntax, not semantics

69 69 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm

70 70 The CYK Algorithm The membership problem: –Problem: Given a context-free grammar G and a string w –G = (V, ∑,P, S) where »V finite set of variables »∑ (the alphabet) finite set of terminal symbols »P finite set of rules »S start symbol (distinguished element of V) »V and ∑ are assumed to be disjoint –G is used to generate the string of a language –Question: Is w in L(G)?

71 71 The CYK Algorithm J. Cocke D. Younger, T. Kasami –Independently developed an algorithm to answer this question.

72 72 The CYK Algorithm Basics –The Structure of the rules in a Chomsky Normal Form grammar –Uses a “dynamic programming” or “table- filling algorithm”

73 73 Chomsky Normal Form Normal Form is described by a set of conditions that each rule in the grammar must satisfy Context-free grammar is in CNF if each rule has one of the following forms: –A  BCat most 2 symbols on right side –A  a, orterminal symbol –S  εnull string where B, C Є V – {S}

74 74 Construct a Triangular Table Each row corresponds to one length of substrings –Bottom Row – Strings of length 1 –Second from Bottom Row – Strings of length 2. –Top Row – string ‘w’

75 75 Construct a Triangular Table X i, i is the set of variables A such that A  w i is a production of G Compare at most n pairs of previously computed sets: (X i, i, X i+1, j ), (X i, i+1, X i+2, j ) … (X i, j-1, X j, j )

76 76 Construct a Triangular Table X 1, 5 X 1, 4 X 2, 5 X 1, 3 X 2, 4 X 3, 5 X 1, 2 X 2, 3 X 3, 4 X 4, 5 X 1, 1 X 2, 2 X 3, 3 X 4, 4 X 5, 5 w1w1 w2w2 w3w3 w4w4 w5w5 Table for string ‘w’ that has length 5

77 77 X 1, 5 X 1, 4 X 2, 5 X 1, 3 X 2, 4 X 3, 5 X 1, 2 X 2, 3 X 3, 4 X 4, 5 X 1, 1 X 2, 2 X 3, 3 X 4, 4 X 5, 5 w1w1 w2w2 w3w3 w4w4 w5w5 Construct a Triangular Table Looking for pairs to compare

78 78 Example CYK Algorithm Show the CYK Algorithm with the following example: –CNF grammar G S  AB | BC A  BA | a B  CC | b C  AB | a –w is baaba –Question Is baaba in L(G)?

79 79 Constructing The Triangular Table {B}{A, C} {B}{A, C} baaba Calculating the Bottom ROW S  AB | BC A  BA | a B  CC | b C  AB | a

80 80 Constructing The Triangular Table X 1, 2 = (X i, i,X i+1, j ) = (X 1, 1, X 2, 2 )  {B}{A,C} = {BA, BC} Steps: –Look for production rules to generate BA or BC –There are two: S and A –X 1, 2 = {S, A} S  AB | BC A  BA | a B  CC | b C  AB | a

81 81 Constructing The Triangular Table {S, A} {B}{A, C} {B}{A, C} baaba

82 82 Constructing The Triangular Table X 2, 3 = (X i, i,X i+1, j ) = (X 2, 2, X 3, 3 )  {A, C}{A,C} = {AA, AC, CA, CC} = Y Steps: –Look for production rules to generate Y –There is one: B –X 2, 3 = {B} S  AB | BC A  BA | a B  CC | b C  AB | a

83 83 Constructing The Triangular Table {S, A}{B} {A, C} {B}{A, C} baaba

84 84 Constructing The Triangular Table X 3, 4 = (X i, i,X i+1, j ) = (X 3, 3, X 4, 4 )  {A, C}{B} = {AB, CB} = Y Steps: –Look for production rules to generate Y –There are two: S and C –X 3, 4 = {S, C} S  AB | BC A  BA | a B  CC | b C  AB | a

85 85 Constructing The Triangular Table {S, A}{B}{S, C} {B}{A, C} {B}{A, C} baaba

86 86 Constructing The Triangular Table X 4, 5 = (X i, i,X i+1, j ) = (X 4, 4, X 5, 5 )  {B}{A, C} = {BA, BC} = Y Steps: –Look for production rules to generate Y –There are two: S and A –X 4, 5 = {S, A} S  AB | BC A  BA | a B  CC | b C  AB | a

87 87 Constructing The Triangular Table {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba

88 88 Constructing The Triangular Table X 1, 3 = (X i, i,X i+1, j ) (X i, i+1,X i+2, j ) = (X 1, 1, X 2, 3 ), (X 1, 2, X 3, 3 )  {B}{B} U {S, A}{A, C}= {BB, SA, SC, AA, AC} = Y Steps: –Look for production rules to generate Y –There are NONE: S and A –X 1, 3 = Ø –no elements in this set (empty set) S  AB | BC A  BA | a B  CC | b C  AB | a

89 89 Constructing The Triangular Table Ø {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba

90 90 Constructing The Triangular Table X 2, 4 = (X i, i,X i+1, j ) (X i, i+1,X i+2, j ) = (X 2, 2, X 3, 4 ), (X 2, 3, X 4, 4 )  {A, C}{S, C} U {B}{B}= {AS, AC, CS, CC, BB} = Y Steps: –Look for production rules to generate Y –There is one: B –X 2, 4 = {B} S  AB | BC A  BA | a B  CC | b C  AB | a

91 91 Constructing The Triangular Table Ø{B} {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba

92 92 Constructing The Triangular Table X 3, 5 = (X i, i,X i+1, j ) (X i, i+1,X i+2, j ) = (X 3, 3, X 4, 5 ), (X 3, 4, X 5, 5 )  {A,C}{S,A} U {S,C}{A,C} = {AS, AA, CS, CA, SA, SC, CA, CC} = Y Steps: –Look for production rules to generate Y –There is one: B –X 3, 5 = {B} S  AB | BC A  BA | a B  CC | b C  AB | a

93 93 Constructing The Triangular Table Ø{B} {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba

94 94 Final Triangular Table {S, A, C}  X 1, 5 Ø{S, A, C} Ø{B} {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba - Table for string ‘w’ that has length 5 - The algorithm populates the triangular table

95 95 Example (Result) Is baaba in L(G)? Yes We can see the S in the set X 1n where ‘n’ = 5 We can see the table the cell X 15 = (S, A, C) then if S Є X 15 then baaba Є L(G)

96 96 Theorem The CYK Algorithm correctly computes X i j for all i and j; thus w is in L(G) if and only if S is in X 1n. The running time of the algorithm is O(n 3 ).

97 97 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm


Download ppt "1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016."

Similar presentations


Ads by Google