Presentation is loading. Please wait.

Presentation is loading. Please wait.

Context Free Grammars 1. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”

Similar presentations


Presentation on theme: "Context Free Grammars 1. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”"— Presentation transcript:

1 Context Free Grammars 1

2 Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger” than that of regular languages –One of these classes are called “Context Free” languages Described by Context-Free Grammars (CFG) –Why named context-free? –Property that we can substitute strings for variables regardless of context (implies context sensitive languages exist) CFG’s are useful in many applications –Describing syntax of programming languages –Parsing –Structure of documents, e.g.XML Analogy of the day: –DFA:Regular Expression as Pushdown Automata : CFG 2

3 3 Regular Languages Context-Free Languages

4 4 Example: CFG for { 0 n 1 n | n > 1} Productions: S -> 01 S -> 0S1 Basis: 01 is in the language. Induction: if w is in the language, then so is 0w1.

5 CFG Example Language of palindromes –We can easily show using the pumping lemma that the language L = { w | w = w R } is not regular. –However, we can describe this language by the following context-free grammar over the alphabet {0,1}: P   P  0 P  1 P  0P0 P  1P1 Inductive definition More compactly: P   | 0 | 1 | 0P0 | 1P1 5

6 6 Grammars Grammars express languages Example: the English language grammar

7 Fall 2006Costas Busch - RPI7

8 Fall 2006Costas Busch - RPI8 Derivation of string “the dog walks”:

9 9 Derivation of string “a cat runs”:

10 10 Language of the grammar: L = { “a cat runs”, “a cat sleeps”, “the cat runs”, “the cat sleeps”, “a dog runs”, “a dog sleeps”, “the dog runs”, “the dog sleeps” }

11 11 Variables Sequence of Terminals (symbols) Productions Sequence of Variables

12 12 CFG Formalism Terminals = symbols of the alphabet of the language being defined. Variables = nonterminals = a finite set of other symbols, each of which represents a language. Start symbol = the variable whose language is the one being defined.

13 13 Productions A production has the form variable (head) -> string of variables and terminals (body). Convention: –A, B, C,… and also S are variables. –a, b, c,… are terminals. –…, X, Y, Z are either terminals or variables. –…, w, x, y, z are strings of terminals only. – , , ,… are strings of terminals and/or variables.

14 CFG Notation A CFG G may then be represented by these four components, denoted G=(V,T,R,S) –V is the set of variables –T is the set of terminals –R is the set of production rules –S is the start symbol. 14

15 15 Derivations – Intuition We derive strings in the language of a CFG by starting with the start symbol, and repeatedly replacing some variable A by the body of one of its productions. –That is, the “productions for A” are those that have head A.

16 16 Derivations – Formalism We say  A  =>  if A ->  is a production. Example: S -> 01; S -> 0S1. S => 0S1 => 00S11 => 000111.

17 17 Iterated Derivation =>* means “zero or more derivation steps.” Basis:  =>*  for any string . Induction: if  =>*  and  => , then  =>* .

18 18 Example: Iterated Derivation S -> 01; S -> 0S1. S => 0S1 => 00S11 => 000111. Thus S =>* S; S =>* 0S1; S =>* 00S11; S =>* 000111.

19 Sample CFG 1.E  I// Expression is an identifier 2.E  E+E// Add two expressions 3.E  E*E// Multiply two expressions 4.E  (E)// Add parenthesis 5.I  L// Identifier is a Letter 6.I  ID// Identifier + Digit 7.I  IL// Identifier + Letter 8.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9// Digits 9.L  a | b | c | … A | B | … Z// Letters Note Identifiers are regular; could describe as (letter)(letter + digit)* 19

20 Derivation Expand start symbol first and work way down in such a way that it matches the input string For example, given a*(a+b1) we can derive this by: –E  E*E  I*E  L*E  a*E  a*(E)  a*(E+E)  a*(I+E)  a*(L+E)  a*(a+E)  a*(a+I)  a*(a+ID)  a*(a+LD)  a*(a+bD)  a*(a+b1) Note that at each step of the productions we could have chosen any one of the variables to replace with a more specific rule. 20

21 Multiple Derivation We saw an example of  in deriving a*(a+b1) We could have used  * to condense the derivation. –E.g. we could just go straight to E  * E*(E+E) or even straight to the final step E  * a*(a+b1) Going straight to the end is not recommended on a homework or exam problem if you are supposed to show the derivation 21

22 22 Leftmost and Rightmost Derivations Derivations allow us to replace any of the variables in a string. –Leads to many different derivations of the same string. By forcing the leftmost variable (or alternatively, the rightmost variable) to be replaced, we avoid these “distinctions without a difference.”

23 Leftmost Derivation In the previous example we used a derivation called a leftmost derivation. We can specifically denote a leftmost derivation using the subscript “lm”, as in:  lm or  * lm A leftmost derivation is simply one in which we replace the leftmost variable in a production body by one of its production bodies first, and then work our way from left to right. 23

24 24 Example: Leftmost Derivations Balanced-parentheses grammar: S -> SS | (S) | () S => lm SS => lm (S)S => lm (())S => lm (())() Thus, S =>* lm (())() S => SS => S() => (S)() => (())() is a derivation, but not a leftmost derivation.

25 Rightmost Derivation Not surprisingly, we also have a rightmost derivation which we can specifically denote via:  rm or  * rm A rightmost derivation is one in which we replace the rightmost variable by one of its production bodies first, and then work our way from right to left. 25

26 Rightmost Derivation Example a*(a+b1) was already shown previously using a leftmost derivation. We can also come up with a rightmost derivation, but we must make replacements in different order: –E  rm E*E  rm E * (E)  rm E*(E+E)  rm E*(E+I)  rm E*(E+ID)  rm E*(E+I1)  rm E*(E+L1)  rm E*(E+b1)  rm E*(I+b1)  rm E*(L+b1)  rm E*(a+b1)  rm I*(a+b1)  rm L*(a+b1)  rm a*(a+b1) 26

27 27 Example: Rightmost Derivations Balanced-parentheses grammar: S -> SS | (S) | () S => rm SS => rm S() => rm (S)() => rm (())() Thus, S =>* rm (())() S => SS => SSS => S()S => ()()S => ()()() is neither a rightmost nor a leftmost derivation.

28 Left or Right? Does it matter which method you use? Answer: No Any derivation has an equivalent leftmost and rightmost derivation. That is, A  * . iff A  * lm  and A  * rm . 28

29 29 Sentential Forms Any string of variables and/or terminals derived from the start symbol is called a sentential form. Formally,  is a sentential form iff S =>* .

30 30 Language of a Grammar If G is a CFG, then L(G), the language of G, is {w | S =>* w}. Example: G has productions S -> ε and S -> 0S1. L(G) = {0 n 1 n | n > 0}.

31 31 Context-Free Languages A language that is defined by some CFG is called a context-free language. There are CFL’s that are not regular languages, such as the example just given. But not all languages are CFL’s. Intuitively: CFL’s can count two things, not three.

32 CFG Exercise {a i b j c k | i ≠ j or j ≠ k} 32


Download ppt "Context Free Grammars 1. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”"

Similar presentations


Ads by Google