Presentation is loading. Please wait.

Presentation is loading. Please wait.

Theory of Computation Automata Theory Dr. Ayman Srour.

Similar presentations


Presentation on theme: "Theory of Computation Automata Theory Dr. Ayman Srour."— Presentation transcript:

1 Theory of Computation Automata Theory Dr. Ayman Srour

2 TOPIC 3: Context free grammar OUTLINE 3.1 Introduction 3.2 Context free grammars (CFGs) 3.3 Formal Definition of Context free grammars 3.4 Ambiguity 3.5 Tips for Designing Grammars 3.6 CFG Simplification 3.7 Chomsky Normal Form

3 3.1 Introduction Context-free grammars is a more powerful method of describing languages. Such grammars can describe certain features that have a recursive structure, which makes them useful in a variety of applications. Context-free grammars were first used in the study of human languages. One way of understanding the relationship of terms such as noun, verb, and preposition and their respective phrases leads to a natural recursion because noun phrases may appear inside verb phrases and vice versa. Context-free grammars can capture important aspects of these relationships.

4 3.1 Introduction An important application of context-free grammars occurs in the specification and compilation of programming languages. A grammar for a programming language often appears as a reference for people trying to learn the language syntax. Designers of compilers and interpreters for programming languages often start by obtaining a grammar for the language. Most compilers and interpreters contain a component called a parser that extracts the meaning of a program prior to generating the compiled code or performing the interpreted execution.

5 3.1 Introduction A number of methodologies facilitate the construction of a parser once a context-free grammar is available. Some tools even automatically generate the parser from the grammar. The collection of languages associated with context- free grammars are called the context-free languages. They include all the regular languages and many additional languages.

6 3.2 Context-free Grammars The following is an example of a context-free grammar, which we call G 1. A grammar consists of a collection of substitution rules, also called productions. Each rule appears as a line in the grammar, comprising a symbol and a string separated by an arrow. The symbol is called a variable. The string consists of variables and other symbols called terminals.

7 3.2 Context-free Grammars The variable symbols often are represented by capital letters. The terminals are similar to the input alphabet and often are represented by lowercase letters, numbers, or special symbols. One variable is designated as the start variable. It usually occurs on the left-hand side of the topmost rule. For example, grammar G 1 contains three rules. G1's variables are A and B, where A is the start variable. Its terminals are 0, 1, and #.

8 3.2 Context-free Grammars You use a grammar to describe a language by generating each string of that language in the following manner. 1. Write down the start variable. It is the variable on the left-hand side of the top rule, unless specified otherwise. 2. Find a variable that is written down and a rule that starts with that variable. Replace the written down variable with the right-hand side of that rule. 3. Repeat step 2 until no variables remain.

9 3.2 Context-free Grammars For example, grammar G 1 generates the string 000#111. The sequence of substitutions to obtain a string is called a derivation. A derivation of string 000#111 in grammar G1 is

10 3.2 Context-free Grammars You may also represent the same information pictorially with a parse tree. An example of a parse tree is shown in Figure 2.1

11 3.2 Context-free Grammars All strings generated in this way constitute the language of the grammar. We write L(G 1 ) for the language of grammar G 1. Some experimentation with the grammar G 1 shows us that L( G 1 ) is {0n#1n|n > 0}. Any language that can be generated by some context-free grammar is called a context-free language (CFL).

12 3.2 Context-free Grammars For convenience when presenting a context-free grammar, we abbreviate several rules with the same left-hand variable, such as into a single line using the symbol" | " as an "or."

13 3.2 Context-free Grammars The following is a second example of a context-free grammar, called G 2, which describes a fragment of the English language.

14 3.2 Context-free Grammars Grammar G 2 has 10 variables (the capitalized grammatical terms written inside brackets); 27 terminals (the standard English alphabet plus a space character); and 18 rules. Strings in L( G 2 ) include a boy sees the boy sees a flower Each of these strings has a derivation in grammar G 2. The following is a derivation of the first string

15 3.2 Context-free Grammars

16 3.2 Context-free Grammars Example This grammar generates strings such as abab, aaabbb, and aababb.

17 3.2 Context-free Grammars Example The two strings a+axa and (a+a)xa can be generated with grammar G 4. The parse trees are shown in the following slide.

18 3.2 Context-free Grammars

19 3.3 Formal Definition Of A Context-free Grammar

20 3.4 Ambiguity Sometimes a grammar can generate the same string in several different ways. A grammar is ambiguous if a string may have multiple leftmost (or rightmost) derivations or have multiple parse trees. If a string have several different parse trees thens the same string may have several different meanings.

21 3.4 Ambiguity Definition : “A string w is derived ambiguously in a context-free grammar G if it has two or more distinct leftmost derivations. A grammar G is said to be ambiguous if there exists at least one string w that can be ambiguously derived.”

22 3.4 Ambiguity Example 1: Grammar: S -> SS | () | (S) String: ()()() 2 distinct leftmost derivations (and parse trees) S -> SS -> SSS ->()SS ->()()S ->()()() S -> SS -> ()S ->()SS ->()()S ->()()()

23 3.4 Ambiguity Dealing With programming language: out goal is to describe is programming languages with CFGs There are three main thing we should consider: Associativity (a-b)-c vs. a-(b-c) Precedence. (a-b)*c vs. a-(b*c) control flow if (if else) vs. if (if) else We had the following example which describes limited arithmetic expressions E -> a | b | c | E+E | E-E | E*E | (E) What’s wrong with using this grammar? It’s ambiguous!

24 3.4 Ambiguity Example a-b-c:

25 3.4 Ambiguity Example a-b*c:

26 3.4 Ambiguity Another Example: If-Then-Else: | |... if ( ) | if ( ) else (Note ’s are used to denote nonterminals) Consider the following program fragment: if (x > y) if (x < z) a = 1; else a = 2; (Note: Ignore newlines)

27 3.4 Ambiguity Parse Tree #1: Else belongs to inner if

28 3.4 Ambiguity Parse Tree #2: Else belongs to outer if

29 3.4 Disambiguate Ambiguity is bad Syntax is correct But semantics differ depending on choice Associativity (a-b)-c vs. a-(b-c) Precedence. (a-b)*c vs. a-(b*c) control flow if (if else) vs. if (if) else So we need to rewrite grammars to fixed the amiguty

30 3.4 Fixing the Expression Grammar Require right operand to not be bare expression E-> E+T | E-T | E*T | T T -> a | b | c | (E) Corresponds to left-associativity Now only one parse tree for a-b-c Find derivation

31 3.4 Fixing the Expression Grammar What if We Wanted Right- Associativity? Left-recursive productions Used for left-associative operators Example E-> E+T | E-T | E*T | T T -> a | b | c | (E) Right-recursive productions Used for right-associative operators Example E -< T+E | T-E | T*E | T T ->a | b | c | (E)

32 3.4 A Different Problem How about the string a+b*c ? E -> E+T | E-T | E*T | T T -> a | b | c | (E) Doesn’t have correct precedence for * When a nonterminal has productions for several operators, they effectively have the same precedence

33 3.4 Final Expression Grammar E -> E+T | E-T | T lowest precedence operators T -> T*P | P higher precedence P -> a | b | c | (E) highest precedence (parentheses) Practice Construct tree and left and and right derivations for a+b*c a*(b+c) a*b+c a-b-c See what happens if you change the last set of productions to P -> a | b | c | E | (E) See what happens if you change the first set of productions to E -> E +T | E-T | T | P

34 3.5 Tips for Designing Grammars The following techniques are helpful, singly or in combination, when you're faced with the problem of constructing a CFG. First, many CFLs are the union of simpler CFLs. If you must construct a CFG for a CFL that you can break into simpler pieces, do so and then construct individual grammars for each piece. These individual grammars can be easily merged into a grammar for the original language by combining their rules and then adding the new rule where the variables Si are the start variables for the individual grammars.

35 3.5 Tips for Designing Grammars For example, to get a grammar for the language first construct the grammar for the language and the grammar for the language

36 3.5 Tips for Designing Grammars and then add the rule to give the grammar.

37 3.5 Tips for Designing Grammars Second, constructing a CFG for a language that happens to be regular is easy if you can first construct a DFA for that language. You can convert any DFA into an equivalent CFG as follows. Make a variable Ri for each state qi of the DFA. Add the rule to the CFG if is a transition in the DFA. Add the rule if qi is an accept state of the DFA.

38 3.5 Tips for Designing Grammars Make R0 the start variable of the grammar, where q0 is the start state of the machine. Verify on your own that the resulting CFG generates the same language that the DFA recognizes.

39 3.5 Tips for Designing Grammars Third, certain context-free languages contain strings with two substrings that are "linked" in the sense that a machine for such a language would need to remember an unbounded amount of information about one of the substrings to verify that it corresponds properly to the other substring. This situation occurs in the language because a machine would need to remember the number of 0s in order to verify that it equals the number of 1s. You can construct a CFG to handle this situation by using a rule of the form which generates strings wherein the portion containing the u's corresponds to the portion containing the v's.

40 3.5 Tips for Designing Grammars Use recursive productions to generate an arbitrary number of symbols A -> xA | e // Zero or more x’s A ->yA | y // One or more y’s Use separate non-terminals to generate disjoint parts of a language, and then combine in a production

41 3.5 Tips for Designing Grammars To generate languages with matching, balanced, or related numbers of symbols, write productions which generate strings from the middle

42 3.6 context-Free Grammar Simplification In a CFG, it may happen that all the production rules and symbols are not needed for the derivation of strings. Besides, there may be some null productions and unit productions. Elimination of these productions and symbols is called simplification of CFGs. Simplification essentially comprises of the following steps: Reduction of CFG. Removal of Unit Productions. Removal of Null Productions.

43 3.6 context-Free Grammar Simplification  Reduction of CFG: CFGs are reduced in two phases: Phase 1: Derivation of an equivalent grammar, G’, from the CFG, G, such that each variable derives some terminal string. Derivation Procedure : Step 1; Include all symbols, W1, that derive some terminal and initialize i=1. Step 2 : Include all symbols, Wi+1, that derive Wi. Step 3 : Increment i and repeat Step 2, until Wi+1 = Wi. Step 4: Include all production rules that have Wi in it.

44 3.6 context-Free Grammar Simplification  Reduction of CFG: CFGs are reduced in two phases: Phase 2: Derivation of an equivalent grammar, G”, from the CFG, G’, such that each symbol appears in a sentential form. Derivation Procedure: Step 1 : Include the start symbol in Y1 and initialize i = 1. Step 2 : Include all symbols, Yi+1, that can be derived from Yi and include all production rules that have been applied. Step 3 : Increment i and repeat Step 2, until Yi+1 = Yi.

45 3.6 context-Free Grammar Simplification  Reduction of CFG Example : Find a reduced grammar equivalent to the grammar G, having production rules, P: S → AC | B, A → a, C → c | BC, E → aA | e. Solution: Phase 1 : 1.T = { a, c, e }. W 1 = { A, C, E } from rules A → a, C → c and E → aA]e 2.W 2 = { A, C, E } ∪ { S } from rule S → AC 3.W 3 = { A, C, E, S } ∪ ∅ 4.Since W 2 = W 3, we can derive G’ as :G’ = { { A, C, E, S }, { a, c, e }, P, {S}} where P: S → AC, A → a, C → c, E → aA | e

46 3.6 context-Free Grammar Simplification  Reduction of CFG Example : Find a reduced grammar equivalent to the grammar G, having production rules, P: S → AC | B, A → a, C → c | BC, E → aA | e. Solution cont.: Phase 2 : 1.Y1 = { S } 2.Y2 = { S, A, C } from rule S → AC 3.Y3 = { S, A, C, a, c } from rules A → a and C → c 4.Y4 = { S, A, C, a, c } 5.Since Y3 = Y4, we can derive G” as G” = { { A, C, S }, { a, c }, P, {S}} where P: S → AC, A → a, C → c

47 3.6 context-Free Grammar Simplification  Removal of Unit Productions: Any production rule in the form A → B where A, B ∈ Non-terminal is called unit production. Removal Procedure cont.: Step 1: To remove A → B, add production A → x to the grammar rule whenever B → x occurs in the grammar. [x ∈ Terminal, x can be epsilon] Step 2: Delete A → B from the grammar. Step 3: Repeat from step 1 until all unit productions are removed.

48 3.6 context-Free Grammar Simplification  Removal of Unit Productions: Any production rule in the form A → B where A, B ∈ Non-terminal is called unit production. Example: Remove unit production from the following S → XY, X → a, Y → Z | b, Z → M, M → N, N → a Solution cont.: There are 3 unit productions in the grammar Y → Z, Z → M, and M → N At first, we will remove M → N. As N → a, we add M → a, and M → N is removed. The production set becomes S → XY, X → a, Y → Z | b, Z → M, M → a, N → a

49 3.6 context-Free Grammar Simplification  Removal of Unit Productions: Any production rule in the form A → B where A, B ∈ Non- terminal is called unit production. Example: Remove unit production from the following S → XY, X → a, Y → Z | b, Z → M, M → N, N → a Solution cont.: Now we will remove Z → M.. As M → a, we add Z → a, and Z → M is removed. The production set becomes S → XY, X → a, Y → Z | b, Z → a, M → a, N → a Now Z, M, and N are unreachable, hence we can remove those. The final CFG is unit production free: S → XY, X → a, Y → a | b

50 3.6 context-Free Grammar Simplification  Removal of Epsilon/Nullable Productions: In a CFG, a non-terminal symbol ‘A’ is a nullable variable if there is a production A → ε or there is a derivation that starts at A and finally ends up with ε: A →.......… → ε Removal Procedure: Step 1: Find out nullable non-terminal variables which derive ε. Step 2: For each production A → a, construct all productions A → x where x is obtained from ‘a’ by removing one or multiple non-terminals from Step 1. Step 3: Combine the original productions with the result of step 2 and remove ε-productions.

51 3.6 context-Free Grammar Simplification  Removal of Epsilon/Nullable Productions: In a CFG, a non-terminal symbol ‘A’ is a nullable variable if there is a production A → ε or there is a derivation that starts at A and finally ends up with ε: A →.......… → ε Example: Remove null production from the following S → ASA | aB | b, A → B, B → b | ε Solution : At first, we will remove B → ε. After removing B → ε, the production set becomes S → ASA | aB | b | a, A → B| b | ε, B → b Now we will remove A → ε. fter removing A → ε, the production set becomes : S → ASA | aB | b | a | SA | AS | S, A → B| b, B → b

52 3.7 Chomsky Normal Form A CFG is in Chomsky Normal Form if the Productions are in the following forms: A-> a A->BC S->ε where A, B, and C are non-terminals and a is terminal.

53 3.7 Chomsky Normal Form Algorithm to Convert into Chomsky Normal Form: Step 1: If the start symbol S occurs on some right side, create a new start symbol S’ and a new production S’ → S. Step 2: Remove Null productions. (Using the Null production removal algorithm discussed earlier). Step 3: Remove unit productions. (Using the Unit production removal algorithm discussed earlier). Step 4: Replace each production A → B1…Bn where n > 2 with A → B1C where C → B2 …Bn. Repeat this step for all productions having two or more symbols in the right side. Step 5:If the right side of any production is in the form A → aB where a is a terminal and A, B are non-terminal, then the production is replaced by A → XB and X → a. Repeat this step for every production which is in the form A → aB.

54 3.7 Chomsky Normal Form Example: Convert the following CFG into CNF S → ASA | aB, A → B | S, B → b | ∈ Ssolution: Step1: Since S appears in R.H.S, we add a new state S0 and S0 → S is added to the production set and it becomes S0 → S, S → ASA | aB, A → B | S, B → b | ε Step 2 Now we will remove the null productions B → ε and A → ε After removing B → ε, the production set becomes S0 → S, S → ASA | aB | a, A → B | S | ε, B → b After removing A → ε, the production set becomes S0 → S, S → ASA | aB | a | AS | SA | S, A → B | S, B → b

55 3.7 Chomsky Normal Form Example: Convert the following CFG into CNF S → ASA | aB, A → B | S, B → b | ∈ Ssolution cont.: Step 3: Now we will remove the unit productions. After removing S → S, the production set becomes : S0 → S, S → ASA | aB | a | AS | SA, A → B | S, B → b After removing S0 → S, the production set becomes: S0 → ASA | aB | a | AS | SA, S → ASA | aB | a | AS | SA A → B | S, B → b A → S | b B → b After removing A → S, the production set becomes : S0 → ASA | aB | a | AS | SA, S → ASA | aB | a | AS | SA A → b |ASA | aB | a | AS | SA, B → b

56 3.7 Chomsky Normal Form Example: Convert the following CFG into CNF S → ASA | aB, A → B | S, B → b | ∈ Ssolution cont.: Step 4: Now we will find out more than two variables in the R.H.S Here, S0 → ASA, S → ASA, A → ASA violates two Non-terminals in R.H.S. Hence we will apply step 4 and step 5 to get the following final production set which is in CNF: S0 → AX | aB | a | AS | SA S → AX | aB | a | AS | SA A → b |AX | aB | a | AS | SA B → b X → SA

57 3.7 Chomsky Normal Form Example: Convert the following CFG into CNF S → ASA | aB, A → B | S, B → b | ∈ Ssolution cont.: Step 5: We have to change the productions S0 → aB, S → aB, A → aB And the final production set becomes: S0 → AX | YB | a | AS | SA S → AX | YB | a | AS | SA A → b |AX | YB | a | AS | SA B → b X → SA Y → a

58 Quiz Convert the following CFG into CNF S → aXbX X → aY | bY | ε Y → X | c

59 Quiz Convert the following CFG into CNF S → aXbX X → aY | bY | ε Y → X | c Solution : S → T A | BA | AB | b A → AC | a T → AB B → b C → a


Download ppt "Theory of Computation Automata Theory Dr. Ayman Srour."

Similar presentations


Ads by Google