Finite Automata and Formal Languages

Finite Automata and Formal Languages
Subject Name Finite Automata and Formal Languages Subject Code 10CS56 Prepared By: Swetha G , Pramela Department: ISE & CSE Date : 07 / 10 / 2014

Context Free Grammars and Languages
Chapter 4: Context Free Grammars and Languages

Topics Covered: CFG Definition & Derivations Parse Tree Ambiguity in Grammars and Languages Problems

Context Free Grammars - CFG
Context-free grammars (CFG) deﬁne the context-free languages. They have been developed in the mid-1950s by Noam Chomsky. CFG play an important role in the description and design of programming languages and compilers, for the implementation of parsers, also used in XML , DTD. The grammar can be represented as parse tree, a pictorial representation of the grammar. The CFL ( context free languages) are represented by “ Pushdown automata ” – a automata representation.

Context-Free Grammars
Definition: A context-free grammar is a 4-tuple G = (V,T,P, S) where: V is a finite set of variables / non-terminals: each variable represents a language or set of strings; T is a finite set of symbols or terminals: P is a finite set of rules or productions which recursively define the language. Each production consists of a variable being defined in the production. Productions are represented by the symbol “→”; A string of 0 or more terminals and variables called the body of the production; S is the start variable and represents the language we are defining. other variables define the auxiliary classes needed to define our language.

Example: Palindrome defined over language ∑ = ( 0 , 1 )
Hence the productions on the basis of induction says, basis = 0 or 1 P → є P → 0 P → 1 and induction : n >=1, means, P → 0P0 P → 1P1 The grammar G pal = ( { P } , { 0,1 } , A , P ) Here V = { P } = non terminals T = { 0,1 } = Terminals P = Start state A = Set of Productions defined above.

Example 2 : Expression . Here we deﬁne 2 classes of strings: those representing simple numerical expressions (denoted by E) and those representing Boolean expressions (denoted by B). E → 0 E → 1 E → E + E E → if B then E else E B → True B → False B → E < E B → E == E

Compact Notation We can group all productions defining elements in a certain class to have a more compact notation. Example: Palindromes can be defined as P → | 0 | 1 | 0P0 | 1P1 Example: The expressions can be defined as E → 0 | 1 | E + E | if B then E else E B → True | False | E < E | E == E

Chomsky Hierarchy Noam Chomsky is the funder of Formal Language theory has classified grammar into categories… Type 0 : Phrase structured grammar Type 1 : Context Sensitive grammar Type 2 : Context Free grammar Type 3 : Regular grammar

α belongs to ( V U T )+ and β belongs to ( V U T )*
Type 0 : Phrase structured grammar A grammar G = ( VTPS) is Type 0 / unrestricted / Phrase Structured Grammar if all the productions are of the form α β α belongs to ( V U T )+ and β belongs to ( V U T )* There are no restrictions on the length of the α and β. Α cannot contain “ є ” but β can have it.

α and β belongs to ( V U T )+
Type 1 : Context Sensitive grammar A grammar G = ( VTPS) is Type 1 / context sensitive Grammar if the grammar is Type 0 and all the productions are of the form α β α and β belongs to ( V U T )+ There is restriction on the length of the β i.e., it must be atleast length of α . α and β cannot contain “ є ”. It is “є-free” Grammar.

Type 2 : Context Free grammar
A grammar G = ( VTPS) is Type 2 / context free Grammar all the productions are of the form A →α, where α belongs to ( V U T )* and A is non terminal. “ є ” can appear on right hand side of the production.

Type 3 : Regular grammar A grammar G = ( V,T,P,S) is Type 3 / Regular Grammar if the grammar is right linear or left linear. A Grammar is said to be right linear if all the productions are of the form A →wB or A →w , A Grammar is said to be left linear if all the productions are of the form A →Bw where A , B are terminals and w is string of terminals

Defining language of Grammar:
When CFG productions are applied, it is possible to derive certain strings, and we say these strings are in the language of certain variable. The procedure to derive the string is known as inference . Two types of Inference 1 . Recursive inference 2 . Derivation.

Recursive inference To infer sting in the language defined by CFG, one can use productions from body to head. Ex : Grammar G = ( { E, I } , T, P, E ) where: T = { + , ∗ , ( , ) , a, b,0 ,1 } and P is the set of productions E → I | E + E | E ∗ E | ( E ) I → a | b | Ia | Ib | I0 | I1. Example to Infer String: a(a + b00 )

String Lang Production String used i a I I → a - ii b I → b iii b0 I → I0 iv b00 v E E → I vi vii a+b00 E → E + E v, vi viii (a + b00 ) E → ( E ) ix a ( a + b00 ) v, vii

Language of a Grammar Deﬁnition: Let G = (V,T,R, S) be a CFG. The language of G, denoted L(G) is the set of terminal strings that can be derived from the start variable, that is, L(G) = {w ∈ T∗ | S ⇒∗ w} Theorem :

Examples : 1) Construct context free grammars to accept the language L = {w | w starts and ends with the same symbol} where Σ = {0, 1} Solution : S → 0A0 | 1A1 A → 0A | 1A | є 2) Obtain grammar to generate string consisting of at-least one a Solution: S → a S → aS Or S → a | aS Language generated can be formally written as L = {an : n>=1}

Examples : 3) Obtain grammar to generate strings of a’s and b’s such that the length is multiple of 3. Solution: S → є | AAAS A → a | b. 4) Obtain grammar to generate strings of a’s and b’s such that the string ends with “ab” S → Aab A → є | aA | bA

Derivations Applying productions from head to body are derivations. It requires the definition of a new relational symbol “⇒” At every derivation step we can choose to replace any variable by the right-hand side of one of its productions. Example: To obtain any arithmetic expression E → E + E E → E - E E → E * E E → E / E E → ( E ) E → id Obtain id + id * id from E.

Derivation of a ∗ ( a + b000) E ⇒ E ∗ E ⇒ id ∗ E ⇒ a ∗ E ⇒ a ∗ ( E ) ⇒ a ∗ ( E + E ) ⇒ a ∗ ( id + E ) ⇒ a ∗ ( a + E ) ⇒ a ∗ ( a + id ) ⇒ a ∗ ( a + id 0) ⇒ a ∗ ( a + id 00) ⇒ a ∗ ( a + b 00) Note 1 At each step we might have several rules to choose from, e.g. I ∗ E ⇒ a ∗ E ⇒ a ∗ ( E ) , versus I ∗ E ⇒ I ∗ ( E ) ⇒ a ∗ ( E ). Note 2 Not all choices lead to successful derivations of a particular string, for instance E ⇒ E + E (at the first step) won’t lead to a derivation of a ∗ ( a + b00) Important: Recursive inference and derivation are equivalent. A string of terminals w is inferred to be in the language of some variable A iff A ∗ ⇒ w.

Derivations are of 2 types :
Left most derivation Right most derivation. Leftmost derivation: lm ⇒ and lm ⇒∗. At each step we choose to replace the leftmost variable. Rightmost derivation: rm ⇒ and rm ⇒∗. At each step we choose to replace the rightmost variable. Example : Consider the Grammar for string “(a,a)”. S → L L → (L)|(L,S)|a Leftmost derivation: S →L → (L) →(L,S) → (a,S) →(a,L) → (a,a) Rightmost Derivation: S →L → (L) →(L,S) → (L,L) →(L,a) → (a,a)

Sentential Forms Derivations from the start variable have an special role. Deﬁnition: Let G = (V,T,R, S) be a CFG and let α ∈ (V ∪ T)∗. Then S ⇒∗ α is called a sentential form. We say left sentential form if S Lm ⇒∗ α or right sentential form if S rm ⇒∗ α. Example: A sentential form P ⇒∗ 010P010

Ambiguity: a word or expression that can be understood
in two or more possible ways. The following Grammar produces ambiguity in programming languages with conditionals: C → if b then C else C C → if b then C C → s The expression “if b then if b then s else s” can be interpreted in the following 2 diﬀerent ways: 1 if b then (if b then s else s) 2 if b then (if b then s) else s How should the parser of this language understand the expression?

Example: Consider the following grammar
E → 0 | 1 | E + E | E ∗ E The sentential form E + E ∗ E has the following 2 possible derivations 1. E ⇒ E + E ⇒ E + E ∗ E 2. E ⇒ E ∗ E ⇒ E + E ∗ E there are 2 possible meanings for words like * 0: (1 * 0) = 1 2. (1 + 1) * 0 = 0 Ambiguous Grammars Deﬁnition: A CFG grammar G = (V ,T,R, S) is ambiguous if there is at least a string w ∈ T∗ for which we can ﬁnd two (or more) parse trees, each with root S and yield w. If each string has at most one parse tree we say that the grammar is unambiguous.

Parse Trees Deﬁnition: Let G = (V,T,R, S) be a CFG. The parse trees for G are trees with the following conditions: Nodes are labelled with a variable Leaves are either variables, terminals or If a node is labelled A and it has children labelled X1, X2, , Xn respectively from left to right, then it must exist in R a production of the form A → X1X Xn. Note: If a leaf is є it should be the only child of its parent A and there should be a production A → є . Height of a Parse Tree Deﬁnition: The height of a parse tree is the maximum length of a path from the root of the tree to one of its leaves. Observe: We count the edges in the tree, and not the number of nodes and the leaf in the path.

Parse Trees Parse trees are a way to represent derivations. A parse tree reﬂects the internal structure of a word of the language. Using parse trees it becomes very clear which is the variable that was replaced at each step. In addition, it becomes clear which terminal symbols where generated/derived form a particular variable. Parse trees are very important in compiler theory. Parser takes the source code into its parse tree. This parse tree is the structure of the program.

Example of a Parse Tree :
Derivation of String a * a + b00

Yield of a Parse Tree Deﬁnition: A yield of a parse tree is the string resulted from concatenating all the leaves of the tree from left to right. Note : The yield of a tree is a string derived from the root of the tree; Of particular importance are the yields that consist of only terminals; that is, the leaves are either terminals or є; When, the root is S then we have a parse tree for a string in the language of the grammar; Yields can be used to describe the language of a grammar.

From Inference to Trees:
Let G = ( V,T,P,S ) be CFG. If recursive inference procedure tells us that terminal string w is in the language of variable A, then there is a parse tree with root A and yield w. Proof: We do an induction of the length of the inference. Basis: n = 1, means , atleast we must have used one production A → w . The desired parse tree is then

Induction : N= n+1 “w” is a language inferred after n+1 steps, and the theorem holds for all strings x & variables B such that x is in language of B, was inferred using n or fewer inference steps. Consider string “w” , break w up as w1, w2, ……wk. Let assume we are in the last step of inferring w. This inference uses some production of A, say A → X1,X2,X3…….Xk and Xi belongs to ( V U T ) Case 1 : if Xi is terminal, then Xi = Wi Case 2 : if Xi is non- terminal, then Wi is string that was previously inferred to be in language of Xi , That is., Xi was inferred at most using n steps. By the IH there are parse trees i with root Xi and yield wi. Then the following is a parse tree for Grammar G with root A and yield w:

(Only-if) Let w ∈ L(G), that is P ⇒∗ w.
We prove by induction on the length of the derivation of w that w is a palindrome, that is, w = rev(w). Base case: If the derivation is in one step then we should have P ⇒ epsilon, P ⇒ 0 and P ⇒ 1. In all cases w is a palindrome. Inductive Step: Our IH is that if P ⇒n w ‘ with n > 0 then w ‘ = rev(w ‘ ). Suppose P ⇒n+1 w. The we have 2 cases: P ⇒ 0P0 ⇒n 0w ‘0 = w; P ⇒ 1P1 ⇒n 1w ‘1 = w. Observe that in both cases P ⇒n w` with n > 0. Hence by IH w ‘ is a palindrome and then so is w.

Finite Automata and Formal Languages

Similar presentations

Presentation on theme: "Finite Automata and Formal Languages"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Finite Automata and Formal Languages

Similar presentations

Presentation on theme: "Finite Automata and Formal Languages"— Presentation transcript:

Similar presentations

About project

Feedback