Regular Grammars Formal definition of a regular expression.

Regular Grammars Formal definition of a regular expression.
Languages associated with regular expressions. Introduction regular grammars. Regular language and homomorphism. The Chomsky Hierarchy

Regular Expression The regular expressions over a set I are defined recursively by: the symbol ∅ is a regular expression; the symbol λ is a regular expression; the symbol x is a regular expression whenever x ∈ I ; the symbols (AB), (A ∪ B), and A* are regular expressions whenever A and B are regular expressions. ∅ represents the empty set, that is, the set with no strings; λ represents empty string; x represents the set {x} containing the string with one symbol x; (AB) represents the concatenation of the sets represented by A and by B; (A ∪ B) represents the union of the sets represented by A and by B; A* represents the Kleene closure of the set represented by A.

Example What are the strings in the regular sets specified by the regular expressions 10*, (10)*, 0 ∪ 01, 0(0 ∪ 1)*, and (0*1)*?

Example Find a regular expression that specifies each of these sets: (a) the set of bit strings with even length (b) the set of bit strings ending with a 0 and not containing 11 The set of strings of two bits is specified by the regular expression (00 ∪ 01 ∪ 10 ∪ 11). Consequently, the set of strings with even length is specified by (00 ∪ 01 ∪ 10 ∪ 11)∗ . It must be the concatenation of one or more strings where each string is either a 0 or a 10. It follows that the regular expression (0 ∪ 10)∗ (0 ∪ 10) specifies the set of bit strings that do not contain 11 and end with a 0.

symbol ∅; symbol λ; symbol a whenever a ∈ I ;

Construct a nondeterministic finite-state automaton that recognizes the regular set 1∗ ∪ 01.

Languages associated with regular expression
Definition: The Language L(r) denoted by any regular expression r is defined by the following rules. ∅ is a regular expression denoting the empty set, λ is a regular expression denoting {λ }, For every a ϵ∑, a is a regular expression denoting {a} If r1 and r2 are regular expressions, then L(r1 + r2) = L(r1) U L(r2), L(r1.r2) = L(r1)L(r2), L((r1)) = L(r1), L(r1*) = (L(r1))*

Example: Exhibit the language L(a*.(a + b)) in set notation.
Solution: L(a*.(a + b)) = L(a*)L(a + b) (from L(r1.r2) = L(r1)L(r2)) = (L(a))*(L(a)U(L(b)) (from L(r1*)) = (L(r1))*) = (L(a))*(L(a)U(L(b)) (from L(r1+r2)=L(r1) U L(r2)) But (L(a))*={ , a, aa, aaa, …..} L(a) ={a} and L(b) ={b} L(a) U L(b) ={a,b} L(a*.(a + b)) = { , a, aa, aaa, …..}{a,b} = {a, b, aa, ab, aaa, aab,……}.

Example: For ∑ = {a, b} , the expression r= (a + b)
Example: For ∑ = {a, b} , the expression r= (a + b) * (a + bb) is a regular expression. Write its language. Solution: (we can prove easily r is regular expression) r= (a + b) * (a + bb) L(r) = L((a + b) * (a + bb)) = L((a + b) *) L((a+bb)) = (L(a+b))* (L(a) U L(bb)) = (L(a) U L(b))* (L(a) U L(bb)) =((L(a))* U (L(b))*) (L(a) U L(bb)) But (L(a))*={a}*= { , a, aa, aaa, …..} (L(b))*={b}*= { , b, bb, bbb, …..} L(a) U L(bb) ={a, bb} So, L((a+b)*(a + bb))={ , a, aa, aaa….., b, bb, bbb,……}{a, bb} = {a, bb, aa, abb, …… ba, bbb, ……….}, In other words L(r) is the set of all strings on {a, b}, terminated by either a or bb.

Example: write the language for the following expression;
r= (aa)*(bb)*b Solution: L(r) = L((aa)*(bb)*b) = L((aa)*) L((bb)*) L(b) = (L(aa))* (L(bb))* L(b) = {aa}*{bb}*{b} = { , aa, aaaa, aaaaaa, ..} { , bb, bbbb, bbbbbb, ...} {b} = {a2n: n ≥ 0} {b2m: m ≥ 0} {b} = {a2nb2m+1; n ≥ 0, m ≥ 0}

Regular Grammars Regular Grammars are two types as follows:
1) Right-Linear Grammar: A grammar G = (V, T, S, P) is said to be right-linear if all productions are of the form; A  xB, A  x, Where A, B ϵ V, and x ϵ T * 2) Left-Linear Grammar: A grammar G = (V, T, S, P) is said to be Left-linear if all productions are of the form; A  Bx, A  x, Regular languages as languages generated by FSA V: finite set of non-terminals (upper case) T: finite set of terminals (lower case) S: Start symbol P: finite set of rewriting rules of the form A-> xB or A-> x, where A and B stand for non-terminals and x stands for a terminal

Example : The grammar G1= ({S}, {a, b}, S, P1), with P1 given as S abS|a, It is right-linear. 2) The Grammar G2 =({S,S1,S2}, {a, b}, S, P2) with productions S S1ab, S1S1ab|S2, S2  a, It is left-linear. Both G1 and G2 are regular grammars. Example: Write the regular expression generated by these; 1) S abS ababS ababa  r= (ab)*a 2) SS1ab  S1abab  S2abab aabab  r= a(ab)* Example: The grammar G= ({S, A, B},{a, b}, S, P), with production SA, AaB|λ, BAb. Is it a regular language? Solution: It is not a regular language because it is neither right-liner not left-linear.

Homomorphism: Suppose ∑ and T are alphabets. Then a function f : ∑  T
Homomorphism: Suppose ∑ and T are alphabets. Then a function f : ∑  T* is called a homomorphism. In words, a homomorphism is a substitution in which a single letter is replaced with a string. The domain of the function h is extended to strings in an obvious fashion if w= a1a2a3…an. Then h(w)=h(a1)h(a2)h(a3)……h(an). Remark: if L is a language on ∑, then its homomorphism image is defined as h(L) = {h(w): w ϵ L}.

Example: let ∑ = {a, b} and T= {a, b, c} and define h by h(a)= ab, h(b) = bbc. Find the homomorphic image of L={aa,aba}, h(L). Solution: h(aa) = abab, h(aba) = abbbcab, The homomorphic image of L={aa,aba} is the language h(L) = {abab, abbbcab} Example: let ∑ = {a, b} and T= {b, c, d} and define h by h(a)= dbcc, h(b) = bdc. If L is the regular language denoted by r = (a + b*)(aa)*. Find the regular language h(L). Since r = (a + b*)(aa)*. Then r’ = (dbcc+ (bdc)* (dbccdbcc)*denotes the regular language h(L).

The Chomsky Hierarchy The Chomsky Hierarchy: Noam Chomsky, a founder of formal language theory, provided an initial classification into four language types, type 0, 1, 2, and 3, described as; Type 0 : Type 0 languages are those generated by unrestricted grammars, that is, the recursively enumerable languages. It is denoted as LRE. Type 1 : Type 1 consists of the context-sensitive languages. It is denoted as LCS. Type 2 : Type 2 consists of the context-free languages. It is denoted as LCF. Type 3 : Type 3 consists of the regular languages. It is denoted as LREG.

The relationship between these types is shown in the diagram
The relationship between these types is shown in the diagram. It is clear that LREG ⊆ LCF ⊆ LCS ⊆ LRE.

Home Work Q1: Find all strings in L((a+ b)*b(a + ab)*) of length less than four. Q2: if r= ((0+1)(0+1)*)*00(0+1)*,Give the language L(r). Q3:Give regular expressions for the following languages on {a,b,c}. a) All strings containing exactly one a. b) All strings containing no more than three a’s c) All strings that contain at least one occurrence of each symbol in a given set. Q4: Find a regular grammars that generates the language L(aa*(ab+a)*) and L((aab)*ab) . Q5: What are the strings generated by the regular expressions 10*, (10)*, (0 + 01), 0(0+1)*, and (0*1)* . Q6: Solve questions 3, 4, 5, and 6 at page DMA-826.

Regular Grammars Formal definition of a regular expression.

Similar presentations

Presentation on theme: "Regular Grammars Formal definition of a regular expression."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Regular Grammars Formal definition of a regular expression.

Similar presentations

Presentation on theme: "Regular Grammars Formal definition of a regular expression."— Presentation transcript:

Similar presentations

About project

Feedback