Regular Grammars Formal definition of a regular expression.

Slides:



Advertisements
Similar presentations
Grammar types There are 4 types of grammars according to the types of rules: – General grammars – Context Sensitive grammars – Context Free grammars –
Advertisements

Properties of Regular Languages
Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
CS 3240 – Chapter 3.  How would you delete all C++ files from a directory from the command line?  How about all PowerPoint files that start with the.
Strings and Languages Operations
Languages, grammars, and regular expressions
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Normal forms for Context-Free Grammars
CS 3240 – Chuck Allison.  A model of computation  A very simple, manual computer (we draw pictures!)  Our machines: automata  1) Finite automata (“finite-state.
1 Regular Expressions/Languages Regular languages –Inductive definitions –Regular expressions syntax semantics Not covered in lecture.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Languages and Machines Unit two: Regular languages and Finite State Automata.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Chapter 2 Languages.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
Theory of computing, part 1. Von Neumann Turing machine Finite state machines NP complete problems -maximum clique -travelling salesman problem -colour.
1 Syntax Specification Regular Expressions. 2 Phases of Compilation.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
Formal Methods in SE Theory of Automata Qasiar Javaid Assistant Professor Lecture # 06.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Recap Lecture-2 Kleene Star Closure, Plus operation, recursive definition of languages, INTEGER, EVEN, factorial, PALINDROME, {anbn}, languages of strings.
1 INFO 2950 Prof. Carla Gomes Module Modeling Computation: Language Recognition Rosen, Chapter 12.4.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
1 Introduction to Regular Expressions EELS Meeting, Dec Tom Horton Dept. of Computer Science Univ. of Virginia
1 Computability Five lectures. Slides available from my web page There is some formality, but it is gentle,
Introduction to Language Theory
Module 2 How to design Computer Language Huma Ayub Software Construction Lecture 8.
Lecture-2 Recap Lecture-1
L ECTURE 3 Chapter 4 Regular Expressions. I MPORTANT T ERMS Regular Expressions Regular Languages Finite Representations.
Lecture # 5 Pumping Lemma & Grammar
Introduction to Theory of Automata By: Wasim Ahmad Khan.
1 Module 14 Regular languages –Inductive definitions –Regular expressions syntax semantics.
Grammar G = (V N, V T, P, S) –V N : Nonterminal symbols –V T : Terminal symbols V N  V T = , V N ∪ V T = V – P : a finite set of production rules α 
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
CSC312 Automata Theory Lecture # 3 Languages-II. Formal Language A formal language is a set of words—that is, strings of symbols drawn from a common alphabet.
1 Closure Properties of Regular Languages L 1 and L 2 are regular. How about L 1  L 2, L 1  L 2, L 1 L 2, L 1, L 1 * ?
CS 3813: Introduction to Formal Languages and Automata
CS 203: Introduction to Formal Languages and Automata
Recursive Definations Regular Expressions Ch # 4 by Cohen
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Regular Grammars Reading: 3.3. What we know so far…  FSA = Regular Language  Regular Expression describes a Regular Language  Every Regular Language.
Formal Languages and Grammars
Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.
Lecture # 4.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Conversions Regular Expression to FA FA to Regular Expression.
Lecture 6: Context-Free Languages
Lecture 03: Theory of Automata:2014 Asif Nawaz Theory of Automata.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Recap Lecture 3 RE, Recursive definition of RE, defining languages by RE, { x}*, { x}+, {a+b}*, Language of strings having exactly one aa, Language of.
Lecture 17: Theory of Automata:2014 Context Free Grammars.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Theory of Computation Lecture #
Lecture # 2.
Complexity and Computability Theory I
Natural Language Processing - Formal Language -
CS314 – Section 5 Recitation 3
REGULAR LANGUAGES AND REGULAR GRAMMARS
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
Recap Lecture-2 Kleene Star Closure, Plus operation, recursive definition of languages, INTEGER, EVEN, factorial, PALINDROME, {anbn}, languages of strings.
Recap Lecture-2 Kleene Star Closure, Plus operation, recursive definition of languages, INTEGER, EVEN, factorial, PALINDROME, {anbn}, languages of strings.
Recap Lecture 3 RE, Recursive definition of RE, defining languages by RE, { x}*, { x}+, {a+b}*, Language of strings having exactly one aa, Language of.
LECTURE # 07.
Presentation transcript:

Regular Grammars Formal definition of a regular expression. Languages associated with regular expressions. Introduction regular grammars. Regular language and homomorphism. The Chomsky Hierarchy

Regular Expression The regular expressions over a set I are defined recursively by: the symbol ∅ is a regular expression; the symbol λ is a regular expression; the symbol x is a regular expression whenever x ∈ I ; the symbols (AB), (A ∪ B), and A* are regular expressions whenever A and B are regular expressions. ∅ represents the empty set, that is, the set with no strings; λ represents empty string; x represents the set {x} containing the string with one symbol x; (AB) represents the concatenation of the sets represented by A and by B; (A ∪ B) represents the union of the sets represented by A and by B; A* represents the Kleene closure of the set represented by A.

Example What are the strings in the regular sets specified by the regular expressions 10*, (10)*, 0 ∪ 01, 0(0 ∪ 1)*, and (0*1)*?

Example Find a regular expression that specifies each of these sets: (a) the set of bit strings with even length (b) the set of bit strings ending with a 0 and not containing 11 The set of strings of two bits is specified by the regular expression (00 ∪ 01 ∪ 10 ∪ 11). Consequently, the set of strings with even length is specified by (00 ∪ 01 ∪ 10 ∪ 11)∗ . It must be the concatenation of one or more strings where each string is either a 0 or a 10. It follows that the regular expression (0 ∪ 10)∗ (0 ∪ 10) specifies the set of bit strings that do not contain 11 and end with a 0.

symbol ∅; symbol λ; symbol a whenever a ∈ I ;

Construct a nondeterministic finite-state automaton that recognizes the regular set 1∗ ∪ 01.

Languages associated with regular expression Definition: The Language L(r) denoted by any regular expression r is defined by the following rules. ∅ is a regular expression denoting the empty set, λ is a regular expression denoting {λ }, For every a ϵ∑, a is a regular expression denoting {a} If r1 and r2 are regular expressions, then L(r1 + r2) = L(r1) U L(r2), L(r1.r2) = L(r1)L(r2), L((r1)) = L(r1), L(r1*) = (L(r1))*

Example: Exhibit the language L(a*.(a + b)) in set notation. Solution: L(a*.(a + b)) = L(a*)L(a + b) (from L(r1.r2) = L(r1)L(r2)) = (L(a))*(L(a)U(L(b)) (from L(r1*)) = (L(r1))*) = (L(a))*(L(a)U(L(b)) (from L(r1+r2)=L(r1) U L(r2)) But (L(a))*={ , a, aa, aaa, …..} L(a) ={a} and L(b) ={b} L(a) U L(b) ={a,b} L(a*.(a + b)) = { , a, aa, aaa, …..}{a,b} = {a, b, aa, ab, aaa, aab,……}.

Example: For ∑ = {a, b} , the expression r= (a + b) Example: For ∑ = {a, b} , the expression r= (a + b) * (a + bb) is a regular expression. Write its language. Solution: (we can prove easily r is regular expression) r= (a + b) * (a + bb) L(r) = L((a + b) * (a + bb)) = L((a + b) *) L((a+bb)) = (L(a+b))* (L(a) U L(bb)) = (L(a) U L(b))* (L(a) U L(bb)) =((L(a))* U (L(b))*) (L(a) U L(bb)) But (L(a))*={a}*= { , a, aa, aaa, …..} (L(b))*={b}*= { , b, bb, bbb, …..} L(a) U L(bb) ={a, bb} So, L((a+b)*(a + bb))={ , a, aa, aaa….., b, bb, bbb,……}{a, bb} = {a, bb, aa, abb, …… ba, bbb, ……….}, In other words L(r) is the set of all strings on {a, b}, terminated by either a or bb.

Example: write the language for the following expression; r= (aa)*(bb)*b Solution: L(r) = L((aa)*(bb)*b) = L((aa)*) L((bb)*) L(b) = (L(aa))* (L(bb))* L(b) = {aa}*{bb}*{b} = { , aa, aaaa, aaaaaa, ..} { , bb, bbbb, bbbbbb, ...} {b} = {a2n: n ≥ 0} {b2m: m ≥ 0} {b} = {a2nb2m+1; n ≥ 0, m ≥ 0}

Regular Grammars Regular Grammars are two types as follows: 1) Right-Linear Grammar: A grammar G = (V, T, S, P) is said to be right-linear if all productions are of the form; A  xB, A  x, Where A, B ϵ V, and x ϵ T * 2) Left-Linear Grammar: A grammar G = (V, T, S, P) is said to be Left-linear if all productions are of the form; A  Bx, A  x, Regular languages as languages generated by FSA V: finite set of non-terminals (upper case) T: finite set of terminals (lower case) S: Start symbol P: finite set of rewriting rules of the form A-> xB or A-> x, where A and B stand for non-terminals and x stands for a terminal

Example : The grammar G1= ({S}, {a, b}, S, P1), with P1 given as S abS|a, It is right-linear. 2) The Grammar G2 =({S,S1,S2}, {a, b}, S, P2) with productions S S1ab, S1S1ab|S2, S2  a, It is left-linear. Both G1 and G2 are regular grammars. Example: Write the regular expression generated by these; 1) S abS ababS ababa  r= (ab)*a 2) SS1ab  S1abab  S2abab aabab  r= a(ab)* Example: The grammar G= ({S, A, B},{a, b}, S, P), with production SA, AaB|λ, BAb. Is it a regular language? Solution: It is not a regular language because it is neither right-liner not left-linear.

Homomorphism: Suppose ∑ and T are alphabets. Then a function f : ∑  T Homomorphism: Suppose ∑ and T are alphabets. Then a function f : ∑  T* is called a homomorphism. In words, a homomorphism is a substitution in which a single letter is replaced with a string. The domain of the function h is extended to strings in an obvious fashion if w= a1a2a3…an. Then h(w)=h(a1)h(a2)h(a3)……h(an). Remark: if L is a language on ∑, then its homomorphism image is defined as h(L) = {h(w): w ϵ L}.

Example: let ∑ = {a, b} and T= {a, b, c} and define h by h(a)= ab, h(b) = bbc. Find the homomorphic image of L={aa,aba}, h(L). Solution: h(aa) = abab, h(aba) = abbbcab, The homomorphic image of L={aa,aba} is the language h(L) = {abab, abbbcab} Example: let ∑ = {a, b} and T= {b, c, d} and define h by h(a)= dbcc, h(b) = bdc. If L is the regular language denoted by r = (a + b*)(aa)*. Find the regular language h(L). Since r = (a + b*)(aa)*. Then r’ = (dbcc+ (bdc)* (dbccdbcc)*denotes the regular language h(L).

The Chomsky Hierarchy The Chomsky Hierarchy: Noam Chomsky, a founder of formal language theory, provided an initial classification into four language types, type 0, 1, 2, and 3, described as; Type 0 : Type 0 languages are those generated by unrestricted grammars, that is, the recursively enumerable languages. It is denoted as LRE. Type 1 : Type 1 consists of the context-sensitive languages. It is denoted as LCS. Type 2 : Type 2 consists of the context-free languages. It is denoted as LCF. Type 3 : Type 3 consists of the regular languages. It is denoted as LREG.

The relationship between these types is shown in the diagram The relationship between these types is shown in the diagram. It is clear that LREG ⊆ LCF ⊆ LCS ⊆ LRE.

Home Work Q1: Find all strings in L((a+ b)*b(a + ab)*) of length less than four. Q2: if r= ((0+1)(0+1)*)*00(0+1)*,Give the language L(r). Q3:Give regular expressions for the following languages on {a,b,c}. a) All strings containing exactly one a. b) All strings containing no more than three a’s c) All strings that contain at least one occurrence of each symbol in a given set. Q4: Find a regular grammars that generates the language L(aa*(ab+a)*) and L((aab)*ab) . Q5: What are the strings generated by the regular expressions 10*, (10)*, (0 + 01), 0(0+1)*, and (0*1)* . Q6: Solve questions 3, 4, 5, and 6 at page DMA-826.