CS 3813: Introduction to Formal Languages and Automata

Slides:



Advertisements
Similar presentations
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Advertisements

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Regular Grammars Formal definition of a regular expression.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
CS 3240 – Chapter 3.  How would you delete all C++ files from a directory from the command line?  How about all PowerPoint files that start with the.
CS5371 Theory of Computation
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
Fall 2004COMP 3351 Single Final State for NFA. Fall 2004COMP 3352 Any NFA can be converted to an equivalent NFA with a single final state.
1 Single Final State for NFAs and DFAs. 2 Observation Any Finite Automaton (NFA or DFA) can be converted to an equivalent NFA with a single final state.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Normal forms for Context-Free Grammars
Cs466(Prasad)L14Equiv1 Equivalence of Regular Language Representations.
1 A Single Final State for Finite Accepters. 2 Observation Any Finite Accepter (NFA or DFA) can be converted to an equivalent NFA with a single final.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,
CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
1 Theory of Digital Computation Course material for undergraduate students on IT Department of Computer Science University of Veszprem Veszprem, Hungary.
Theory of Languages and Automata
Regular Expressions and Finite State Automata  Themes  Finite State Automata (FSA)  Describing patterns with graphs  Programs that keep track of state.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
1 Introduction to Regular Expressions EELS Meeting, Dec Tom Horton Dept. of Computer Science Univ. of Virginia
CS 3813: Introduction to Formal Languages and Automata
Regular Grammars Chapter 7. Regular Grammars A regular grammar G is a quadruple (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals.
Regular Grammars Chapter 7 1. Regular Grammars A regular grammar G is a quadruple (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
 Regular Grammar and Regular Language [Def 3.1] Regular Grammar(use to in lexical analysis) Type 3 grammar(regular grammar, RG) Type 3 grammar(regular.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
CS 3813: Introduction to Formal Languages and Automata Chapter 2 Deterministic finite automata These class notes are based on material from our textbook,
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Chapter 6 Properties of Regular Languages. 2 Regular Sets and Languages  Claim(1). The family of languages accepted by FSAs consists of precisely the.
CS 203: Introduction to Formal Languages and Automata
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Regular Grammars Reading: 3.3. What we know so far…  FSA = Regular Language  Regular Expression describes a Regular Language  Every Regular Language.
Finite Automata & Regular Languages Sipser, Chapter 1.
Formal Languages and Grammars
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
1 Chapter 3 Regular Languages.  2 3.1: Regular Expressions (1)   Regular Expression (RE):   E is a regular expression over  if E is one of:
 2004 SDU Lecture4 Regular Expressions.  2004 SDU 2 Regular expressions A third way to view regular languages. Say that R is a regular expression if.
1 Introduction to the Theory of Computation Regular Expressions.
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2006.
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
Theory of Languages and Automata By: Mojtaba Khezrian.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Lecture 17: Theory of Automata:2014 Context Free Grammars.
Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.
CSE 105 theory of computation
CSE 3813 Introduction to Formal Languages and Automata
CS314 – Section 5 Recitation 3
REGULAR LANGUAGES AND REGULAR GRAMMARS
Chapter 7 Regular Grammars
Closure Properties of Regular Languages
Presentation transcript:

CS 3813: Introduction to Formal Languages and Automata Chapter 3 Regular languages and expressions These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata, 3rd ed., by Peter Linz, published by Jones and Bartlett Publishers, Inc., Sudbury, MA, 2001. They are intended for classroom use only and are not a substitute for reading the textbook.

Operations on formal languages Let L1 = {10} and L2 = {011, 11}. Union: L1  L2 = {10, 011, 11} Concatenation: L1 L2 = {10011, 1011} Kleene Star: L1* = {λ, 10, 1010, 101010, … } Other operations: intersection, complement, difference

Definition Of Regular Languages A regular language over an alphabet  is one that contains either a single string of length 0 or 1, or strings which can be obtained by using the operations of union, concatenation, or Kleene* on strings of length 0 or 1.

Alternative definition of regular languages The simplest possible regular languages are the empty set and languages consisting of a single string that is either the empty string or has length one. For example; if  = {a,b}, the simplest languages are , {λ}, {a}, and {b}. A regular language is a language that can be built from these simple languages, by using the three operations of union, concatenation, and Kleene star.

Regular Languages correspond to Regular Expressions L = Ø RE is Ø L = {} RE =  L = {a} RE = a L = L1  L2 RE = (r1 + r2) L = L1 L2 RE = (r1r2) L = L1* RE = (r1*)

Regular expressions A useful shorthand for describing regular languages. Compare to arithmetic expressions, such as (x + 3)/2. An arithmetic expression is constructed using arithmetic operators, such as addition and division. A regular expression is constructed using operations on languages, such as concatenation, union, and Kleene star. The value of an arithmetic expression is a number. The value of a regular expression is a language.

Recursive definition of a regular expression  is a regular expression corresponding to the language . λ is a regular expression corresponding to the language {λ}. For each symbol a  , a is a regular expression corresponding to the language {a}. For any regular expressions r and s, corresponding to the regular languages L(r) and L(s), respectively, each of the following is a regular expression: (r + s) corresponds to the language L(r)  L(s) (r · s) or (rs) corresponds to the language L(r)L(s) (r*) corresponds to the language (L(r))*

Examples a*+ b = {λ, a, b, aa, aaa, aaaa, aaaaa, … } a*ba* = {w  * : w has exactly one b} (a + b)* = any string of a’s and b’s (a + b)*aa (a + b)* = {w  * : w contains aa} (a + b)*aa (a + b)* + (a + b)*bb (a + b)* = {w  * : w contains aa or bb} (a + λ)b* = {abn : n  0} + {bn : n  0} As with arithmetic expressions, there is an order of precedence for operators -- unless you change it using parentheses. The order is: star closure first, then concatenation, then union.

More examples All strings containing no more than two a’s: (b + c)*(λ + a)(b + c)*(λ + a)(b + c)* All strings containing no runs of a’s of length greater than two: (b + c)*(λ + a + aa)(b + c)*((b + c)(b + c)*(λ + a + aa)(b + c)*)* All strings in which all runs of a’s have lengths that are multiples of three: (aaa + b + c)*

Hints for writing regular expressions Assume  = {a, b, c}. Zero or more a’s: a* One or more a’s: aa* Any string at all: (a + b + c)* Any nonempty string: (a + b + c)(a + b + c)* Any string that does not contain a: (b + c)* Any string containing exactly one a: (b + c)*a(b + c)*

Practice Let  = {a,b,c}. Give a regular expression for the following languages: (a) all strings containing exactly two a’s (b) all strings containing no more than three a’s

Practice Let  = {a,b,c}. Give a regular expression for the following languages: (a) all strings containing exactly two a’s (b + c)*a(b + c)*a(b + c)* (b) all strings containing no more than three a’s (b + c)*(λ + a)(b + c)*(λ + a)(b + c)*(λ + a)(b + c)*

Practice What languages correspond to the following regular expressions? a*b (aaa + bba) (ab)*

More practice Give regular expressions for the following languages, where the alphabet is  = {a, b, c}. --all strings ending in b --all strings containing no more than two a’s -- all strings of even length

More practice Give regular expressions for the following languages, where the alphabet is  = {0, 1}. --all strings of one or more 0’s followed by a 1 --all strings of two or more symbols followed by three or more 0’s -- all strings that do not end with 01

Do these strings match the regular expression? Regular expression String (01* + 1) 0101 (a + λ)b b (ab)*a* λ (a + b)(ab) bb

Accepting (review) Let M = (Q, S, q0, d, A) be an FA. A string x  S * is accepted by M if d*(q0, x)  A The language accepted (or recognized) by M is the set L(M) = {x  S * | x is accepted by M} A language L over the alphabet S is regular iff there is a Finite Automaton that accepts L.

Kleene’s theorem 1) For any regular expression r that represents language L(r), there is a finite automaton that accepts that same language. 2) For any finite automaton M that accepts language L(M), there is a regular expression that represents the same Therefore, the class of languages that can be represented by regular expressions is equivalent to the class of languages accepted by finite automata -- the regular languages.

NFA regular expression DFA Kleene’s theorem part 1 proved Kleene’s

Theorem 3.1 1st half of Kleene’s theorem: Let r be a regular expression. Then there exists some nondeterministic regular accepter that accepts L(r). Consequently, L(r) is a regular language. Proof strategy: for any regular expression, we show how to construct an equivalent NFA. Because regular expressions are defined recursively, the proof is by induction.

Base step: Give a NFA that accepts each of the simple or “base” languages, , {λ}, and {a} for each a  . a

Inductive step: For each of the operations -- union, concatenation and Kleene star -- show how to construct an accepting NFA. Closure under union: M1 λ λ λ M2 λ

Closure under concatenation: M1 λ M2

Closure under Kleene Star: λ λ M1 λ λ

Closure properties of Regular Languages Union, concatenation, and Kleene star of two regular languages will result in a regular language, since we can write a regular expression for them. Intersection and difference (complement) of two regular languages will also produce a regular language. The class of regular languages is said to be closed under these operations. (More in Ch. 4.)

Exercise Use the construction of the first half of Kleene’s theorem to construct a NFA that accepts the language L(ab*aa + bba*ab).

Exercise Use the construction of the first half of Kleene’s theorem to construct a NFA that accepts the language L(ab*aa + bba*ab). λ λ FA accepting ab*aa q0 qf λ λ FA accepting bba*ab

Exercise Construct a NFA that accepts the language corresponding to the regular expression: ((b(a+b)*a) + a)

Theorem 3.2 Kleene’s theorem part 2: Let L be a regular language. Then there exists a regular expression r such that L = L(r). Any language accepted by a finite automaton can be represented by a regular expression. The proof strategy: For any DFA, we show how create an equivalent regular expression. In other words, we describe an algorithm for converting any DFA to a regular expression.

Expression diagram A labeled directed graph (similar to a finite state diagram) in which transitions are labeled by regular expressions Has a single start state with no incoming transitions Has a single accepting state with no outgoing transitions Example: (a+b) ab a*

Algorithm for converting a DFA into an equivalent regular expression Initial step: Change every transition labeled a,b to (a+b). Add a single start state with an outgoing λ-transition to the current start state, and add a single final state with incoming λ-transitions from every previous final state. Main step: Until expression diagram has only two states (initial state and final state), repeat the following: -- pick some non-start, non-final state -- remove it from the diagram and re-label transitions with regular expressions so that the same language is accepted

The key step is removing states and re-labeling transitions with regular expressions. Here are some examples of how to do this. b a a ab*a b ab*a a b ab*b a b ab*b ab*a b a a

Exercise a,b a a (a+b) λ λ b b Continue ...

Exercise a (a+b) λ λ b (a+b) λ a*b a*b (a+b)*

Exercise Find a regular expression that corresponds to the language accepted by the following DFA. a b

Exercise a b λ λ a*bb*a a*bb* λ (a*bb*a)*a*bb*

Homework Find a regular expression that corresponds to the language accepted by the following DFA. q1 q0 1 1 q2 1

Applications of regular expressions Validation checking that an input string is in valid format example 1: checking format of email address on WWW entry form example 2: UNIX regex command Search and selection looking for strings that match a certain pattern example: UNIX grep command Tokenization converting sequence of characters (a string) into sequence of tokens (e.g., keywords, identifiers) used in lexical analysis phase of compiler

Grammar A grammar G = (V, T, S, P) consists of the following quadruple: a set V of variables (non-terminal symbols), including a starting symbol S  NT a set T of terminals (same as an alphabet, ) A start symbol S  V a set P of production rules Example: S  aS | A A bA | λ

Derivation Strings are “derived” from a grammar Example of a derivation S  aS  aaS  aaA  aabA  aab At each step, a nonterminal is replaced by the sentential form on the right-hand side of a rule (a sentential form can contain nonterminals and/or terminals) Automata recognize languages; grammars generate languages

Context-free grammar A grammar is said to be context-free if every rule has a single non-terminal on the left-hand side This means you can apply the rule in any context. More complicated languages (such as English) have context-dependent rules. A language generated from a context-free grammar is called a context-free language

Regular grammar A grammar is said to be right-linear if all productions are of the form AxB or Ax, where A and B are variables and x is a string of terminals A grammar is said to be left-linear if all productions are of the form ABx or Ax A regular grammar is either right-linear or left-linear.

Linear grammar A grammar can be linear without being right- or left-linear. A linear grammar is a grammar in which at most one variable can occur on the right side of any production rule, without any restriction on the position of the variable. Example: S  aS | A A Ab | λ

Another formalism for regular languages Every regular grammar generates a regular language, and every regular language can be generated by a regular grammar. A regular grammar is a simpler, special-case of a context-free grammar The regular languages are a proper subset of the context-free languages

Exercises Find a regular grammar that generates the language on  = {a,b} consisting of all strings with no more than three a’s

Exercises Find a regular grammar that generates the language on  = {a,b} consisting of all strings with no more than three a’s S  bS | aA | λ A  bA | aB | λ B  bB | aC | λ C  bC | λ

Exercises Find a regular grammar that generates the language consisting of even-length strings over {a,b}

Exercises Find a regular grammar that generates the language consisting of even-length strings over {a,b} S  aaS | abS | baS | bbS | λ

Exercise What language is generated by the following context-free (but not regular) grammar? S  aSa | bSb | a | b | λ

Exercise What language is generated by the following context-free grammar? S  aSa | bSb | a | b | λ The odd/even palindrome language: L = {w(a+b+λ)wR}

Exercise Given a grammar, you should be able to say what language it generates Use set notation to define the language generated by the following grammars 1) S  aaSB | λ B  bB | b 2) S  aSbb | A A  cA | c

Exercise S  aaSB | λ B  bB | b It helps to list some of the strings that can be formed: S  aaSB  aaB  aab S  aaSB  aaB  aabB  aabb S  aaSB  aaB  aabB  aabbB  aabbb S  aaSB  aaB  aabB  aabbB  aabbbB  aabbbb S  aaSB  aaaaSBB  aaaaBB  aaaaBb  aaaabb S  aaSB  aaaaSBB  aaaaBB  aaaaBbB  aaaaBbb  aaaabbb What is the pattern? L = {(aa)nbnb*}

Exercise Given a language, you should be able give a grammar that generates it. For example, give a regular (right-linear) grammar for the language consisting of all strings over {a, b, c} that begin with a, contain exactly two b’s, and end with cc.

Exercise Give a regular (right-linear) grammar for the language consisting of all strings over {a, b, c} that begin with a, contain exactly two b’s, and end with cc S  aA A  bB | aA | cA B  bC | aB | cB C  aC | cC | cD D  c

Theorem 3.3 Every language generated by a right-linear grammar is regular. Proof: Specify a procedure for automatically constructing an NFA that mimics the derivations of a right-linear grammar.

Theorem 3.3 Justification: The sentential forms produced by a right linear grammar have exactly one variable, which occurs as the rightmost symbol. Assume that our grammar has a production rule D  dE and that, during the derivation of a string, there is a step wcD  wcdE We can construct an NFA which has states D and E, and an arc labeled d from D to E. NFAs can be converted to DFAs. All languages accepted by DFAs are regular.

Theorem 3.3 Construction: For each variable Vi in the grammar there will be a state in the automaton labeled Vi. The initial state of the automaton will be labeled V0 and will correspond to the S variable in the grammar. For each production rule Vi  a1a2…amVj the automaton will have transitions such that δ*(Vi  a1a2…am) = Vj For each production rule Vi  a1a2…am the automaton will have transitions such that δ*(Vi  a1a2…am) = Vfinal

Theorem 3.3 Construct an NFA that accepts the language generated by the grammar: S  aA convert to: V0  aV1 A abS | b V1  abV0 | b a b V0 V1 Vf b a

Theorem 3.4 Every regular language can be generated by a right-linear grammar. Proof: Generate a DFA for the language. Specify a procedure for automatically constructing a right-linear grammar from the DFA.

Theorem 3.4 Given a regular language L, let M = (Q, , δ, q0, F) be a DFA that accepts L. Let Q = {q0, q1, …, qn} and  = {a1, a2, …, am}. Construct the grammar G = (V, T, S, P) with: V = {q0, q1, …, qn} T = {a1, a2, …, am} S = q0. P = {} initially. P, the set of production rules, is constructed as follows:

Theorem 3.4 For each transition δ(qi, aj) = qk in the transition table of M, add to P the production: qi  ajqk If qk is in F, then add to P the production: qk  λ

Theorem 3.4 Example: Construct a right-linear grammar for the language L = L(aab*a) First, build an NFA for L: a a a q0 q1 q2 qf b

Theorem 3.4 a a a q0 q1 q2 qf b P = {} initially. Add to P a rule for each transition in the NFA: q0  aq1 q1  aq2 q2  bq2 q2  aqf Since qf is in F, add to P the production: qf  λ

Theorem 3.4 a a a q0 q1 q2 qf b Now P = You can convert to normal grammar notation: {q0  aq1 S  aA q1  aq2 A  aB q2  bq2 B  bB q2  aqf B  aC qf  λ } C  λ

Theorem 3.5 A language L is regular if and only if there exists a left-linear grammar G such that L = L(G). Proof: The strategy here is a little tricky. We describe an algorithm to construct a right-linear grammar that generates the reverse of all the strings generated by the left-linear grammar.

Theorem 3.5 Given any left-linear grammar we can construct from it an right-linear grammar G’ by replacing productions of the form: A  Bv with A  vRB and A  v with A  vR Since L(G’) is generated by a right-linear grammar, it is regular. It can be demonstrated that L(G) = (L(G’))R. It can be proven that the reverse of any regular language is also regular (see exercise 12, section 2.3 in the Linz text). Hence, L is regular.

Theorem 3.6 A language L is regular if and only if there exists a regular grammar G such that L = L(G). Proof: Combine our definition of regular grammars, which includes the statement, “A regular grammar is either right-linear or left-linear”, with theorems 3.4 and 3.5

3 ways of specifying regular languages Regular expressions DFA NFA Regular grammars describe accept Regular languages generate