Regular Languages and Regular Grammars Chapter 3.

Slides:



Advertisements
Similar presentations
Chapter 5: Languages and Grammar 1 Compiler Designs and Constructions ( Page ) Chapter 5: Languages and Grammar Objectives: Definition of Languages.
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Closure Properties of CFL's
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
CS 3240 – Chapter 3.  How would you delete all C++ files from a directory from the command line?  How about all PowerPoint files that start with the.
CS 310 – Fall 2006 Pacific University CS310 Regular Expressions Sections:1.3 page 63 September 18, 2006 September 20, 2006.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Dept. of Computer Science & IT, FUUAST Automata Theory 2 Automata Theory III Languages And Regular Expressions Construction of FA’s for given languages.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Finite-State Machines with No Output
Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.
Theory of Languages and Automata
Regular Expressions and Finite State Automata  Themes  Finite State Automata (FSA)  Describing patterns with graphs  Programs that keep track of state.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Regular Expressions and Finite State Automata Themes –Finite State Automata (FSA) Describing patterns with graphs Programs that keep track of state –Regular.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided by author Slides edited for.
Grammars CPSC 5135.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Regular Grammars Chapter 7. Regular Grammars A regular grammar G is a quadruple (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals.
MA/CSSE 474 Theory of Computation Kleene's Theorem Practical Regular Expressions.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
MA/CSSE 474 Theory of Computation Regular Expressions Intro.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2.
 Regular Grammar and Regular Language [Def 3.1] Regular Grammar(use to in lexical analysis) Type 3 grammar(regular grammar, RG) Type 3 grammar(regular.
CHAPTER 1 Regular Languages
Copyright © Curt Hill Finite State Automata Again This Time No Output.
Regular Expressions Chapter 6.
Regular Expressions Chapter 6. Regular Languages Regular Language Regular Expression Finite State Machine Recognizes or Accepts Generates.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Chapter 6 Properties of Regular Languages. 2 Regular Sets and Languages  Claim(1). The family of languages accepted by FSAs consists of precisely the.
Regular Expressions Chapter 6. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts.
CS 3813: Introduction to Formal Languages and Automata
CS 203: Introduction to Formal Languages and Automata
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
Modeling Computation: Finite State Machines without Output
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Regular Grammars Reading: 3.3. What we know so far…  FSA = Regular Language  Regular Expression describes a Regular Language  Every Regular Language.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Regular Expressions Section 1.3 (also 1.1, 1.2) CSC 4170 Theory of Computation.
using Deterministic Finite Automata & Nondeterministic Finite Automata
1 Language Recognition (11.4) Longin Jan Latecki Temple University Based on slides by Costas Busch from the courseCostas Busch
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
Conversions Regular Expression to FA FA to Regular Expression.
1 Chapter 3 Regular Languages.  2 3.1: Regular Expressions (1)   Regular Expression (RE):   E is a regular expression over  if E is one of:
 2004 SDU Lecture4 Regular Expressions.  2004 SDU 2 Regular expressions A third way to view regular languages. Say that R is a regular expression if.
1 Introduction to the Theory of Computation Regular Expressions.
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
MA/CSSE 474 Theory of Computation Closure properties of Regular Languages Pumping Theorem.
MA/CSSE 474 Theory of Computation Regular Expressions Intro.
MA/CSSE 474 Theory of Computation Closure properties of Regular Languages Pumping Theorem.
Practical Regular Expressions
Regular Expressions.
Regular Expressions Sections:1.3 page 63 September 17, 2008
Language Recognition (12.4)
REGULAR LANGUAGES AND REGULAR GRAMMARS
Chapter 7 Regular Grammars
Kleene’s Theorem Muhammad Arif 12/6/2018.
Language Recognition (12.4)
NFAs and Transition Graphs
CSC312 Automata Theory Kleene’s Theorem Lecture # 12
Presentation transcript:

Regular Languages and Regular Grammars Chapter 3

Regular Languages Regular Language Regular Expression Finite State Machine Accepts Describes

Operators on Regular Expressions ()Parentheses * Star Closure Concatenation + Union. Example: Over  = {a, b, c}, (a + (b. c)) * produces: {λ, a, bc, aa, abc, bcbc, … } In order of precedence: Note: The concatenation symbol is often omitted.

Regular Expressions Let  be a given alphabet. Then 1. , λ, and a   are all primitive regular expressions. 2. If r 1 and r 2 are regular expressions, so are r 1 + r 2, r 1. r 2, r 1 *, and (r 1 ) 3.A string is a regular expression, iff it can be derived from the primitive regular expressions by a finite number of application of the rules in (2).

Languages Associated with Regular Expressions If r is a regular expression L(r) is a language associated with r. Rules to simplify languages associated with r: L(  ) =  L( λ) = λ L(a) = {a} L(r 1 + r 2 ) = L(r 1 ) U L(r 2 ) L(r 1. r 2 ) = L(r 1 ). L(r 2 ) L((r 1 )) = L(r 1 ) L(r 1 * ) = (L(r 1 )) *

Analyzing a Regular Expression L(( a + b )* b ) = L(( a + b )*) L( b ) = (L( a + b ))* L( b ) = (L( a ) U L( b ))* L( b ) = ({ a } U { b })* { b } = { a, b }* { b }. A string of a’s and b’s that end with b

L( a * b *)= L( a *)L( b *) = {a}*{b}* A string of zero or more a’s followed by a string of zero or more b’s. Analyzing a Regular Expression

L = {w  { a, b }* : w = |w| is even} ((a + b)(a + b))* or (aa + ab + ba + bb)* Given a Language, find a rex

Examples L = {w  { a, b }* : w contains an odd number of a’s} b*(ab*ab*)*ab* or b*ab*(ab*ab*)* Both expressions require that there be a single a somewhere. There can also be other a’s, but they must occur in pairs.

More Regular Expression Examples Try these: L = {w  { a, b }*: there is no more than one b in w} L(r) = {a 2n b 2m+1 : n  0, m  0}

More Regular Expression Examples Try these: L = {w  { a, b }*: there is no more than one b in w} a*(λ+b)a* or a* + a*ba* L(r) = {a 2n b 2m+1 : n  0, m  0} (aa)*(bb)*b

The Details Matter a * + b *  ( a + b )* ( ab )*  a * b *

Rex to NFA Finite state machines and regular expressions define the same class of languages. Theorem: Any language that can be defined with a regular expression can be accepted by some NFA and so is regular. Proof by Construction: Must show that an NFA can be constructed using rules for: , λ, any symbol in , union, and concatenation.

For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for:  :

For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for:  :

For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for:  : A single element of  :

For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for:  : A single element of  :

For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for:  : A single element of  : λ :

For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for:  : A single element of  : λ :

FSA that recognizes s + t ;;; … … M 1 (recognizes string s) M 2 (recognizes string t) λ λ λ λ Union

;;; … … λ λ λ FSA that recognizes st Concatenation M 1 (recognizes string s)M 2 (recognizes string t)

Star Closure ;;; … λ λ M 1 (recognizes string s) λ λ FSA that recognizes s * λ

An Example (b + ab )* An FSM for a An FSM for b An FSM for ab : λ

An Example ( b + ab )* An FSM for ( b + ab ): λ λ λ

An Example An FSM for ( b + ab )*: λ λ λ λ λ λ λ λ

An Example A Simplified FSM for ( b + ab )*: a b b λ λ

For Every FSM There is a Corresponding Regular Expression Theorem: Every regular language (i.e., every language that can be accepted by some DFSM) can be defined with a regular expression. Proof by Construction: Use generalized transition graphs (GTGs) to convert FSM to REX. A GTG is a transition graph whose edges are labeled with regular expressions.

A Simple Example Let M be: Suppose we rip out state 2:

The Algorithm fsmtoregexheuristic fsmtoregexheuristic(M: FSM) = 1. Remove unreachable states from M. 2. If M has no accepting states then return . 3. If the start state of M is part of a loop, create a new start state s and connect s to M’s start state via an λ -transition. 4. If there is more than one accepting state of M or there are any transitions out of any of them, create a new accepting state and connect each of M’s accepting states to it via an λ -transition. The old accepting states no longer accept. 5. If M has only one state then return λ. 6. Until only the start state and the accepting state remain do: 6.1 Select rip (not s or an accepting state). 6.2 Remove rip from M. 6.3 *Modify the transitions among the remaining states so M accepts the same strings. 7. Return the regular expression that labels the one remaining transition from the start state to the accepting state.

Example 1 1.Create a new initial state and a new, unique accepting state, neither of which is part of a loop. Note:   λ

2. Remove states and arcs and replace with arcs labeled with larger and larger regular expressions. Example 1, Continued

Remove state 3: Example 1, Continued

Remove state 2: + Example 1, Continued

Remove state 1: + ++

Example 2 a * (a + b)c *

Example 3 a * + a * (a + b)c *

Simplifying Regular Expressions Regex’s describe sets: ● Union is commutative:  +  =  + . ● Union is associative: (  +  ) +  =  + (  +  ). ●  is the identity for union:  +  =  +  = . ● Union is idempotent:  +  = . Concatenation: ● Concatenation is associative: (  )  =  (  ). ● λ is the identity for concatenation:  λ = λ  = . ●  is a zero for concatenation:   =   = . Concatenation distributes over union: ● (  +  )  = (   ) + (   ). ●  (  +  ) = (   ) + (   ). Kleene star: ●  * = λ. ● λ * = λ. ●(  *)* =  *. ●  *  * =  *. ●(  +  )* = (  *  *)*.

Applications of regular expressions: Pattern Matching Many applications allow pattern matches unix perl Excel Access … Pattern matching programs use automata pattern  rex  nfa  dfa  transition table  driver

A Biology Example – BLAST Given a protein or DNA sequence, find others that are likely to be evolutionarily close to it. ESGHDTTTYYNKNRYPAGWNNHHDQMFFWV Build a DFSM that can examine thousands of other sequences and find those that match any of the selected patterns.

Regular Expressions in Perl SyntaxNameDescription abcConcatenationMatches a, then b, then c, where a, b, and c are any regexs a | b | cUnion (Or)Matches a or b or c, where a, b, and c are any regexs a*a*Kleene starMatches 0 or more a’s, where a is any regex a+a+At least oneMatches 1 or more a’s, where a is any regex a?a?Matches 0 or 1 a’s, where a is any regex a{n, m}ReplicationMatches at least n but no more than m a’s, where a is any regex a*?ParsimoniousTurns off greedy matching so the shortest match is selected a+?  .Wild cardMatches any character except newline ^Left anchorAnchors the match to the beginning of a line or string $Right anchorAnchors the match to the end of a line or string [a-z][a-z] Assuming a collating sequence, matches any single character in range [^ a - z ] Assuming a collating sequence, matches any single character not in range \d\d Digit Matches any single digit, i.e., string in [ ] \D\D Nondigit Matches any single nondigit character, i.e., [^ ] \w\w Alphanumeric Matches any single “word” character, i.e., [ a - zA - Z0 - 9 ] \W\W Nonalphanumeric Matches any character in [^ a - zA - Z0 - 9 ] \s\s White spaceMatches any character in [space, tab, newline, etc.]

SyntaxNameDescription \S\S Nonwhite space Matches any character not matched by \ s \n\n NewlineMatches newline \r\r ReturnMatches return \t\t TabMatches tab \f\f FormfeedMatches formfeed \b\b BackspaceMatches backspace inside [] \b\b Word boundaryMatches a word boundary outside [] \B\B Nonword boundaryMatches a non-word boundary \0\0 NullMatches a null character \ nnn OctalMatches an ASCII character with octal value nnn \ x nn HexadecimalMatches an ASCII character with hexadecimal value nn \cX\cX ControlMatches an ASCII control character \ char QuoteMatches char ; used to quote symbols such as. and \ (a)(a)StoreMatches a, where a is any regex, and stores the matched string in the next variable \1VariableMatches whatever the first parenthesized expression matched \2Matches whatever the second parenthesized expression matched …For all remaining variables Regular Expressions in Perl

Using Regular Expressions in the Real World Matching numbers: -? ([0-9]+(\.[0-9]*)? | \.[0-9]+) Matching ip addresses: S! ([0-9]{1,3} (\. [0-9] {1,3}){3}) ! $1 ! Finding doubled words: \ From Friedl, J., Mastering Regular Expressions, O’Reilly,1997.

More Regular Expressions Identifying spam: \badv\(?ert\)?\b Trawl for addresses: (\.[A-Za- z]+){1,4}\b

Using Substitution Building a chatbot: On input: is the chatbot will reply: Why is ?

Chatbot Example The food there is awful Why is the food there awful? Assume that the input text is stored in the variable $text : $text =~ s/^([A-Za-z]+)\sis\s([A-Za-z]+)\.?$/ Why is \1 \2?/ ;

Regular Grammars A regular grammar G is a quadruple (V, T, S, P) that is either consistently right-linear or consistently left-linear. ● V - Variables ● T – Terminals ● S - Start variable, S  V ● P - Productions

Right-Linear Grammar All production rules are of the form: A  xBorA  x A,B  VA and B are variables x  T * x is a string in the alphabet Example: G = ({S}, {a, b}, S, P) P: S  abS | a Corresponding Regular Expression: (ab) * a

Left-Linear Grammar All production rules are of the form: A  BxorA  x A,B  VA and B are variables x  T * x is a string in the alphabet Example: G = ({S, S 1, S 2 }, {a, b}, S, P) P: S  S 1 ab S 1  S 1 ab | S 2 S 2  a Corresponding Regular Expression: aab(ab) *

Focus on Right-Linear Grammars A language generated by a right-linear grammar is always regular. Proof by construction of FA on page 91 of text. Example: Construct an FA that accepts the language generated by the grammar: V 0  aV 1 V 1  abV 0 | b

Focus on Right-Linear Grammars V 0  aV 1 V 1  b V 1  abV 0 Complete FA:

Right-Linear Grammars Every regular language can be generated by some right-linear grammar. Proof by reverse construction of an FA, page 93 of text. Example: Find a right-linear grammar that generates the language accepted by the FA shown below.

G = {{Q 0, Q 1, Q 2 }, {0, 1}, Q 0, P} P:Q 0  1Q 1 | Q 2 | λ Q 1  0Q 0 | 0Q 2 Q 2  1Q 2 Each state in the FA is represented by a variable in the grammar. Each transition symbol in the FA is a terminal in the grammar. Each transition in the FA is represented by a rule in the grammar. If a state, q k is a final state, include the production q k  λ