MA/CSSE 474 Theory of Computation Regular Expressions Intro.

MA/CSSE 474 Theory of Computation Regular Expressions Intro

Your Questions? Previous class days' material Reading Assignments HW5 problems Anything else Still more language ambiguity!

Regular Languages Regular Language Regular Expression Finite State Machine Describes Accepts

Regular Expressions The regular expressions over an alphabet  are the strings that can be obtained as follows: 1.  is a regular expression. 2.  is a regular expression. 3. Every element of  is a regular expression. 4. If ,  are regular expressions, then so is . 5. If ,  are regular expressions, then so is . 6. If  is a regular expression, then so is  *. 7.  is a regular expression, then so is  +. 8. If  is a regular expression, then so is (  ). #7 is here for convenience only (syntactic sugar); many authors do not include + in the list of r.e. builders.

Regular Expression Examples If  = { a, b }, the following are regular expressions:   a ( a  b )* (abba   ) + (a  bab)

Regular Expressions Define Languages Define L, a semantic interpretation function for regular expressions (Let  and  be arbitrary regular expressions over alphabet  ). 1. L(  ) = . 2. L(  ) = {  }. 3. If c  , L(c) = {c}. 4. L(  ) = L(  ) L(  ). 5. L(    ) = L(  )  L(  ). 6. L(  *) = (L(  ))*. 7. L(  + ) = L(  *) = L(  ) (L(  ))*. If L(  ) is equal to , then L(  + ) is also equal to . Otherwise L(  + ) is the language that is formed by concatenating together one or more strings drawn from L(  ). 8. L((  )) = L(  ).

The Role of the Rules Rules 1, 3, 4, 5, and 6 give the language its power to define sets. Rule 8 has as its only role grouping other operators. Rules 2 and 7 appear to add functionality to the regular expression language, but they don’t. 2.  is a regular expression. 7.  is a regular expression, then so is  +.

Operator Precedence in Regular Expressions RegularArithmeticExpressions HighestKleene star and +exponentiation concatenationmultiplication Lowestunionaddition a b *  c d *x y 2 + i j 2

Analyzing a Regular Expression L(( a  b )* b ) = L(( a  b )*) L( b ) = (L(( a  b )))* L( b ) = (L( a )  L( b ))* L( b ) = ({ a }  { b })* { b } = { a, b }* { b }.

From English to reg exps L = {w  { a, b }*: |w| is even} L = {w  { 0, 1 }*: w is a binary representation of a multiple of 4} L = {w  { a, b }*: w contains an odd number of a ’s}

Hidden: Going the Other Way L = {w  { a, b }*: |w| is even} ( a  b ) ( a  b ))* ( aa  ab  ba  bb )* L = {w  { 0, 1 }*: w is a binary representation of a multiple of 4} 0  1(0  1)*00 L = {w  { a, b }*: w contains an odd number of a ’s} b * ( ab * ab *)* a b * b * a b * ( ab * ab *)*

The Details Matter a *  b *  ( a  b )* ( ab )*  a * b *

More Regular Expression Examples L ( ( aa *)   ) = L ( ( a   )* ) = L = {w  { a, b }*: there is no more than one b in w} L = {w  { a, b }* : no two consecutive letters in w are the same}

The Details Matter L 1 = {w  { a, b }* : every a is immediately followed a b } A regular expression for L 1 : A FSM for L 1 : L 2 = {w  { a, b }* : every a has a matching b somewhere} A regular expression for L 2 : A FSM for L 2 :

Kleene’s Theorem Finite state machines and regular expressions define the same class of languages. To prove this, we must show: Theorem: Any language that can be defined by a regular expression can be accepted by some FSM and so is regular. Theorem: Every regular language (i.e., every language that can be accepted by some DFSM) can be defined with a regular expression.

For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for:  : A single element c of  :  :

Union If  is the regular expression    and if both L(  ) and L(  ) are regular:

Concatenation If  is the regular expression  and if both L(  ) and L(  ) are regular:

Kleene Star If  is the regular expression  * and if L(  ) is regular:

An Example (b  ab )* An FSM for b An FSM for a An FSM for b An FSM for ab :

An Example ( b  ab )* An FSM for ( b  ab ):

An Example ( b  ab )* An FSM for ( b  ab )*:

For Every FSM There is a Corresponding Regular Expression We’ll show this by construction. The construction is different than the textbook's. Let M = ({q 1, …, q n }, , , q 1, A) be a DFSM. Define R ijk to be the set of all strings x   * such that (q i,x) |- M (q j,  ), and if (q i,y) |- M (q,  ), for any prefix y of x (except y=  and y=x), then  k That is, R ijk is the set of all strings that take us from q i to q j without passing through any intermediate states numbered higher than k. In this case, "passing through" means both entering and leaving. Note that either i or j (or both) may be greater than k. * *

Example: R ijk R ijk is the set of all strings that take us from q i to q j without passing through any intermediate states numbered higher than k. In this case, "passing through" means both entering and leaving. Note that either i or j (or both) may be greater than k. R 000 R 010 R 011 R 021 R 022 R 232 R 233

DFA  Reg. Exp. construction R ijk is the set of all strings that take M from q i to q j without passing through any intermediate states numbered higher than k. Examples: R ijn is Also note that L(M) is the union of R 1jn over all q j in A. We will show that for all i,j  {1, …, n} and all k  {0, …, n}, R ijk is defined by a regular expression. –We already know that the union of languages defined by reg. exps. is defined by a reg. exp.

DFA  Reg. Exp. continued R ijk is the set of all strings that take M from q i to q j without passing through any intermediate states numbered higher than k. It can be computed recursively: Base cases (k = 0): –If i  j, R ij0 = {a  :  (q i, a) = q j } –If i = j, R ii0 = {a  :  (q i, a) = q i }  {  } Recursive case (k > 0): R ijk is R ijk-1  R ikk-1 (R kkk-1 )*R kjk-1 We show by induction that each R ijk is defined by some regular expression r ijk.

DFA  Reg. Exp. Proof pt. 1 Base case definition (k = 0): –If i  j, R ij0 = {a  :  (q i, a) = q j } –If i = j, R ii0 = {a  :  (q i, a) = q i }  {  } Base case proof: R ij0 is a finite set of symbols, each of which is either  or a single symbol from . So R ij0 can be defined by the reg. exp. r ij0 = a 1  a 2  …  a p (or a 1  a 2  …  a p  if i=j), where {a 1, a 2, …,a p } is the set of all symbols a such that  (q i, a) = q j. Note that if M has no direct transitions from q i to q j, then r ij0 is  (it is  if i=j and no "loop" on that state).

DFA  Reg. Exp. Proof pt. 2 Recursive definition (k > 0): R ijk is R ijk-1  R ikk-1 (R kkk-1 )*R kjk-1 Induction hypothesis: For each and, there is a regular expression r k-1 such that L(r k-1 )= R k-1. Induction step. By the recursive parts of the definition of regular expressions and the languages they define, and by the above recursive defintion of R ijk : R ijk = L(r ijk-1  r ikk-1 (r kkk-1 )*r kjk-1 )

DFA  Reg. Exp. Proof pt. 3 We showed by induction that each R ijk is defined by some regular expression r ijk. In particular, for all q j  A, there is a regular expression r 1jn that defines R 1jn. Then L(M) = L(r 1j 1 n  …  r 1j p n ), where A = {q j 1, …, q j p }

An Example Start q 1 q 2 q 3 0 0 1 1 0,1 k=0k=1k=2 r 11k   (00)* r 12k 000(00)* r 13k 110*1 r 21k 000(00)* r 22k    00(00)* r 23k 11  010*1 r 31k   (0  1)(00)*0 r 32k 0  1 0  1(0  1)(00)* r 33k     (0  1)0*1

MA/CSSE 474 Theory of Computation Regular Expressions Intro.

Similar presentations

Presentation on theme: "MA/CSSE 474 Theory of Computation Regular Expressions Intro."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MA/CSSE 474 Theory of Computation Regular Expressions Intro.

Similar presentations

Presentation on theme: "MA/CSSE 474 Theory of Computation Regular Expressions Intro."— Presentation transcript:

Similar presentations

About project

Feedback