Presentation is loading. Please wait.

Presentation is loading. Please wait.

MA/CSSE 474 Theory of Computation Regular Expressions Intro.

Similar presentations


Presentation on theme: "MA/CSSE 474 Theory of Computation Regular Expressions Intro."— Presentation transcript:

1 MA/CSSE 474 Theory of Computation Regular Expressions Intro

2 Your Questions? Previous class days' material Reading Assignments HW5 problems Anything else Still more language ambiguity!

3 Regular Languages Regular Language Regular Expression Finite State Machine Describes Accepts

4 Regular Expressions The regular expressions over an alphabet  are the strings that can be obtained as follows: 1.  is a regular expression. 2.  is a regular expression. 3. Every element of  is a regular expression. 4. If ,  are regular expressions, then so is . 5. If ,  are regular expressions, then so is . 6. If  is a regular expression, then so is  *. 7.  is a regular expression, then so is  +. 8. If  is a regular expression, then so is (  ). #7 is here for convenience only (syntactic sugar); many authors do not include + in the list of r.e. builders.

5 Regular Expression Examples If  = { a, b }, the following are regular expressions:   a ( a  b )* (abba   ) + (a  bab)

6 Regular Expressions Define Languages Define L, a semantic interpretation function for regular expressions (Let  and  be arbitrary regular expressions over alphabet  ). 1. L(  ) = . 2. L(  ) = {  }. 3. If c  , L(c) = {c}. 4. L(  ) = L(  ) L(  ). 5. L(    ) = L(  )  L(  ). 6. L(  *) = (L(  ))*. 7. L(  + ) = L(  *) = L(  ) (L(  ))*. If L(  ) is equal to , then L(  + ) is also equal to . Otherwise L(  + ) is the language that is formed by concatenating together one or more strings drawn from L(  ). 8. L((  )) = L(  ).

7 The Role of the Rules Rules 1, 3, 4, 5, and 6 give the language its power to define sets. Rule 8 has as its only role grouping other operators. Rules 2 and 7 appear to add functionality to the regular expression language, but they don’t. 2.  is a regular expression. 7.  is a regular expression, then so is  +.

8 Operator Precedence in Regular Expressions RegularArithmeticExpressions HighestKleene star and +exponentiation concatenationmultiplication Lowestunionaddition a b *  c d *x y 2 + i j 2

9 Analyzing a Regular Expression L(( a  b )* b ) = L(( a  b )*) L( b ) = (L(( a  b )))* L( b ) = (L( a )  L( b ))* L( b ) = ({ a }  { b })* { b } = { a, b }* { b }.

10 From English to reg exps L = {w  { a, b }*: |w| is even} L = {w  { 0, 1 }*: w is a binary representation of a multiple of 4} L = {w  { a, b }*: w contains an odd number of a ’s}

11 Hidden: Going the Other Way L = {w  { a, b }*: |w| is even} ( a  b ) ( a  b ))* ( aa  ab  ba  bb )* L = {w  { 0, 1 }*: w is a binary representation of a multiple of 4} 0  1(0  1)*00 L = {w  { a, b }*: w contains an odd number of a ’s} b * ( ab * ab *)* a b * b * a b * ( ab * ab *)*

12 The Details Matter a *  b *  ( a  b )* ( ab )*  a * b *

13 More Regular Expression Examples L ( ( aa *)   ) = L ( ( a   )* ) = L = {w  { a, b }*: there is no more than one b in w} L = {w  { a, b }* : no two consecutive letters in w are the same}

14 The Details Matter L 1 = {w  { a, b }* : every a is immediately followed a b } A regular expression for L 1 : A FSM for L 1 : L 2 = {w  { a, b }* : every a has a matching b somewhere} A regular expression for L 2 : A FSM for L 2 :

15 Kleene’s Theorem Finite state machines and regular expressions define the same class of languages. To prove this, we must show: Theorem: Any language that can be defined by a regular expression can be accepted by some FSM and so is regular. Theorem: Every regular language (i.e., every language that can be accepted by some DFSM) can be defined with a regular expression.

16 For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for:  : A single element c of  :  :

17 Union If  is the regular expression    and if both L(  ) and L(  ) are regular:

18 Concatenation If  is the regular expression  and if both L(  ) and L(  ) are regular:

19 Kleene Star If  is the regular expression  * and if L(  ) is regular:

20 An Example (b  ab )* An FSM for b An FSM for a An FSM for b An FSM for ab :

21 An Example ( b  ab )* An FSM for ( b  ab ):

22 An Example ( b  ab )* An FSM for ( b  ab )*:

23 For Every FSM There is a Corresponding Regular Expression We’ll show this by construction. The construction is different than the textbook's. Let M = ({q 1, …, q n }, , , q 1, A) be a DFSM. Define R ijk to be the set of all strings x   * such that (q i,x) |- M (q j,  ), and if (q i,y) |- M (q,  ), for any prefix y of x (except y=  and y=x), then  k That is, R ijk is the set of all strings that take us from q i to q j without passing through any intermediate states numbered higher than k. In this case, "passing through" means both entering and leaving. Note that either i or j (or both) may be greater than k. * *

24 Example: R ijk R ijk is the set of all strings that take us from q i to q j without passing through any intermediate states numbered higher than k. In this case, "passing through" means both entering and leaving. Note that either i or j (or both) may be greater than k. R 000 R 010 R 011 R 021 R 022 R 232 R 233

25 DFA  Reg. Exp. construction R ijk is the set of all strings that take M from q i to q j without passing through any intermediate states numbered higher than k. Examples: R ijn is Also note that L(M) is the union of R 1jn over all q j in A. We will show that for all i,j  {1, …, n} and all k  {0, …, n}, R ijk is defined by a regular expression. –We already know that the union of languages defined by reg. exps. is defined by a reg. exp.

26 DFA  Reg. Exp. continued R ijk is the set of all strings that take M from q i to q j without passing through any intermediate states numbered higher than k. It can be computed recursively: Base cases (k = 0): –If i  j, R ij0 = {a  :  (q i, a) = q j } –If i = j, R ii0 = {a  :  (q i, a) = q i }  {  } Recursive case (k > 0): R ijk is R ijk-1  R ikk-1 (R kkk-1 )*R kjk-1 We show by induction that each R ijk is defined by some regular expression r ijk.

27 DFA  Reg. Exp. Proof pt. 1 Base case definition (k = 0): –If i  j, R ij0 = {a  :  (q i, a) = q j } –If i = j, R ii0 = {a  :  (q i, a) = q i }  {  } Base case proof: R ij0 is a finite set of symbols, each of which is either  or a single symbol from . So R ij0 can be defined by the reg. exp. r ij0 = a 1  a 2  …  a p (or a 1  a 2  …  a p  if i=j), where {a 1, a 2, …,a p } is the set of all symbols a such that  (q i, a) = q j. Note that if M has no direct transitions from q i to q j, then r ij0 is  (it is  if i=j and no "loop" on that state).

28 DFA  Reg. Exp. Proof pt. 2 Recursive definition (k > 0): R ijk is R ijk-1  R ikk-1 (R kkk-1 )*R kjk-1 Induction hypothesis: For each and, there is a regular expression r k-1 such that L(r k-1 )= R k-1. Induction step. By the recursive parts of the definition of regular expressions and the languages they define, and by the above recursive defintion of R ijk : R ijk = L(r ijk-1  r ikk-1 (r kkk-1 )*r kjk-1 )

29 DFA  Reg. Exp. Proof pt. 3 We showed by induction that each R ijk is defined by some regular expression r ijk. In particular, for all q j  A, there is a regular expression r 1jn that defines R 1jn. Then L(M) = L(r 1j 1 n  …  r 1j p n ), where A = {q j 1, …, q j p }

30 An Example Start q 1 q 2 q 3 0 0 1 1 0,1 k=0k=1k=2 r 11k   (00)* r 12k 000(00)* r 13k 110*1 r 21k 000(00)* r 22k    00(00)* r 23k 11  010*1 r 31k   (0  1)(00)*0 r 32k 0  1 0  1(0  1)(00)* r 33k     (0  1)0*1


Download ppt "MA/CSSE 474 Theory of Computation Regular Expressions Intro."

Similar presentations


Ads by Google