Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.

Similar presentations


Presentation on theme: "Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction."— Presentation transcript:

1 Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1

2 Introduction to Computation2 Regular Languages and Regular Expressions Many simple languages can be expressed by a formula involving languages containing a single string of length 1 and the operations of union, concatenation and Kleene star. Here are three examples –Strings ending in aa: {a, b}* {aa} –(This is a simplification of ({a} ∪ {b})*{a}{a} ) –Strings containing ab or bba: {a, b}* {ab, bba} {a, b}* –The language {a, b}* {aa, aab}* {b} These are called regular languages

3 Regular Languages and Regular Expressions (cont’d.) Definition: If  is an alphabet, the set R of regular languages over  is defined as follows: –The language  is an element of R, and for every   , the language {  } is in R –For every two languages L 1 and L 2 in R, the three languages L 1 ∪ L 2, L 1 L 2, and L 1 * are elements of R Examples: –{  }, because  * = {  } –{a, b}*{aa} = ({a} ∪ {b})* ({a}{a}) Introduction to Computation3

4 Regular Languages and Regular Expressions (cont’d.) A regular expression for a language is a slightly more user-friendly formula –Parentheses replace curly braces, and are used only when needed, and the union symbol is replaced by + Regular languageRegular Expression  {}{}  {a,b}*(a+b)* {aab}*{a,ab}(aab)*(a+ab) Introduction to Computation4

5 Regular Languages and Regular Expressions (cont’d.) Two regular expressions are equal if the languages they describe are equal. For example, –(a*b*)* = (a+b)* –(a+b)*ab(a+b)*+b*a* = (a+b)* The first half of the left-hand expression describes the strings that contain the substring ab and the second half describes those that don’t Introduction to Computation5

6 Nondeterministic Finite Automata Kleene’s Theorem asserts that regular languages are precisely the languages accepted by finite automata To prove this, we introduce a more general “device,” a nondeterministic finite automaton (NFA) –These make it much easier to start with a regular expression and draw a transition diagram for a device that accepts the corresponding language and has an obvious connection to the regular expression –We will show that allowing nondeterminism doesn’t change the languages that can be accepted Introduction to Computation6

7 Nondeterministic Finite Automata (cont’d.) This NFA closely resembles the regular expression (aa + aab)*b –The top loop is aa –The bottom loop is aab –By following the links we can generate any string in the language This is not the transition diagram for an FA; some nodes have more than one arc leaving them, some have none Introduction to Computation7

8 Nondeterministic Finite Automata (cont’d.) For this reason, we should not think of an NFA as describing an algorithm for recognizing a language Instead, consider it as describing a number of different sequences of steps that might be followed Introduction to Computation8

9 Nondeterministic Finite Automata (cont’d.) This is the “computation tree” for aaaabaab –Each level corresponds to a prefix of the input string –Each state on a level is one the machine could be in after processing that prefix –There is an accepting path for the input string (as well as other paths that are not accepting) Introduction to Computation9

10 Nondeterministic Finite Automata (cont’d.) Definition: A nondeterministic finite automaton (NFA) is a 5-tuple (Q, , q 0, A,  ), where: –Q is a finite set of states, –  is a finite input alphabet –q 0  Q is the initial state –A  Q is the set of accepting states –  : Q  (  ∪ {  })  2 Q is the transition function. (The values of  are not single states, but sets of states) For every element q of Q and every element  of   {  }, we interpret  (q,  ) as the set of states to which the NFA can move from state q on input  Introduction to Computation10

11 Nondeterministic Finite Automata (cont’d.) Defining  * is a little harder than for an FA, since  *(q, x) is a set, as is  (p,  ) for any p in the first set: ∪ {  (p,  ) | p   *(q, x)) is a first step towards  * We must also consider  -transitions, which could potentially occur at any stage Introduction to Computation11

12 Nondeterministic Finite Automata (cont’d.) Definition: Suppose M = (Q, , q 0, A,  ) is an NFA, and S  Q is a set of states –The  -closure of S is the set  (S) that can be defined recursively as follows: S   (S) For every q   (S),  (q,  )   (S) Introduction to Computation12

13 Nondeterministic Finite Automata (cont’d.) As for any finite set that is defined recursively, we can easily formulate an algorithm to calculate  (S): –Initialize T to be S, as in the basis part of the definition –Make a sequence of passes, in each pass considering every q  T and adding every state in  (q,  ) not already there –Stop after the first pass in which T does not change –The final value of T is  (S) Introduction to Computation13

14 Nondeterministic Finite Automata (cont’d.) Definition: Let M=(Q, , q 0, A,  ) be an NFA Define the extended transition function  * : Q   *  2 Q as follows: –For every q  Q,  *(q,  ) =  ({q}) –For every q  Q, every y   *, and every     *(q, y  ) =  ( ∪ {  (p,  ) | p   *(q, y)}) –A string x   * is accepted by M if  *(q 0, x) ∩ A   (i.e., some sequence of transitions involving the symbols of x and  ’s leads from q 0 to an accepting state) The language L(M) accepted by M is the set of all strings accepted by M Introduction to Computation14

15 The Nondeterminism in an NFA Can Be Eliminated Two types of nondeterminism have arisen: –Different arcs for the same input symbol, and  -transitions –Both can be eliminated For the second type, introduce new transitions so that we no longer need the  -transitions –When there is no  -transition from p to q but the NFA can go from p to q by using one or more  -transitions as well as , we introduce the  -transition –The resulting NFA may have more nondeterminism of the first type, but it will have no  -transitions Introduction to Computation15

16 The Nondeterminism in an NFA Can Be Eliminated (cont’d.) Theorem: For every language L   * accepted by an NFA M = (Q, , q 0, A,  ), there is an NFA M 1 with no  -transitions that also accepts L Define M 1 = (Q, , q 0, A 1,  1 ), where for every q  Q,  1 (q,  ) = , and for every q  Q and every  ,  1 (q,  ) =  *(q,  ) Define A 1 = A ∪ {q 0 } if   L, and A 1 = A otherwise We can prove, by structural induction on x, that for every q and every x with |x|  1,  1 *(q, x) =  * (q, x) Introduction to Computation16

17 The Nondeterminism in an NFA Can Be Eliminated (cont’d.) Theorem: For every language L   * accepted by an NFA M = (Q, , q 0, A,  ), there is an FA M 1 = (Q 1, , q 1, A 1,  1 ) that also accepts L We can assume M has no  -transitions. Let Q 1 = 2 Q (for this reason, this is called the subset construction); q 1 = {q 0 }; for every q  Q 1 and   ,  1 (q,  ) = ∪ {  (p,  ) | p  q}; A 1 = {q  Q 1 | q ∩ A   } M 1 is clearly an FA –It accepts the same language as M because for every x   *,  1 *(q 1, x) =  *(q 0, x) The proof is by structural induction on x Introduction to Computation17

18 Kleene’s Theorem, Part 1 Theorem: For every alphabet , every regular language over  can be accepted by a finite automaton Because of what we have just shown, it is enough to show that every regular language over  can be accepted by an NFA The proof is by structural induction, based on the recursive definition of the set of regular languages over  Introduction to Computation18

19 Kleene’s Theorem, Part 1 (cont’d.) The basis cases are easy The machines pictured below accept the languages  and {  }, respectively The next slide shows schematic diagrams of NFAs that accept L(M 1 ) ∪ L(M 2 ), L(M 1 )L(M 2 ), and L(M 1 )* (each FA is shown as having 2 accepting states) Introduction to Computation19

20 Kleene’s Theorem, Part 1 (cont’d.) Introduction to Computation20

21 Kleene’s Theorem, Part 1 (cont’d.) By using these constructions, we can create for every regular expression an NFA that accepts the corresponding language Introduction to Computation21

22 Kleene’s Theorem, Part 2 Theorem: For every finite automaton M=(Q, , q 0, A,  ), the language L(M) is regular Proof: First, for two states p and q, we define the language L(p, q) = {x   * |  *(p, x)=q} If we can show that for every p and q in Q, L(p, q) is regular, then it will follow that L(M) is, because –L(M) = ∪ {L(q 0, q) | q  A} –The union of a finite collection of regular languages is regular We will show that L(p, q) is regular by expressing it in terms of simpler languages that are regular Introduction to Computation22

23 Kleene’s Theorem, Part 2 (cont’d.) We will consider the distinct states through which M passes as it moves from p to q If x  L(p, q), we say x causes M to go from p to q through a state r if there are non-null strings x 1 and x 2 such that x = x 1 x 2,  *(p, x 1 ) = r, and  *(r, x 2 ) = q –In using a string of length 1 to go from p to q, M does not go through any state –In using a string of length n  2, it goes through a state n - 1 times (but if n > 2, these states may not be distinct) Introduction to Computation23

24 Kleene’s Theorem, Part 2 (cont’d.) Assume Q has n elements numbered 1 to n For p, q  Q and j  0, we let L(p, q, j) be the set of strings in L(p, q) that cause M to go from p to q without going through any state numbered higher than j Suppose that for some number k  0, L(p, q, k) is regular for every p, q  Q and consider how a string can be in L(p, q, k+1) –The easiest way is for it to be in L(p, q, k) –If not, it causes M to go to k+1 one or more times, but M goes through nothing higher Introduction to Computation24

25 Kleene’s Theorem, Part 2 (cont’d.) Every string in L(p, q, k+1) can be described in one of those two ways and every string that has one of these two forms is in L(p, q, k+1). This leads to the formula –L(p, q, k+1) = L(p, q, k) ∪ L(p, k+1, k) L(k+1, k+1, k)* L(k+1, q, k) This is the starting point for a proof by induction on k and for an algorithm Introduction to Computation25


Download ppt "Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction."

Similar presentations


Ads by Google