Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 07: Theory of Automata:08 Properties of Regular Languages.

Similar presentations


Presentation on theme: "Lecture 07: Theory of Automata:08 Properties of Regular Languages."— Presentation transcript:

1 Lecture 07: Theory of Automata:08 Properties of Regular Languages

2 Lecture 07: Theory of Automata:08 2 Lecture Objective Closure Properties Complementation Intersection

3 Lecture 07: Theory of Automata:08 3 Closure Properties A language that can be defined by a regular expression is called a regular language. Not all languages are regular, as we shall see in the next lecture. In this lecture we will focus on the class of all regular languages and discuss some of their properties.

4 Lecture 07: Theory of Automata:08 4 Theorem 10 If L 1 and L 2 are regular languages, then L 1 + L 2, L 1 L 2, and L* 1 are also regular languages. Notes: L 1 + L 2 is the language of all words in either L 1 or L 2. L 1 L 2 is the product language of all words formed by concatenating a word form L 1 with a word from L 2. L 1 is the language of all words that are the concatenation of arbitrarily many factors from L 1 *. The result stated in Theorem 10 is often expressed as The set of regular languages is closed under union, concatenation, and Kleene closure.

5 Lecture 07: Theory of Automata:08 5 Proof by Machines Because L 1 and L 2 are regular languages, there must be TGs that accept them (by Kleenes theorem). Let TG 1 accepts L 1 and TG 2 accepts L 2. Assume that TG 1 and TG 2 each have a unique start state and a unique separate final state. If this is not the case originally, then we can modify the TGs so that this becomes true as in Kleenes theorem,Part 2 of the proof (page 93).

6 Lecture 07: Theory of Automata:08 6 Proof contd. Then the TG described below accepts the language L 1 + L 2. By Kleenes theorem, since L 1 + L 2 is defined by this TG, it is also defined by a regular expression and hence is a regular language.

7 Lecture 07: Theory of Automata:08 7 The TG described below accepts the language L 1 L 2 where state 1 is the former + of TG 1 and state 2 is the former - of TG 2. Since L 1 L 2 is defined by this TG, it is also defined by a regular expression by Kleenes theorem, and therefore it is a regular language.

8 Lecture 07: Theory of Automata:08 8 The TG described below accepts the language L 1 *. We begin at the - state and trace a path to the + state of TG1. At this point, we cold stop and accept the string or jump back, at no cost, to the - state and run another segment of the input string.

9 Lecture 07: Theory of Automata:08 Complements. Intersections

10 Lecture 07: Theory of Automata:08 10 Complements Definition: If L is a language over the alphabet, we define its complement L` to be the language of all strings of letters from that are not words in L. Example: – Let L be the language over the alphabet = {a; b} of all words that have a double a in them. – Then, L` is the language of all words that do not have a double a in them. Note that the complement of L` is L. That is (L`)` = L

11 Lecture 07: Theory of Automata:08 11 Theorem 11 If L is a regular language, then L` is also a regular language. In other words, the set of regular languages is closed under complementation.

12 Lecture 07: Theory of Automata:08 12 Proof of theorem 11 If L is a regular language, then by Kleenes theorem, there is some FA that accepts the language L. Some states of this FA are final states and some are not. Let us reverse the status of each state: If it was a final state, make it a non-final state. If it was a non-final state, make it a final state. The start state gets reversed as follows: - ± If an input string formerly ended in a non-final state, it now ends in a final state, and vice versa. The new machine we have just built accepts all input strings that were not accepted by the original FA, and it rejects all the input strings that used to be accepted by FA. Therefore, this machine accepts exactly the language L`. So, by Kleenes theorem, L` is regular.

13 Lecture 07: Theory of Automata:08 13 Intersection: Theorem 12 If L 1 and L 2 are regular languages, then L 1 L 2 is also a regular language. In other words, the set of regular languages is closed under intersection.

14 Lecture 07: Theory of Automata:08 14 Proof of Theorem 12 By DeMorgans law (for sets of any kind): L 1 L 2 = (L` 1 + L` 2 )` This means that the language L 1 L 2 consists of all words that are not in either L` 1 or L` 2. Because L 1 and L 2 are regular, then so are L` 1 and L` 2 by Theorem 11. Since L` 1 and L` 2 are regular, so is L` 1 + L` 2 by Theorem 10. Now, since L` 1 + L` 2 is regular, so is (L` 1 + L` 2 )` by Theorem 11. This means L1 \ L2 is regular, because L1 L2 = (L` 1 + L` 2 )` by DeMorgans law.

15 Lecture 07: Theory of Automata:08 Pumping Lemma

16 Lecture 07: Theory of Automata:08 16 Introduction By using FAs and regular expressions, we have been able to define many languages. Although these languages have many different structures, they take only a few basic forms: – Languages with required substrings – Languages that forbid some substrings – Languages that begin or end with certain substrings – Languages with certain even (or odd) properties, and so on. We now turn our attention to some new forms, such as the language PALINDROME, or the language PRIME of all words a p, where p is a prime number. We shall see that neither of these is a regular language. We can describe them in English, but they can not be defined by an FA. We need to build more powerful machines to define them.

17 Lecture 07: Theory of Automata:08 17 Definition of Non-Regular Languages Definition: A language that cannot be defined by a regular expression is called a non-regular language. Notes: – By Kleenes theorem, a non-regular language can also not be accepted by any FA or TG. – All languages are either regular or non-regular; none are both.

18 Lecture 07: Theory of Automata:08 18 Case Study Consider the langugage L = {Λ; ab; aabb; aaabbb; aaaabbbb; aaaaabbbbb; …} The language L can also be written as L = {a n b n for n = 0; 1; 2; 3; …} or for short L = {a n b n } Note that although L is a subset of many regular languages, such as a*b*; the language defined by a*b* also includes such strings as aab and bb that are not in L.

19 Lecture 07: Theory of Automata:08 19 Example We shall show that the language L = {a n b n } is non-regular. Suppose on the contrary that L were regular. Then there must exist some FA that accepts L. Just for the sake of argument, let us assume that this FA has 95 states. We know that this FA must accept the word a 96 b 96. The first 96 letter as of this input string trace a path through this machine. The path cannot visit a new state when each input letter is read, because there are only 95 states. Therefore, at some point the path returns to a state that it has already visited. In other words, the path must contain a circuit in it. (A circuit is a loop that can be made of several edges.)

20 Lecture 07: Theory of Automata:08 20 So, the path first wanders up to the circuit and then starts to loop around the circuit (maybe many times) until a b is read from the input. At this point, the path can take a different turn following the b-edges, and eventually end up at a final state where the word a 96 b 96 is accepted. Just for the sake of argument again, let us say that the circuit that the a-edge path loops around has 7 states in it. Then what would happen to the input string a 96+7 b 96 ? Just as in the case of the input string a 96 b 96, the input string a 96+7 b 96 would produce a path through the machine and loop around the circuit exactly in the same way, but one more time than the path produced by the input string a 96 b 96. Both paths, at exactly the same state in the circuit, begin to branch off on the b-road.

21 Lecture 07: Theory of Automata:08 21 Once on the b-road, they both go the same 96 b-steps to arrive at the same final state. But this would mean that the input string a 103 b 96 is also accepted by this machine. This is a contradiction since we assume that this FA accepts exactly the words in L = {a n b n}. This contradiction means that the FA that accepts exactly the words in L does not exist. In other words, L is non-regular.

22 Lecture 07: Theory of Automata:08 22 In summary, we can always choose a word in L that is large enough so that its path through the FA has to contain a circuit. –Once we find that some path with a circuit can reach a final state, we ask ourselves what happens to a path that is just like the first one, but that loops around the circuit one extra time and then proceeds identically through the machine. –The new path also leads to the same final state, but it is generated by a different input string which is not in the language L. –We then can conclude that there is no FA that can accepts all the words in L and only the words in L. Therefore, L is non-regular. This idea is called the pumping lemma discovered by Bar-Hillel, Perles, and Shamir in It is called pumping because we pump more letters into the middle of the words. It is called lemma because it is used as a tool to prove other results (i.e., certain specific languages are non-regular).

23 Lecture 07: Theory of Automata:08 23 Theorem 13 Let L be any regular language that has infinitely many words. Then there exist some three strings x, y, and z (where y is not the null string) such that all the strings of the form xy n z for n = 1, 2, 3, … are words in L.

24 Lecture 07: Theory of Automata:08 24 Proof of theorem 13 Since L is regular, there is an FA that accepts exactly the words in L. Let w be some word in L that has more letters than there are states in FA. When w generates a path through the machine, the path cannot visit a new state for each letter read, because there are more letters than states. Therefore, the path must at some point revisit a state that it has already visited. In other words, the path contains a circuit in it.

25 Lecture 07: Theory of Automata:08 25 Proof of theorem 13 contd. Lets break the word w up into 3 parts: 1. Part 1: Starting at the beginning, let x denote all the letters of w that lead up to the first state that is revisited. Note that x may be the null string if the path revisits the start state as its first revisit. 2. Part 2: Starting at the letter after the substring x, let y denote the substring of w that travels around the circuit coming back to the same state the circuit began with. Because there must be a circuit, y cannot be the null string, and y contains the letters of w for exactly one loop around this circuit. 3. Part 3: Let z be the rest of w, starting at the letter after y and going to the end of the string w. Note that z could be null, or the path for z could also loop around the y-circuit or any other. That means that what z does is arbitrary. Clearly, from the definitions of these substrings, we have w = xyz. Recall that w is accepted by the FA.

26 Lecture 07: Theory of Automata:08 26 Proof of Theorem 13 contd. What is the path for the input string xyyz? This path follows the path for w in the first part x and leads up to the beginning of the place where w looped around a circuit. Then like w, it inputs the substring y, which causes the machine to loop back to this same state again. Then, again like w, it inputs the substring y, which causes the machine to loop back to this same state another time. Finally, just like w, it proceeds along the path dictated by the substring z and ends at the same final state that w did. Hence, the string xyyz is accepted by this machine and therefore must be in the language L.

27 Lecture 07: Theory of Automata:08 27 Proof of Theorem 13 contd. Similarly, the strings xyyyz, xyyyyz,... must also be in L. In other words, L must contain all strings of the form: xy n z for n = 1; 2; 3; …

28 Lecture 07: Theory of Automata:08 28 Example Consider the following FA that accepts an infinite language and has only six states: Consider the word w = bbbababa

29 Lecture 07: Theory of Automata:08 29 Example contd. The x-part goes from the - state up to the first circuit: substring b The y-part goes around the circuit consisting of states 2, 3, and 5: substring bba The z-part is substring baba What happens to the input string xyyz = (b)(bba)(bba)(baba)? This string will loop twice around the circuit and is accepted. The same thing happens with xyyyz, xyyyyz, and in general, for xy n z.

30 Lecture 07: Theory of Automata:08 30 Let us use Theorem 13 to show again that the language L = {a n b n } is not regular. If L is regular then Theorem 13 says that there must be strings x, y, and z such that all words of the form xy n z are in L. A typical word of L looks like this: aaa…aaaabbbb…bbb. How to break it into x, y, and z? If y contains entirely as, then when we pump it to xyyz, this string will have more as than bs, which is not allowed in L. Similarly, if y is composed of only bs then xyyz will have more bs than as, and is not allowed in L either.

31 Lecture 07: Theory of Automata:08 31 If y have some as followed by some bs then y must contain the substring ab. – So, xyyz must have 2 substrings abs. – But every word in L has exactly one substring ab. – Therefore, xyyz can not be a word in L. The above arguments show that the pumping lemma cannot apply to L and therefore L is not regular.

32 Lecture 07: Theory of Automata:08 32 Example Let EQUAL be the language of all words (over the alphabet = {a; b}) that have the same total number of as and bs: EQUAL = {Λ; ab; ba; aabb; abab; abba; baab; baba; bbaa; aaabbb; …} Can you show that EQUAL is not regular? Let L = {a n ba n } = {b; aba; aabaa; …} Can you show that L is not regular?

33 Lecture 07: Theory of Automata:08 33 Theorem 14 Let L be an infinite language accepted by a finite automaton with N states. Then, for all words w in L that have more than N letters, there are strings x, y, and z, where y is not null and length(x) + length(y) does not exceed N, such that w = xyz and all strings of the form xy n z for n = 1; 2; 3; … are in L. This is obviously just another version of Theorem 13 (the pumping lemma), for which we have already provided the proof. The purpose of stressing the issue of lengths is illustrated in the following example.

34 Lecture 07: Theory of Automata:08 34 Example We will show that the language PALINDROME is not regular. We cannot use the first version of the pumping lemma (Theorem 13)vbecause the strings x = a; y = b; z = a satisfy the lemma and do not contradict the language, since all the strings of the form xy n z = ab n a are words in PALINDROME. So, we will use the second version of the pumping lemma (Theorem 14) to show that PALINDROME is non- regular.

35 Lecture 07: Theory of Automata:08 35 Example contd. Suppose for the contrary that PALINDROME were regular, then there would exist some FA that accepts it. For the sake of argument, assume that this FA has 77 states. Then, the palindrome w = a 80 ba 80 must be accepted by this FA. Because w has more letters than the FA has states, by Theorem 14 we can break w into three parts: x, y, and z. Since length(x) + length(y) 77 (by Theorem 14), the strings x and y must both be made of all as, since the first 77 letters of w are all as.

36 Lecture 07: Theory of Automata:08 36 Hence, when we form xyyz, we are adding more as to the front of w, but we are not adding more as to the back of w. Thus, the string xyyz will be of the form a (more than 80) ba 80 and obviously is NOT a palindrome. This is a contradiction, since Theorem 14 says that xyyz must be a palindrome. Hence, the language PALINDROME is NOT regular.

37 Lecture 07: Theory of Automata:08 37 Example Consider the language PRIME = {a p where p is a prime} Recall that a prime is a positive integer greater than 1 whose only positive divisors are 1 and itself, for example 2, 3, 5, 7... Hence, PRIME = {a p where p is a prime} = {aa; aaa; aaaaa; aaaaaaa; …} Can you show that PRIME is non-regular?


Download ppt "Lecture 07: Theory of Automata:08 Properties of Regular Languages."

Similar presentations


Ads by Google