Deterministic Finite Automata

Deterministic Finite Automata
Alphabets, Strings, and Languages Transition Graphs and Tables Some Proof Techniques We are now going to express the ideas behind finite automata more formally. We’ll see how to represent finite automata either as graphs or tables – they are two representations of the same idea. And we’re going to do a few proofs in considerable detail and learn some useful proof techniques, especially inductive proofs.

Alphabets An alphabet is any finite set of symbols. Examples:
ASCII, Unicode, {0,1} (binary alphabet ), {a,b,c}, {s,o}, set of signals used by a protocol. Click 1 An ALPHABET is simply a finite set of symbols. Click 2 There are many examples. Click 3 Two standard alphabets we see frequently are the ASCII and Unicode alphabets. Click 4 An alphabet we’ll use frequently is the binary alphabet consisting of only the symbols 0 and 1. Click 5 We’ll also sometimes use alphabets of letters, like a, b, and c. And in the tennis example, we used the alphabet of outcomes consisting of the letters s and o. Click 6 Each protocol has a set of signals, and the set of all signals used by a protocol is an alphabet.

Strings A string over an alphabet Σ is a list, each element of which is a member of Σ. Strings shown with no commas or quotes, e.g., abc or Σ* = set of all strings over alphabet Σ. The length of a string is its number of positions. ε stands for the empty string (string of length 0). Click 1 Strings are the usual data type you’ve seen in languages like C or Java. That is, a string is a sequence of symbols chosen from some alphabet Sigma. The strings “over” Sigma are those strings whose symbols are each members of Sigma. For example, 011 (DRAW at top) is a string over alphabet {0,1} (DRAW Sigma = {0,1}). It is also a string over alphabet {0,1, and 2}; it just happens not to have any 2’s. Click 2 Unlike programming languages, which typically put quotes around strings, we’re not going to do that. And even though strings are lists of symbols, we’re not going to separate the symbols by commas or other separators. We’ll just write down the strings like this (POINT to abc, 01101). Click 3 We use the expression Sigma-star for the set of all strings over alphabet Sigma. Click 4 The *length* of a string is the number of positions in that string. Click 5 A special string is the EMPTY STRING, which has length 0. We represent it by epsilon (POINT).

Example: Strings {0,1}* = {ε, 0, 1, 00, 01, 10, 11, 000, 001, . . . }
Subtlety: 0 as a string, 0 as a symbol look the same. Context determines the type. Click 1 For example, the language {0,1}* consists of all strings that can be made from the symbols 0 and 1. For example, epsilon, the empty string (POINT) is in this language. There are two strings of length one, the strings 0 and 1 (POINT). And there are four strings of length two (POINT), and so on. Click 2 Notice that we do not make a distinction between a symbol, like 0, and the string of length 1 that consists of a single instance of that symbol, the string 0. We trust that context will make clear which we mean. Programming languages do make this distinction. So in C, for example, we’d put single quotes around the 0 (DRAW) to indicate the symbol or character 0, while we’d put double quotes around 0 (DRAW) to indicate the string 0.

Languages A language is a subset of Σ* for some alphabet Σ.
Example: The set of strings of 0’s and 1’s with no two consecutive 1’s. L = {ε, 0, 1, 00, 01, 10, 000, 001, 010, 100, 101, 0000, 0001, 0010, 0100, 0101, 1000, 1001, 1010, } Click 1 Languages are sets of strings, and they can be finite sets or infinite sets. The only limitation we place on languages is that there is some alphabet, that is, a finite set of symbols, from which all the strings of one language are composed. Click 2 Here’s an example of a language L we’re going to meet several times. It’s alphabet is {0,1}, and it contains all strings of 0’s and 1’s that do not have two consecutive 1’s. Click 3 So the empty string is there (POINT), and both strings of length 1 (POINT). Of the strings of length two, all are there except 11 (POINT), because that string obviously has two consecutive 1’s. Five of the eight possible strings of length three are there (POINT), and eight of the sixteen strings of length four (POINT). Click 4 Here’s a little puzzle for you to think about. For the lengths zero through four, we see there are 1, 2, 3, 5, and 8 strings. Do you recognize this sequence? And can you extend it to strings of length 5, 6 ,and so on? Can you prove your prediction is correct? Hmm… 1 of length 0, 2 of length 1, 3, of length 2, 5 of length 3, 8 of length 4. I wonder how many of length 5?

Deterministic Finite Automata
A formalism for defining languages, consisting of: A finite set of states (Q, typically). An input alphabet (Σ, typically). A transition function (δ, typically). A start state (q0, in Q, typically). A set of final states (F ⊆ Q, typically). “Final” and “accepting” are synonyms. Click 1 Now we’ll give the formal notation for deterministic finite automata. Click 2 There is a finite set of states, and we usually use Q as the set of states. Don’t ask me why it is Q; it just is. Click 3 And there is a finite input alphabet. The traditional name for the input alphabet is Sigma, again for no known reason. Click 4 There is a transition function, which we’ll discuss on the next slide. It is the guts of the automaton, since it tells how the automaton moves from state to state in response to inputs. The traditional symbol for the transition function is delta. Click 5 One of the states in Q is the start state, and we designate it q_0 typically. Click 6 And some of the states are designated final states, or synonymously “accepting states.” The final states are traditionally denoted F.

The Transition Function
Takes two arguments: a state and an input symbol. δ(q, a) = the state that the DFA goes to when it is in state q and input a is received. Note: always a next state – add a dead state if no transition (Example on next slide). The thing that makes the automaton work is the transition function. Click 1 This function, usually denoted by delta, takes two arguments – a state q and an input symbol “a”. Click 2 It gives you back the state that the automaton goes to when it is in state q and the next input symbol to arrive is “a”. Click 3 The function delta is “total,” that is, it has a value for every state and symbol. There are examples of automata where we really don’t want to continue in certain situations. For example, our tennis automaton did not have transitions out of the two states where one player or the other has won the game. To fix up such situations, we have to introduce a “dead state.” A “dead state” is a state that is not accepting and that has a transition to itself on every input symbol. Once you get to a dead state, you cannot leave and you can never accept, so “dead” is a pretty good description of what is going on. After discussing slide: We’re going to use the abbreviation DFA for “Deterministic Finite Automaton.” The “deterministic” means that there is a unique transition for every state and input symbol. We’re going to meet nondeterministic automata soon, and there it is possible to transition to many states from one state on one input.

Server Wins Opp’nt s o s o Love-40 15-30 30-15 40-Love s o deuce s o
Dead s, o 30-40 40-30 s o Ad-out Ad-in s o 40-15 15-40 30-all s o Love-30 15-all 30-Love s o Start 15-Love s Love o Love-15 Here is our tennis example. Notice that the two final states have no transitions out (POINT). Click 1 So we add a dead state, and all the missing transitions go to that state. The transitions from the dead state are to itself on all possible input symbols.

Graph Representation of DFA’s
Nodes = states. Arcs represent transition function. Arc from state p to state q labeled by all those input symbols that have transitions from p to q. Arrow labeled “Start” to the start state. Final states indicated by double circles. Like the tennis automaton, we can represent any finite automaton by a graph. Click 1 The nodes of the graph are the states of the automaton. Click 2 The arcs are the transitions. There is an arc from state (or node) p to state or node q labeled by all the input symbols “a” such that delta(p,a) = q. For example, this arc (DRAW above: p and q and an arc labeled a,b) would be present if delta(p,a) and delta(p,b) were both q. Click 3 We represent the start state by adding an arrow labeled “Start” into that state. Click 4 And we indicate final states by double circles.

Example: Recognizing Strings Ending in “ing”
Not i or g Not i Not i or n i nothing Saw i Saw in Saw ing i n g This is an interesting example of an automaton that processes text. The goal is to recognize that the string read so far ends in “ing”. The start state (POINT) represents the condition where we have made no progress toward seeing i-n-g. If, in that state, we next see “i,” we have made some progress, so we go to the state that says “i” was the last symbol seen (POINT). Otherwise, we stay in the start state, because what we’ve seen so far is no help in seeing i-n-g. From the “Saw i” state, if we next see “n,” then we have made more progress. We go to a state “Saw in.” (POINT) If we see another “i” in state “Saw i,” then we have not made progress, but neither have we lost ground. We may be reading a word like “Skiing,” with a double “i.” Thus, the transition from state “Saw i” on “i” is to itself (POINT to transition). On any symbol other than “i” or “n,” we go from”Saw i” back to the start state (POINT to transition). In state “Saw in,” if we next see a “g,” then we win – we have just seen i-n-g as the last three symbols, so we go to the accepting state “Saw ing.” (POINT) On the other hand, if from State “Saw in,” we next see an “i,” then the pattern i-n is broken but a new pattern beginning with “i” has started, so we go to the “Saw i” state (POINT to transition). On any input other than “i” or “g,” including “n,” we have no progress at all, so we go back to the start state (POINT to transition). From state “Saw ing,” we can only go to state “Saw i” on “i,” (POINT to transition) and otherwise we must go back to the start state (POINT to transition). i Start i Not i

Example: Protocol for Sending Data
timeout data in Ready Sending Here is an automaton that represents the simplest possible protocol for sending data. The program is in one of two states, Ready and Sending. It starts in the Ready state (POINT). Eventually, it gets a signal that some data has been loaded into its buffer (POINT to transition). At that point, it enters the Sending state (POINT), where it does what is necessary to transmit the contents of the buffer. The receiver will send an ack signal when the contents are received, in which case we return to the Ready state (POINT to transition). However, if the receiver is down, we may instead get a local timeout signal that warns something is wrong, and the buffer must be retransmitted (POINT to transition). This automaton is not complete. There are no final states, because this automaton is designed to run forever without rendering a decision. Also, there are missing transitions. It is OK to ignore ack or timeout signals in the Ready state, staying in the Ready state (DRAW). However, a data-in signal in the Sending state is an indication of an error, so we might want to go to an error state (DRAW). And the error state is a dead state, so we need transitions to itself on Any input. A typical question that is asked about protocols is whether you can get to an error state. We could, for example, make the error state final (DRAW double circle), and then ask if the language of the automaton is empty. As we shall see, there is an algorithm to tell whether the language of a finite automaton is empty – a question we could not ask about programs in general. In this example, however, it is really easy to see that there is a path in the graph from the start state to the final state, so its language is not empty and errors are possible. Start ack

Example: Strings With No 11
Start 1 A C B 0,1 Consecutive 1’s have been seen. String so far has no 11, does not end in 1. String so far has no 11, but ends in a single 1. For a running example, we are going to use this automaton. Its language is the set of binary strings that do not contain two consecutive 0’s. Click 1: State A is where the automaton will be whenever the input string seen so far is “good,” that is, it contains no consecutive 1’s, and also, it does not end in a 1. Surely this state should be the start state, since when no input has been received, there are not two consecutive 1’s, and moreover, the input so far does not end in 1. Click 2 We get to state B when the input is good, that is, no two consecutive 1’s, but the last symbol seen is a 1. Notice the only way to get to B is to be in A and then get input 1. Click 3 C is actually a dead state. We are there whenever two consecutive 1’s have been received. We arrive at C for the first time from B, which you reacall means that the previous input was 1, and in state B we receive a second 1. Once in C, we stay there, because once a string has 11, you can never undo that fact, no matter how many 0’s you see.

Alternative Representation: Transition Table
Final states starred * Columns = input symbols 1 A A B B A C C C C Arrow for start state Each entry is δ of the row and column. Rows = states We can also represent automata by tables. Here’s our example automaton for strings without consecutive 1’s, shown as a transition graph in the corner. And we also see a representation of the same automaton as a table. Click 1 The rows each correspond to one of the states. Click 2 And the columns correspond to the input symbols. Click 3 We indicate the final states by putting a star next to their name. Click 4 And the start state is indicated by an arrow. Click 5 The entries of the table are the values of the transition function applied to the state that is the row and the symbol that is the column. For example, this entry (POINT) is C because it is in the row for state B and the column for input symbol 1, and we know that delta(B,1) is C (POINT to graph). Start 1 A C B 0,1

Convention: Strings and Symbols
… w, x, y, z are strings. a, b, c,… are single input symbols. There are two important conventions that we are going to use. They let us know the types of things without declaring types. In particular we can distinguish between strings and characters. Click 1 We use lower-case letters at the end of the alphabet, w. x. y .and z, and sometimes u or v, to represent strings. When you see one of these letters, you need to think “aha –that’s a string.” Click 2 And lower-case letters near the beginning of the alphabet will represent single symbols. Again, we’ll try to remind you at first, but you have to think “single symbol” when you see one.

Extended Transition Function
We describe the effect of a string of inputs on a DFA by extending δ to a state and a string. Intuition: Extended δ is computed for state q and inputs a1a2…an by following a path in the transition graph, starting at q and selecting the arcs with labels a1, a2,…, an in turn. The extended transition function delta is a function that takes a state q and a string w of any length, including 0, and tells us where the automaton gets to if it follows the path in the transition diagram from q where the arcs are labeled by each of the symbols of w in order. That is, we look for the unique path whose labels form w. In some materials, you will see a hat over the delta to remind you that it is the extended version. However, as we shall see, the extended delta agrees with the given delta when the string w is of length 1, that is, it is a single symbol. Thus, there is not really a need to distinguish the extended and original deltas.

Inductive Definition of Extended δ
Induction on length of string. Basis: δ(q, ε) = q Induction: δ(q,wa) = δ(δ(q,w),a) Remember: w is a string; a is an input symbol, by convention. Click 1 The definition of the extended transition function delta is an induction on the length of the string to which this function is applied. Click 2 for the basis, Delta(q,epsilon) is q. That is, if you are in state q, and no input symbols arrive, then you stay in state q. Click 3 For the induction, suppose the input string is wa (POINT). By our convention, w is a string of some length, possibly zero, and a is a single symbol. The inductive rule says that we first see where you get from state q on string of inputs w (DRAW circle around delta(q,w) ) and then see where you get from that state, whatever it is, on the last input a.

Example: Extended Delta
1 A A B B A C C C C DRAW ALL OF THIS: 011 is split into w=01, a=1. Then we have to compute delta(B,01) and see where it goes on input 1. Similarly, 01 is broken into w=0 and a=1, so we need to compute delta(B,0) and see where we get to on input 1. But delta(B,0) is A. Then we need to compute delta(A,1), which is B, and finally delta(B,1), which is C. δ(B,011) = δ(δ(B,01),1) = δ(δ(δ(B,0),1),1) = δ(δ(A,1),1) = δ(B,1) = C

Language of a DFA Automata of all kinds define languages.
If A is an automaton, L(A) is its language. For a DFA A, L(A) is the set of strings labeling paths from the start state to a final state. Formally: L(A) = the set of strings w such that δ(q0, w) is in F. Click 1 There are many different kinds of automata. We’ve seen only the deterministic finite automaton so far, but there are many others. But no matter what kind of automaton, its job is to define some language. Click 2 We’ll use L(A) to denote the language defined by automaton A. Click 3 For a deterministic finite automaton, the language it defines is the set of strings that take the start state to a final state. Click 4 Formally, the language of a DFA is the set of strings w such that the extended delta applied to the start state and this w is a final state. (DRAW q_0 ->w -> state in F)

Example: String in a Language
String 101 is in the language of the DFA below. Start at A. Start 1 A C B 0,1 For example, consider our running example of the automaton that accepts strings without consecutive 1’s, and consider the input 101. It should accept this string, because it obviously does not have consecutive 1’s. The start state is A, and if we follow the path labeled 101, we first go to B.

String 101 is in the language of the DFA below. Follow arc labeled 1. 0,1 1 1 A B C Then from B we follow the arc with a 0. Start

String 101 is in the language of the DFA below. Then arc labeled 0 from current state B. 0,1 1 1 A B C That leads us back to A. The third input is another 1, so that gets us back to B again. Start

String 101 is in the language of the DFA below. Finally arc labeled 1 from current state A. Result is an accepting state, so 101 is in the language. 0,1 1 1 A B C Since the path from A labeled 101 gets us to a final state, 101 is indeed accepted by the automaton. Start

Example – Concluded The language of our example DFA is:
{w | w is in {0,1}* and w does not have two consecutive 1’s} Read a set former as “The set of strings w… Such that… These conditions about w are true. This expression is called a SET FORMER.” It describes sets, and in particular we’ll use it to describe languages. IN this example, it is describing the language of the automaton that has served as our running example. The set former starts with a curly brace, and then an expression representing the things that we want to put in the set. Click 1 In this case, the expression simply says that the set consists of some strings w. We know they’re strings because of our convention. Click 2 The vertical bar can be read “such that.” Click 3 Then, there follows a description of what must be true for something to be a member of the set. In our example, we say that the string consists of 0’s and 1’s and does not have two consecutive 1’s.

Regular Languages A language L is regular if it is the language accepted by some DFA. Note: the DFA must accept only the strings in L, no others. Some languages are not regular. Intuitively, regular languages “cannot count” to arbitrarily high integers. Now, we introduce a class of languages called the regular languages. Click 1 These are the languages that have a deterministic finite automaton accepting them. That means that the language is exactly the set of strings accepted by this automaton. Soon, we shall see that there are several other ways to describe the regular languages, including by regular expressions and nondeterministic automata. Click 2 While many common languages are regular, there are also many that are not. Intuitively, finite automata cannot count beyond a fixed number. Thus, they cannot do things like check whether they have seen the same number of 0’s as 1’s on their input, or check that parentheses are balanced in an arithmetic expression. For those tasks, we need more powerful mechanisms, such as context-free grammars which we shall meet soon enough.

Example: A Nonregular Language
L1 = {0n1n | n ≥ 1} Note: ai is conventional for i a’s. Thus, 04 = 0000, e.g. Read: “The set of strings consisting of n 0’s followed by n 1’s, such that n is at least 1. Thus, L1 = {01, 0011, ,…} Click 1 Here is an example of a language that is simple to understand, but is not a regular language. Click 2 To understand what this notation is saying, we first need to know that an exponent i on a symbol is shorthand for the string consisting of i copies of that symbol. Thus, 0 to the 4th means the string 0000. Click 3 Thus, we read this set former as “0 to the n, 1 to the n such that n >= 1,” or more verbosely, “ the set of strings consisting of n 0’s followed by n 1’s for some n >= 1.” Click 4 The strings in the language L_1 are thus 01, 0011, and so on.

Another Example L2 = {w | w in {(, )}* and w is balanced }
Balanced parentheses are those sequences of parentheses that can appear in an arithmetic expression. E.g.: (), ()(), (()), (()()),… Here’s another example of a nonregular language: the set of strings that are balanced parentheses. The strings of balanced parentheses are those that can appear in an arithmetic expression. For example, the string () (POINT) can appear in an expression like (a+b) times c (DRAW). Or ()() (POINT) can appear in an expression like (a+b)(c+d) (DRAW). Examples of strings that are not balanced are )( (DRAW). This is not balanced because no prefix of a balanced string can have more left parens than right parens. And ((()) (DRAW) is not balanced because it has more left parens than right.

But Many Languages are Regular
They appear in many contexts and have many useful properties. Example: the strings that represent floating point numbers in your favorite language is a regular language. However, regular languages are common. For example, in each language there is a format for floating point numbers, and this format can be quite complicated, with optional E’s or decimal points, and strings of digits that could be empty. But in all programming languages I know about, the set of strings that represent some floating-point number is a regular language.

Example: A Regular Language
L3 = { w | w in {0,1}* and w, viewed as a binary integer is divisible by 23} The DFA: 23 states, named 0, 1,…,22. Correspond to the 23 remainders of an integer divided by 23. Start and only final state is 0. Click 1 This is a very interesting case illustrating what finite memory means. We want to know if a binary number is divisible by 23. We’re going to read the bits high-order first. But if we have only finite memory, how can we remember exactly what sequence of bits has been read, since the sequence can grow very long? Click 2 But there is a trick. We don’t really need to remember everything about the bits read. It is sufficient to remember what the remainder is, when divided by 23. Click 3 Thus, we’ll have 23 states, 0 through 22. Click 4 These correspond to the 23 possible remainders when an integer is divided by 23. Click 5 The start state is 0, because we interpret the empty string as representing zero. That may be a bit of an assumption, but nothing else makes sense, and treating it as zero makes everything work out right. State 0 is also the only final state, since we want the inputs that leave a remainder of 0 when divided by 23.

Transitions of the DFA for L3
If string w represents integer i, then assume δ(0, w) = i%23. Then w0 represents integer 2i, so we want δ(i%23, 0) = (2i)%23. Similarly: w1 represents 2i+1, so we want δ(i%23, 1) = (2i+1)%23. Example: δ(15,0) = 30%23 = 7; δ(11,1) = 23%23 = 0. Click 1 We’re going to assume that things are working right after reading a binary string w. That is w takes state 0 to the state that is the correct remainder when w is divided by 23. I’m using the C-style percent operator to denote “remainder.” You can read the % as “modulo” or “mod,” which is just another way of saying “the remainder when divided by…” Click 2 The transition from each state i on input 0 is to the state that is the remainder of 2i divided by 23. To see why this works, we know that i = 23a + b (DRAW at bottom) for some integer a and for some integer b in the range 0 to 22. That is, b = i%23. Then 2i = 46a + 2b (DRAW). Since 46 is divisible by 23, 2i%23 = 2b%23 (CROSS out 46a). That is, we can take the state b, which is the remainder, double it, subtract 23 if it is 23 or more, and that gives us the new remainder and therefore the new state. We never need to know what a is and we never need to know i exactly. Click 3 For the same reason, when a 1 arrives at the input, we can go from state i to the state that is the remainder of 2i+1 divided by 23. Click 4 For example twice 15 is 30, so from state 15, we go to the state that is the remainder of 30 divided by 23, or state 7. From state eleven, on input 1 we go to the state that is the remainder of twice eleven plus one, divided by 23, which is state 0. So, whenever a string gets you to state 11, i.e., the string leaves a remainder of 11 when divided by 23, that string followed by a 1 MUST be divisible by 23 and therefore must be accepted.

Another Example L4 = { w | w in {0,1}* and w, viewed as the reverse of a binary integer is divisible by 23} Example: is in L4, because its reverse, is 46 in binary. Hard to construct the DFA. But there is a theorem that says the reverse of a regular language is also regular. Click 1 Interestingly, another regular language is the set of all the binary strings that if we read the strings in backwards – that is, low-order bit first, form a binary integer that is divisible by 23. And there’s nothing special about 23. It could be any number in both this example and the previous one. Click 2 For example, is in this language because when we reverse it, we get , which has the value 46 in decimal, and 46 is, of course, divisible by 23. Click 3 It is a really tricky to design a DFA to accept this reverse language. However, there is a theorem, which we shall see soon, that says if a language is regular, than its reversal – what you get by reversing each of its strings – is also a regular language. The proof of this theorem will let us construct the DFA for the reverse language from the DFA we just saw.

Deterministic Finite Automata

Similar presentations

Presentation on theme: "Deterministic Finite Automata"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deterministic Finite Automata

Similar presentations

Presentation on theme: "Deterministic Finite Automata"— Presentation transcript:

Similar presentations

About project

Feedback