 # FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

## Presentation on theme: "FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY"— Presentation transcript:

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
15-453 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY Lecture7x.ppt

Chomsky Normal Form and TURING MACHINES

B and C aren’t start variables
CHOMSKY NORMAL FORM A context-free grammar is in Chomsky normal form if every rule is of the form: A → BC B and C aren’t start variables A → a a is a terminal S → ε S is the start variable Chomsky normal form: a special way of writing a CFG. So each rule either has two variables produced, or a single terminal produced. And only the start can produce epsilon. Note this doesn’t mean that the grammar must generate epsilon: it just says that the production rule that can generate epsilon must have the start variable on the left hand side. Any variable A (except the start variable) can only generate strings of length > 0

B and C aren’t start variables
CHOMSKY NORMAL FORM A context-free grammar is in Chomsky normal form if every rule is of the form: A → BC B and C aren’t start variables A → a a is a terminal S → ε S is the start variable S → TU | TV T → 0 U → SV S0 → TU | TV | ε V → 1 S → 0S1 S → TT T → ε

Proof (by induction on |w|):
Theorem: If G is in CNF, w  L(G) and |w| > 0, then any derivation of w in G has length 2|w| - 1 Proof (by induction on |w|): If |w| = 1, then any derivation of w must have length 1 (S → a) Base Case: Assume true for any string of length at most k ≥ 1, and let |w| = k+1 Inductive Step: Since |w| > 1, derivation starts with S → AB Chomsky normal form is useful in several ways, here’s one of them: The length of a derivation of w is proportional to the length of w. So w = xy where A * x, |x| > 0 and B * y, |y| > 0 By the inductive hypothesis, the length of any derivation of w must be 1 + (2|x| - 1) + (2|y| - 1) = 2(|x| + |y|) - 1

“Can transform any CFG into Chomsky normal form”
Theorem: Any context-free language can be generated by a context-free grammar in Chomsky normal form “Can transform any CFG into Chomsky normal form” Another interesting property of CNF is that every context free language has a CNF grammar.

Theorem: Any context-free language can be generated by a context-free grammar in Chomsky normal form
Proof Idea: 1. Add a new start variable 2. Eliminate all Aε rules. Repair grammar 3. Eliminate all AB rules. Repair 4. Convert Au1u2... uk to A u1A1, A1u2A2, ... If ui is a terminal, replace ui with Ui and add Uiui The high level idea is to just remove those rules that aren’t in the proper form, and replace them with equivalent rules that are in the proper form.

S0 → S 3. Remove unit rules A → B S0 → 0S1
Add a new start variable S0 and add the rule S0 → S 2. Remove all A → ε rules (where A is not S0) S → 0S1 S → T#T For each occurrence of A on right hand side of a rule, add a new rule with the occurrence deleted S → T T → ε S → T# If we have the rule B → A, add B → ε, unless we have previously removed B → ε S → #T S → # S → ε 3. Remove unit rules A → B Now I’ll give a procedure for converting a grammar into CNF. ... We can just remove S -> T, no rules of the form T -> w S_0 -> S: replace with S_0 -> 0S1, once we do this replacement for all rules we’ll get… (next slide) S → 01 S0 → 0S1 Whenever B → w appears, add the rule A → w unless this was a unit rule previously removed S0 → ε

4. Convert all remaining rules into the proper form: S0 → 0S1 S0 → T#T
S0 → A1A3 S0 → A1A2 S0 → #T A1 → 0 S → 01 S0 → # A2 → SA3 S → A1A3 S0 → 01 A3 → 1 S → 0S1 S → T#T After applying these rules some more, we get this grammar on the right. No more eps-productions, no more unit-productions. We can convert S0 -> 0S1 as follows. Note that all the rules involving a T can be removed, there’s nothing you can do with them! So what does the final grammar look like? What are the rules? We’d also have to convert S -> 0S1, just as in the other case. S → T# S → #T S → # S → 01

Convert the following into Chomsky normal form:
A → BAB | B | ε B → 00 | ε A → BAB | B | BB | AB | BA B → 00 S0 → A | ε A → BAB | B | ε B → 00 | ε S0 → A A → BAB | 00 | BB | AB | BA B → 00 S0 → BAB | 00 | BB | AB | BA | ε 1. Get rid of epsilon-productions. 2. Get rid of the unit productions. 3. Then clean up the remaining rules so they have two variables or one symbol on each RHS. So that’s how you get a CNF grammar from an arbitrary one. S0 → BC | DD | BB | AB | BA | ε, C → AB, A → BC | DD | BB | AB | BA , B → DD, D → 0

TURING MACHINE FINITE STATE CONTROL q1 q0 A I N P U T INFINITE TAPE
That concludes our discussion of context-free languages. Now we’ll shift to a much more general computational model, called the Turing machine. Named after its inventor… It has a finite control, like that of a DFA, but it can read and write to its input, which we think of as being presented on an infinite tape. Only a finite part of the tape has input symbols on it, and the rest of the tape is filled with special “blank” symbols in both directions. This pointer here is the TAPE HEAD. It begins at the leftmost input symbol. Here’s what a “step” of a Turing machine looks like: from a state q0, it will read the symbol under the head, write a symbol, change its state, and move its head left or right by one. A I N P U T INFINITE TAPE

read write move  → , R qaccept  → , R qreject  → , L
Now the state diagrams for a TM look something like this. We have a TM here. We have both accept states and reject states. What language does it recognize? Suppose we take away the reject state and replace it with this loop.

read write move  → , R qaccept  → , R  → , L
We will say that this new TM still recognizes this language, even though there are inputs for which it will run forever.

TMs VERSUS FINITE AUTOMATA
TM can both write to and read from the tape The head can move left and right The input doesn’t have to be read entirely, and the computation can continue after all the input has been read But note that a Turing machine need not accept OR reject on an input: it could run forever. Accept and Reject take immediate effect

Testing membership in L = { w#w | w  {0,1}* }
STATE q0, FIND # q1, FIND # q#, FIND  q0, FIND  q1, FIND  qGO LEFT 1 X 1 X 1 # # X 1 1 1 If there’s no # on the tape, reject. While there is a 0 or 1 to the left of #, Replace the first bit with X, and check if the first bit to the right of the # is identical. (If not, reject.) Replace that bit with an X too. If there’s a bit to the right of #, then reject else accept

Definition: A Turing Machine is a 7-tuple
T = (Q, Σ, Γ, , q0, qaccept, qreject), where: Q is a finite set of states Σ is the input alphabet, where   Σ Γ is the tape alphabet, where   Γ and Σ  Γ  : Q  Γ → Q  Γ  {L, R} q0  Q is the start state qaccept  Q is the accept state qreject  Q is the reject state, and qreject  qaccept

11010q700110 CONFIGURATIONS corresponds to: q7 1
We can completely describe a Turing machine at any step of its computation using strings called configurations. These are strings over the tape alphabet and the states (treating the states as symbols). For example, this string means that the Turing machine has these 1’s and 0’s on its tape, the state is q7, and its tape head is reading the 0 to the right of it. Configurations are like “snapshots” of the TM at a given point. With configurations, we can give a formal definition of TM accepting a string. (Read Sipser.) On input w, the “start” configuration is ‘q0 w’ From any particular configuration, there is a “next” configuration we can get to by applying the transition function. And if we reach a configuration that has an accept state in it, then the machine accepts.

A language L is called Turing-recognizable or recursively enumerable
A TM recognizes a language iff it accepts all and only those strings in the language A language L is called Turing-recognizable or recursively enumerable iff some TM recognizes L A TM decides a language L iff it accepts all strings in L and rejects all strings not in L Now, since a TM can run forever on some inputs, we have two different ways for a TM to “compute” a language. A language L is called decidable or recursive iff some TM decides L

A language is called Turing-recognizable or recursively enumerable (r
A language is called Turing-recognizable or recursively enumerable (r.e.) if some TM recognizes it A language is called decidable or recursive if some TM decides it recursive languages r.e.

Theorem: If A and A are r.e. then A is recursive
Given: a TM that recognizes A and a TM that recognizes A, we can build a new machine that decides A So what does this theorem mean? (Click) How might we prove this? (Don’t go on until it’s proved.)

2n { 0 | n ≥ 0 } PSEUDOCODE: Sweep from left to right, cross out every other 0 If in stage 1, the tape had only one 0, accept If in stage 1, the tape had an odd number of 0’s, reject Move the head back to the first input symbol. Go to stage 1. How can we get a TM that decides this language? As with the last lecture, we’ll want to use precise pseudocode whenever we can, because it’s just so much easier to understand. But for every line of pseudocode you write, you need to think about how you’d convert that into a formal specification. (Every execution of stage 1 halves the number of 0’s. If we get to “only one 0” and we never got an odd number of 0’s greater than one, then we must have had a power of two.)

{ 0 | n ≥ 0 } 2n qreject qaccept x → x, L 0 → 0, L q2  → , R
x → x, R x → x, R q0 q1 q3 0 → , R 0 → x, R x → x, R  → , R 0 → 0, R  → , R 0 → x, R This is a state diagram corresponding to the pseudocode of the previous slide. Let’s think about how the two correspond. We have an extra symbol “x” that is used to cross out symbols. We start by writing the first 0 over with a blank. If there was just a single 0, we accept. (This corresponds to stage 2.) In states q3 and q4, we cross out every other 0, until we see a blank. (Stage 1) If we see a blank in state q4, then the number of 0’s left was odd, so we reject. (Stage 3) If we see a blank in state q3, then the number was even, so now in q2 we move left until we hit blank (Stage 4). We move back to q1. qreject qaccept q4 x → x, R  → , R

{ 0 | n ≥ 0 } q00000 2n q1000 xq300 x0q40 x0xq3 x0q2x xq20x
0 → , R  → , R qaccept qreject 0 → x, R x → x, R 0 → 0, L x → x, L  → , L 0 → 0, R { 0 | n ≥ 0 } 2n q0 q1 q2 q3 q4 q00000 q1000 xq300 x0q40 x0xq3 x0q2x xq20x Let’s look at what this TM does by looking at its configurations on an input with four zeroes. …. And the process will continue from q1 at this point. q2x0x q2x0x

C = {aibjck | k = ij, and i, j, k ≥ 1}
PSEUDOCODE: If the input doesn’t match a*b*c*, reject. Move the head back to the leftmost symbol. Cross off an a, scan to the right until b. Sweep between b’s and c’s, crossing off one of each until all b’s are crossed off. Uncross all the b’s. If there’s another a left, then repeat stage 3. If all a’s are crossed out, Check if all c’s are crossed off. If yes, then accept, else reject. (If not much time left, skip this.) We can have a TM that does some basic arithmetic. If we run out of c’s in stage 3, we’ll reject in that case too.

C = {aibjck | k = ij, and i, j, k ≥ 1}
aabbbcccccc xabbbcccccc xayyyzzzccc xabbbzzzccc Here’s an example. xxyyyzzzzzz

MULTITAPE TURING MACHINES
FINITE STATE CONTROL One amazing aspect of Turing machines is that you can define all sorts of variations on them, give them more complicated ways to store information, faster ways to access information, whatever, but so long as your new TM only reads and writes a finite number of symbols in each step, it doesn’t matter: an old TM can simulate it! Here’s an example. We can define Turing machines with multiple tapes. We can imagine the input coming in (on one tape), and using the other tapes for scratch work. This can easily be stuck into the formalism, by redefining the transition function: now it takes k tape symbols, and outputs k tape symbols, and k tape directions.  : Q  Γk → Q  Γk  {L,R}k

Theorem: Every Multitape Turing Machine can be transformed into a single tape Turing Machine
FINITE STATE CONTROL 1 We can simulate these multiple tapes with one tape, in the following way. We write down the contents of all these tapes in order, and put a # between two tapes. To keep track of where all the heads of the multitape machine are, we double the size of the alphabet to include “marked” symbols, and a mark above a symbol indicates the head position. FINITE STATE CONTROL . . . 1 # # #

Theorem: Every Multitape Turing Machine can be transformed into a single tape Turing Machine
FINITE STATE CONTROL 1 When some tape head moves, we move the mark. When the multi-tape machine makes one of its tape contents “longer” (by writing over some blanks), the simulation of it moves all the tape content beyond over by one. Like, suppose the multitape machine adds a 0 to the end of the first tape. Then everything from the # on is shifted over to the right, and 0 is written. FINITE STATE CONTROL . . . 1 # # #

Intuitive Notion of Algorithms
THE CHURCH-TURING THESIS Intuitive Notion of Algorithms EQUALS Turing Machines The facts that: (1) we can change the definition of a TM so much, without changing what can be decided by them, AND (2) everything we can think of that can be computed with an intuitive idea of an algorithm can be done with TMs, led mathematicians to state the Church-Turing thesis. If some language can be computed, in our intuitive sense of what it means to “compute”, then we can build a Turing machine that decides it.

0n10m10k10s10t10r10u1… ( (p, a), (q, b, L) ) = 0p10a10q10b10
We can encode a TM as a string of 0s and 1s start state reject state n states 0n10m10k10s10t10r10u1… m tape symbols (first k are input symbols) blank symbol accept state An important observation that is used time and time again in computability theory and complexity theory, is that we can encode TMs. This means that we can have TMs compute on the code of other TMs. Now if we think of the states and tape symbols as integers, then we can encode transitions as follows… ( (p, a), (q, b, L) ) = 0p10a10q10b10 ( (p, a), (q, b, R) ) = 0p10a10q10b11

Similarly, we can encode DFAs, NFAs, CFGs, etc
Similarly, we can encode DFAs, NFAs, CFGs, etc. into strings of 0s and 1s So we can define the following languages: ADFA = { (B, w) | B is a DFA that accepts string w } ANFA = { (B, w) | B is an NFA that accepts string w } ACFG = { (G, w) | G is a CFG that generates string w }

ADFA = { (B, w) | B is a DFA that accepts string w }
Theorem: ADFA is decidable Proof Idea: Simulate B on w ANFA = { (B, w) | B is an NFA that accepts string w } Theorem: ANFA is decidable ACFG = { (G, w) | G is a CFG that generates string w } Theorem: ACFG is decidable Proof Idea: Transform G into Chomsky Normal Form. Try all derivations of length 2|w|-1

Read Chapter 3 of the book for next time