Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.

Slides:



Advertisements
Similar presentations
Why empty strings? A bit like zero –you may think you can do without –but it makes definitions & calculations easier Definitions: –An alphabeth is a finite.
Advertisements

Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Automata Theory Part 1: Introduction & NFA November 2002.
Formal Languages Languages: English, Spanish,... PASCAL, C,... Problem: How do we define a language? i.e. what sentences belong to a language? e.g.Large.
Kleene's Theorem We have defined the regular languages, using regular expressions, which are convenient to write down and use. We have also defined the.
Theory of Computing Lecture 23 MAS 714 Hartmut Klauck.
Finite-State Machines with No Output Ying Lu
4b Lexical analysis Finite Automata
CMPS 3223 Theory of Computation
Lecture 24 MAS 714 Hartmut Klauck
Theory Of Automata By Dr. MM Alam
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Chapter Section Section Summary Set of Strings Finite-State Automata Language Recognition by Finite-State Machines Designing Finite-State.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture7: PushDown Automata (Part 1) Prof. Amos Israeli.
Intro to DFAs Readings: Sipser 1.1 (pages 31-44) With basic background from Sipser 0.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
1 Lecture 16 FSA’s –Defining FSA’s –Computing with FSA’s Defining L(M) –Defining language class LFSA –Comparing LFSA to set of solvable languages (REC)
PZ02B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ02B - Regular grammars Programming Language Design.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Rosen 5th ed., ch. 11 Ref: Wikipedia
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
1 Unit 1: Automata Theory and Formal Languages Readings 1, 2.2, 2.3.
PZ02B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ02B - Regular grammars Programming Language Design.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
Lecture 05: Theory of Automata:08 Kleene’s Theorem and NFA.
Chapter 9. Section 9.1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R ⊆ A × B. Example: Let A = { 0, 1,2 } and.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CHAPTER 1 Regular Languages
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Complexity and Computability Theory I Lecture #11 Instructor: Rina Zviel-Girshin Lea Epstein.
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
MA/CSSE 474 Theory of Computation Minimizing DFSMs.
Modeling Computation: Finite State Machines without Output
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
1 Chapter Pushdown Automata. 2 Section 12.2 Pushdown Automata A pushdown automaton (PDA) is a finite automaton with a stack that has stack operations.
1 Regular grammars Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
  An alphabet is any finite set of symbols.  Examples: ASCII, Unicode, {0,1} ( binary alphabet ), {a,b,c}. Alphabets.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Lexical analysis Finite Automata
Lecture2 Regular Language
Regular grammars Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Two issues in lexical analysis
Chapter 2 FINITE AUTOMATA.
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
THEORY OF COMPUTATION Lecture One: Automata Theory Automata Theory.
Non-Deterministic Finite Automata
Non-Deterministic Finite Automata
Non Deterministic Automata
Introduction to Finite Automata
Finite Automata.
4b Lexical analysis Finite Automata
Deterministic Finite Automaton (DFA)
4b Lexical analysis Finite Automata
Regular grammars Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Regular grammars Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
What is it? The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An automaton (Automata in plural) is an abstract self-propelled.
PZ02B - Regular grammars Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section PZ02B.
Presentation transcript:

Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising infinite languages? i.e. given a language description and a string, is there an algorithm which will answer yes or no correctly? We will define an abstract machine which takes a candidate string and produces the answer yes or no. The abstract machine will be the specification of the language.

Finite State Automata A finite state automaton is an abstract model of a simple machine (or computer). The machine can be in a finite number of states. It receives symbols as input, and the result of receiving a particular input in a particular state moves the machine to a specified new state. Certain states are finishing states, and if the machine is in one of those states when the input ends, it has ended successfully (or has accepted the input). Example: A a b b b a a a,b

FSA: Formal Definition A Finite State Automaton (FSA) is a 5-tuple (Q, I, F, T, E) where: Q = states = a finite set; I = initial states = a subset of Q; F = final states = a subset of Q; T = an alphabet; E = edges = a subset of Q (T + ) Q. FSA = labelled, directed graph =set of nodes (some final/initial) + each arc may have a label from an alphabet + directed arcs (arrows) between nodes. Example: formal definition of A 1 Q = {1, 2, 3, 4} I = {1} F = {4} T = {a, b} E = { (1,a,2), (1,b,4), (2,a,3), (2,b,4), (3,a,3), (3,b,3), (4,a,2), (4,b,4) } A1A a b b b a a a,b

Try going backwards Given: Q = {1, 2, 3, 4, 5} I = {1} F = {3} T = {a, c, h, m, t} E = { (1,c,2), (1,h,2), (1,m,2), (1,a,3), (1,h,5), (2,a,3), (3,t,4), (5,a,3) } Whats the diagram?

Definitions If (x,a,y) is an edge, x is its start state and y is its end state. A path is a sequence of edges such that the end state of one is the start state of the next. path p 1 = (2,b,4), (4,a,2), (2,a,3) A path is successful if the start state of the first edge is an initial state, and the end state of the last is a final state. path p 2 = (1,b,4),(4,a,2),(2,b,4),(4,b,4) The label of a path is the sequence of edge labels. label(p 1 ) = baa.

Definitions (cont.) A string is accepted by a FSA if it is the label of a successful path. Let A be a FSA. The language accepted by A is the set of strings accepted by A, denoted L(A). babb = label(p 2 ) is accepted by A 1. The language accepted by A 1 is the set of strings of a's and b's which end in a b, and in which no two a's are adjacent a b b b a a a,b

Non-Determinism We would like to use FSAs as a mechanism to recognise languages, simply following the edges as we input symbols from a string. But problem – non-deterministic. That is, we may have a choice. 1.From one state there could be a number of edges with the same label. have to remember possible branching points as we trace out a path, and investigate all branches. 2.Some of the edges could be labelled with, the empty string how does this affect our algorithm? 3.May be more than one initial state where do we start? a a a b b Not Algorithmic

Deterministic FSA A FSA is deterministic if: (i) there are no -labelled edges; (ii) for any pair of state and symbol (q,t), there is at most one edge (q,t, p); and (iii) there is only one initial state. Notation: DFSA and NDFSA stand for deterministic and non-deterministic FSA respectively. A DFSA is a specification of an algorithm for determining whether or not a string is a member of a given language. All three conditions must hold

Transition Functions The transition function of A is the function : (x, t) {y : edge (x, t, y) in A} If A = (Q,I,F,T,E), then the transition matrix for A is a matrix with one row for each state in Q, one column for each symbol in T, s.t. the entry in row x and column t is (x,t). Example: ab 1{2}{4} 2{3}{4} 3{3}{3} 4{2}{4} where x denotes an initial state and x denotes a finish state.

Recognition Algorithm Problem: Given a DFSA, A = (Q,I,F,T,E), and a string w, determine whether w L(A). Note: denote the current state by q, and the current input symbol t. Since A is deterministic, (q,t) will always be a singleton set or will be undefined. If it is undefined, denote it by ( Q). Algorithm: Add symbol # to end of w. q := initial state t := first symbol of w#. while (t # and q ) begin q := (q,t) t := next symbol of w# end return ((t == #) & (q F))

Equivalence of DFSA's and NDFSA's Theorem: DFSA = NDFSA Let L be a language. L is accepted by a NDFSA iff L is accepted by a DFSA Algorithm: NDFSA -> DFSA create unique initial state remove -edges remove edge choices There is a simple algorithm to create a DFSA equivalent (i.e. accepts same language) to a given NDFSA. The reverse direction is trivial.

Algorithms Algorithm: create unique initial state (informal) Input: (Q, I, F, T, E) Q := Q {i} (where i Q) for each q I, E:= E {(i,,q)} I := {i} a b a b a b 7 8

Algorithms Algorithm: remove -edges (informal) remove all single edges of form (q,,q) from E while there are cycles (paths starting and ending at the same state) of -edges in E begin select a cycle merge all states in path into a single state, keeping all edges into and out of path end while there are -edges in E begin select an -edge (p,,q) for each edge (q,t,r) add edge (p,t,r) if q F, add p to F remove (p,,q) from E end

Removing Edge Choices Algorithm: remove edge choices (subset construction) Input: (Q,I,F,T,E) I' := {I}F' := {}E' := {}S := {I}Q' : = {I} while S is not empty begin select X S for each t T begin S' = {q Q:(p,t,q) E, for some p X} if S' F {}, then F' := F' {S'} if S' {}, then E' := E' {(X,t,S')} S := S {S'}\Q' Q' := Q' {S'} end S := S\{X} end return (Q',I',F',T,E') a,b ab (Q = {1,2,3}, I = {1}, F={3}, T={a,b}, E = {(1,a,1),(1,a,2),(1,b,1), (2,b,3),(3,a,3),(3,b,3)}) REMEMBER S is a set of sets!

Removing Edge Choices I' F' E' S Q' X t S' {1} ({1},a,{1,2}) ({1},b,{1}) ({1,2},a,{1,2}) ({1,2},b,{1,3}) ({1,3),a,{1,2,3}) ({1,3},b,{1,3}) ({1,2,3},a,{1,2,3}) ({1,2,3},b,{1,3}) {1} {1,2} -{1} {1,3} -{1,2} {1,2,3} -{1,3} -{1,2,3} {1} {1,2} {1,3} {1,2,3} {1} {1,2} {1,3} {1,2,3} abababababababab {1,2} {1} {1,2} {1,3} {1,2,3} {1,3} {1,2,3} {1,3} {1,2,3} a 1 1,2 1,2,3 b ab a 1,3 b b a REMEMBER S is a set of sets!

Minimum Size of FSA's Let A = (Q,I,F,T,E). For any two strings, x, y in T*, x and y are distinguishable w.r.t. A if there is a string z T* s.t. exactly one of xz and yz are in L(A). z distinguishes x and y w.r.t. A. This means that with x and y as input, A must end in different states - A has to distinguish x and y in order to give the right results for xz and yz. This is used to prove the next result: Theorem: L T*. If, for some integer n, there are n elements of T* s.t. any two are distinguishable w.r.t. A, then any FSA that recognises L must have at least n states.

Unique Minimal DFSA's Theorem: Minimal DFSA For a given language L, there exists a minimal DFSA accepting L, and it is unique. Algorithm: DFSA => minimal DFSA Input: (Q,I,F,T,E) R := { } for all edges (p,t,q) E do add (q,t,p) to R remove edge choices from (Q,I,F,T,R) to get (Q',I',F',T',E') Z := equivalent_states(Q, Q') (Q'',I'',F'',T,E'') := merge(Z,Q,I,F,T,E) return (Q'',I'',F'',T,E'') a b a b 3 a,b b a b b a b a 3,4 4 a,b b

Equivalent States Algorithm: equivalent_states Input: Q, Q' set all cells of M to t for all p in Q do for all sets S in Q' do if p is in S then for each q in Q do if q not in S then M[p][q] := f else for each q in Q do if q in S then M[p][q] := f S := { } for all p in Q do Z := { } if M[p][ ] = t then for each q in Q do if M[p][q] = t then add q to Z M[q] [ ] := f add Z to S return S Note: M is a 2d array, indexed by Q and { } Q

merging equivalent states Algorithm: merge Input: Z,(Q,I,F,T,E) for all S in Z do select p in S for all q in S s.t. q p do delete q from Q for each edge (q,t,w) in E do delete (q,t,w) from E if (p,t,w) not in E then add (p,t,w) to E for each edge (w,t,q) in E do delete (w,t,q) from E if (w,t,p) not in E then add (w,t,p) to E if q in I then delete q from I if p not in I then add p to I if q in F then delete q from F if p not in F then add p to F return (Q,I,F,T,E)

Example Minimisation a b a b 3 a,b b a b b reversed the set Q from remove edge choices is: { {3,5}, {2,3,5}, {2,3,4,5}, {1,2,3,4,5} } a b a b 3 a,b b a b b tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt