Finite state automaton (FSA)

Slides:



Advertisements
Similar presentations
CSE 311 Foundations of Computing I
Advertisements

4b Lexical analysis Finite Automata
CSC 361NFA vs. DFA1. CSC 361NFA vs. DFA2 NFAs vs. DFAs NFAs can be constructed from DFAs using transitions: Called NFA- Suppose M 1 accepts L 1, M 2 accepts.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
YES-NO machines Finite State Automata as language recognizers.
Chapter Section Section Summary Set of Strings Finite-State Automata Language Recognition by Finite-State Machines Designing Finite-State.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
1 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY (For next time: Read Chapter 1.3 of the book)
CS5371 Theory of Computation
FSA and HMM LING 572 Fei Xia 1/5/06.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Languages, grammars, and regular expressions
1 Single Final State for NFAs and DFAs. 2 Observation Any Finite Automaton (NFA or DFA) can be converted to an equivalent NFA with a single final state.
Fall 2005 CSE 467/567 1 Formal languages regular expressions regular languages finite state machines.
1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
1 Languages, grammars, and regular expressions LING 570 Fei Xia Week 2: 10/03/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Topics Automata Theory Grammars and Languages Complexities
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
CSC 361Finite Automata1. CSC 361Finite Automata2 Formal Specification of Languages Generators Grammars Context-free Regular Regular Expressions Recognizers.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
Languages and Machines Unit two: Regular languages and Finite State Automata.
CSE 311: Foundations of Computing Fall 2014 Lecture 23: State Minimization, NFAs.
CPSC 388 – Compiler Design and Construction
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Theory of Languages and Automata
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analyzer (Checker)
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CHAPTER 1 Regular Languages
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
INHERENT LIMITATIONS OF COMPUTER PROGAMS CSci 4011.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Finite Automata & Regular Languages Sipser, Chapter 1.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
Lecture Notes 
Algorithms for hard problems Automata and tree automata Juris Viksna, 2015.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
1 Language Recognition (11.4) Longin Jan Latecki Temple University Based on slides by Costas Busch from the courseCostas Busch
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Conversions Regular Expression to FA FA to Regular Expression.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
1 Finite Automata. 2 Introductory Example An automaton that accepts all legal Pascal identifiers: Letter Digit Letter or Digit "yes" "no" 2.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Lexical analysis Finite Automata
Two issues in lexical analysis
Chapter 2 FINITE AUTOMATA.
REGULAR LANGUAGES AND REGULAR GRAMMARS
Finite Automata.
4b Lexical analysis Finite Automata
Chapter Five: Nondeterministic Finite Automata
4b Lexical analysis Finite Automata
Chapter 1 Regular Language
Finite-State Machines with No Output
Presentation transcript:

Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

Hw1 Need to test your code on patas Windows carriage return vs. unix newline Your machine could yield different results Compile your code and include the binary Include the shell scripts Make sure that your code does not crash Set the executable bit in *.(sh|pl|py|…) Including path names in the code could be a problem. See Bill’s GoPost message: “how to make sure that we can run your code”

Hw1 (cont) Whitespace means \s+ Your code should be able to handle empty lines, etc. make_voc.* runs slowly  use a hash table It is really important to follow the instructions: Ex: Retain the same line break Ex: the output of make_voc.sh Should be of 1152 instead > of 1152 1: (‘of’, 1152)

Hw1 (cont) Grading criteria: Grades: 20 students Incorrect input / output format: -2 Runtime error (e.g., it does not process entire file): -10  -5 for hw1 Missing shell script: -2 No binary, Unix-incompatible: 0 only for this time. Later it will be treated as a runtime error. Grades: 20 students Average: 57.9 Median: 60 (12 got 60) Time spent on the homework: 13 students Average: 12.2 hours Median: 10.5 hours

Hw2 Any questions? n >= 0

From last time

Formal grammar A formal grammar is a 4-tuple: Grammars generate languages. Chomsky Hierarchy: Unrestricted, context sensitive, context free, regular There are other types of grammars. Human languages are beyond context-free.

Formal language A language is a set of strings. Regular language: defined by recursion They can be generated by regular grammars They can be expressed by regular expressions Given a regular language, grammar, or expression, how can we tell whether a string belongs to a language?  creating an FSA as an acceptor

Finite state automaton (FSA)

FSA / FST It is one of the most important techniques in NLP. Multiple FSTs can be combined to form a larger, more powerful FST. Any regular language can be recognized by an FSA. Any regular relation can be recognized by an FST.

FST Toolkits AT&T: http://www.research.att.com/~fsmtools/fsm/man.html NLTK: http://nltk.sf.net/docs.html ISI: Carmel …

Outline Deterministic FSA (DFA) Non-deterministic FSA (NFA) Probabilistic FSA (PFA) Weighted FSA (WFA)

DFA

Definition of DFA An automaton is a 5-tuple = An alphabet input symbols A finite set of states Q A start state q0 A set of final states F A transition function:

= {a, b} S = {q0, q1} F = {q1} q0 £ b ! q1, q1 £ b ! q1 } a b q1 q0 = { q0 £ a ! q0, q0 £ b ! q1, q1 £ b ! q1 } a b q0 q1 What about q1 £ a ?

Representing an FSA as a directed graph The vertices denote states: Final states are represented as two concentric circles. The transitions forms the edges. The edges are labeled with symbols.

An example a b b q1 q0 b a q2 a a b b a a a b b a b q0 q0 q1 q1 q2 q1

DFA as an acceptor A string is said to be accepted by an FSA if the FSA is in a final state when it stops working. that is, there is a path from the initial state to a final state which yields the string. Ex: does the FSA accept “abab”? The set of the strings that can be accepted by an FSA is called the language accepted by the FSA.

An algorithm for deterministic recognition of DFAs

An example a b q0 q1 FSA: Regular language: {b, ab, bb, aab, abb, …} Regular expression: a* b+ Regular grammar: q0  a q0 q0  b q1 q1  b q1 q1  ²

NFA

NFA A transition can lead to more than one state. There could be multiple start states. Transitions can be labeled with ², meaning states can be reached without reading any input.  now the transition function is:

NFA example b q0 q1 q2 a a b b a b b b b a b b q0 q0 q1 q1 q2 q1

Definition of regular expression The set of regular expressions is defined as follows: (1) Every symbol of is a regular expression (2) ² is a regular expression (3) If r1 and r2 are regular expressions, so are (r1), r1 r2, r1 | r2 , r1* (4) Nothing else is a regular expression.

Regular expression  NFA Base case: Concatenation: connecting the final states of FSA1 to the initial state of FSA2 by an ²-translation. Union: Creating a new initial state and add ²-transitions from it to the initial states of FSA1 and FSA2. Kleene closure:

Regular expression  NFA (cont) Kleene closure: An example: \d+(\.\d+)?(e\-?\d+)?

Regular grammar and FSA Conversion between them  Hw3

Relation between DFA and NFA DFA and NFA are equivalent. The conversion from NFA to DFA: Create a new state for each equivalent class in NFA The max number of states in DFA is 2N, where N is the number of states in NFA. Why do we need both?

Common algorithms for FSA packages Converting regular expressions to NFAs Converting NFAs to regular expressions Determinization: converting NFA to DFA Other useful closure properties: union, concatenation, Kleene closure, intersection

So far A DFA is a 5-tuple: A NFA is a 5-tuple: DFA and NFA are equivalent. Any regular language can be recognized by an FSA. Regular language  Regex  NFA  DFA  Regular grammar

Outline Deterministic finite state automata (DFA) Non-deterministic finite state automata (NFA) Probabilistic finite state automata (PFA) Weighted Finite state automata (WFA)

An example of PFA F(q0)=0 F(q1)=0.2 I(q0)=1.0 I(q1)=0.0 b:0.8 a:1.0 I(q0)=1.0 I(q1)=0.0 P(abn)=I(q0)*P(q0,abn,q1)*F(q1) =1.0*1.0*0.8n*0.2

Formal definition of PFA A PFA is Q: a finite set of N states Σ: a finite set of input symbols I: Q R+ (initial-state probabilities) F: Q R+ (final-state probabilities) : the transition relation between states. P: (transition probabilities)

Constraints on function: Probability of a string:

PFA Informally, in a PFA, each arc is associated with a probability. The probability of a path is the multiplication of the arcs on the path. The probability of a string x is the sum of the probabilities of all the paths for x. Tasks: Given a string x, find the best path for x. Given a string x, find the probability of x in a PFA. Find the string with the highest probability in a PFA …

Another PFA example: A bigram language model P( BOS w1 w2 … wn EOS) = P(BOS) * P(w1 | BOS) P(w2 | w1) * …. P(wn | wn-1) * P(EOS | wn) Examples: I bought two/to/too books How many states?

Weighted finite-state automata (WFA) Each arc is associated with a weight. “Sum” and “Multiplication” can have other meanings. Ex: weight is –log prob - “multiplication”  addition - “Sum”  power

Summary DFA and NFA are 5-tuple: PFA and WFA are 6-tuple: They are equivalent Algorithm for constructing NFAs for Regexps PFA and WFA are 6-tuple: Existing packages for FSA/FSM algorithms: Ex: intersection, union, Kleene closure, difference, complementation, …