Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA.

Similar presentations


Presentation on theme: "1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA."— Presentation transcript:

1 1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A

2 2 FSA / FST It is one of the most important techniques in NLP. Multiple FSAs/FSTs can be combined to form a larger, more powerful FSAs/FSTs. Any regular language can be recognized by an FSA. Any regular relation can be recognized by an FST.

3 3 FST Toolkits AT&T: http://www.research.att.com/~fsmtools/fsm/man. html http://www.research.att.com/~fsmtools/fsm/man. html NLTK: http://nltk.sf.net/docs.htmlhttp://nltk.sf.net/docs.html ISI: Carmel …

4 4 Outline Deterministic FSA (DFA) Non-deterministic FSA (NFA) Probabilistic FSA (PFA) Weighted FSA (WFA)

5 5 DFA

6 6 Definition of DFA An automaton is a 5-tuple = An alphabet input symbols A finite set of states Q A start state q 0 A set of final states F A transition function:

7 7 a b b q0q0 q1q1 = {a, b} S = {q 0, q 1 } F = {q 1 } = { q 0 £ a ! q 0, q 0 £ b ! q 1, q 1 £ b ! q 1 } What about q 1 £ a ?

8 8 Representing an FSA as a directed graph The vertices denote states: –Final states are represented as two concentric circles. The transitions forms the edges. The edges are labeled with symbols.

9 9 An example a b b q0q0 q1q1 q2q2 a b a a b b a a q 0 q 0 q 1 q 1 q 2 q 0 a b b a b q 0 q 0 q 1 q 1 q 2 q 1

10 10 DFA as an acceptor A string is said to be accepted by an FSA if the FSA is in a final state when it stops working. –that is, there is a path from the initial state to a final state which yields the string. –Ex: does the FSA accept “abab”? The set of the strings that can be accepted by an FSA is called the language accepted by the FSA.

11 11 An example Regular expression: a* b + a b b q0q0 q1q1 q 0  a q 0 q 0  b q 1 q 1  b q 1 q 1  ² FSA: Regular grammar: Regular language: {b, ab, bb, aab, abb, …}

12 12 NFA

13 13 NFA A transition can lead to more than one state. There could be multiple start states. Transitions can be labeled with ², meaning states can be reached without reading any input.  now the transition function is:

14 14 NFA example b q0q0 q1q1 q2q2 b b a b a b b a b b q 0 q 0 q 1 q 1 q 2 q 1 b b a b b q 0 q 1 q 2 q 0 q 0 q 1 q 0 q 1 q 2 q 0 q 1 q 2 q 0 q 0 q 0

15 15 Regular grammar and FSA Regular grammar: FSA: Conversion between them  Q1 in Hw2

16 16 Relation between DFA and NFA DFA and NFA are equivalent. The conversion from NFA to DFA: –Create a new state for each equivalent class in NFA –The max number of states in DFA is 2 N, where N is the number of states in NFA. Why do we need both?

17 17 Common algorithms for FSA packages Converting regular expressions to NFAs Converting NFAs to regular expressions Determinization: converting NFA to DFA Other useful closure properties: union, concatenation, Kleene closure, intersection

18 18 So far A DFA is a 5-tuple: A NFA is a 5-tuple: DFA and NFA are equivalent. Any regular language can be recognized by an FSA. –Reg lang  Regex  NFA  DFA  Reg grammar

19 19 Outline Deterministic finite state automata (DFA) Non-deterministic finite state automata (NFA) Probabilistic finite state automata (PFA) Weighted Finite state automata (WFA)

20 20 An example of PFA q 0 :0 q 1 :0.2 b:0.8 a:1.0 I(q 0 )=1.0 I(q 1 )=0.0 P(ab n )=I(q 0 )*P(q 0,ab n,q 1 )*F(q 1 ) =1.0*(1.0*0.8 n )*0.2 F(q 0 )=0 F(q 1 )=0.2

21 21 Formal definition of PFA A PFA is Q: a finite set of N states Σ: a finite set of input symbols I: Q  R + (initial-state probabilities) F: Q  R + (final-state probabilities) : the transition relation between states. P: (transition probabilities)

22 22 Constraints on function: Probability of a string:

23 23 PFA Informally, in a PFA, each arc is associated with a probability. The probability of a path is the multiplication of the arcs on the path. The probability of a string x is the sum of the probabilities of all the paths for x. Tasks: –Given a string x, find the best path for x. –Given a string x, find the probability of x in a PFA. –Find the string with the highest probability in a PFA –…–…

24 24 Weighted finite-state automata (WFA) Each arc is associated with a weight. “Sum” and “Multiplication” can have other meanings. Ex: weight is –log prob - “multiplication”  addition - “Sum”  power

25 25 Summary DFA and NFA are 5-tuple: –They are equivalent –Algorithm for constructing NFAs for Regexps PFA and WFA are 6-tuple: Existing packages for FSA/FSM algorithms: –Ex: intersection, union, Kleene closure, difference, complementation, …

26 Additional slides 26

27 27 An algorithm for deterministic recognition of DFAs

28 28 Definition of regular expression The set of regular expressions is defined as follows: (1) Every symbol of is a regular expression (2) ² is a regular expression (3) If r 1 and r 2 are regular expressions, so are (r 1 ), r 1 r 2, r 1 | r 2, r 1 * (4) Nothing else is a regular expression.

29 29 Regular expression  NFA Base case: Concatenation: connecting the final states of FSA 1 to the initial state of FSA 2 by an ² -translation. Union: Creating a new initial state and add ² -transitions from it to the initial states of FSA 1 and FSA 2. Kleene closure:

30 30 Regular expression  NFA (cont) An example: \d+(\.\d+)?(e\-?\d+)? Kleene closure:

31 31 Another PFA example: A bigram language model P( BOS w 1 w 2 … w n EOS) = P(BOS) * P(w 1 | BOS) P(w 2 | w 1 ) * …. P(w n | w n-1 ) * P(EOS | w n ) Examples: I bought two/to/too books How many states?


Download ppt "1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA."

Similar presentations


Ads by Google