 # FSA and HMM LING 572 Fei Xia 1/5/06.

## Presentation on theme: "FSA and HMM LING 572 Fei Xia 1/5/06."— Presentation transcript:

FSA and HMM LING 572 Fei Xia 1/5/06

Outline FSA HMM Relation between FSA and HMM

FSA

Definition of FSA A FSA is Q: a finite set of states
Σ: a finite set of input symbols I: the set of initial states F: the set of final states : the transition relation between states.

An example of FSA b a q0 q1

Definition of FST A FST is Q: a finite set of states
Σ: a finite set of input symbols Γ: a finite set of output symbols I: the set of initial states F: the set of final states : the transition relation between states.  FSA can be seen as a special case of FST

The extended transition relation is the smallest set such that
T transduces a string x into a string y if there exists a path from the initial state to a final state whose input is x and whose output is y:

An example of FST q0 q1 b:y a:x

Operations on FSTs Union: Concatenation: Composition:

An example of composition operation
q0 q1 b:y a:x q0 y:z x:ε

Probabilistic finite-state automata (PFA)
Informally, in a PFA, each arc is associated with a probability. The probability of a path is the multiplication of the arcs on the path. The probability of a string x is the sum of the probabilities of all the paths for x. Tasks: Given a string x, find the best path for x. Given a string x, find the probability of x in a PFA. Find the string with the highest probability in a PFA

Formal definition of PFA
A PFA is Q: a finite set of N states Σ: a finite set of input symbols I: Q R+ (initial-state probabilities) F: Q R+ (final-state probabilities) : the transition relation between states. P: (transition probabilities)

Constraints on function:
Probability of a string:

Consistency of a PFA Let A be a PFA.
Def: P(x | A) = the sum of all the valid paths for x in A. Def: a valid path in A is a path for some string x with probability greater than 0. Def: A is called consistent if Def: a state of a PFA is useful if it appears in at least one valid path. Proposition: a PFA is consistent if all its states are useful.  Q1 of Hw1

An example of PFA I(q0)=1.0 I(q1)=0.0 P(abn)=0.2*0.8n b:0.8 a:1 q0:0

Weighted finite-state automata (WFA)
Each arc is associated with a weight. “Sum” and “Multiplication” can be other meanings.

HMM

Two types of HMMs State-emission HMM (Moore machine):
The emission probability depends only on the state (from-state or to-state). Arc-emission HMM (Mealy machine): The probability depends on (from-state, to-state) pair.

State-emission HMM … s1 s2 sN w1 w4 w1 w3 w5 w1
Two kinds of parameters: Transition probability: P(sj | si) Output (Emission) probability: P(wk | si)  # of Parameters: O(NM+N2)

Arc-emission HMM … w1 w2 w1 w1 w5 sN s1 s2 w4 w3
Same kinds of parameters but the emission probabilities depend on both states: P(wk, sj | si)  # of Parameters: O(N2M+N2).

Are the two types of HMMs equivalent?
For each state-emission HMM1, there is an arc-emission HMM2, such that for any sequence O, P(O|HMM1)=P(O|HMM2). The reverse is also true.  Q3 and Q4 of hw1.

Definition of arc-emission HMM
A HMM is a tuple : A set of states S={s1, s2, …, sN}. A set of output symbols Σ={w1, …, wM}. Initial state probabilities State transition prob: A={aij}. Symbol emission prob: B={bijk} State sequence: X1,n Output sequence: O1,n

Constraints For any integer n and any HMM  Q2 of hw1.

Properties of HMM Limited horizon:
Time invariance: the probabilities do not change over time: The states are hidden because we know the structure of the machine (i.e., S and Σ), but we don’t know which state sequences generate a particular output.

Applications of HMM N-gram POS tagging Other tagging problems:
Bigram tagger: oi is a word, and si is a POS tag. Trigram tagger: oi is a word, and si is ?? Other tagging problems: Word segmentation Chunking NE tagging Punctuation predication Other applications: ASR, ….

Three fundamental questions for HMMs
Finding the probability of an observation Finding the best state sequence Training: estimating parameters

(1) Finding the probability of the observation
Forward probability: the probability of producing O1,t-1 while ending up in state si:

Calculating forward probability
Initialization: Induction:

(2) Finding the best state sequence
Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T).  Viterbi algorithm X1 X2 XT o1 o2 oT XT+1

Viterbi algorithm The probability of the best path that produces O1,t-1 while ending up in state si: Initialization: Induction: Modify it to allow epsilon emission: Q5 of hw1.

Summary of HMM Two types of HMMs: state-emission and arc-emission HMM:
Properties: Markov assumption Applications: POS-tagging, etc. Finding the probability of an observation: forward probability Decoding: Viterbi decoding

Relation between FSA and HMM

Relation between WFA and HMM
HMM can be seen as a special type of WFA. Given an HMM, how to build an equivalent WFA?

Converting HMM into WFA
Given an HMM , build a WFA such that. for any input sequence O, P(O|HMM)=P(O|WFA). Build a WFA: add a final state and arcs to it Show that there is a one-to-one mapping between the paths in HMM and the paths in WFA Prove that the probabilities in HMM and in WFA are identical.

HMM  WFA  The WFA is not a PFA.
Need to create a new state (the final state) and add edges to it.  The WFA is not a PFA.

A slightly different definition of HMM
A HMM is a tuple : A set of states S={s1, s2, …, sN}. A set of output symbols Σ={w1, …, wM}. Initial state probabilities State transition prob: A={aij}. Symbol emission prob: B={bijk} qf is the final state: there are no outcoming edges from qf

Constraints For any HMM (under this new definition)

HMM  PFA

PFA  HMM Need to add a new final state and edges to it

Project: Part 1 Learn to use Carmel (a WFST package)
Use Carmel as an HMM Viterbi decoder for a trigram POS tagger. The instruction will be handed out on 1/12, and the project is due on 1/19.

Summary FSA HMM Relation between FSA and HMM
HMM (the common def) is a special case of WFA HMM (a different def) is equivalent to PFA.