Input / Output An HMM is a statistical model that describes a probability distribution over a number of possible sequences. Input: A sequence of feature vectors Output: Words with highest probability being spoken Given a sequence of feature vectors, what words are most probably meant?
Basics State transition probability matrix –States –State transition probabilities –Symbol emission probabilities
A simple HMM Formal definition HMM An output observation alphabet The set of states A transition probability matrix An output probability matrix An initial state distribution Assumptions Markov assumption Output independence assumption Ease of use / no significant affect Formal notation whole parameter set
Markov assumption “probability of the random variable at a given time depends only on the value at the preceding time.”
HMM Program t:=1; Start in state s j with probability π i (i.e., X 1 = i) Forever do Move from state s i to state s j with probability a ij (i.e. X t+1 = j) Emit observation symbol o t = k with probability b ijk t := t+1 end
ㅕ ㄹㅓ HMM Features Speech signals Frame shift frame time A symbol sequence (or observations) is generated by starting at an initial state and moving from state to state until a terminal state is reached. The state sequence is “hidden”. Only the symbol sequence that hidden states emit is observable.
Problems The Evaluation Problem Given the observation sequence O and the model Ф, how do we efficiently compute P(O|Ф), the probability of the observation sequence, given the model? The Decoding Problem Finding the sequence of hidden states that most probably generated an observed sequence. The Learning Problem How can we adjust the model parameter to maximize the joint probability (likelihood)?
How to evaluate an HMM Given multiple HMM’s (1 for each word) and a observation sequence. Which HMM most probably generated the sequence? Simple (expensive) solution: Enumerate all possible state sequences S of length T Sum up all probabilities of these sequences Probability of path S (calculate for all paths): State sequence probability * joint output probability Forward Algorithm is used to calculate above idea much more efficient, Complexity O(N 2 T) Recursive use of partially computed probabilities for efficiency
How to evaluate an HMM (2) Select maximum Recognized word Speech Seoul Feature extraction Likelihood computation 1 Likelihood computation V...... P(X| 1 ) P(X| V ) HMM for word 1 HMM for word V
How to decode an HMM Forward algorithm does not find best state sequence (‘best path’) Exhaustive search for best path is expensive Viterbi algorithm is used: Also uses partially computed results recursively Partially computed results are best path so far Each calculated state remembers most optimal previous state invoking it Complexity O(N 2 T) Finding best path is very important for continuous speech recognition
How to estimate HMM Parameters (learning) Baum-Welch ( or Forward-Backward) algorithm Estimation of model parameters ф=(A,B, ): First make an initial guess of the parameters (which may well be entirely wrong) Refine it by assessing its worth, attempt to reduce provoked errors when fitted to the given data Performs a form of gradient descent, looking for a minimum of an error measure. Forward probability term and backward probability term Similar to Forward & Viterbi (recursive use of incomplete data) but more complex Unsupervised learning: feed sample speech data along with phonemes of spoken words
How to estimate HMM Parameters (learning) (2) Baum-Welch Re-estimation Speech database Feature Extraction i il chil Converged? 1 2 7 Word HMM waveform feature Yes No end