Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.

Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Acoustic Modeling

Motivation  Standard acoustic models impose many structural constraints  We propose an automatic approach  Use TIMIT Dataset  MFCC features  Full covariance Gaussians (Young and Woodland, 1994)

Phone Classification ??????????

HMMs for Phone Classification

Temporal Structure

Standard subphone/mixture HMM Temporal Structure Gaussian Mixtures Model Error rate HMM Baseline25.1%

Our Model Standard Model Single Gaussians Fully Connected

Hierarchical Baum-Welch Training 32.1% 28.7% 25.6% HMM Baseline25.1% 5 Split rounds21.4% 23.9%

Phone Classification Results MethodError Rate GMM Baseline (Sha and Saul, 2006) 26.0 % HMM Baseline (Gunawardana et al., 2005) 25.1 % SVM (Clarkson and Moreno, 1999) 22.4 % Hidden CRF (Gunawardana et al., 2005) 21.7 % Our Work21.4 % Large Margin GMM (Sha and Saul, 2006) 21.1 %

Phone Recognition ?????????

Standard State-Tied Acoustic Models

No more State-Tying

No more Gaussian Mixtures

Fully connected internal structure

Fully connected external structure

Refinement of the /ih/-phone

Refinement of the /l/-phone

Hierarchical Refinement Results HMM Baseline41.7% 5 Split Rounds28.4%

Merging  Not all phones are equally complex  Compute log likelihood loss from merging Split modelMerged at one node t-1tt+1t-1tt+1

Merging Criterion t-1tt+1 t-1tt+1

Split and Merge Results Split Only28.4% Split & Merge27.3%

HMM states per phone

Alignment Hand Aligned27.3% Auto Aligned26.3% Results

Alignment State Distribution

Inference  State sequence: d 1 -d 6 -d 6 -d 4 -ae 5 -ae 2 -ae 3 -ae 0 -d 2 -d 2 -d 3 -d 7 -d 5  Phone sequence: d - d - d -d -ae - ae - ae - ae - d - d -d - d - d  Transcription d - ae - d Viterbi Variational ???

Variational Inference Variational Approximation: Viterbi26.3% Variational25.1% : Posterior edge marginals Solution:

Phone Recognition Results MethodError Rate State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.7 % Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1 % Our Work26.1 % Bayesian Triphone HMM (Ming and Smith, 1998) 25.6 % Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4 %

Conclusions  Minimalist, Automatic Approach  Unconstrained  Accurate  Phone Classification  Competitive with state-of-the-art discriminative methods despite being generative  Phone Recognition  Better than standard state-tied triphone models

Thank you! http://nlp.cs.berkeley.edu

Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.

Similar presentations

Presentation on theme: "Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.

Similar presentations

Presentation on theme: "Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein."— Presentation transcript:

Similar presentations

About project

Feedback