Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.

Similar presentations


Presentation on theme: "Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein."— Presentation transcript:

1 Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

2 Acoustic Modeling

3 Motivation  Standard acoustic models impose many structural constraints  We propose an automatic approach  Use TIMIT Dataset  MFCC features  Full covariance Gaussians (Young and Woodland, 1994)

4 Phone Classification ??????????

5 æ

6 HMMs for Phone Classification

7 Temporal Structure

8 Standard subphone/mixture HMM Temporal Structure Gaussian Mixtures Model Error rate HMM Baseline25.1%

9 Our Model Standard Model Single Gaussians Fully Connected

10 Hierarchical Baum-Welch Training 32.1% 28.7% 25.6% HMM Baseline25.1% 5 Split rounds21.4% 23.9%

11 Phone Classification Results MethodError Rate GMM Baseline (Sha and Saul, 2006) 26.0 % HMM Baseline (Gunawardana et al., 2005) 25.1 % SVM (Clarkson and Moreno, 1999) 22.4 % Hidden CRF (Gunawardana et al., 2005) 21.7 % Our Work21.4 % Large Margin GMM (Sha and Saul, 2006) 21.1 %

12 Phone Recognition ?????????

13 Standard State-Tied Acoustic Models

14 No more State-Tying

15 No more Gaussian Mixtures

16 Fully connected internal structure

17 Fully connected external structure

18 Refinement of the /ih/-phone

19

20

21

22 Refinement of the /l/-phone

23 Hierarchical Refinement Results HMM Baseline41.7% 5 Split Rounds28.4%

24 Merging  Not all phones are equally complex  Compute log likelihood loss from merging Split modelMerged at one node t-1tt+1t-1tt+1

25 Merging Criterion t-1tt+1 t-1tt+1

26 Split and Merge Results Split Only28.4% Split & Merge27.3%

27 HMM states per phone

28

29

30 Alignment Hand Aligned27.3% Auto Aligned26.3% Results

31 Alignment State Distribution

32 Inference  State sequence: d 1 -d 6 -d 6 -d 4 -ae 5 -ae 2 -ae 3 -ae 0 -d 2 -d 2 -d 3 -d 7 -d 5  Phone sequence: d - d - d -d -ae - ae - ae - ae - d - d -d - d - d  Transcription d - ae - d Viterbi Variational ???

33 Variational Inference Variational Approximation: Viterbi26.3% Variational25.1% : Posterior edge marginals Solution:

34 Phone Recognition Results MethodError Rate State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.7 % Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1 % Our Work26.1 % Bayesian Triphone HMM (Ming and Smith, 1998) 25.6 % Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4 %

35 Conclusions  Minimalist, Automatic Approach  Unconstrained  Accurate  Phone Classification  Competitive with state-of-the-art discriminative methods despite being generative  Phone Recognition  Better than standard state-tied triphone models

36 Thank you!


Download ppt "Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein."

Similar presentations


Ads by Google