We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byAdriana Chinnock
Modified about 1 year ago
Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein
Motivation Standard acoustic models impose many structural constraints We propose an automatic approach Use TIMIT Dataset MFCC features Full covariance Gaussians (Young and Woodland, 1994)
Phone Classification ??????????
HMMs for Phone Classification
Standard subphone/mixture HMM Temporal Structure Gaussian Mixtures Model Error rate HMM Baseline25.1%
Our Model Standard Model Single Gaussians Fully Connected
Hierarchical Baum-Welch Training 32.1% 28.7% 25.6% HMM Baseline25.1% 5 Split rounds21.4% 23.9%
Phone Classification Results MethodError Rate GMM Baseline (Sha and Saul, 2006) 26.0 % HMM Baseline (Gunawardana et al., 2005) 25.1 % SVM (Clarkson and Moreno, 1999) 22.4 % Hidden CRF (Gunawardana et al., 2005) 21.7 % Our Work21.4 % Large Margin GMM (Sha and Saul, 2006) 21.1 %
Phone Recognition ?????????
Standard State-Tied Acoustic Models
No more State-Tying
No more Gaussian Mixtures
Fully connected internal structure
Fully connected external structure
Refinement of the /ih/-phone
Refinement of the /l/-phone
Hierarchical Refinement Results HMM Baseline41.7% 5 Split Rounds28.4%
Merging Not all phones are equally complex Compute log likelihood loss from merging Split modelMerged at one node t-1tt+1t-1tt+1
Merging Criterion t-1tt+1 t-1tt+1
Split and Merge Results Split Only28.4% Split & Merge27.3%
HMM states per phone
Alignment Hand Aligned27.3% Auto Aligned26.3% Results
Alignment State Distribution
Inference State sequence: d 1 -d 6 -d 6 -d 4 -ae 5 -ae 2 -ae 3 -ae 0 -d 2 -d 2 -d 3 -d 7 -d 5 Phone sequence: d - d - d -d -ae - ae - ae - ae - d - d -d - d - d Transcription d - ae - d Viterbi Variational ???
Variational Inference Variational Approximation: Viterbi26.3% Variational25.1% : Posterior edge marginals Solution:
Phone Recognition Results MethodError Rate State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.7 % Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1 % Our Work26.1 % Bayesian Triphone HMM (Ming and Smith, 1998) 25.6 % Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4 %
Conclusions Minimalist, Automatic Approach Unconstrained Accurate Phone Classification Competitive with state-of-the-art discriminative methods despite being generative Phone Recognition Better than standard state-tied triphone models
Building an ASR using HTK CS4706 Fadi Biadsy. Outline Speech Recognition Feature Extraction Modeling Speech Hidden Markov Models (HMM): 3 basic problems.
Unified Expectation Maximization Rajhans Samdani Joint work with Ming-Wei Chang (Microsoft Research) and Dan Roth University of Illinois at Urbana-Champaign.
Eric Fosler-Lussier The Ohio State University Geoff Zweig Microsoft.
1 Discriminative Learning for Hidden Markov Models Li Deng Microsoft Research EE 516; UW Spring 2009.
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Activity Recognition Ram Nevatia Presents work of F. Lv, P. Natarajan and V. Singh Institute of Robotics and Intelligent Systems Computer Science Department.
Video Surveillance E Senior/Feris/Tian 1 Behavior Analysis Rogerio Feris IBM TJ Watson Research Center
Thomas Jellema & Wouter Van Gool 1 Question. 2Answer.
1 Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr Improved Initialisation and Gaussian Mixture Pairwise Terms for Dense Random Fields.
1 Unsupervised learning of visual representations and their use in object & face recognition Gary Cottrell Chris Kanan Honghao Shan Lingyun Zhang Matthew.
1 Human Gesture Recognition by Mohamed Bécha Kaâniche 11/02/2009.
Segmentation and Classification Optimally selected HMMs using BIC were integrated into a Superior HMM framework A Soccer video topology was generated utilising.
Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley.
11 Sum-Product Networks: A New Deep Architecture Hoifung Poon Microsoft Research Joint work with Pedro Domingos.
CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to.
Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science The University.
Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference.
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
How Microsoft Had Made Deep Learning Red-Hot in IT Industry Zhijie Yan, Microsoft Research Asia USTC visit, May 6, 2014.
Bayesian network for gene regulatory network construction Jin Chen CSE Fall 1.
Distinctive Feature Detection For Automatic Speech Recognition Jun Hou Prof. Lawrence Rabiner Dr. Sorin Dusan CAIP, ECE Dept., Rutgers University Sep.13,
Correctness of Gossip-Based Membership under Message Loss Maxim GurevichIdit Keidar Technion.
1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova.
Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg (CVPR08) Jitendra Malik UC Berkeley.
1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.
1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.
Acoustic model adaptation for telephone-based speech recognition N. Kleynhans and E. Barnard 27 January 2010.
Exploded Views for Volume Data Stefan Bruckner, M. Eduard Gröller Institute of Computer Graphics and Algorithms Vienna University of Technology.
© 2016 SlidePlayer.com Inc. All rights reserved.