We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byAdriana Chinnock
Modified about 1 year ago
Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein
Motivation Standard acoustic models impose many structural constraints We propose an automatic approach Use TIMIT Dataset MFCC features Full covariance Gaussians (Young and Woodland, 1994)
Phone Classification ??????????
HMMs for Phone Classification
Standard subphone/mixture HMM Temporal Structure Gaussian Mixtures Model Error rate HMM Baseline25.1%
Our Model Standard Model Single Gaussians Fully Connected
Hierarchical Baum-Welch Training 32.1% 28.7% 25.6% HMM Baseline25.1% 5 Split rounds21.4% 23.9%
Phone Classification Results MethodError Rate GMM Baseline (Sha and Saul, 2006) 26.0 % HMM Baseline (Gunawardana et al., 2005) 25.1 % SVM (Clarkson and Moreno, 1999) 22.4 % Hidden CRF (Gunawardana et al., 2005) 21.7 % Our Work21.4 % Large Margin GMM (Sha and Saul, 2006) 21.1 %
Phone Recognition ?????????
Standard State-Tied Acoustic Models
No more State-Tying
No more Gaussian Mixtures
Fully connected internal structure
Fully connected external structure
Refinement of the /ih/-phone
Refinement of the /l/-phone
Hierarchical Refinement Results HMM Baseline41.7% 5 Split Rounds28.4%
Merging Not all phones are equally complex Compute log likelihood loss from merging Split modelMerged at one node t-1tt+1t-1tt+1
Merging Criterion t-1tt+1 t-1tt+1
Split and Merge Results Split Only28.4% Split & Merge27.3%
HMM states per phone
Alignment Hand Aligned27.3% Auto Aligned26.3% Results
Alignment State Distribution
Inference State sequence: d 1 -d 6 -d 6 -d 4 -ae 5 -ae 2 -ae 3 -ae 0 -d 2 -d 2 -d 3 -d 7 -d 5 Phone sequence: d - d - d -d -ae - ae - ae - ae - d - d -d - d - d Transcription d - ae - d Viterbi Variational ???
Variational Inference Variational Approximation: Viterbi26.3% Variational25.1% : Posterior edge marginals Solution:
Phone Recognition Results MethodError Rate State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.7 % Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1 % Our Work26.1 % Bayesian Triphone HMM (Ming and Smith, 1998) 25.6 % Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4 %
Conclusions Minimalist, Automatic Approach Unconstrained Accurate Phone Classification Competitive with state-of-the-art discriminative methods despite being generative Phone Recognition Better than standard state-tied triphone models
Thank you! http://nlp.cs.berkeley.edu
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Building an ASR using HTK CS4706 Fadi Biadsy. Outline Speech Recognition Feature Extraction Modeling Speech Hidden Markov Models (HMM): 3 basic problems.
Building an ASR using HTK CS4706 Fadi Biadsy 1. Outline Speech Recognition Feature Extraction HMM – 3 basic problems HTK – Steps to Build a speech recognizer.
Inducing Structure for Perception Slav Petrov Advisors: Dan Klein, Jitendra Malik Collaborators: L. Barrett, R. Thibaux, A. Faria, A. Pauls, P. Liang,
Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
1 Word Recognition with Conditional Random Fields Jeremy Morris 12/03/2009.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
1 Discriminative Learning for Hidden Markov Models Li Deng Microsoft Research EE 516; UW Spring 2009.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK Technical University of Crete Speech Processing and Dialog Systems Group.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
Study of Word-Level Accent Classification and Gender Factors Xing Wang, Peihong Guo, Tian Lan, Guoyu Fu CSCE 666 Term Project Presentation Dec 11th, 2013.
Basics of Statistical Estimation Alan Ritter 1.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Tom Ko and Brian Mak The Hong Kong University of Science and Technology.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
Unified Expectation Maximization Rajhans Samdani Joint work with Ming-Wei Chang (Microsoft Research) and Dan Roth University of Illinois at Urbana-Champaign.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Video Surveillance E Senior/Feris/Tian 1 Behavior Analysis Rogerio Feris IBM TJ Watson Research Center
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Acoustic model adaptation for telephone-based speech recognition N. Kleynhans and E. Barnard 27 January 2010.
Introduction to Automatic Speech Recognition. Outline Define the problem What is speech? Feature Selection Models Early methods Modern statistical.
© 2017 SlidePlayer.com Inc. All rights reserved.