A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Slides:

Advertisements

Similar presentations

Image Modeling & Segmentation

Advertisements

Pattern Recognition and Machine Learning

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

LECTURE 11: BAYESIAN PARAMETER ESTIMATION

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Visual Recognition Tutorial

Speech Recognition Training Continuous Density HMMs Lecture Based on:

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:

Isolated-Word Speech Recognition Using Hidden Markov Models

The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.

- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

HMM - Part 2 The EM algorithm Continuous density HMM.

Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Variational Bayesian Methods for Audio Indexing

Bayesian Prior and Posterior Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 24, 2000.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

Machine Learning 5. Parametric Methods.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Univariate Gaussian Case (Cont.)

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Other Models for Time Series. The Hidden Markov Model (HMM)

Bayesian Enhancement of Speech Signals Jeremy Reed.

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

Chapter 3: Maximum-Likelihood Parameter Estimation

Probability Theory and Parameter Estimation I

Ch3: Model Building through Regression

Statistical Models for Automatic Speech Recognition

Special Topics In Scientific Computing

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

More about Posterior Distributions

Pattern Recognition and Machine Learning

LECTURE 15: REESTIMATION, EM AND MIXTURES

LECTURE 07: BAYESIAN ESTIMATION

Parametric Methods Berlin Chen, 2005 References:

Qiang Huo(*) and Chorkin Chan(**)

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Applied Statistics and Probability for Engineers

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Presentation transcript:

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE

2/23 Outlines Introduction Adaptive estimation of CDHMM parameters Bayesian adaptation of Gaussian parameters Experimental setup and recognition results Summary

3/23 Introduction Adaptive learning  Adapting reference speech patterns or models to handle the situations unseen in the training phase  For example: Varying channel characteristics Changing environmental noise Varying transducers

4/23 Introduction – MAP Maximum a posteriori (MAP)  Also called Bayesian adaptation  Under the given prior distribution, MAP tries to maximize the posterior distribution

5/23 Adaptive estimation of CDHMM parameters Sequence of SD observation Y = {y 1, y 2, …,y T } λ = parameter set of the distribution function Given a training data Y, we want to estimate λ If λ is assumed random with a prior distribution function P 0 ( λ), the MAP estimate for λ is obtained by solving Prior Distribution (MLE) Likelihood function Language Model Posterior Distribution

6/23 Adaptive segmental K-Means algorithm Maximization of the state-likelihood of the observation sequences in an iterative manner using the segmental k-means training algorithm s = state sequence 1. For a given model, find the optimal state sequence 2. Based on a state sequence, find the MAP estimate

7/23 The choices of prior distributions Non-informative prior  Parameters are fixed but unknown and are to be estimated from the data  No preference to what the value of the parameters should be  MAP = MLE Informative prior  Knowledge about the parameters to be estimated is known  Choice of prior distribution depends on the acoustic models used to characterize the data

8/23 Conjugate prior Prior and posterior probabilities belong to the same distribution family Analytical forms of some conjugate priors are available

9/23 Bayesian adaptation of Gaussian parameters 3 implementations of Bayesian adaptation 1. Gaussian mean 2. Gaussian variance 3. Gaussian mean and precision μ= mean and σ 2 = variance of one component of a state observation distribution Precision

10/23 Bayesian adaptation of the Gaussian mean Observation μ is random σ 2 is fixed and known MAP estimate for the parameter μ is: where

11/23 Bayesian adaptation of the Gaussian mean (cont.) MAP converges to MLE when  A large number of samples are used for training.  Relatively large value for prior variance τ 2 is chosen (τ 2 >> σ 2 / n). (non-informative prior)

12/23 Bayesian adaptation of the Gaussian variance Mean μ is estimated from sample mean Variance σ 2 is given by an informative prior: σ min 2 is estimated from a large collection of speech data

13/23 Bayesian adaptation of the Gaussian variance (cont.) Variance parameter is: S y 2 is the sample variance Effective when insufficient amount of sample data is available

14/23 Bayesian adaptation of both Gaussian mean and precision Both mean and precision parameters are random The joint conjugate prior P 0 (μ, θ ) is a normal- gamma distribution Gamma DistributionNormal Distribution

15/23 Bayesian adaptation of both Gaussian mean and precision (cont.) MAP estimate of μ and σ 2 can be derived as: Prior parameters can be estimated as follows:

16/23 Experimental setup 39 words vocabulary  26 English letters  10 digits  3 command words (stop, error, repeat) 2 sets of speech data  SI data for SI model, 100 speakers (50F50M)  SD data for adaptation, 4 speakers (2F2M) SD data  5 training utterances per word for each male speaker and 7 utterances for each female speaker SD testing data  10 utterances per word per speaker Recorded over local dialed-up telephone lines Sampling rate = 6.67kHz

17/23 Experimental setup (cont.) Models are obtained by using the segmental k- means training procedure Maximum number of mixture component per state = 9 Diagonal covariance matrix 5-state HMM 2 sets of SI models  1 st set: as described above  2 nd set: single Gaussian distribution per state

18/23 Experimental results 1 Baseline recognition rate: SD: 2 Gaussian mixtures per state per word

19/23 Experimental results 2 5 adaptation experiments:  EXP1: SD mean and an SD variance (regular MLE)  EXP2: SD mean and a fixed variance estimate  EXP3: SA mean (3.1) with prior parameters ( )  EXP4: SD mean and an SA variance (3.5)  EXP5: SA estimates (3.7) with prior parameters ( )

20/23 Experimental results 3

21/23 Experimental results 4 SD mean, SD variance (MLE) SD mean, fixed variance SA mean (method 1) SD mean, SA variance (method 2)SA mean and precision (method 3)

22/23 Experimental results 5

23/23 Conclusions Average recognition rate with all token incorporated = 96.1% Performance improves when  More adaptation data are used  Both mean and precision are adapted