Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Similar presentations


Presentation on theme: "A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE."— Presentation transcript:

1 A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE

2 2/23 Outlines Introduction Adaptive estimation of CDHMM parameters Bayesian adaptation of Gaussian parameters Experimental setup and recognition results Summary

3 3/23 Introduction Adaptive learning  Adapting reference speech patterns or models to handle the situations unseen in the training phase  For example: Varying channel characteristics Changing environmental noise Varying transducers

4 4/23 Introduction – MAP Maximum a posteriori (MAP)  Also called Bayesian adaptation  Under the given prior distribution, MAP tries to maximize the posterior distribution

5 5/23 Adaptive estimation of CDHMM parameters Sequence of SD observation Y = {y 1, y 2, …,y T } λ = parameter set of the distribution function Given a training data Y, we want to estimate λ If λ is assumed random with a prior distribution function P 0 ( λ), the MAP estimate for λ is obtained by solving Prior Distribution (MLE) Likelihood function Language Model Posterior Distribution

6 6/23 Adaptive segmental K-Means algorithm Maximization of the state-likelihood of the observation sequences in an iterative manner using the segmental k-means training algorithm s = state sequence 1. For a given model, find the optimal state sequence 2. Based on a state sequence, find the MAP estimate

7 7/23 The choices of prior distributions Non-informative prior  Parameters are fixed but unknown and are to be estimated from the data  No preference to what the value of the parameters should be  MAP = MLE Informative prior  Knowledge about the parameters to be estimated is known  Choice of prior distribution depends on the acoustic models used to characterize the data

8 8/23 Conjugate prior Prior and posterior probabilities belong to the same distribution family Analytical forms of some conjugate priors are available

9 9/23 Bayesian adaptation of Gaussian parameters 3 implementations of Bayesian adaptation 1. Gaussian mean 2. Gaussian variance 3. Gaussian mean and precision μ= mean and σ 2 = variance of one component of a state observation distribution Precision

10 10/23 Bayesian adaptation of the Gaussian mean Observation μ is random σ 2 is fixed and known MAP estimate for the parameter μ is: where

11 11/23 Bayesian adaptation of the Gaussian mean (cont.) MAP converges to MLE when  A large number of samples are used for training.  Relatively large value for prior variance τ 2 is chosen (τ 2 >> σ 2 / n). (non-informative prior)

12 12/23 Bayesian adaptation of the Gaussian variance Mean μ is estimated from sample mean Variance σ 2 is given by an informative prior: σ min 2 is estimated from a large collection of speech data

13 13/23 Bayesian adaptation of the Gaussian variance (cont.) Variance parameter is: S y 2 is the sample variance Effective when insufficient amount of sample data is available

14 14/23 Bayesian adaptation of both Gaussian mean and precision Both mean and precision parameters are random The joint conjugate prior P 0 (μ, θ ) is a normal- gamma distribution Gamma DistributionNormal Distribution

15 15/23 Bayesian adaptation of both Gaussian mean and precision (cont.) MAP estimate of μ and σ 2 can be derived as: Prior parameters can be estimated as follows:

16 16/23 Experimental setup 39 words vocabulary  26 English letters  10 digits  3 command words (stop, error, repeat) 2 sets of speech data  SI data for SI model, 100 speakers (50F50M)  SD data for adaptation, 4 speakers (2F2M) SD data  5 training utterances per word for each male speaker and 7 utterances for each female speaker SD testing data  10 utterances per word per speaker Recorded over local dialed-up telephone lines Sampling rate = 6.67kHz

17 17/23 Experimental setup (cont.) Models are obtained by using the segmental k- means training procedure Maximum number of mixture component per state = 9 Diagonal covariance matrix 5-state HMM 2 sets of SI models  1 st set: as described above  2 nd set: single Gaussian distribution per state

18 18/23 Experimental results 1 Baseline recognition rate: SD: 2 Gaussian mixtures per state per word

19 19/23 Experimental results 2 5 adaptation experiments:  EXP1: SD mean and an SD variance (regular MLE)  EXP2: SD mean and a fixed variance estimate  EXP3: SA mean (3.1) with prior parameters (3.2-3.3)  EXP4: SD mean and an SA variance (3.5)  EXP5: SA estimates (3.7) with prior parameters (3.8- 3.9)

20 20/23 Experimental results 3

21 21/23 Experimental results 4 SD mean, SD variance (MLE) SD mean, fixed variance SA mean (method 1) SD mean, SA variance (method 2)SA mean and precision (method 3)

22 22/23 Experimental results 5

23 23/23 Conclusions Average recognition rate with all token incorporated = 96.1% Performance improves when  More adaptation data are used  Both mean and precision are adapted


Download ppt "A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE."

Similar presentations


Ads by Google