ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Advertisements

Component Analysis (Review)
Speech Recognition with Hidden Markov Models Winter 2011
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Visual Recognition Tutorial
Introduction to Bayesian Parameter Estimation
Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models C. J. Leggetter and P. C. Woodland Department of.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Gaussian Mixture Model and the EM algorithm in Speech Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
EM and expected complete log-likelihood Mixture of Experts
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
University of Southern California Department Computer Science Bayesian Logistic Regression Model (Final Report) Graduate Student Teawon Han Professor Schweighofer,
ECE 8443 – Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML and Bayesian Model Comparison Combining Classifiers Resources: MN:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Discriminant Analysis
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Normal Equations The Orthogonality Principle Solution of the Normal Equations.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML.
Univariate Gaussian Case (Cont.)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Object Orie’d Data Analysis, Last Time
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
LECTURE 03: DECISION SURFACES
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Latent Variables, Mixture Models and EM
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Generally Discriminant Analysis
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 15: REESTIMATION, EM AND MIXTURES
LECTURE 07: BAYESIAN ESTIMATION
LECTURE 09: DISCRIMINANT ANALYSIS
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Machine Learning – a Probabilistic Perspective
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000: MLLR MG: MLLR Transformations TAM: Adaptation for ASR ECE 8463: Adaptation AM: Transform Sharing ECE 7000: MLLR MG: MLLR Transformations TAM: Adaptation for ASR ECE 8463: Adaptation AM: Transform Sharing URL:.../publications/courses/ece_8423/lectures/current/lecture_19.ppt.../publications/courses/ece_8423/lectures/current/lecture_19.ppt MP3:.../publications/courses/ece_8423/lectures/current/lecture_19.mp3.../publications/courses/ece_8423/lectures/current/lecture_19.mp3 LECTURE 19: PRACTICAL ISSUES IN MLLR

ECE 8423: Lecture 19, Slide 1 Let’s begin with a simple example involving a single state and a two- dimensional feature vector: We observe two new data points: We can estimate the new mean and covariance from these (noting that it is a noisy estimate because there are only two points): Recall we assumed a diagonal covariance matrix, and derived an equation for the estimate of the elements of the transformation matrix: Let’s assume the state occupancies are: These are arbitrary and would normally be accumulated during training of the model and represent the probability of being in state 1 at time t = 1 and 2. MLLR Example

ECE 8423: Lecture 19, Slide 2 Then: Recall our extended mean vector: We can compute Z: For a diagonal covariance, we defined G (i) as: MLLR Example (Cont.)

ECE 8423: Lecture 19, Slide 3 Now we can solve for G (i) (there are i = 1, …,n of these, where n = 2): MLLR Example (Cont.)

ECE 8423: Lecture 19, Slide 4 Next, we must solve for (G (i) ) -1. But there is a problem: The G (i) are singular (linearly dependent rows in this case). We typically use Singular Value Decomposition (from Numerical Recipes in C) to find a pseudo-inverse: Now we can compute the components of W: MLLR Example (Cont.)

ECE 8423: Lecture 19, Slide 5 We can finally compute the adapted means: Recall that: so MLLR has pushed the new mean very close to the observed data mean, but has done this using a transformation matrix. MLLR Example (Cont.)

ECE 8423: Lecture 19, Slide 6 Observations The state occupancy probabilities determine the speed of adaptation, much like our adaptive filter. The more probable a state is, the more it will influence the overall calculation (another example of maximum likelihood). The larger the occupancy probabilities, the faster the adapted mean will move to the new mean. In general, the mean moves fairly quickly. Question: If all we are doing is replacing the old mean with the new mean, why go through all this trouble? The quality of the new model depends on the amount and richness of the new data. Also, note that this is an unsupervised method, meaning the method does not need “truth-markings” of the adaptation data. Many variants of the MLLR approach exist today, including supervised versions.

ECE 8423: Lecture 19, Slide 7 Transform Sharing Recall in our HMM, we had many states and many Gaussians per state. Transform sharing provides a means for dealing with small amounts of adaptation data under this scheme, even components that are not observed in the adaptation data can be adapted. A common approach is the use of a binary regression class tree. The leaves of the tree are termed as the "base regression classes“. Each Gaussian mixture component of a model set belongs to a single base class. The tree has four base classes, C 4, C 5, C 6, and C 7. During adaptation, occupation counts are accumulated for each of the base classes the dashed circles indicate clusters which have insufficient adaptation observations. The details of this approach are beyond the scope of this course. However, the key point here is that the number of adaptation parameters can be controlled in a manner that is directly related to the overall likelihood.

ECE 8423: Lecture 19, Slide 8 Maximum A Posteriori (MAP) Adaptation The MAP approach to adaptation attempts to maximize the posterior probability of the data given the model: If we have no prior information, we can assume a uniform distribution for, and the MAP estimate is equivalent to the ML estimate. However, we can often estimate the prior from the training data. The MAP estimate can be derived using a combination of the auxiliary function for the ML estimate and the prior: If we assume the prior distribution of the parameters can be modeled as a multivariate Gaussian distribution, we can derive an expression for the MAP estimate of the new mean: where is the number of observations in the adaptation data, is the existing mean estimated on the training data, is the ML estimate of the mean on the adaptation data, and is a balancing factor.

ECE 8423: Lecture 19, Slide 9 MLLR and MAP Comparison We can gain some insight into these methods by examining their performance on a speech recognition task.

ECE 8423: Lecture 19, Slide 10 Demonstrated MLLR on a simple example. Discussed some practical issues in its implementation. Introduced MAP adaptation. Compared the performance of the two on a speech recognition application. Next: Take one more look at MLLR and MAP. Summary