Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Part 2: Unsupervised Learning
Image Modeling & Segmentation
Mixture Models and the EM Algorithm
Expectation Maximization
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Mixture Language Models and EM Algorithm
Visual Recognition Tutorial
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
Lecture 5: Learning models using EM
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Differential Equations (4/17/06) A differential equation is an equation which contains derivatives within it. More specifically, it is an equation which.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Visual Recognition Tutorial
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
Gaussian Mixture Models and Expectation Maximization.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Gaussian Mixture Model and the EM algorithm in Speech Recognition
EM and expected complete log-likelihood Mixture of Experts
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
B AYESIAN L EARNING & G AUSSIAN M IXTURE M ODELS Jianping Fan Dept of Computer Science UNC-Charlotte.
Lecture 19: More EM Machine Learning April 15, 2010.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Lecture 17 Gaussian Mixture Models and Expectation Maximization
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
HMM - Part 2 The EM algorithm Continuous density HMM.
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Today's Specials ● Detailed look at Lagrange Multipliers ● Forward-Backward and Viterbi algorithms for HMMs ● Intro to EM as a concept [ Motivation, Insights]
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Lecture 18 Expectation Maximization
Statistical Models for Automatic Speech Recognition
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
CS 2750: Machine Learning Expectation Maximization
Latent Variables, Mixture Models and EM
Unsupervised-learning Methods for Image Clustering
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Statistical Models for Automatic Speech Recognition
10701 Recitation Pengtao Xie
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 15: REESTIMATION, EM AND MIXTURES
Biointelligence Laboratory, Seoul National University
EM Algorithm 主講人:虞台文.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep Some graphics are from Pattern Recognition and Machine learning, Bishop, Copyright © Springer Some equations are from the slides edited by Muneem S.

Parametric density estimation How to estimate those parameters? –ML –MAP

How about Multimodal cases?

Mixture Models If you believe that the data set is comprised of several distinct populations It has the following form: with

Mixture Models  Let y i  {1,…, M} represents the source that generates the data.

Mixture Models  Let y i  {1,…, M} represents the source that generates the data.

Mixture Models 

Given x and , the conditional density of y can be computed.

Complete-Data Likelihood Function  But we don’t know the latent variable y

Expectation-Maximization (1) If we want to find the local maxima of – Define an auxiliary function at – Satisfy and

Expectation-Maximization (2) The goal: finding ML solutions for probabilistic models with latent variables Difficult, auxiliary function Log(x) is concave Auxiliary function

Expectation-Maximization (2) What is the gap?

Expectation-Maximization (2) We have shown that If we evaluate with old parameters

Expectation-Maximization (3) Any physical meaning of the auxiliary function? Independent of Conditional expectation of the complete data log likelihood

Expectation-Maximization (4) 1.Choose an initial setting for the parameters 2.E step: evaluate 3.M step: evaluate given by 4.Check for convergence if not, and return to step 2

Expectation-Maximization (4)

Guassian Mixture Model (GMM) Guassian model of a d-dimensional source, say j : GMM with M sources:

Mixture Model subject to M step: E step: Correlated with  l only. Correlated with  l only.

Finding  l Due to the constraint on  l ’s, we introduce Lagrange Multiplier, and solve the following equation.

Finding  l 1 N 1

Finding  l Only need to maximize this term Consider GMM unrelated

Finding  l Only need to maximize this term Therefore, we want to maximize: How?

Finding  l Therefore, we want to maximize:

Summary EM algorithm for GMM Given an initial guess  g, find  new as follows Not converge