Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models C. J. Leggetter and P. C. Woodland Department of.

Slides:



Advertisements
Similar presentations
Speech Recognition with Hidden Markov Models Winter 2011
Advertisements

Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
Supervised Learning Recap
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Chapter 2 Random Vectors 與他們之間的性質 (Random vectors and their properties)
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Chapter 9 Hypothesis tests with the t statistic. 當母體  為未知時 ( 我們通常不知 ) ,用樣本 s 來取代 因為用 s 來估計  ,所呈現出來的分佈已不 是 z distribution ,而是 t distribution.
Department of Air-conditioning and Refrigeration Engineering/ National Taipei University of Technology 模糊控制設計使用 MATLAB 李達生.
1 Part IC. Descriptive Statistics Multivariate Statistics ( 多變量統計 ) Focus: Multiple Regression ( 多元迴歸、複迴歸 ) Spring 2007.
具備人臉追蹤與辨識功能的一個 智慧型數位監視系統 系統架構 在巡邏模式中 ,攝影機會左右來回巡視,並 利用動態膚色偵測得知是否有移動膚色物體, 若有移動的膚色物體則進入到追蹤模式,反之 則繼續巡視。
Introduction to Java Programming Lecture 17 Abstract Classes & Interfaces.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
Speaker Adaptation for Vowel Classification
Gaussian Mixture Example: Start After First Iteration.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
845: Gas Station Numbers ★★★ 題組: Problem Set Archive with Online Judge 題號: 845: Gas Station Numbers. 解題者:張維珊 解題日期: 2006 年 2 月 題意: 將輸入的數字,經過重新排列組合或旋轉數字,得到比原先的數字大,
Learning Method in Multilingual Speech Recognition Author : Hui Lin, Li Deng, Jasha Droppo Professor: 陳嘉平 Reporter: 許峰閤.
Optimal Adaptation for Statistical Classifiers Xiao Li.
1/17 A Study on Separation between Acoustic Models and Its Application Author : Yu Tsao, Jinyu Li, Chin-Hui Lee Professor : 陳嘉平 Reporter : 許峰閤.
Chapter 7 Sampling Distribution
Inference for Simple Regression Social Research Methods 2109 & 6507 Spring 2006 March 15, 16, 2006.
Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.
Mapping - 1 Mapping From ER Model to Relational DB.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆 資料分析與表達.
9.0 Speaker Variabilities: Adaption and Recognition References: of Huang 2. “ Maximum A Posteriori Estimation for Multivariate Gaussian Mixture.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Isolated-Word Speech Recognition Using Hidden Markov Models
Gaussian Mixture Model and the EM algorithm in Speech Recognition
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Random Knapsack in Expected Polynomial Time 老師:呂學一老師.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
ROBUST VIDEO STABILIZATION BASED ON PARTICLE FILTER TRACKING OF PROJECTED CAMERA MOTION IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
HMM - Part 2 The EM algorithm Continuous density HMM.
CS Statistical Machine learning Lecture 24
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Flat clustering approaches
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
數位影像中熵的計算與應用 義守大學 資訊工程學系 黃健興. Outline Entropy Definition Entropy of images Applications Visual Surveillance System Background Extraction Conclusions.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Ch 8 實習.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Statistical Models for Automatic Speech Recognition
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 15: REESTIMATION, EM AND MIXTURES
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models C. J. Leggetter and P. C. Woodland Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, U.K. Computer Speech and Language (1995) Present by Hsu Ting-Wei

2 Introduction Speaker adaptation techniques fall into two main categories: –Speaker normalization The input speech is normalized to match the speaker that the system is trained to model –Model adaptation techniques The parameters of the model set are adjusted to improve the modeling of the new speaker MAP method –Only update the parameters of models which are observed in the adaptation data MLLR method (Maximum Likelihood Linear Regression) –All model states can be adapted even if no model-specific data is available Speaker HMM Models Say: “ Hello! ”

3 MLLR’s adaptation approach This method requires an initial speaker independent continuous density HMM system MLLR takes some adaptation data from a new speaker and updates the model mean parameters to maximize the likelihood of the adaptation data The other HMM parameters are not adapted since the main differences between speakers are assumed to be characterized by the means

4 MLLR’s adaptation approach (cont.) Consider the case of a continuous density HMM system with Gaussian output distributions. A particular distribution s,is characterized by a mean vector, and a covariance matrix Given a parameterized speech frame vector, the probability density of that vector being generated by distribution s is where n is the dimension of the observation vector S speech frame vector

5 MLLR’s adaptation approach (cont.) We use the following equation We can simply it where So the probability density function for the adapted system becomes n*(n+1) extended mean vector 要調適的分佈的 mean 值 所串起的向量 offset = 1, include an offset in the regression offset = 0, ignore offsets 若調適語者的錄音環境與初始模型錄音環境不同時, 可以加入的一項參數 [ 參考資料 ] Original.. (n+1)*1 transformation matrices (1)

6 MLLR’s adaptation approach (cont.) The transformation matrices are calculated to maximize the likelihood of the adaptation data The transformation matrices can be implemented using the forward–backward algorithm A more general approach is adopted in which the same transformations matrix is used for several distributions. If some of the distributions are not observed in the adaptation data, a transformation may still be applied (global transformation)

7 Estimation of MLLR regression matrices 1.Definition of auxiliary function objective function S speech frame vector E-step

8 Estimation of MLLR regression matrices (cont.) 2.Maximization of auxiliary function only related with mean (2) (3)

9 Estimation of MLLR regression matrices (cont.) 2.Maximization of auxiliary function (cont.) (4) expanding this term

10 Estimation of MLLR regression matrices (cont.) 2.Maximization of auxiliary function (cont.)

11 Estimation of MLLR regression matrices (cont.) 2.Maximization of auxiliary function (cont.) M-step <= 估測 的 general form (5)

12 Estimation of MLLR regression matrices (cont.) 3.Re-estimation formula for tied regression matrices [(n+1)*1][1*(n+1)] =(n+1) *(n+1) 當調適語料不夠多時,可以將調適語料中相關性較大的狀態分為 同一類,利用在同一類別中所收集到的語料來估測 Ws 。

13 Estimation of MLLR regression matrices (cont.) 3.Re-estimation formula for tied regression matrices (cont.) (7)is denoted by n*(n+1) matrix Y(7)is denoted by n*(n+1) matrix Z ?

14 Special cases of MLLR 1.Least squares regression (XX ’ )YX ’

15 Special cases of MLLR (cont.) 1.Least squares regression (cont.)

16 Special cases of MLLR (cont.) 2.Single variable linear regression

17 Special cases of MLLR (cont.) 2.Single variable linear regression (cont.) M-step

18 Defining regression classes When regression matrices are tied across mixture components, each matrix is associated with many mixture components. For the tied approach to be effective it is desirable to put all the mixture components which will use similar transforms into the same class. Two approaches for defining regression classes were considered: –Based on broad phonetic classes All mixture components in any model representing the same broad phonetic class (e.g. fricatives, nasals, etc.) were placed in the same regression class. –Based on clustering of mixture components The mixture components were compared using a likelihood measure and similar components placed in the same regression class.

19 Experiment: Full regression matrix V.S. Diagonal regression matrix SD SI diagonal full : a lot of parameters

20 Experiment: Full matrix using global regression class SD SI adapted

21 Experiment: Supervised v.s Unsupervised SD SI supervised unsupervised

22 Conclusion MLLR can be applied to continuous density HMMs with a large number of Gaussians and is effective with small amounts of adaptation data.