Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language Technology Workshop, 1995

2 Outline Introduction MLLR Overview Fixed and Dynamic Regression Classes Supervised Adaptation vs. Unsupervised Adaptation Evaluation on WSJ Data Conclusion

3 Introduction Speaker Independent (SI) Recognition systems – Poor performance – Easy to get lots of training data Speaker Dependent (SD) Recognition systems – Better performance – Difficult to get enough training data Solution: SI system + adaptation with little SD data – Advantage: Little SD data is required – Problem: some models are not updated

4 Introduction (aim of the paper) MLLR (Maximum Likelihood Linear Regression) approach – Parameter transformation technique – All models are updated with little adaptation data – Adapts the SI system by transforming the mean parameters with a set of linear transforms Dynamic Regression Classes approach – Optimizing the adaptation procedure during runtime – Allows all models of adaptation to be performed in a single framework

5 MLLR Overview Regression Classes – The set of Gaussians that shares the same transformation SD Data Mixture components Regression Classes Transformation Matrix (W) 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 estimate transform

6 MLLR Overview (cont.) SI meanSD mean Therefore, for a single Gaussian distribution, the probability density function of state j generating a speech observation vector o of dimension n is:

7 Estimation of MLLR matrices Gaussian covariance matrices are diagonal A set of T frames of adaptation data O = o 1 o 2 … o T W j is tied between R Gaussians j 1 j 2 … j R Wj can be updated column by column:

8 Estimation of MLLR matrices (cont.) z i = i th column of Z: The probability of occupying state j at time t while generating O: c (r) ii is the i th diagonal element of the r th tied state covariance scaled by the total state occupation probability

9 MLLR for Incremental Adaptation Can be implemented by accumulating the time dependent components separately Accumulate the observation vectors associated with each Gaussian and the associated occupation probability – MLLR equations can be implemented as any time

10 Fixed Regression Classes Regression classes are predetermined by assessing – the amount of adaptation data – Mixture component clustering procedure based on a likelihood measure Number of regression classes is roughly proportional to the number of adaptation data Disadvantage: – Needs to know the adaptation data in advance – Some regression classes might not have sufficient amount of data Poor estimates of the transformations Class may be dominated by a specific mixture component

11 Dynamic Regression Classes Mixture components are arranged into a tree Leaves of the tree are: – For small HMM system: individual mixture component – For large HMM system: base classes containing a set of mixture components These components are similar in divergence measure Leaves in a tree are then merged into groups of similar components based on a distance measure (divergence)

12 Supervised Adaptation vs. Unsupervised Adaptation Note: Fixed regression class approach was used Figure: Supervised vs. Unsupervised adaptation using RM corpus

13 Evaluation on WSJ Data Experiment settings – Dynamic regression classes approach – Baseline Speaker Independent system (refer to 5.1) S3 test: – Static supervised adaptation for non-native speakers S4 test: – Incremental unsupervised adaptation for native speakers

14 Regression Class Tree Settings Distance measure: – Divergence between mixture components Use clustering algorithm to generate 750 base classes – 750 mixture components were chosen – Assign the nearest 10 to each base class – Assign the rest to the base classes by using an average distance measure from all the existing members Regression tree was then built in a similar distance measure – Base classes are compared in pair-wise basis using an average divergence between all members of each class

15 S3 Test Results Regression Classes Iterations MLLR % Word Error S3-dev 94S3-Nov 94 Native speaker recognizer n/a27.1420.72 Baseline 20.8216.67 Tree112.8011.52 Tree212.3410.99 Tree312.2010.99 Global216.1013.81

16 S4 Test Results Regression Classes Update Interval % Word Error S4-dev 94S4-Nov 94 Baseline 9.087.76 Tree16.666.43 Tree56.696.58 Tree106.766.62 Global17.277.04 Note: Increase update interval: large reduction in adaptation computation and only small drop in performance

17 Number of classes vs. number of sentences (S4 Test)

18 Adaptation in Nov’94 Hi-P0 HTK System Unsupervised adaptation Adapt for 15 sentences from each speaker from unfiltered newspaper articles About 15 million parameter in this HMM set Used 750 base classes Adaptation % Word Error H2-dev’94H1 Nov’94 No8.307.93 Yes7.287.18

19 Conclusion MLLR approach can be used for both static and incremental adaptation MLLR approach can be used for both supervised and unsupervised adaptation Dynamic regression classes

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

Similar presentations

Presentation on theme: "Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

Similar presentations

Presentation on theme: "Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language."— Presentation transcript:

Similar presentations

About project

Feedback