Presentation is loading. Please wait.

Presentation is loading. Please wait.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Similar presentations


Presentation on theme: "International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee."— Presentation transcript:

1 International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee

2  INTRODUCTION  GMM SPEAKER IDENTIFICATION SYSTEM  EXPERIMENTAL EVALUATION  CONCLUSION

3  Speaker recognition is generally divided into two tasks ◦ Speaker Verification(SV) ◦ Speaker Identification(SI)  Speaker model ◦ Text-dependent(TD) ◦ Text-independent(TI)

4  Many approaches have been proposed for TI speaker recognition ◦ VQ based method ◦ Hidden Markov Models ◦ Gaussian Mixture Model  VQ based method

5  Hidden Markov Models ◦ State Probability ◦ Transition Probability  Classify acoustic events corresponding to HMM states to characterize each speaker in TI task  TI performance is unaffected by discarding transition probabilities in HMM models

6  Gaussian Mixture Model ◦ Corresponds to a single state continuous ergodic HMM ◦ Discarding the transition probabilities in the HMM models  The use of GMM for speaker identity modeling ◦ The Gaussian components represent some general speaker- dependent spectral shapes ◦ The capability of Gaussian mixture to model arbitrary densities

7  The GMM speaker identification system consists of the following elements ◦ Speech processing ◦ Gaussian mixture model ◦ Parameter estimation ◦ Identification

8  The Mel-scale frequency cepstral coefficients (MFCC) extraction is used in front-end processing Input Speech Signal Pre-Emphasis Frame Hamming Window Hamming Window FFT Triangular band-pass filter Triangular band-pass filter Logarithm DCT Mel-sca1e cepstral feature analysis

9  The Gaussian model is a weighted linear combination of M uni-model Gaussian component densities  The mixture weight satisfy the constraint that Where is a D-dimensional vector are the component densities w i, i=1,…,M are the mixture weights

10  Each component density is a D-variate Gaussian function of the form  The Gaussian mixture density model are denoted as Where is mean vector is covariance matrix

11  Conventional GMM training process Input training vector LBG algorithm EM algorithm Convergence End Y N

12 Input training vector Overall average Split Clustering Cluster’s average Cluster’s average Calculate Distortion (D-D’)/D< δ D’=D m<M End NY Y N

13  Speaker model training is to estimate the GMM parameters via maximum likelihood (ML) estimation  Expectation-maximization (EM) algorithm

14  This paper proposes an algorithm consists of two steps

15  Cluster the training vectors to the mixture component with the highest likelihood  Re-estimate parameters of each component number of vectors classified in cluster i / total number of training vectors sample mean of vectors classified in cluster i. sample covariance matrix of vectors classified in cluster i

16  The feature is classified to the speaker,whose model likelihood is the highest  The above can be formulated in logarithmic term

17  Database and Experiment Conditions ◦ 7 male and 3 female ◦ The same 40 sentences utterances with different text ◦ The average sentences duration is approximately 3.5 s  Performance Comparison between EM and Highest Mixture Likelihood Clustering Training ◦ The number of Gaussian components 16 ◦ 16 dimensional MFCCs ◦ 20 utterances is used for training

18  Convergence condition

19  The comparison between EM and highest likelihood clustering training on identification rate ◦ 10 sentences were used for training ◦ 25 sentences were used for testing ◦ 4 Gaussian components ◦ 8 iterations

20  Effect of Different Number of Gaussian Mixture Components and Amount of Training Data ◦ MFCCs feature dimension is fixed to 12 ◦ 25 sentences is used for testing

21  Effect of Feature Set on Performance for Different Number of Gaussian Mixture Components ◦ Combination with first and second order difference coefficients was tested ◦ 10 sentences is used for training ◦ 30 sentences is used for testing

22  Comparably to conventional EM training but with less computational time  First order difference coefficients is sufficient to capture the transitional information with reasonable dimensional complexity  The 12 dimensional 16 order GMM and using 5 training sentences achieved 98.4% identification rate


Download ppt "International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee."

Similar presentations


Ads by Google