International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee

 INTRODUCTION  GMM SPEAKER IDENTIFICATION SYSTEM  EXPERIMENTAL EVALUATION  CONCLUSION

 Speaker recognition is generally divided into two tasks ◦ Speaker Verification(SV) ◦ Speaker Identification(SI)  Speaker model ◦ Text-dependent(TD) ◦ Text-independent(TI)

 Many approaches have been proposed for TI speaker recognition ◦ VQ based method ◦ Hidden Markov Models ◦ Gaussian Mixture Model  VQ based method

 Hidden Markov Models ◦ State Probability ◦ Transition Probability  Classify acoustic events corresponding to HMM states to characterize each speaker in TI task  TI performance is unaffected by discarding transition probabilities in HMM models

 Gaussian Mixture Model ◦ Corresponds to a single state continuous ergodic HMM ◦ Discarding the transition probabilities in the HMM models  The use of GMM for speaker identity modeling ◦ The Gaussian components represent some general speaker- dependent spectral shapes ◦ The capability of Gaussian mixture to model arbitrary densities

 The GMM speaker identification system consists of the following elements ◦ Speech processing ◦ Gaussian mixture model ◦ Parameter estimation ◦ Identification

 The Mel-scale frequency cepstral coefficients (MFCC) extraction is used in front-end processing Input Speech Signal Pre-Emphasis Frame Hamming Window Hamming Window FFT Triangular band-pass filter Triangular band-pass filter Logarithm DCT Mel-sca1e cepstral feature analysis

 The Gaussian model is a weighted linear combination of M uni-model Gaussian component densities  The mixture weight satisfy the constraint that Where is a D-dimensional vector are the component densities w i, i=1,…,M are the mixture weights

 Each component density is a D-variate Gaussian function of the form  The Gaussian mixture density model are denoted as Where is mean vector is covariance matrix

 Conventional GMM training process Input training vector LBG algorithm EM algorithm Convergence End Y N

Input training vector Overall average Split Clustering Cluster’s average Cluster’s average Calculate Distortion (D-D’)/D< δ D’=D m<M End NY Y N

 Speaker model training is to estimate the GMM parameters via maximum likelihood (ML) estimation  Expectation-maximization (EM) algorithm

 This paper proposes an algorithm consists of two steps

 Cluster the training vectors to the mixture component with the highest likelihood  Re-estimate parameters of each component number of vectors classified in cluster i / total number of training vectors sample mean of vectors classified in cluster i. sample covariance matrix of vectors classified in cluster i

 The feature is classified to the speaker,whose model likelihood is the highest  The above can be formulated in logarithmic term

 Database and Experiment Conditions ◦ 7 male and 3 female ◦ The same 40 sentences utterances with different text ◦ The average sentences duration is approximately 3.5 s  Performance Comparison between EM and Highest Mixture Likelihood Clustering Training ◦ The number of Gaussian components 16 ◦ 16 dimensional MFCCs ◦ 20 utterances is used for training

 Convergence condition

 The comparison between EM and highest likelihood clustering training on identification rate ◦ 10 sentences were used for training ◦ 25 sentences were used for testing ◦ 4 Gaussian components ◦ 8 iterations

 Effect of Different Number of Gaussian Mixture Components and Amount of Training Data ◦ MFCCs feature dimension is fixed to 12 ◦ 25 sentences is used for testing

 Effect of Feature Set on Performance for Different Number of Gaussian Mixture Components ◦ Combination with first and second order difference coefficients was tested ◦ 10 sentences is used for training ◦ 30 sentences is used for testing

 Comparably to conventional EM training but with less computational time  First order difference coefficients is sufficient to capture the transitional information with reasonable dimensional complexity  The 12 dimensional 16 order GMM and using 5 training sentences achieved 98.4% identification rate

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Similar presentations

Presentation on theme: "International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Similar presentations

Presentation on theme: "International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee."— Presentation transcript:

Similar presentations

About project

Feedback