Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.

Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002

Outline Motivation of having Speaker Identification and Language Identification System Motivation of having Speaker Identification and Language Identification System Reasons of using Gaussian Mixture Model Reasons of using Gaussian Mixture Model Experiment Experiment Results Results Conclusion Conclusion

Motivation of having Language Identification To detect multi-lingual speech for further speech recognition system such as IBM ViaVoice and Microsoft SAPI To detect multi-lingual speech for further speech recognition system such as IBM ViaVoice and Microsoft SAPI Demo Demo

Motivation of having Speaker Identification Speaker Identification Systems using speech recognition technique: Speaker Identification Systems using speech recognition technique:  Security Lock System  Speaker Tracking System to locating the specified person in the video clips (Demo)  Voice Mail System

Reasons of using Gaussian Mixture Model Gaussian Mixture Model is a type of density model which includes a number of component functions Gaussian Mixture Model is a type of density model which includes a number of component functions These component functions are combined to provide a multi-model density These component functions are combined to provide a multi-model density

Reasons of using Gaussian Mixture Model (cont’) Because speakers and languages have their own statistical density Because speakers and languages have their own statistical density

Reasons of using Gaussian Mixture Model (cont’) Expectation-Maximization (EM) is a well established maximum likelihood algorithm for fitting a mixture model to a set of training data. Expectation-Maximization (EM) is a well established maximum likelihood algorithm for fitting a mixture model to a set of training data. By EM, a set of GMMs is trained and used for recognition. By EM, a set of GMMs is trained and used for recognition.

Steps of the Identification Systems Pre-processing Stage: Pre-processing Stage:  1. Collect the specific speakers or languages sound clips  2. Train the GMM using the collected sound clips Testing Stage: Testing Stage:  1. Calculate the scores of each GMM  2. Select the maxium score as the result

System Diagram

Probability Score Calculation After EM training, the GMM is include means and variances for each mixture component functions. After EM training, the GMM is include means and variances for each mixture component functions. Using the equation shown below, the score is calculated: Using the equation shown below, the score is calculated:

Experiment Setup In Common: In Common:  Testing and Training Data  Score: News Video Clips  Audio Data: 22kHz  Feature: 24 MFCC  No. of Mixture: 256  Duration: >6 seconds

Language Identification Experiment Setup For Language Identification System: For Language Identification System:  Three Languages are tested: 1. English, 2. Cantonese and 3. Putonghua  14 sound clips are tested for each case, i.e. 42 sound clips in total

Result of Language Identification Overall it can get 36 out of 42 correct. (~85% accurate) Overall it can get 36 out of 42 correct. (~85% accurate) Thus, the language is identified and the appropriated speech recognition engine is selected for “speech to text” process. Thus, the language is identified and the appropriated speech recognition engine is selected for “speech to text” process.

Speaker Identification Experiment Setup For Speaker Identification System: For Speaker Identification System:  Testing speakers: five males and five females, totally 50 sound clips  Can be close-set or open-set experiment  For close-set experiment, we select the maximum score achieved as our result.  For open-set experiment, the result score is normalized with the score of silence model

Result of Speaker Identification Close-set Close-set  i.e. select the best speaker within the group  correct: 49 out of 50 (~98%) Open-set Open-set  correct: 45 out of 50 (~90%)  false alarm (accepted wrong speaker): 1.5%  false reject (rejected correct speaker): 6%

Conclusion GMM is suitable for statistical analysis GMM is suitable for statistical analysis Language Identification ==> 85% Language Identification ==> 85% Speaker Identification ==> ~90% Speaker Identification ==> ~90%

Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.

Similar presentations

Presentation on theme: "Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.

Similar presentations

Presentation on theme: "Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002."— Presentation transcript:

Similar presentations

About project

Feedback