International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Slides:



Advertisements
Similar presentations
Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University.
Advertisements

Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
Speaker Adaptation for Vowel Classification
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Advanced Signal Processing 2, SE Professor Horst Cerjak, Andrea Sereinig Graz, Basics of Hidden Markov Models Basics of HMM-based.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Isolated-Word Speech Recognition Using Hidden Markov Models
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Gaussian Mixture Model and the EM algorithm in Speech Recognition
Study of Word-Level Accent Classification and Gender Factors
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
7-Speech Recognition Speech Recognition Concepts
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, and.
Jacob Zurasky ECE5526 – Spring 2011
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Tom Ko and Brian Mak The Hong Kong University of Science and Technology.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Performance Comparison of Speaker and Emotion Recognition
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
For multivariate data of a continuous nature, attention has focussed on the use of multivariate normal components because of their computational convenience.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
1 A speech recognition system for Swedish running on Android Simon Lindholm LTH May 7, 2010.
1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
ARTIFICIAL NEURAL NETWORKS
Statistical Models for Automatic Speech Recognition
Computational NeuroEngineering Lab
Statistical Models for Automatic Speech Recognition
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
LECTURE 15: REESTIMATION, EM AND MIXTURES
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee

 INTRODUCTION  GMM SPEAKER IDENTIFICATION SYSTEM  EXPERIMENTAL EVALUATION  CONCLUSION

 Speaker recognition is generally divided into two tasks ◦ Speaker Verification(SV) ◦ Speaker Identification(SI)  Speaker model ◦ Text-dependent(TD) ◦ Text-independent(TI)

 Many approaches have been proposed for TI speaker recognition ◦ VQ based method ◦ Hidden Markov Models ◦ Gaussian Mixture Model  VQ based method

 Hidden Markov Models ◦ State Probability ◦ Transition Probability  Classify acoustic events corresponding to HMM states to characterize each speaker in TI task  TI performance is unaffected by discarding transition probabilities in HMM models

 Gaussian Mixture Model ◦ Corresponds to a single state continuous ergodic HMM ◦ Discarding the transition probabilities in the HMM models  The use of GMM for speaker identity modeling ◦ The Gaussian components represent some general speaker- dependent spectral shapes ◦ The capability of Gaussian mixture to model arbitrary densities

 The GMM speaker identification system consists of the following elements ◦ Speech processing ◦ Gaussian mixture model ◦ Parameter estimation ◦ Identification

 The Mel-scale frequency cepstral coefficients (MFCC) extraction is used in front-end processing Input Speech Signal Pre-Emphasis Frame Hamming Window Hamming Window FFT Triangular band-pass filter Triangular band-pass filter Logarithm DCT Mel-sca1e cepstral feature analysis

 The Gaussian model is a weighted linear combination of M uni-model Gaussian component densities  The mixture weight satisfy the constraint that Where is a D-dimensional vector are the component densities w i, i=1,…,M are the mixture weights

 Each component density is a D-variate Gaussian function of the form  The Gaussian mixture density model are denoted as Where is mean vector is covariance matrix

 Conventional GMM training process Input training vector LBG algorithm EM algorithm Convergence End Y N

Input training vector Overall average Split Clustering Cluster’s average Cluster’s average Calculate Distortion (D-D’)/D< δ D’=D m<M End NY Y N

 Speaker model training is to estimate the GMM parameters via maximum likelihood (ML) estimation  Expectation-maximization (EM) algorithm

 This paper proposes an algorithm consists of two steps

 Cluster the training vectors to the mixture component with the highest likelihood  Re-estimate parameters of each component number of vectors classified in cluster i / total number of training vectors sample mean of vectors classified in cluster i. sample covariance matrix of vectors classified in cluster i

 The feature is classified to the speaker,whose model likelihood is the highest  The above can be formulated in logarithmic term

 Database and Experiment Conditions ◦ 7 male and 3 female ◦ The same 40 sentences utterances with different text ◦ The average sentences duration is approximately 3.5 s  Performance Comparison between EM and Highest Mixture Likelihood Clustering Training ◦ The number of Gaussian components 16 ◦ 16 dimensional MFCCs ◦ 20 utterances is used for training

 Convergence condition

 The comparison between EM and highest likelihood clustering training on identification rate ◦ 10 sentences were used for training ◦ 25 sentences were used for testing ◦ 4 Gaussian components ◦ 8 iterations

 Effect of Different Number of Gaussian Mixture Components and Amount of Training Data ◦ MFCCs feature dimension is fixed to 12 ◦ 25 sentences is used for testing

 Effect of Feature Set on Performance for Different Number of Gaussian Mixture Components ◦ Combination with first and second order difference coefficients was tested ◦ 10 sentences is used for training ◦ 30 sentences is used for testing

 Comparably to conventional EM training but with less computational time  First order difference coefficients is sufficient to capture the transitional information with reasonable dimensional complexity  The 12 dimensional 16 order GMM and using 5 training sentences achieved 98.4% identification rate