Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan Speaker Recognition Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Speaker Identification Speaker Recognition Definition It is the method of recognizing a person based on his voice It is one of the forms of biometric identification Depends of speaker dependent characteristics. Speaker Recognition Speaker Identification Speaker Verification Speaker Detection Text Dependent Independent EE 516 Term Project, Fall 2003
Speech production Speech production mechanism Speech production model Impulse Train Generator Glottal Pulse Model G(z) Vocal Tract V(z) Radiation R(z) Noise source Pitch Av AN Speech production mechanism Speech production model EE 516 Term Project, Fall 2003
Generic Speaker Recognition System Speech signal Score Analysis Frames Feature Vector Preprocessing Feature Extraction Pattern Matching Verification Preprocessing Feature Extraction Speaker Model Enrollment Stochastic Models GMM HMM Template Models DTW Distance Measures LAR Cepstrum LPCC MFCC A/D Conversion End point detection Pre-emphasis filter Segmentation Choice of features Differentiating factors b/w speakers include vocal tract shape and behavioral traits Features should have high inter-speaker and low intra speaker variation EE 516 Term Project, Fall 2003
Our Approach Silence Removal Cepstrum Coefficients Cepstral Normalization Long time average Polynomial Function Expansion Dynamic Time Warping Distance Computation Reference Template Preprocessing Feature Extraction Speaker model Matching EE 516 Term Project, Fall 2003
Silence Removal Preprocessing Feature Extraction Speaker model Matching EE 516 Term Project, Fall 2003
Pre-emphasis Preprocessing Feature Extraction Speaker model Matching EE 516 Term Project, Fall 2003
Segmentation Preprocessing Feature Extraction Speaker model Matching Short time analysis The speech signal is segmented into overlapping ‘Analysis Frames’ The speech signal is assumed to be stationary within this frame Q31 Q32 Q33 Q34 EE 516 Term Project, Fall 2003
Feature Representation Preprocessing Feature Extraction Speaker model Matching Speech signal and spectrum of two users uttering ‘ONE’ EE 516 Term Project, Fall 2003
Smoothened Signal Spectrum Vocal Tract modeling Preprocessing Feature Extraction Speaker model Matching Signal Spectrum Smoothened Signal Spectrum The smoothened spectrum indciates the locations of the formants of each user The smoothened spectrum is obtained by cepstral coefficients EE 516 Term Project, Fall 2003
Cepstral coefficients P[n] G(z) V(z) R(z) u[n] Pitch Av AN Preprocessing Feature Extraction Speaker model Matching D[] L[] D-1[] x1[n]*x2[n] x1‘[n]+x2‘[n] y1‘[n]+y2‘[n] y1[n]*y2[n] DFT[] LOG[] IDFT[] x1[n]*x2[n] X1(z)X2(z) x1‘[n]+x2‘[n] log(X1(z)) + log(X2(z)) EE 516 Term Project, Fall 2003
Speaker Model F1 = [a1…a10,b1…b10] F2 = [a1…a10,b1…b10] ……………. FN = [a1…a10,b1…b10] ……………. EE 516 Term Project, Fall 2003
Dynamic Time Warping Preprocessing Feature Extraction Speaker model Matching The DTW warping path in the n-by-m matrix is the path which has minimum average cumulative cost. The unmarked area is the constrain that path is allowed to go. EE 516 Term Project, Fall 2003
Results Distances are normalized w.r.t. length of the speech signal Intra speaker distance less than inter speaker distance Distance matrix is symmetric EE 516 Term Project, Fall 2003
Matlab Implementation EE 516 Term Project, Fall 2003
THANK YOU