Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Speaker Recognition Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

Speaker Identification
Speaker Recognition Definition It is the method of recognizing a person based on his voice It is one of the forms of biometric identification Depends of speaker dependent characteristics. Speaker Recognition Speaker Identification Speaker Verification Speaker Detection Text Dependent Independent EE 516 Term Project, Fall 2003

Speech production Speech production mechanism Speech production model
Impulse Train Generator Glottal Pulse Model G(z) Vocal Tract V(z) Radiation R(z) Noise source Pitch Av AN Speech production mechanism Speech production model EE 516 Term Project, Fall 2003

Generic Speaker Recognition System
Speech signal Score Analysis Frames Feature Vector Preprocessing Feature Extraction Pattern Matching Verification Preprocessing Feature Extraction Speaker Model Enrollment Stochastic Models GMM HMM Template Models DTW Distance Measures LAR Cepstrum LPCC MFCC A/D Conversion End point detection Pre-emphasis filter Segmentation Choice of features Differentiating factors b/w speakers include vocal tract shape and behavioral traits Features should have high inter-speaker and low intra speaker variation EE 516 Term Project, Fall 2003

Our Approach Silence Removal Cepstrum Coefficients
Cepstral Normalization Long time average Polynomial Function Expansion Dynamic Time Warping Distance Computation Reference Template Preprocessing Feature Extraction Speaker model Matching EE 516 Term Project, Fall 2003

Silence Removal Preprocessing Feature Extraction Speaker model
Matching EE 516 Term Project, Fall 2003

Pre-emphasis Preprocessing Feature Extraction Speaker model Matching
EE 516 Term Project, Fall 2003

Segmentation Preprocessing Feature Extraction Speaker model Matching
Short time analysis The speech signal is segmented into overlapping ‘Analysis Frames’ The speech signal is assumed to be stationary within this frame Q31 Q32 Q33 Q34 EE 516 Term Project, Fall 2003

Feature Representation
Preprocessing Feature Extraction Speaker model Matching Speech signal and spectrum of two users uttering ‘ONE’ EE 516 Term Project, Fall 2003

Smoothened Signal Spectrum
Vocal Tract modeling Preprocessing Feature Extraction Speaker model Matching Signal Spectrum Smoothened Signal Spectrum The smoothened spectrum indciates the locations of the formants of each user The smoothened spectrum is obtained by cepstral coefficients EE 516 Term Project, Fall 2003

Cepstral coefficients
P[n] G(z) V(z) R(z) u[n] Pitch Av AN Preprocessing Feature Extraction Speaker model Matching D[] L[] D-1[] x1[n]*x2[n] x1‘[n]+x2‘[n] y1‘[n]+y2‘[n] y1[n]*y2[n] DFT[] LOG[] IDFT[] x1[n]*x2[n] X1(z)X2(z) x1‘[n]+x2‘[n] log(X1(z)) + log(X2(z)) EE 516 Term Project, Fall 2003

Speaker Model F1 = [a1…a10,b1…b10] F2 = [a1…a10,b1…b10] …………….
FN = [a1…a10,b1…b10] ……………. EE 516 Term Project, Fall 2003

Dynamic Time Warping Preprocessing Feature Extraction Speaker model
Matching The DTW warping path in the n-by-m matrix is the path which has minimum average cumulative cost. The unmarked area is the constrain that path is allowed to go. EE 516 Term Project, Fall 2003

Results Distances are normalized w.r.t. length of the speech signal
Intra speaker distance less than inter speaker distance Distance matrix is symmetric EE 516 Term Project, Fall 2003

Matlab Implementation
EE 516 Term Project, Fall 2003

THANK YOU

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

Similar presentations

Presentation on theme: "Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

Similar presentations

Presentation on theme: "Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan"— Presentation transcript:

Similar presentations

About project

Feedback