Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors http://www.cubs.buffalo.edu

Speech Fundamentals Characterizing speech Content (Speech recognition) Signal representation (Vocoding) Waveform Parametric( Excitation, Vocal Tract) Signal analysis (Gender determination, Speaker recognition) Terminologies Phonemes : Basic discrete units of speech. English has around 42 phonemes. Language specific Types of speech Voiced speech Unvoiced speech(Fricatives) Plosives Formants

Speech production Speech production mechanismSpeech production model Impulse Train Generator Glottal Pulse Model G(z) Vocal Tract Model V(z) Radiation Model R(z) Noise source Pitch AvAv ANAN 17 cm

Nature of speech Spectrogram

Vocal Tract modeling Signal Spectrum Smoothened Signal Spectrum The smoothened spectrum indciates the locations of the formants of each user The smoothened spectrum is obtained by cepstral coefficients

Parametric Representations: Formants Formant Frequencies Characterizes the frequency response of the vocal tract Used in characterization of vowels Can be used to determine the gender

Parametric Representations:LPC Linear predictive coefficients Used in vocoding Spectral estimation 5 2 20 40 200

Parametric Representations:Cepstrum P[n]G(z) V(z)R(z) u[n] PitchAvAv ANAN D[]L[]D -1 [] x 1 [n]*x 2 [n] x 1 ‘[n]+x 2 ‘[n] y 1 ‘[n]+y 2 ‘[n] y 1 [n]*y 2 [n] DFT[]LOG[]IDFT[] x 1 [n]*x 2 [n] X 1 (z)X 2 (z) x1‘[n]+x2‘[n] log(X 1 (z)) + log(X 2 (z)) 5 10 40

Speaker Recognition Definition It is the method of recognizing a person based on his voice It is one of the forms of biometric identification Depends of speaker dependent characteristics. Speaker Recognition Speaker IdentificationSpeaker VerificationSpeaker Detection Text Dependent Text Independent Text Dependent Text Independent

Generic Speaker Recognition System Preprocessing Feature Extraction Pattern Matching Preprocessing Feature Extraction Speaker Model Verification Enrollment A/D Conversion End point detection Pre-emphasis filter Segmentation LAR Cepstrum LPCC MFCC Stochastic Models GMM HMM Template Models DTW Distance Measures Speech signal Analysis FramesFeature Vector Score Choice of features Differentiating factors b/w speakers include vocal tract shape and behavioral traits Features should have high inter-speaker and low intra speaker variation

Our Approach Silence Removal Cepstrum Coefficients Cepstral NormalizationLong time average Polynomial Function Expansion Dynamic Time Warping Distance Computation Reference Template Preprocessing Feature Extraction Speaker model Matching

Silence Removal Preprocessing Feature Extraction Speaker model Matching

Pre-emphasis Preprocessing Feature Extraction Speaker model Matching

Segmentation Preprocessing Feature Extraction Speaker model Matching Short time analysis The speech signal is segmented into overlapping ‘Analysis Frames’ The speech signal is assumed to be stationary within this frame Q 31 Q 32 Q 33 Q 34

Feature Representation Preprocessing Feature Extraction Speaker model Matching Speech signal and spectrum of two users uttering ‘ONE’

Speaker Model F 1 = [a1…a10,b1…b10] F 2 = [a1…a10,b1…b10] F N = [a1…a10,b1…b10] …………….

Dynamic Time Warping Preprocessing Feature Extraction Speaker model Matching The DTW warping path in the n-by-m matrix is the path which has minimum average cumulative cost. The unmarked area is the constrain that path is allowed to go.

Results Distances are normalized w.r.t. length of the speech signal Intra speaker distance less than inter speaker distance Distance matrix is symmetric

Matlab Implementation

THANK YOU

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Similar presentations

Presentation on theme: "Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Similar presentations

Presentation on theme: "Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors"— Presentation transcript:

Similar presentations

About project

Feedback