Download presentation
Presentation is loading. Please wait.
Published byGary Fowler Modified over 9 years ago
2
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors http://www.cubs.buffalo.edu
3
Speech Fundamentals Characterizing speech Content (Speech recognition) Signal representation (Vocoding) Waveform Parametric( Excitation, Vocal Tract) Signal analysis (Gender determination, Speaker recognition) Terminologies Phonemes : Basic discrete units of speech. English has around 42 phonemes. Language specific Types of speech Voiced speech Unvoiced speech(Fricatives) Plosives Formants
4
Speech production Speech production mechanismSpeech production model Impulse Train Generator Glottal Pulse Model G(z) Vocal Tract Model V(z) Radiation Model R(z) Noise source Pitch AvAv ANAN 17 cm
5
Nature of speech Spectrogram
6
Vocal Tract modeling Signal Spectrum Smoothened Signal Spectrum The smoothened spectrum indciates the locations of the formants of each user The smoothened spectrum is obtained by cepstral coefficients
7
Parametric Representations: Formants Formant Frequencies Characterizes the frequency response of the vocal tract Used in characterization of vowels Can be used to determine the gender
8
Parametric Representations:LPC Linear predictive coefficients Used in vocoding Spectral estimation 5 2 20 40 200
9
Parametric Representations:Cepstrum P[n]G(z) V(z)R(z) u[n] PitchAvAv ANAN D[]L[]D -1 [] x 1 [n]*x 2 [n] x 1 ‘[n]+x 2 ‘[n] y 1 ‘[n]+y 2 ‘[n] y 1 [n]*y 2 [n] DFT[]LOG[]IDFT[] x 1 [n]*x 2 [n] X 1 (z)X 2 (z) x1‘[n]+x2‘[n] log(X 1 (z)) + log(X 2 (z)) 5 10 40
10
Speaker Recognition Definition It is the method of recognizing a person based on his voice It is one of the forms of biometric identification Depends of speaker dependent characteristics. Speaker Recognition Speaker IdentificationSpeaker VerificationSpeaker Detection Text Dependent Text Independent Text Dependent Text Independent
11
Generic Speaker Recognition System Preprocessing Feature Extraction Pattern Matching Preprocessing Feature Extraction Speaker Model Verification Enrollment A/D Conversion End point detection Pre-emphasis filter Segmentation LAR Cepstrum LPCC MFCC Stochastic Models GMM HMM Template Models DTW Distance Measures Speech signal Analysis FramesFeature Vector Score Choice of features Differentiating factors b/w speakers include vocal tract shape and behavioral traits Features should have high inter-speaker and low intra speaker variation
12
Our Approach Silence Removal Cepstrum Coefficients Cepstral NormalizationLong time average Polynomial Function Expansion Dynamic Time Warping Distance Computation Reference Template Preprocessing Feature Extraction Speaker model Matching
13
Silence Removal Preprocessing Feature Extraction Speaker model Matching
14
Pre-emphasis Preprocessing Feature Extraction Speaker model Matching
15
Segmentation Preprocessing Feature Extraction Speaker model Matching Short time analysis The speech signal is segmented into overlapping ‘Analysis Frames’ The speech signal is assumed to be stationary within this frame Q 31 Q 32 Q 33 Q 34
16
Feature Representation Preprocessing Feature Extraction Speaker model Matching Speech signal and spectrum of two users uttering ‘ONE’
17
Speaker Model F 1 = [a1…a10,b1…b10] F 2 = [a1…a10,b1…b10] F N = [a1…a10,b1…b10] …………….
18
Dynamic Time Warping Preprocessing Feature Extraction Speaker model Matching The DTW warping path in the n-by-m matrix is the path which has minimum average cumulative cost. The unmarked area is the constrain that path is allowed to go.
19
Results Distances are normalized w.r.t. length of the speech signal Intra speaker distance less than inter speaker distance Distance matrix is symmetric
20
Matlab Implementation
21
THANK YOU
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.