Download presentation
Presentation is loading. Please wait.
Published byDina Jordan Modified over 9 years ago
1
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99
2
DLFBE ---Preliminary * MFCC is very successful in speech recognition * MFCC computed from the speech signal using the following three steps: 1.Compute the FFT power spectrum of the speech signal 2.Apply a Mel-space filter-bank to the power spectrum to get N energies (N=20~60) 3.Compute discrete cosine x’form (DCT) of log filter-bank energies to get uncorrelated MFCC’s (M=10)
3
DLFBE --- Motivation *MFCC has two drawbacks 1. Does not have any physical interpretataion 2. Liftering of cepstral coefficient has no effect in the modern speech recognition (discuss later) *The two problem(i.e., numbers and correlation) in FBE used in 50’s, 60’s,70’s can be solved today
4
Liftering --- What and How *Lifter is the reweighting process of cepstral coeff. used in DTW framework of speech recognition where is dissimilarity between the test vector and the mean vector Euclidean distance
5
Liftering --- What and How (cont’d) Where is i-th cepstral coeff., is the corresponding liftering coeff. and is the lifter So More general form
6
Liftering --- What and How (cont’d)
7
The types of lifters are listed belows 1.Linear lifter 2.Statistical lifter 3.Sinusoidal lifter 4.Exponential lifter
8
Liftering --- Discussion and Why * The multiplicative weighting in cepstrum domain is equivalent to convolution in spectral domain Spectral domainCepstral domain Type 1 and 2HP filter Emphasize the higher cepstral coeff’s. Type 3 and 4BP filter Lessen the higher and lower cepstral coeff’s.
9
Liftering --- Experiment on DTW
10
Liftering on CDHMM (??) --- Why Mahalanobis distance measure due to out observation prob.
11
Liftering on CDHMM (??) --- Why liftering matrix for MFCC where
12
Liftering on CDHMM (??) --- Why Thus,cepstral liftering has no effect in the recognition process when used with continuous observation Gaussian Density HMM’s
13
Decorrelation of FBE --- Why/How *FBEs are correlated => we can’t use CDHMM * We can use LP techniques to solve this defeat can be obtained by covariance method of LP analysis
14
Liftering of FBE --- How FIR filter N=M+L
15
DLFBE --- Experiment *SI and isolated word recognition using ISOLET spoken letter database *90 training utterances from 90 speakers(45 females,45 males) 30 testing utterances from 30 speakers (15 females,15 males)
16
DLFBE --- Experiment (cont’d)
18
Robust Speech Feature Noise-Invariant Representation for Speech Signal Group Delay Function (GDF) Method Proposed by Bayya & Yegnanarayana in EuroSpeech ‘99
19
GDF --- Motivation *Background noise is a prominent source of mismatch and eliminated roughly by methods as follows 1.compensation cause the overestimation and underestimation side effects Pre- Processing SS(spectral sub.),HP,BP FN(feature normalization) Model Adaptation Parameter x’form
20
GDF --- Motivation (cont’d) 2.new feature not completely noise resistant *All the above use power/amplitude as speech feature Why don’t we use phase information as features ? And phase infor. may be helpful in speech recognition. LPCMEL,PLP (projection concept)
21
GDF --- What/How *GDF is defined as the normalized autocorrelation of a short segment of a signal (#.1) Where is the normalized autocorrelation of a short segment of a signal
22
(#.2) compare(#.1)&(#.2) GDF --- What/How (cont’d)
23
Easy to implement Truncated version of GDF
24
GDF --- What/How (cont’d) where Hanning window
25
GDF --- Why & Experiment *frame length = 5 ms, frame rate = 1 ms & modified autocorrelation sequence averaged over 20 frames then the GDF computed as defined above
26
GDF --- Why & Experiment (cont’d)
27
GDF --- Experiment *Isolated-digit recognition Clean Noisy SI97% 95% YES SD96.5% 94.5% NO Due to large dynamic range ?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.