Download presentation
Presentation is loading. Please wait.
1
LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007
2
Missing Data : previous approach Hypothesis: some coefficients of feature vector are masked by noise –marginalization : to replace p(Y|M) by integration Approach presented before: y = x + n (additive case because we are in the spectral domain) –Two cases: If SNR > 0 If SNR < 0 x>n then y/2<x<y x<n then 0<x<y/2 n x x y ….. n x n y y/2 0 0
3
WP1 : Missing Data modified approach presented before –Better approximation of the interval of marginalization
4
Missing Data : new approach To chose the integral limits in function of the mask estimation Interval of marginalization will be smaller
5
Proposed masks Clean speech spectrum X Noisy speech spectrum Y X 1 / Y 1 X 2 / Y 2 X 3 / Y 3 X 4 / Y 4 X 5 / Y 5 X 6 / Y 6 X 7 / Y 7 X 8 / Y 8 Each Time-Frequency unit is a scalar ( in [0;1] ) which is the relative contribution of speech energy in the observed signal. Different from mask based on SNR where each unit give the probability that the corresponding pixel is missing.
6
Cluster 1 Cluster 2Cluster 3Cluster 4 Each cluster k is represented by: –a mean vector: μ k = (μ 1, …, μ N ) –a diagonal covariance matrix: Σ k = diag(σ 1, …, σ N ) Clusters can be seen as pdfs of the contribution of speech energy in the noisy observed signal. We propose to consider these clusters as potential missing data masks for any noisy input frame Proposed masks
7
Missing data :training –For each mask k a GMM model is trained with observation on the noisy frames Y aligned with M k –Construction of ergodic HMM with previous GMMs
8
Missing data : recognition Use ergodic HMM to find the mask k for each frame –Each frame y(t) -> one state -> mask Use i k and i k of M k to define the marginalization interval: – [ i k - 2 i k, i k +2 i k ] Marginalization:
9
Missing data : delta coefficients All formulas presented before was only for static coefficient Computation of delta: As X i is unknown :
10
Missing Data: Experiments Parameterization –Spectral domain 12 Mel bands + + training –HMM models on clean Aurora4 + adaptation with 50 first sentences HIWIRE clean –M k : trained on noisy HIWIRE (50 first sentences) LN+MN+HM+clean Test –Noisy HIWIRE (50 last sentences)
11
Visualisation of the marginalisation intervals on an example Clean LN New method Previous method One spectral coefficient for word « standby »
12
Visualisation of the marginalisation intervals on an example new methodprevious method MN HN
13
WER evaluation new previous
14
WER based evaluation Comparison with ETSI AFE: New
15
Results Oracle : X/Y -> M k -> marginalisation WER % previous new
16
New method : High Noise problem True value is outside of the marginalization interval
17
Conclusion Better approximation of the interval of marginalization gives better recognition results especially for LN and MN conditions But mask estimation must be improved in MN and HN conditions
18
WP2: Non-native speech recognition Previous work –2 sets of models: TIMIT HMM models Native (Fr, It, Gr, Sp) HMM models –Confusion rules –Integration of the rules in HMM New study: –Different sets of models
19
Different sets of models TIMIT models (canonical English models) Native models L={Fr, It, Sp, Gr} MLLR adapted models –TIMIT HMM adapted on HIWIRE L MAP adapted models –TIMIT HMM adapted on HIWIRE L Re-estimated models –TIMIT HMM + Baum-Welch iterations using HIWIRE L
20
Experimental conditions Adaptation and re-estimation: –Cross-validation system (leave one out): All speakers exept one for adaptation or re- estimation The remaining speaker for testing
21
Results HMM TIMIT TIMIT+ native Retraining on HIWIRE MLLR adaptation with HIWIRE MAP adaptation with HIWIRE Word loop grammar HIWIRE grammar
22
Results with confusion rules integrated in HMM (HIWIRE grammar) 5.3 10.2 WER SER 5.8 11.8 4.8 10.9 3.5 8.1 2.8 6.4 2.8 6.5 2.1 5.0 Baseline7.2 14.6 Best result with TIMIT HMM models (canonical English) + retrained models
23
Results with speaker adaptation Using the best system of the previous slide (confusion rules integrated in TIMIT HMM + re-estimation) we add a speaker adaptation step: –50 first sentences per speaker for adaptation –MAP adaptation –Hiwire grammar WER : 1.4% SER : 3.2%
24
Conclusion Different sets of models have been tested Baseline results : –WER : 7.2% SER : 14.6% Best result is obtained with Confusion with TIMIT HMM + re- estimation+MAP speaker adaptation : –WER : 1.4% SER : 3.2%
26
Example of acoustic model modification for english phone /t / /t/ /k/ //// /t / //// //// Extracted rules Modifed structure of HMM for model /t / English phonesFrench phones English model French models
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.