LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007.

LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007

Missing Data : previous approach Hypothesis: some coefficients of feature vector are masked by noise –marginalization : to replace p(Y|M) by integration Approach presented before: y = x + n (additive case because we are in the spectral domain) –Two cases: If SNR > 0 If SNR < 0 x>n then y/2<x<y x<n then 0<x<y/2 n x x y ….. n x n y y/2 0 0

WP1 : Missing Data modified approach presented before –Better approximation of the interval of marginalization

Missing Data : new approach To chose the integral limits in function of the mask estimation Interval of marginalization will be smaller

Proposed masks Clean speech spectrum X Noisy speech spectrum Y X 1 / Y 1 X 2 / Y 2 X 3 / Y 3 X 4 / Y 4 X 5 / Y 5 X 6 / Y 6 X 7 / Y 7 X 8 / Y 8 Each Time-Frequency unit is a scalar ( in [0;1] ) which is the relative contribution of speech energy in the observed signal. Different from mask based on SNR where each unit give the probability that the corresponding pixel is missing.

Cluster 1 Cluster 2Cluster 3Cluster 4 Each cluster k is represented by: –a mean vector: μ k = (μ 1, …, μ N ) –a diagonal covariance matrix: Σ k = diag(σ 1, …, σ N ) Clusters can be seen as pdfs of the contribution of speech energy in the noisy observed signal. We propose to consider these clusters as potential missing data masks for any noisy input frame Proposed masks

Missing data :training –For each mask k a GMM model is trained with observation on the noisy frames Y aligned with M k –Construction of ergodic HMM with previous GMMs

Missing data : recognition Use ergodic HMM to find the mask k for each frame –Each frame y(t) -> one state -> mask Use  i k and  i k of M k to define the marginalization interval: – [  i k - 2  i k,  i k +2  i k ] Marginalization:

Missing data : delta coefficients All formulas presented before was only for static coefficient Computation of delta: As X i is unknown :

Missing Data: Experiments Parameterization –Spectral domain 12 Mel bands + +  training –HMM models on clean Aurora4 + adaptation with 50 first sentences HIWIRE clean –M k : trained on noisy HIWIRE (50 first sentences) LN+MN+HM+clean Test –Noisy HIWIRE (50 last sentences)

Visualisation of the marginalisation intervals on an example Clean LN New method Previous method One spectral coefficient for word « standby »

Visualisation of the marginalisation intervals on an example new methodprevious method MN HN

WER evaluation new previous

WER based evaluation Comparison with ETSI AFE: New

Results Oracle : X/Y -> M k -> marginalisation WER % previous new

New method : High Noise problem True value is outside of the marginalization interval

Conclusion Better approximation of the interval of marginalization gives better recognition results especially for LN and MN conditions But mask estimation must be improved in MN and HN conditions

WP2: Non-native speech recognition Previous work –2 sets of models: TIMIT HMM models Native (Fr, It, Gr, Sp) HMM models –Confusion rules –Integration of the rules in HMM New study: –Different sets of models

Different sets of models TIMIT models (canonical English models) Native models L={Fr, It, Sp, Gr} MLLR adapted models –TIMIT HMM adapted on HIWIRE L MAP adapted models –TIMIT HMM adapted on HIWIRE L Re-estimated models –TIMIT HMM + Baum-Welch iterations using HIWIRE L

Experimental conditions Adaptation and re-estimation: –Cross-validation system (leave one out): All speakers exept one for adaptation or re- estimation The remaining speaker for testing

Results HMM TIMIT TIMIT+ native Retraining on HIWIRE MLLR adaptation with HIWIRE MAP adaptation with HIWIRE Word loop grammar HIWIRE grammar

Results with confusion rules integrated in HMM (HIWIRE grammar) 5.3 10.2 WER SER 5.8 11.8 4.8 10.9 3.5 8.1 2.8 6.4 2.8 6.5 2.1 5.0 Baseline7.2 14.6 Best result with TIMIT HMM models (canonical English) + retrained models

Results with speaker adaptation Using the best system of the previous slide (confusion rules integrated in TIMIT HMM + re-estimation) we add a speaker adaptation step: –50 first sentences per speaker for adaptation –MAP adaptation –Hiwire grammar WER : 1.4% SER : 3.2%

Conclusion Different sets of models have been tested Baseline results : –WER : 7.2% SER : 14.6% Best result is obtained with Confusion with TIMIT HMM + re- estimation+MAP speaker adaptation : –WER : 1.4% SER : 3.2%

Example of acoustic model modification for english phone /t  /  /t/  /k/  //// /t  / //// //// Extracted rules Modifed structure of HMM for model /t  / English phonesFrench phones English model French models

LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007.

Similar presentations

Presentation on theme: "LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007.

Similar presentations

Presentation on theme: "LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007."— Presentation transcript:

Similar presentations

About project

Feedback