Presentation is loading. Please wait.

Presentation is loading. Please wait.

LORIA Irina Illina Dominique Fohr Christophe Cerisara Torino Meeting March 9-10, 2006.

Similar presentations


Presentation on theme: "LORIA Irina Illina Dominique Fohr Christophe Cerisara Torino Meeting March 9-10, 2006."— Presentation transcript:

1

2 LORIA Irina Illina Dominique Fohr Christophe Cerisara Torino Meeting March 9-10, 2006

3 HIWIRE Work package 1: –Missing Data Work package 2: –Non Native Speech Recognition

4 WP1 : Missing Data New approach for noise speech recognition Two steps : –Training of mask models –Recognition with mask

5 Missing Data : Training of Mask Models Computation of mask vectors (« oracle ») for each frame –Spectrum with cuberoot compression –Spectrum for clean data and noisy data –For each frequency band f (1..12) If SNR>0dB then mask(f)=0 else mask(f)=1 Clustering of mask vectors –Euclidian distance –N clusters is used (N=31): each element of a cluster is presented by mask vector and corresponding frame vector (MFCC) Training one GMM per cluster –Observations: observation vectors associated with frames (MFCC+D) Training of one ergodic HMM (N states) –Each state is one of previous GMMs –Only state transition probabilities are trained

6 Missing Data : Recognition with Masks Compute mask vector for each frame –MFCC coefficients –Viterbi alignment using ergodic HMM –Each frame -> one state -> mask Perform marginalization with the masked frames –Spectrum with cuberoot compression

7 Missing Data: Experiments Training –Aurora2 –4 noises (test A) 4 SNR (5 1à 15 20 dB) Test –Aurora2 –Test A and B

8 Baseline (multi-style)Missing Data Test ATest BTest ATest B clean98.698.596.0 SNR 2097.597.294.393.6 SNR 1596.695.892.691.8 SNR 1094.393.188.587.0 SNR 589.084.777.875.6 Average 95.293.889.888.8 Missing Data: Experiments

9 WP2 : Non Native Speech Recognition Method based on phone confusion Presented in Granada meeting Extract confusion rules between english phones and native acoustic models English phone -> french phone ah -> a ah ->  Method based on graphemic contraint Presented in Athens meeting Phone prononciation depends on word grapheme English phone [grapheme] -> french phone ah [A] -> a Approach ah [E] ->  cancEl

10 Non Native Speech Recognition : Method based on Graphemic Constraint Idea : –Example 1 : APPROACH /ah p r ow ch/ APPROACH (A, ah) (PP, p) (R, r) (OA, ow) (CH, ch) –Example 2 : POSITION /p ah z ih sh ah n/ POSITION (P, p) (O, ah) (S, z) (I, ih) (TI, sh) (O, ah) (N, n) Alignment between graphemes and phones for each word of lexicon –Using discret HMM –Each state of HMM is a phone symbol Lexicon modification: add graphemes for each word ( like in examples 1, 2) Confusion rules extraction (grapheme, english phone) → list of non native phones Example: (A, ah) → a Confusion rules integration in acoustic models Recognition

11 Example of acoustic model modification for english phone /t  /  /t/  /k/  //// /t  / //// //// Extracted rules Modifed structure of HMM for model /t  / English phonesFrench phones English model French models

12 Used Approach FrenchItalianSpanish WERSERWERSERWERSER Thales grammar baseline612.810.519.67.014.9 confusion4.610.26.914.15.111.8 +graphemes confusion4.911.38.215.96.213.6 Word loop grammar baseline35.747.943.552.039.953.5 confusion27.342.131.346.231.344.5 +graphemes confusion26.241.930.545.531.346.5 Experiments : HIWIRE Database Training French acoustic models : Broadcast News corpus Training English acoustic models: TIMIT Non native speech recognition : 50 sentences per speaker for rules extraction, 50 sentences per speaker for test

13 Questions about prototype Which noise robustness aproaches will be puted in the prototype? Which speaker robustness aproaches will be puted in the prototype? Who to integrate noise and speaker robustness approaches in the same time? Which grammar to use : Thales grammar or large vocabulary grammar? Real time recognition?


Download ppt "LORIA Irina Illina Dominique Fohr Christophe Cerisara Torino Meeting March 9-10, 2006."

Similar presentations


Ads by Google