Presentation is loading. Please wait.

Presentation is loading. Please wait.

LID/SID - Research Stay at BUT Last Presentation Luis Fernando D’Haro Polytechnical University of Madrid Granted by “José Castillejo” fellowship Education.

Similar presentations

Presentation on theme: "LID/SID - Research Stay at BUT Last Presentation Luis Fernando D’Haro Polytechnical University of Madrid Granted by “José Castillejo” fellowship Education."— Presentation transcript:

1 LID/SID - Research Stay at BUT Last Presentation Luis Fernando D’Haro Polytechnical University of Madrid Granted by “José Castillejo” fellowship Education Ministry - Spanish Government February 20 th, 2012

2 Page 2 of 21 Page 2 Outline Research stay goals Work on phonotactic LID  Discriminative n-grams  New phonotactic system Using i-vectors and multinomial subspace model Work on LID-RATS  VAD and LID Future work

3 Page 3 of 21 Page 3 Research Stay Goals To work with most recent techniques for LID such as:  i-Vectors, sGMM, WCCN, score calibration/fusion To test our ranking templates and discriminative n-gram selection approach with the acoustic i-Vector system for LID task Ideas:  Fusion of scores  Selection of discriminative n-grams Collaboration on current BUT campaigns  RATS, LRE, SRE Publications

4 Page 4 of 21 Page 4 Work on Phonotactic LID LID based on ranking positions and distance Original idea: Original idea: [Cavnar and Trenkle, 1994]

5 Improvements to the Ranking approach One ranking for each n-gram order Golf position  All n-grams with the same number of occurrences share the same position in the ranking Discriminative positions in the ranking  Put in higher positions of the rank the most relevant n-grams for each language i.e. very frequent in one language but not in the others  A new formula inspired on td-idf providing normalized scores (1, and -1) Advantages: high order n-grams (up to 5-g) More details at [Caraballo et al, 2010] Page 5 of 21

6 Experiments on LRE09 Baseline: phonotactic PCA [Mikolov et al, 2010]  Use soft-counts n-grams for different phone recognizers Our system uses only the normalized score generated by the system, not the classifier  O ur baseline classifier based on distance among languages did not work fine Approaches:  Comparison/fusion with the PCA system  Fusion with acoustic iVectors system (400 iVectors, 2048 Gauss)  Selection of discriminative n-grams Goal: reduce the input vector of n-gram soft-counts Database:  Train: 9763 segments (345 hours, ~500 utt. per language)  Dev: segments from the 23 languages of LRE09  Test: segments Page 6 of 21

7 Comparison with phonotactic PCA Baseline approach:  Feature vector: Expected N-gram phoneme counts estimated from lattices  For all possible trigrams and most frequent four-grams, e.g. 3-grams: 33^3 = , (Hungarian phone-ASR) 4-grams: 33^4 =  Then, apply PCA to reduce the vector size (baseline:1000) Discriminative approach  Original templates (up to 4-grams) Engl: 45_2025_100K_200K Russ: 47_2209_100K_200K Hung: 33_1089_35K_200K Page 7 of 21

8 Results* Results*: problems to reproduce the same results reported in the paper No good results in almost all cases. Big difference in comparison with baseline using only 3-g and PCA. Page 8 of 21 Cavg Baseline_3g_Hung_PCA1000* DiscriminativeRanking_Engl DiscriminativeRanking_Russ DiscriminativeRanking_Hung

9 Selection of discriminative n-grams Goal: Help PCA to reduce the size of the feature vector, by first selecting the most discriminative n-grams and then applying PCA  Reducing from 35K to aprox. 8K for 3-grams  Using 16K for 4-g instead of 80K most frequents [Mikolov et al, 2010] and concatenating them with the 8K trigrams  Selection based on the discriminability among all languages  We also try using probabilities instead of vector of counts Fusion with acoustic i-Vector systems  600 iVectors Gaussians  Cavg for baseline iVectors: 30s: 2.40% 10s: 4.93% 3s: 14.04% Page 9 of 21

10 Results – Disc. Phonotactic System Page 10 of 21 BASE3G_ 1KPCA 3gCounts_ 1KPCA 3gProbs_ 1KPCA 3g- 4gCounts_ 1KPCA 3g- 4gProbs_1K PCA Base+3gPro bs_1KPCA

11 Results – Disc. Phonotactic System + iVectors Page 11 of 21 BASE3G_ 1KPCA 3gCounts_ 1KPCA 3gProbs_ 1KPCA 3g- 4gCounts_ 1KPCA 3g- 4gProbs_1K PCA Base+3gPro bs_1KPCA iVectors

12 Conclusions phonotactic For LID system based on templates we need to find better solutions for scoring normalization Discriminative n-gram selection helps both phonotactic PCA system and iVector system Better results using probabilities instead of counts because of problems with different length of files  ToDo: Test Length Normalization Find better approach to the selection of high-order n- grams  ToDo: use clusters of scores in the discriminative approach to be able to handle high order n-grams (currently implemented but we did not try it this time) Page 12 of 21

13 New Phonotactic system Baseline: [Soufifar et al, 2011]  Use n-gram soft-counts from lattices  Use subspace multinomial distributions for estimating iVectors  Use iVectors for classifying + using logistic regression (libLinear) Differences  Instead of n-gram soft-counts we use posterior-gram conditional counts  Use original features, or iVectors, or PCA on original features  Use Multiclass Logistic Regression + length normalization  Results on bigrams and trigrams (no time for fine tunning) Same training, test and dev sets as for LRE09  Fusion with the acoustic iVector system Page 13 of 21 Page 13

14 Results new phonotactic iVector Page 14 of 21 Cavg 30 sFusion10 sFusion3 sFusion Baseline Ivector (600) g_Hu1089_originalFeat g_Hu_1089toPCA100_MC LR g_100iVector_MCLR_LN Mehdi’s 600 iVectors HU Trig_600iVector_1089Multi_ MultiClassLR_LengthNorm

15 Work on LID-RATS & VAD-RATS Goals:  Test different noise reduction and speech enhancement algorithms  Test different robust features  Test different BUT VADs  Combine with iVectors Database  Eight noise conditions + clean data  Experiments on the 2 minutes condition and short list  Train: 3458 files (115 h)  Dev: 7331 files (244 h) Page 15 of 21

16 Work on LID-RATS Noise tools and algorithms  Ctucopy, developed at SpeechLab (FEE CTU - Prague) Extended spectral substraction [Sovka and Pollák. 1996] Spectral substraction with full wave rectification  Using internal and external VAD (i.e. BUT-VAD)  Wiener filter [Zavarehei, 2005]  QIO Aurora Front-end from OGI [QIO, 2009] Internal NN_VAD + CMN/CVN + RASTA-LDA + Wiener Filter  ETSI: Advanced Front End [ETSI, 2007] 2-pass adaptive Wiener filter + internal VAD (uses energy info from the whole spectrum and F0 regions)  Kalman filter [Murphy, 1998] Page 16 of 21

17 Work on LID-RATS Common and new features  MFCC/PLP + Delta and Delta-Delta  PNCC: proposed by [Kim and Stern al, 2010] at CMU  Spectral Delta-Delta: proposed by [Kumar et al, 2011] at CMU Page 17 of 21  SDC: Shifted Delta Cepstra  RPLP: proposed by [Rajnoha and Pollák, 2011] at SpeechLab at FEE CTU Prague Hybrid between MFCC + PLP Tests w/w.o Rasta, VTLN, CMN/CVN Test new positions of the filterbank  After studying the spectogram and noise reduction effects  woNR: , wNR:

18 Work on LID-RATS (120s) System without Noise ReductionVAD3 Baseline: 7MFCC+CMN/CVN+RASTA+7SDC+VTLN RPLP+CMN/CVN+RASTA+7SDC PNCC+ DeltaDelta + CMN/CVN + RASTA2.17 Baseline with spectral DD instead of SDC or Delta_Delta2.48 Page 18 of 21 System with Noise ReductionVAD1 Baseline: 7MFCC+CMN/CVN+RASTA+7SDC+VTLN2.03 Base line + Extended Spectral Substraction2.75 Base line + Spectral substraction with full wave rectification + BUT-VAD3.31 Base line + Wiener9.24 Base line + Qio2.09

19 Conclusions RATS-LID No any improvement when using de-noising techniques  QIO toolkit provided the best result Important improvements due to correct selection of Low and High frequency bands RPLP: New robust features for LID PNCC: promising features for LID but training time is high Spectral Delta-Delta slightly better than traditional delta- deltas but not than SDC Use of Rasta and CMN/CVN completely necessary for high performance  Short-term CMN/CVN did not provide better results Page 19 of 21

20 Future work Discriminative n-grams  New techniques for working with higher n-grams orders  Better combination of information from parallel phoneme recognizers  To write a joined paper based on using LRE09 Phonotactic iVector: Promising results  Check combination of parallel phone recognizers  Incorporation of discriminative information LRE/SRE  Try collaborations on following NIST competitions Page 20 of 21

21 Page 21 of 21 Page 21

22 Bibliography I Caraballo, M. A. et al "A Discriminative Text Categorization Technique for Language Identification built into a PPRLM System". FALA, pp Cavnar, W. B. and Trenkle, J.M “N-Gram-Based Text Categorization”. SDAIR-94, pp ETSI: Advanced Front End V Available at x x Kim, C. and Stern, R.M “Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring”. ICASSP, pp – Mikolov et al “PCA-based feature extraction for to phonotactic language recognition”. Odyssey, pp Murphy, K “Kalman filter toolbox for Matlab”. Available at Page 22 of 21

23 Bibliography II Qualcomm-ICSI-OGI (QIO) Aurora front end Available at Rajnoha, J., and Pollák, P “ASR systems in Noisy Environment: Analysis and Solutions for Increasing Noise Robustness”. Radionegineering, Vol. 20, No. 1, April 2011, pp Soufifar, M. et al “iVector approach to phonotactic language recognition”. Interspeech, pp Sovka, P., and Pollák, P “Extended spectral subtraction” Eurospeech, pp Zavarehei, E Wiener filter implementation in Matlab. Available at filter/content/WienerScalart96.m filter/content/WienerScalart96.m Page 23 of 21

24 Results - Discriminative Phonotactic System Page 24 of 21 Cavg Phon.+iVecPhon.+iVecPhon.+iVec Baseline_3g_Hung + PCA(1000)* Disc3g + PCA (1000) Counts Disc3g + PCA(1000) Probs Disc3g + Disc4g + PCA (1000) Counts Disc3g + Disc4g + PCA(1000) Probs Fusion: Baseline + 3Disc3g + PCA(1000) Probs

25 Posterior-gram system Page 25 of 21

Download ppt "LID/SID - Research Stay at BUT Last Presentation Luis Fernando D’Haro Polytechnical University of Madrid Granted by “José Castillejo” fellowship Education."

Similar presentations

Ads by Google