Presentation is loading. Please wait.

# IVector approach to Phonotactic LRE Mehdi Soufifar 2 nd May 2011.

## Presentation on theme: "IVector approach to Phonotactic LRE Mehdi Soufifar 2 nd May 2011."— Presentation transcript:

iVector approach to Phonotactic LRE Mehdi Soufifar 2 nd May 2011

Phonotactic LRE Train Classifier LR, SVM, LM GLC,.. Language-dependant Utterance L Recognizer (Hvite,BUTPR,...) AM Phoneme sequence Extract n-gram statistics N-gram counts Train : Classifier Test Utterance L Recognizer (Hvite,BUTPR,...) AM Phoneme sequence Extract n-gram statistics N-gram counts Test : Language dependant Score

N-gram Counts N^3= 226981 for RU phoneme set 1 Problem : Huge vector of n-gram counts Solutions: ▫Choose the most frequent n-grams ▫Choosing top N n-grams discriminatively(LL) ▫Compress the n-gram counts  Singular Value Decomposition (SVD)  Decompose the document matrix D  Using the the transformation matrix U to reduce the n- gram vector dimensionality  PCA-based dimensionality reduction ▫iVector feature selection

Sub-space multinomial modeling Every vector of n-gram counts consist of E events (#n-grams) Log probability of n th utterance in MN distribution is: can be defined as : Model parameter to be estimated in ML estimation are t and w No analytical solution! We use Newton Raphson update as a Numerical solution N^3= 226981 for RU phoneme set

Sub-space multinomial modeling 1 st solution : ▫consider all 3-grams to be components of a Bernoulli trial ▫Model the entire vector of 3-gram counts with one multinomial distribution ▫N-gram events are not independent (not consistent with Bernoulli trial presumption!) 2 nd solution ▫Cluster 3-grams based on their histories ▫Model each history with a separate MN distribution  Data sparsity problem!  Clustering 3-grams based on binary-decision tree

Training of iVector extractor Number of iterations : 5-7 (depends on sub- space dimension) Sub-space dimension : 600 3 seconds10 seconds30 seconds

Classifiers Configuration : L one-to-all linear classifier ▫L: number of targeted languages Classifiers: ▫SVM ▫LR ▫Linear Generative Classifier ▫MLR (to be done!)

Results on different classifiers Task : NIST LRE 2009 Dev-3sDev-10sDev-30sEval-3sEval-10sEval- 30s PCA- SVM 2.837.0517.773.628.8221.00 PCA-LR2.226.2217.262.938.2922.60 PCA-GLC2.818.2519.833.509.8822.88 iVec-SVM6.5414.0726.798.5417.518.06 iVec-LR2.446.8818.013.058.1021.39 iVec-GLC2.587.1318.182.928.0321.13

Results of different systems LRE09 Dev-3sDev-10sDev-30sEvl-3sEvl-10sEvl-30s BASE-HU-SVM2.837.0517.773.628.8221.00 PCA-HU-LR2.226.2217.262.938.2922.60 iVect-HU-LR2.818.2519.833.058.1021.05 iVec+PCA-HU-LR2.055.7416.712.797.6321.05 iVec-RU-LR2.666.4617.502.597.4219.83 iVec-LR HU+RU1.544.4413.302.095.3416.53 iVec-LR HURU1.905.1014.692.065.8017.79

N-gram clustering Remove all the 3-gram with repetition < 10 over all training utterances Model each history with a separate MN distribution 1084 histories, up to 33 3-grams each Dev-3sDev-10sDev-30sEval-3sEval-10sEval- 30s >10 3-gram8.8416.0427.9410.3419.9232.35

Merging histories using BDT In case of 3-gram P i P j P k Merging histories which do not increase the entropy more than a certain value P i P 22 P k P i P 33 P k P i P 33+22 P k E1=Entropy(Model1) Models 1Models 2 E2=Entropy(Model2) D= E1 – E2

Results on DT Hist. merging 1089-60 More iterations on training T => T matrix is moving toward zero matrix! IterationDev-3sDev-10sDev-30sEval-3sEval-10sEval- 30s DT4.3610.4122.205.4612.8027.09 >10 3-gram8.8416.0427.9410.3419.9232.35

Deeper insight to the iVector Extrac.

Strange results 3-grams with no repetition through out the whole training set should not affect system performance! Remove all the 3-grams with no repetition through the whole training set 35973->35406 (567 reduction) Even worse result if we prune more!!!! Dev-30sDev-10sDev-3sEval- 30s Eval-10sEval-3s 359732.446.8818.013.058.1021.39 354063.358.0519.733.639.1822.60

DT clustering of n-gram histories The overall likelihood is an order of magnitude higher than the 1 st solution Change of the model-likelihood is quite notable in each iteration! The T Matrix is mainly zero after some iterations!

1 st iteration

2 nd iteration

3 rd iteration

4 th iteration

5 th iteration

6 th iteration

Closer look at TRAIN set amhaamha boSnboSn cantcant creocreo croacroa daridari engiengi englengl farsfars frenfren georgeor haushaus hindhind korekore mandmand pashpash portport russruss spanspan turkturk ukraukra urduurdu vietviet TRAIN voa ✔✔✔✔✔✔✗✔✗✔✔✔✔✔✔✔✔✔✔✔✔✔✔ TRAIN cts ✗✗✔✗✗✗✔✔✔✔✗✗✔✔✔✗✔✔✔✗✗✔✔ DEV voa ✔✔✗✔✗✔✗✗✗✔✔✔✗✗✗✔✔✗✗✔✔✗✗ DEV cts ✗✗✔✗✔✗✔✔✔✔✗✗✔✔✔✗✗✔✔✗✗✔✔ EVAL voa ✔✔✔✔✔✔✗✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔ EVAL cts ✗✗✔✗✗✗✔✔✔✗✗✗✔✔✔✗✗✔✗✗✗✔✔

Ivector inspection Cant Engl

iVect inspection Multiple data source causes bimodality We also see this effect in some single source languages Amha

Download ppt "IVector approach to Phonotactic LRE Mehdi Soufifar 2 nd May 2011."

Similar presentations

Ads by Google