Presentation on theme: "IVector approach to Phonotactic LRE Mehdi Soufifar 2 nd May 2011."— Presentation transcript:
iVector approach to Phonotactic LRE Mehdi Soufifar 2 nd May 2011
Phonotactic LRE Train Classifier LR, SVM, LM GLC,.. Language-dependant Utterance L Recognizer (Hvite,BUTPR,...) AM Phoneme sequence Extract n-gram statistics N-gram counts Train : Classifier Test Utterance L Recognizer (Hvite,BUTPR,...) AM Phoneme sequence Extract n-gram statistics N-gram counts Test : Language dependant Score
N-gram Counts N^3= 226981 for RU phoneme set 1 Problem : Huge vector of n-gram counts Solutions: ▫Choose the most frequent n-grams ▫Choosing top N n-grams discriminatively(LL) ▫Compress the n-gram counts Singular Value Decomposition (SVD) Decompose the document matrix D Using the the transformation matrix U to reduce the n- gram vector dimensionality PCA-based dimensionality reduction ▫iVector feature selection
Sub-space multinomial modeling Every vector of n-gram counts consist of E events (#n-grams) Log probability of n th utterance in MN distribution is: can be defined as : Model parameter to be estimated in ML estimation are t and w No analytical solution! We use Newton Raphson update as a Numerical solution N^3= 226981 for RU phoneme set
Sub-space multinomial modeling 1 st solution : ▫consider all 3-grams to be components of a Bernoulli trial ▫Model the entire vector of 3-gram counts with one multinomial distribution ▫N-gram events are not independent (not consistent with Bernoulli trial presumption!) 2 nd solution ▫Cluster 3-grams based on their histories ▫Model each history with a separate MN distribution Data sparsity problem! Clustering 3-grams based on binary-decision tree
Training of iVector extractor Number of iterations : 5-7 (depends on sub- space dimension) Sub-space dimension : 600 3 seconds10 seconds30 seconds
Classifiers Configuration : L one-to-all linear classifier ▫L: number of targeted languages Classifiers: ▫SVM ▫LR ▫Linear Generative Classifier ▫MLR (to be done!)
Results on different classifiers Task : NIST LRE 2009 Dev-3sDev-10sDev-30sEval-3sEval-10sEval- 30s PCA- SVM 2.837.0517.773.628.8221.00 PCA-LR2.226.2217.262.938.2922.60 PCA-GLC2.818.2519.833.509.8822.88 iVec-SVM6.5414.0726.798.5417.518.06 iVec-LR2.446.8818.013.058.1021.39 iVec-GLC2.587.1318.182.928.0321.13
Results of different systems LRE09 Dev-3sDev-10sDev-30sEvl-3sEvl-10sEvl-30s BASE-HU-SVM2.837.0517.773.628.8221.00 PCA-HU-LR2.226.2217.262.938.2922.60 iVect-HU-LR2.818.2519.833.058.1021.05 iVec+PCA-HU-LR2.055.7416.712.797.6321.05 iVec-RU-LR2.666.4617.502.597.4219.83 iVec-LR HU+RU1.544.4413.302.095.3416.53 iVec-LR HURU1.905.1014.692.065.8017.79
N-gram clustering Remove all the 3-gram with repetition < 10 over all training utterances Model each history with a separate MN distribution 1084 histories, up to 33 3-grams each Dev-3sDev-10sDev-30sEval-3sEval-10sEval- 30s >10 3-gram8.8416.0427.9410.3419.9232.35
Merging histories using BDT In case of 3-gram P i P j P k Merging histories which do not increase the entropy more than a certain value P i P 22 P k P i P 33 P k P i P 33+22 P k E1=Entropy(Model1) Models 1Models 2 E2=Entropy(Model2) D= E1 – E2
Results on DT Hist. merging 1089-60 More iterations on training T => T matrix is moving toward zero matrix! IterationDev-3sDev-10sDev-30sEval-3sEval-10sEval- 30s DT4.3610.4122.205.4612.8027.09 >10 3-gram8.8416.0427.9410.3419.9232.35
Strange results 3-grams with no repetition through out the whole training set should not affect system performance! Remove all the 3-grams with no repetition through the whole training set 35973->35406 (567 reduction) Even worse result if we prune more!!!! Dev-30sDev-10sDev-3sEval- 30s Eval-10sEval-3s 359732.446.8818.013.058.1021.39 354063.358.0519.733.639.1822.60
DT clustering of n-gram histories The overall likelihood is an order of magnitude higher than the 1 st solution Change of the model-likelihood is quite notable in each iteration! The T Matrix is mainly zero after some iterations!
Your consent to our cookies if you continue to use this website.