Presentation on theme: "IVector approach to Phonotactic LRE Mehdi Soufifar 2 nd May 2011."— Presentation transcript:
iVector approach to Phonotactic LRE Mehdi Soufifar 2 nd May 2011
Phonotactic LRE Train Classifier LR, SVM, LM GLC,.. Language-dependant Utterance L Recognizer (Hvite,BUTPR,...) AM Phoneme sequence Extract n-gram statistics N-gram counts Train : Classifier Test Utterance L Recognizer (Hvite,BUTPR,...) AM Phoneme sequence Extract n-gram statistics N-gram counts Test : Language dependant Score
N-gram Counts N^3= for RU phoneme set 1 Problem : Huge vector of n-gram counts Solutions: ▫Choose the most frequent n-grams ▫Choosing top N n-grams discriminatively(LL) ▫Compress the n-gram counts Singular Value Decomposition (SVD) Decompose the document matrix D Using the the transformation matrix U to reduce the n- gram vector dimensionality PCA-based dimensionality reduction ▫iVector feature selection
Sub-space multinomial modeling Every vector of n-gram counts consist of E events (#n-grams) Log probability of n th utterance in MN distribution is: can be defined as : Model parameter to be estimated in ML estimation are t and w No analytical solution! We use Newton Raphson update as a Numerical solution N^3= for RU phoneme set
Sub-space multinomial modeling 1 st solution : ▫consider all 3-grams to be components of a Bernoulli trial ▫Model the entire vector of 3-gram counts with one multinomial distribution ▫N-gram events are not independent (not consistent with Bernoulli trial presumption!) 2 nd solution ▫Cluster 3-grams based on their histories ▫Model each history with a separate MN distribution Data sparsity problem! Clustering 3-grams based on binary-decision tree
Training of iVector extractor Number of iterations : 5-7 (depends on sub- space dimension) Sub-space dimension : seconds10 seconds30 seconds
Classifiers Configuration : L one-to-all linear classifier ▫L: number of targeted languages Classifiers: ▫SVM ▫LR ▫Linear Generative Classifier ▫MLR (to be done!)
Results on different classifiers Task : NIST LRE 2009 Dev-3sDev-10sDev-30sEval-3sEval-10sEval- 30s PCA- SVM PCA-LR PCA-GLC iVec-SVM iVec-LR iVec-GLC
Results of different systems LRE09 Dev-3sDev-10sDev-30sEvl-3sEvl-10sEvl-30s BASE-HU-SVM PCA-HU-LR iVect-HU-LR iVec+PCA-HU-LR iVec-RU-LR iVec-LR HU+RU iVec-LR HURU
N-gram clustering Remove all the 3-gram with repetition < 10 over all training utterances Model each history with a separate MN distribution 1084 histories, up to 33 3-grams each Dev-3sDev-10sDev-30sEval-3sEval-10sEval- 30s >10 3-gram
Merging histories using BDT In case of 3-gram P i P j P k Merging histories which do not increase the entropy more than a certain value P i P 22 P k P i P 33 P k P i P P k E1=Entropy(Model1) Models 1Models 2 E2=Entropy(Model2) D= E1 – E2
Results on DT Hist. merging More iterations on training T => T matrix is moving toward zero matrix! IterationDev-3sDev-10sDev-30sEval-3sEval-10sEval- 30s DT >10 3-gram
Deeper insight to the iVector Extrac.
Strange results 3-grams with no repetition through out the whole training set should not affect system performance! Remove all the 3-grams with no repetition through the whole training set >35406 (567 reduction) Even worse result if we prune more!!!! Dev-30sDev-10sDev-3sEval- 30s Eval-10sEval-3s
DT clustering of n-gram histories The overall likelihood is an order of magnitude higher than the 1 st solution Change of the model-likelihood is quite notable in each iteration! The T Matrix is mainly zero after some iterations!