English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of three new variational inference.

English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of three new variational inference algorithms for the acoustic modeling task in speech recognition: Accelerated variational Dirichlet process mixtures (AVDPM), collapsed variational stick breaking (CVSB), and collapsed Dirichlet priors (CDP).  Speech recognition (SR) performance is highly dependent on the data it was trained on.  Dirichlet Processes Mixtures (DPMs) can learn underlying structure from data and can potentially help improve SR systems’ ability to generalize to new testing data  Inference algorithms are needed to make calculations tractable for DPMs Abstract The focus of this work is to assess the performance of three new variational inference algorithms for the acoustic modeling task in speech recognition: Accelerated variational Dirichlet process mixtures (AVDPM), collapsed variational stick breaking (CVSB), and collapsed Dirichlet priors (CDP).  Speech recognition (SR) performance is highly dependent on the data it was trained on.  Dirichlet Processes Mixtures (DPMs) can learn underlying structure from data and can potentially help improve SR systems’ ability to generalize to new testing data  Inference algorithms are needed to make calculations tractable for DPMs John Steinberg and Dr. Joseph Picone Department of Electrical and Computer Engineering, Temple University Variational Inference Algorithms for Acoustic Modeling in Speech Recognition College of Engineering Temple University Speech Recognition Systems Speech Recognition Systems Gaussian Mixture Models Variational Inference Results Probabilistic Modeling: DPMs and Variational Inference Conclusions AVDPM, CVSB, and CDP yielded slightly improved error rates over GMMs AVDPM, CVSB, and CDP found much fewer # of mixtures than GMMs CH-E and CH-M performance gap is most likely due to the number of class labels. Results can possibly be improved by reducing number of class sizes (i.e. phoneme labels). References [1] Picone, J. (2012). HTK Tutorials. Retrieved from http://www.isip.piconepress.com/projects/htk_tutorials/ http://www.isip.piconepress.com/projects/htk_tutorials/ [2] Kurihara, K., Welling, M., & Teh, Y. W. (2007). Collapsed Variational Dirichlet Process Mixture Models. Twentieth International Joint Conference on Artificial Intelligence. [3] Kurihara, K., Welling, M., & Vlassis, N. (2006). Accelerated Variational Dirichlet Process Mixtures. NIPS. 4] Frigyik, B., Kapila, A., & Gupta, M. (2010). Introduction to the Dirichlet Distribution and Related Processes. Seattle, Washington, USA. Retrieved from https://www.ee.washington.edu/techsite/papers/refer/UWEETR-2010-0006.html https://www.ee.washington.edu/techsite/papers/refer/UWEETR-2010-0006.html Conclusions AVDPM, CVSB, and CDP yielded slightly improved error rates over GMMs AVDPM, CVSB, and CDP found much fewer # of mixtures than GMMs CH-E and CH-M performance gap is most likely due to the number of class labels. Results can possibly be improved by reducing number of class sizes (i.e. phoneme labels). References [1] Picone, J. (2012). HTK Tutorials. Retrieved from http://www.isip.piconepress.com/projects/htk_tutorials/ http://www.isip.piconepress.com/projects/htk_tutorials/ [2] Kurihara, K., Welling, M., & Teh, Y. W. (2007). Collapsed Variational Dirichlet Process Mixture Models. Twentieth International Joint Conference on Artificial Intelligence. [3] Kurihara, K., Welling, M., & Vlassis, N. (2006). Accelerated Variational Dirichlet Process Mixtures. NIPS. 4] Frigyik, B., Kapila, A., & Gupta, M. (2010). Introduction to the Dirichlet Distribution and Related Processes. Seattle, Washington, USA. Retrieved from https://www.ee.washington.edu/techsite/papers/refer/UWEETR-2010-0006.html https://www.ee.washington.edu/techsite/papers/refer/UWEETR-2010-0006.html What is a phoneme? An Example  Training Features:  # Study Hours  Age  Training Labels  Previous grades An Example  Training Features:  # Study Hours  Age  Training Labels  Previous grades Dirichlet Processes  DPMs Model distributions of distributions  Can find the best # of classes automatically! Dirichlet Processes  DPMs Model distributions of distributions  Can find the best # of classes automatically! [1] Speech Recognition Applications Speech Recognition Applications MobileTechnology Auto/GPS NationalIntelligence Other Applications Translators Prostheses Lang. Educ. Media Search CH-ECH-E about Word a – bout Syllable ax –b – aw – t PhonemeEnglish  ~10,000 syllables  ~42 phonemes  Non-Tonal LanguageEnglish  ~10,000 syllables  ~42 phonemes  Non-Tonal Language Mandarin  ~1300 syllables  ~92 phonemes  Tonal Language  4 tones  1 neutral 7 instances of “ma”Mandarin  ~1300 syllables  ~92 phonemes  Tonal Language  4 tones  1 neutral 7 instances of “ma” QUESTION: Given a new set of features, what is the predicted grade? Variational Inference  DPMs require ∞ parameters  Variational inference is used to estimate DPM models Variational Inference  DPMs require ∞ parameters  Variational inference is used to estimate DPM models Why English and Mandarin?  Phonetically very different  Can help identify language specific artifacts that affect performance Why English and Mandarin?  Phonetically very different  Can help identify language specific artifacts that affect performance Corpora:  CALLHOME English (CH-E), CALLHOME Mandarin (CH-M)  Conversational telephone speech  ~300,000 (CH-E) and ~250,000 (CH-M) training samples respectivelyCorpora:  CALLHOME English (CH-E), CALLHOME Mandarin (CH-M)  Conversational telephone speech  ~300,000 (CH-E) and ~250,000 (CH-M) training samples respectively Basic Setup:  Compare DPMs to the more commonly used Gaussian mixture model  Find the optimal # of mixtures  Find error rates  Compare model complexity Basic Setup:  Compare DPMs to the more commonly used Gaussian mixture model  Find the optimal # of mixtures  Find error rates  Compare model complexity CH-MCH-M k Error (%) (Val / Evl) 466.83% / 68.63% 864.97% / 66.32% 1667.74% / 68.27% 3263.64% / 65.30% 6460.71% / 62.65% 12861.95% / 63.53% 19262.13% / 63.57% k Error (%) (Val / Evl) 463.23% / 63.28% 861.00% / 60.62% 1664.19% / 63.55% 3262.00% / 61.74% 6459.41% / 59.69% 12858.36% / 58.41% 19258.72% / 58.37% CALLHOME English *This experiment has not been fully completed yet and this number is expected to dramatically decrease CALLHOME English *This experiment has not been fully completed yet and this number is expected to dramatically decrease CALLHOME Mandarin Algorithm Best Error Rate: CH-E Avg. k per Phoneme GMM58.41%128 AVDPM56.65%3.45 CVSB56.54%11.60 CDP57.14%27.93* Algorithm Best Error Rate: CH-M Avg. k per Phoneme GMM62.65%64 AVDPM62.59%2.15 CVSB63.08%3.86 CDP62.89%9.45 www.isip.piconepress.com How many classes are there? 1? 2? 3?

English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of three new variational inference.

Similar presentations

Presentation on theme: "English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of three new variational inference."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of three new variational inference.

Similar presentations

Presentation on theme: "English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of three new variational inference."— Presentation transcript:

Similar presentations

About project

Feedback