Speaker Adaptation for Vowel Classification

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Speech Recognition with Hidden Markov Models Winter 2011
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,

An Introduction of Support Vector Machine
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Computer vision: models, learning and inference
Visual Recognition Tutorial
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 14 – Neural Networks
Support Vector Machines (and Kernel Methods in general)
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Regularized Adaptation for Discriminative Classifiers Xiao Li and Jeff Bilmes University of Washington, Seattle.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
1 Regularized Adaptation: Theory, Algorithms and Applications Xiao Li Electrical Engineering Department University of Washington.
1 Regularized Adaptation: Theory, Algorithms and Applications Xiao Li Electrical Engineering Department University of Washington.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
An Introduction to Support Vector Machines Martin Law.
Efficient Model Selection for Support Vector Machines
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
7-Speech Recognition Speech Recognition Concepts
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Implementing a Speech Recognition System on a GPU using CUDA
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
An Introduction to Support Vector Machines (M. Law)
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Linear Models for Classification
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CS 9633 Machine Learning Support Vector Machines
Deep Feedforward Networks
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Statistical Models for Automatic Speech Recognition
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
3. Applications to Speaker Verification
Neural Networks Advantages Criticism
Statistical Models for Automatic Speech Recognition
Generally Discriminant Analysis
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Linear Discrimination
Discriminative Training
Presentation transcript:

Speaker Adaptation for Vowel Classification Xiao Li Electrical Engineering Dept.

Outline Introduction Background on statistical classifiers Proposed Adaptation strategies Experiments and results Conclusion

Application “Vocal Joystick” (VJ) Vowel classification Human-computer interaction for people with motor-impairments Acoustic parameters – energy, pitch, vowel quality, discrete sound Vowel classification Vowels /ae/ (bat); /aa/ (bought); /uh/ (boot); /iy/ (beat) Control motion direction /ae/ /aa/ /uh/ /iy/

Features Formants Mel-frequency cesptral coefficients (MFCC) Peaks in spectrum Low dimension (F1, F2, F3, F4 + dynamics) Hard to estimate Mel-frequency cesptral coefficients (MFCC) Cosine transform of log spectrum High dimension (26 including deltas) Easy to compute Our choice – MFCCs

User-Independent vs. User–Dependent User-independent models NOT optimized for a specific speaker Easy to get a large train set User-dependent models Optimized for a specific speaker Difficult to get a large train set

Adaptation What is adaptation? Adapting user-independent models to a specific user, using a small set of user-dependent data Adaptation methodology for vowel classification Train speaker-independent vowel models Ask a speaker to articulate a few seconds of vowels for each class Adapt the classifier on this small amount of speaker-dependent data

Outline Introduction Background on statistical classifiers Proposed Adaptation strategies Experiments and results Conclusion

Gaussian mixture models (GMM) Generative models Training objective – maximum likelihood (EM) For training samples O1:T Classification Compute the likelihood scores for each class, and choose the one with the highest likelihood Limitation A class model is trained using only the data in this class Constraints on the discriminant functions

Neural Networks (NN) Three layer perceptrons Training objective # input nodes – feature dimension x window size # hidden nodes – empirically chosen # output nodes – # of classes Training objective Minimum relative entropy Classification Compare the output values Advantages Discriminative training Nonlinearity Features taken from multiple frames Target yk

NN-SVM Hybrid Classifier Idea – replace the hidden-to-output layer of the NN by linear-kernel SVMs Training objective Maximum margin theoretically guaranteed on test error bound Classification Compare the output values of binary classifiers Advantages Compared to pure NN: optimal solution in the last layer Compared to pure SVM: efficiently handling features from multiple frames; no need to choose kernel

Outline Introduction Background on statistical classifiers Proposed Adaptation strategies Experiments and results Conclusion

MLLR for GMM Adaptation Maximum Likelihood Linear Regression Apply a linear transformation on the Gaussian mean Same transformation for the mixture of Gaussians in the same class The covariance matrix can be adapted in a similar fashion, but less effective

MLLR Formulas Objective – maximum likelihood For adaptation samples O1:T First-order derivative vanishes The transform W is obtained by solving a linear equation

NN Adaptation Idea – fix the nonlinear mapping and adapt the last layer (linear classifier) Adaptation objective – minimum relative entropy Start from the original weights Gradient descent formulas

NN-SVM Classifier Adaptation Idea – *again* fix the nonlinear mapping and adapt the last layer Adaptation objective – maximum margin Adaptation procedure Keep the support vectors of the training data Combine these support vectors with the adaptation data Retrain the linear-kernel SVMs for the last layer

Outline Introduction Background on statistical classifiers Proposed Adaptation strategies Experiments and results Conclusion

Database Pure vowel recordings with different energy and pitch Duration – long short Energy – loud, normal, quiet Pitch – rising, level, falling Statistics Train set -- 10 speakers Test set – 5 speakers 4 or 8 or 9 vowel classes 18 utterances (2000 samples) for each vowel and each speaker

Adaptation and Evaluation Set 6-fold cross-validation for each speaker 18 utterances are divided into 6 subsets We adapt on each subset and evaluate on the rest We get 6 accuracy scores for each vowel, and compute the mean and deviation Average over 5 speakers

Speaker-Independent Classifiers % Accuracy 4 –class 8-class 9-class GMM mixture # = 16 85.13±0.67 55.88±0.64 51.21±0.54 NN window = 7 hidden = 50 89.19±0.65 60.05±0.72 53.75±0.61 NN-SVM 89.89±0.55 -- The individual scores for different speakers vary a lot If NN window = 1, the performance is similar to GMM

Adapted Classifiers % Accuracy 4 –class 8-class 9-class MLLR for GMM 85.13±0.67 90.73±0.82 55.88±0.64 67.52±1.27 51.21±0.54 62.94±1.37 Gradient Descent for NN 89.19±0.65 91.85±1.30 60.05±0.72 74.33±1.41 53.75±0.61 71.06±1.62 Maximum Margin for NN-SVM 89.89±0.55 94.70±0.30 --

Conclusion For speaker-independent models, the NN classifier (with multiple frame input) works well For speaker-adapted models, the NN classifier is effective, and NN-SVM so far gets the best performance