Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Pattern Recognition and Machine Learning
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Face Recognition Under Varying Illumination Erald VUÇINI Vienna University of Technology Muhittin GÖKMEN Istanbul Technical University Eduard GRÖLLER Vienna.
Principal Component Analysis
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Speaker Adaptation for Vowel Classification
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Implementing a reliable neuro-classifier
Optimal Adaptation for Statistical Classifiers Xiao Li.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Summarized by Soo-Jin Kim
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
A Generalization of PCA to the Exponential Family Collins, Dasgupta and Schapire Presented by Guy Lebanon.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
Principal Manifolds and Probabilistic Subspaces for Visual Recognition Baback Moghaddam TPAMI, June John Galeotti Advanced Perception February 12,
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Linear Models for Classification
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
2D-LDA: A statistical linear discriminant analysis for image matrix
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
3D Face Recognition Using Range Images Literature Survey Joonsoo Lee 3/10/05.
LDA (Linear Discriminant Analysis) ShaLi. Limitation of PCA The direction of maximum variance is not always good for classification.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Deep Feedforward Networks
Ch 12. Continuous Latent Variables ~ 12
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
LECTURE 11: Advanced Discriminant Analysis
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Principal Component Analysis (PCA)
Machine Learning Dimensionality Reduction
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
A Hybrid PCA-LDA Model for Dimension Reduction Nan Zhao1, Washington Mio2 and Xiuwen Liu1 1Department of Computer Science, 2Department of Mathematics Florida.
Céline Scheidt, Jef Caers and Philippe Renard
Goodfellow: Chapter 14 Autoencoders
Principal Component Analysis
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Principal Component Analysis
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
Hairong Qi, Gonzalez Family Professor
What is Artificial Intelligence?
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of training data using NLPCA2 (10 features) and original features (10 and 39 features)  Classification accuracies of original features and NLPCA2 reduced features with 2% (left) and 50% (right) of the training data  Simulation of NLPCA1  Plot of input and output for semi-random 2-D data. The output data is reconstructed data using an NLPCA1 trained neural network with 1 hidden node  An example with 3D data. Input and output plots of 3-D Gaussian data before and after using neural network with 2 hidden nodes Dimensionality Reduction of Speech Features Using Nonlinear Principal Components Analysis Stephen A. Zahorian, Tara Singh*, Hongbing Hu Department of Electrical and Computer Engineering, Binghamton University, Binghamton, NY, USA * Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA, USA Introduction  Difficulties in automatic speech recognition  Large dimensionality of acoustic feature spaces  Significant load in feature training (“Curse of dimensionality”)  Linear dimensionality reduction methods  Principal Components Analysis (PCA)  Linear Discriminant Analysis (LDA)  Drawback of linear methods  Can result in poor data representations The straight line fit to the data obtained by linear PCA does not accurately represent the original distribution of the data NLPCA Approaches  Nonlinear Principal Components Analysis (NLPCA)  Nonlinear transformation is applied to obtain a transformed version of the data for PCA  Nonlinear transformation  Two approaches (NLPCA1 and NLPCA2) were used for training the neural network  (x): Transformed feature of the data point x for machine learning R M : M dimension feature space  (.): A neural network mapping to obtain data more suitable for linear transformations  NLPCA Approaches  NLPCA1  The neural network is trained as an identity map –Minimize mean square error using targets that are the same as the inputs –Training with regularization is often needed to “guide” the network to a better minimum in error  NLPCA2  The neural network is trained as classifier –The network is trained to maximize discrimination Input Data Bottleneck neural network Dimensionality Reduced Data Experimental Evaluation  Database  Transformation methods compared  Original features, LDA, PCA, NLPCA1 and NLPCA2  Classifiers  Neural network and MXL (maximum likelihood Mahalanobis distance based Gaussian assumption classifier) Experiment 1  The same training data were used to train the transformations and the classifiers  The number of features varied from 1 to 39  Variable percentages of training data (1%, 2%, 5%, 10%, 25%, 50% and 100%) were used  Experiment 1 Results  Classification accuracies of neural network (left) and MXL (right) classifiers with various types of features using all available training data (Figures on next column) NTIMIT database Target (vowels)/ah/, /ee/, /ue/, /ae/, /ur/, /ih/, /eh/, /aw/, /uh/, /oo/ Training data31,300 tokens Testing data11,625 tokens Feature39 DCTC-DCS Conclusions  The nonlinear technique minimizing mean square reconstruction error (NLPCA1) can be very effective for representing data which lies in curved subspaces, but does not appear to offer any advantages over linear dimensionality reduction methods for a speech classification task  The nonlinear technique based on minimizing classification error (NLPCA2) is quite effective for accurate classification in low dimensionality spaces  The reduced features appear to be well modeled as Gaussian features with a common covariance matrix  Nonlinear PCA (NLPCA2) is much more effective than normal PCA for reducing dimensionality; however, with a “good” classification method, neither dimensionality reduction method improves classification accuracy.  Acknowledgement  This work was partially supported by JWFC 900 For both cases, highest accuracy was obtained with NLPCA2, especially with a small numbers of features. NLPCA2 shows better performance than 10-D original features using 10% of training data or more, and has similar performance with 39-D original features Using 50% of the training data, NLPCA2 performs substantially better than original features, at least for 12 or fewer features Experiment 2  50% of the training data was used for training transformations and a variable percentage, ranging from 1% to 100% of the other half of the training data, was used for training classifiers  Experiment 2 Results  Classification accuracies of neural network (left) and MXL (right) classifiers using 10% of classifier training data for training classifier  Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of classifier training data using 4 features For both the neural network and MXL classifiers, NLPCA2 clearly performs much better than the other transformations or the original features. NLPCA2 yields the best performance, with about 68% accuracy for both cases. Similar trends were also observed for 1, 2, 8, 16, and 32 features.