Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Unsupervised Learning
Spectral envelope analysis of TIMIT corpus using LP, WLSP, and MVDR Steve Vest Matlab implementation of methods by Tien-Hsiang Lo.
Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
An Overview of Machine Learning
Motivation Traditional approach to speech and speaker recognition:
G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.
Visual Recognition Tutorial
EE-148 Expectation Maximization Markus Weber 5/11/99.
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Speaker Adaptation for Vowel Classification
Engineering Data Analysis & Modeling Practical Solutions to Practical Problems Dr. James McNames Biomedical Signal Processing Laboratory Electrical & Computer.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Basics: Notation: Sum:. PARAMETERS MEAN: Sample Variance: Standard Deviation: * the statistical average * the central tendency * the spread of the values.
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
9.0 Speaker Variabilities: Adaption and Recognition References: of Huang 2. “ Maximum A Posteriori Estimation for Multivariate Gaussian Mixture.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Nonlinear Mixture Autoregressive Hidden Markov Models for Speech Recognition S. Srinivasan, T. Ma, D. May, G. Lazarou and J. Picone Department of Electrical.
Isolated-Word Speech Recognition Using Hidden Markov Models
Ryan Irwin Intelligent Electronics Systems Human and Systems Engineering Center for Advanced Vehicular Systems URL:
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Page 0 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Saurabh Prasad Intelligent Electronic Systems Human and Systems.
Reconstructed Phase Space (RPS)
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
7-Speech Recognition Speech Recognition Concepts
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
NONLINEAR DYNAMIC INVARIANTS FOR CONTINUOUS SPEECH RECOGNITION Author: Daniel May Mississippi State University Contact Information: 1255 Louisville St.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Nonlinear Dynamical Invariants for Speech Recognition S. Prasad, S. Srinivasan, M. Pannuri, G. Lazarou and J. Picone Department of Electrical and Computer.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Sanjay Patil 1 and Ryan Irwin 2 Graduate research assistant 1, REU undergrad 2 Human and Systems Engineering URL:
Sanjay Patil 1 and Ryan Irwin 2 Intelligent Electronics Systems, Human and Systems Engineering Center for Advanced Vehicular Systems URL:
Sanjay Patil 1 and Ryan Irwin 2 Intelligent Electronics Systems, Human and Systems Engineering Center for Advanced Vehicular Systems URL:
A Toolkit for Remote Sensing Enviroinformatics Clustering Fazlul Shahriar, George Bonev Advisors: Michael Grossberg, Irina Gladkova, Srikanth Gottipati.
Sanjay Patil and Ryan Irwin Intelligent Electronics Systems, Human and Systems Engineering Center for Advanced Vehicular Systems URL:
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
Page 0 of 8 Lyapunov Exponents – Theory and Implementation Sanjay Patil Intelligent Electronics Systems Human and Systems Engineering Center for Advanced.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Sanjay Patil and Ryan Irwin Intelligent Electronics Systems Human and Systems Engineering Center for Advanced Vehicular Systems URL:
Sanjay Patil and Ryan Irwin Intelligent Electronics Systems, Human and Systems Engineering Center for Advanced Vehicular Systems URL:
Learning and Removing Cast Shadows through a Multidistribution Approach Nicolas Martel-Brisson, Andre Zaccarin IEEE TRANSACTIONS ON PATTERN ANALYSIS AND.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 11: Advanced Discriminant Analysis
Statistical Models for Automatic Speech Recognition
3. Applications to Speaker Verification
Statistical Models for Automatic Speech Recognition
EE513 Audio Signals and Systems
Presentation transcript:

Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems Engineering Center for Advanced Vehicular Systems URL: Introduction To Time Series Classification: An approach in reconstructed phase space for phoneme recognition

Page 1 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Abstract Present nonlinear classifiers: clustering and similarity measurement techniques, eg. NN, SVM. Existing time-domain approaches: a priori learned underlying pattern of template base. Frequency-based techniques: spectral patterns based on first and second order characteristics of the system. Current work (as described in the paper): modeling of signals in the reconstructed phase space.

Page 2 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Motivation (why did I read it?) An attempt to find an approach to model the speech signal using nonlinear modeling technique. Takens and Sauer – new signal classification algorithm. Time series of observations sampled from a single state variable of a system Reconstructed space equivalent to the original system Slightly different notations than usually used by other researchers.

Page 3 of 8 Time Series Classification – phoneme recognition in reconstructed phase space The Approach Two methods to tackle the issue: 1.Build global vector reconstructions and differentiate signals in a coefficient space. [Kadtke, 1995] 2.Build GMMs of signal trajectory densities in an RPS and differentiate between signals using Bayesian classifiers. [Authors, 2004] The steps (Algorithm): 1.Data Analysis – normalizing the signals, estimating the time lag and dimension of the RPS. 2.Learning GMMs for each signal class – deciding the number of Gaussian mixtures, parameters learning by Expectation-Maximization (EM) algorithm. 3.Classification – going through the above steps for the SUT (signal under test), using Bayesian maximum likelihood classifiers

Page 4 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Algorithm in details and Issues 1.Data Analysis – 1.normalizing the signals 1.Each signal is normalized to zero mean and unit standard deviation. 2.estimating the time lag  1.Using first minimum of the automutual information function. 2.Overall time lag  is the mode of the histogram of the first minima for all signals. 3.estimating dimension d of the RPS 1.Using global false nearest-neighbor technique. 2.Overall RPS dimension is the mean plus two standard deviations of the distribution of individual signal RPS dimensions. 1.How do you normalize the signal to zero mean and unit standard deviation? 2.What is automutual information function? 3.How do you implement the global false nearest-neighbor technique?

Page 5 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Algorithm in details and Issues 2. Gaussian Mixture Models – Insert all the signals for a particular class into the RPS for a particular d and  selected in previous step, GMM: Where, M = # of mixtures, N(x; ,  ) = normal distribution with mean  and covariance matrix  W = mixture weight with the constraint GMMs estimated using Expectation-Maximization (EM) algorithm. 1.How is EM algorithm implemented? 2.Classification accuracy depends on M, So how to determine the value of M? 3.What is value of M determined from the underlying distribution of the RPS density?

Page 6 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Algorithm in details and Issues 3. Classification – Maximum Likelihood estimates from previous step are: Where, mean , covariance matrix , mixture weight W Using Bayesian maximum likelihood classifiers: Compute the conditional likelihoods of the signal under each learned model Select the model with highest likelihood. 1.How are the conditional likelihoods computed?

Page 7 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Experiment details and Issues TIMIT speech corpus: 417 phonemes for speaker MJDE0. 6 spoken only once, 47 classes in total (out of the standard 48 classes) Sampling frequency 16KHz, Signal length – 227 to 5,201 samples Phoneme boundaries and class labels determined by a group of experts 25 iterations of EM algorithm are used. Classification accuracy is around 50% (50% for for 32GMMs) [reason – due to insufficient training data] Approach is compared with time delay NN with nonlinear one step predictor and minimum prediction error classifier. 1.Details on how the testing is done is missing. 2.How is insufficient training data causing reduction in accuracy for increase in GM mixtures?

Page 8 of 8 Time Series Classification – phoneme recognition in reconstructed phase space References R. Povinelli, M. Johnson, A. Lindgren, and J. Ye, “Time Series Classification using Gaussian Mixture Models of Reconstructed Phase Spaces,” IEEE Transactions on Knowledge and Data Engineering, Vol 16, no 6, June 2004, pp (the referred paper) F. Takens, “Detecting Strange Attractors in Turbulence,” Proceedings Dynamical Systems and Turbulence, 1980, pp (background theory) T. Sauer, J. Yorke, and M. Casdagli, “Embedology,” Journal Statistical Physics, vol 65, 1991, pp (background theory) A. Petry, D. Augusto, and C. Barone, “Speaker Identification using Nonlinear Dynamical Features,” Choas, Solitions, and Fractals, vol 13, 2002, pp (speech related dynamical system) H. Boshoff, and M. Grotepass, “The fractal dimension of fricative Speech Sounds,” Proceddings South African Symposium Communication and Signal Processing, 1991, pp (speech related dynamical system) D. Sciamarella and G. Mindlin, “Topological Structure of Chaotic Flows from Human Speech Chaotic Data,” Physical Review Letters, vol. 82, 1999, pp (speech related dynamical system) T. Moon, “The Expectation-Maximization algorithm,” IEEE Signal Processing Magazine, 1996, pp (expectation-maximization algorithm details) Q. Ding, Z. Zhuang, L. Zhu, and Q. Zhang, “Application of the Chaos, Fractal, and Wavelet Theories to the Feature Extraction of Passive Acoustic Signal,” Acta Acustica, vol 24, 1999, pp (frequency based speech dynamical system analysis) J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallet, N. Dahlgren, and V. Zue, “TIMIT Acoustic-Phonetic Continuous Speech Corpus,” Linguistic Data Consortium, (speech data set used for experiments)