Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.

Slides:

Advertisements

Similar presentations

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.

Advertisements

Neural networks Introduction Fitting neural networks

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.

Hidden Markov Models Theory By Johan Walters (SR 2003)

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Advances in WP1 Turin Meeting – 9-10 March

Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.

4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,

HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.

Advances in WP1 and WP2 Paris Meeting – 11 febr

Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.

Why is ASR Hard? Natural speech is continuous

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Introduction to Automatic Speech Recognition

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.

Eng. Shady Yehia El-Mashad

1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.

0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,

Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.

7-Speech Recognition Speech Recognition Concepts

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.

REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.

LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.

Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.

Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.

Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.

Performance Comparison of Speaker and Emotion Recognition

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.

Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.

Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.

Brain-Machine Interface (BMI) System Identification Siddharth Dangi and Suraj Gowda BMIs decode neural activity into control signals for prosthetic limbs.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

Ghent University Pattern recognition with CNNs as reservoirs David Verstraeten 1 – Samuel Xavier de Souza 2 – Benjamin Schrauwen 1 Johan Suykens 2 – Dirk.

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

Automatic speech recognition using an echo state network Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University.

Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,

Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan

A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

ARTIFICIAL NEURAL NETWORKS

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Statistical Models for Automatic Speech Recognition

Statistical Models for Automatic Speech Recognition

Automatic Speech Recognition: Conditional Random Fields for ASR

August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab

Presenter: Shih-Hsiang(士翔)

Combination of Feature and Channel Compensation (1/2)

Presentation transcript:

Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering Lab University of Florida

Automatic Speech Recognition Using an Echo State Network Mark Skowronski and John Harris Computational Neuro-Engineering Lab University of Florida

Transformation of a graduate student

Motivation: Man vs. Machine Wall Street Journal/Broadcast news readings, 5000 words Untrained human listeners vs. Cambridge HTK LVCSR system

Overview Why is ASR so poor? Hidden Markov Model (HMM) Echo state network (ESN) ESN applied to speech Conclusions

ASR State of the Art Feature extraction: MFCC vs. HFCC* Acoustic pattern rec: HMM Language models *Skowronski & Harris. JASA, (3):1774–1780, m 1 m 2 m 3 m 4 m 5 m 6 frequency … coefficients

Hidden Markov Model Premier stochastic model of non-stationary time series used for decision making. Assumptions: 1) Speech is piecewise- stationary process. 2) Features are independent. 3) State duration is exponential. 4) State transition prob. function of previous-next state only.

ASR Example Isolated English digits “zero” - “nine” from TI46: 8 male, 8 female, 26 utterances each, fs=12.5 kHz. 10 word models, various #states and #gaussians/state. Features: 13 HFCC, 100 fps, Hamming window, pre-emphasis (α=0.95), CMS, Δ+ΔΔ (±4 frames) Pre-processing: zero-mean and whitening transform M1/F1: testing; M2/F2: validation; M3-M8/F3-F8 training Test: corrupted by additive noise from “real” sources (subway, babble, car, exhibition hall, restaurant, street, airport terminal, train station)

HMM Test Results

Overcoming the limitations of HMMs HMMs do not take advantage of the dynamics of speech Well known HMM limitations include: –Only the present state affects transition probabilities –Successive observations are independent –Assumes static density models Need an architecture that better captures the dynamics of speech

Echo State Network Recurrent neural network proposed by Jaeger 2001 L M I A N P E P A E R W W in dxdx dydy Recurrent “reservoir” of nonlinear processing elements with random untrained weights. Linear readout, easily trained weights. Note similarities to Liquid State Machine W out random untrained input weights.

ESN Diagram & Equations

How to classify with predictors Build 10 word models that are trained to predict the future of each of the 10 digits Z ? The best predictor determines the class Not a new idea!

ESN Training Minimize mean-squared error between y(n) and desired signal d(n). Wiener solution:

Multiple Readout Filters Need good predictors for separation of classes One linear filter will give mediocre prediction. Question: how to divide reservoir space and use multiple readout filters? Answer: competitive network of filters Question: how to train/test competitive network of K filters? Answer: mimic HMM.

ASR Example Same spoken digit experiment as before. ESN: M=60 PEs, r=2.0, r in =0.1, 10 word models, various #states and #filters/state. Identical pre-processing and input features Desired signal: next frame of 39-dimension features

ESN Results

ESN/HMM Comparison

Conclusions ESN classifies by predicting Multiple filters mimic sequential nature of HMMs ESN classifier noise robust compared to HMM: –Ave. over all sources, 0-20 dB SNR: +21 percentage points –Ave. over all sources: +9 dB SNR ESN reservoir provides a dynamical model of the history of the speech. Questions?

HMM vs. ESN Classifier HMMESN Classifier OutputLikelihoodMSE ArchitectureStates, left-to-right Minimum element Gaussian kernelReadout filter Elements combined GMMWinner-take-all TransitionsState transition matrixBinary switching matrix TrainingSegmental K-means (Baum-Welch) Segmental K-means DiscriminatoryNoMaybe, depends on desired signal