Isolated word recognition with the Liquid State Machine: a case study

Slides:

Advertisements

Similar presentations

Multi-Layer Perceptron (MLP)

Advertisements

Perceptron Lecture 4.

Slides from: Doug Gray, David Poole

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.

Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.

Machine Learning Neural Networks

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Deep Belief Networks for Spam Filtering

Chapter 6: Multilayer Neural Networks

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Radial Basis Function (RBF) Networks

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Isolated-Word Speech Recognition Using Hidden Markov Models

This week: overview on pattern recognition (related to machine learning)

MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way

Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.

 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.

Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.

Implementing a Speech Recognition System on a GPU using CUDA

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

NEURAL NETWORKS FOR DATA MINING

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.

LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.

Assignment 2: Papers read for this assignment Paper 1: PALMA: mRNA to Genome Alignments using Large Margin Algorithms Paper 2: Optimal spliced alignments.

Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.

Jacob Zurasky ECE5526 – Spring 2011

1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.

Basics of Neural Networks Neural Network Topologies.

Grid-based Simulations of Mammalian Visual System Grzegorz M. Wójcik and Wiesław A. Kamiński Maria Curie-Sklodowska University, Lublin, Poland. Abstract.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Performance Comparison of Speaker and Emotion Recognition

Lecture 5 Neural Control

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

EEE502 Pattern Recognition

Data Mining and Decision Support

November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Ghent University Pattern recognition with CNNs as reservoirs David Verstraeten 1 – Samuel Xavier de Souza 2 – Benjamin Schrauwen 1 Johan Suykens 2 – Dirk.

Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.

July 23, BSA, a Fast and Accurate Spike Train Encoding Scheme Benjamin Schrauwen.

1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Ghent University Compact hardware for real-time speech recognition using a Liquid State Machine Benjamin Schrauwen – Michiel D’Haene David Verstraeten.

Ghent University An overview of Reservoir Computing: theory, applications and implementations Benjamin Schrauwen David Verstraeten and Jan Van Campenhout.

Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Neural Network Architecture Session 2

Learning in Neural Networks

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Intelligent Information System Lab

Final Year Project Presentation --- Magic Paint Face

RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION

Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)

EE513 Audio Signals and Systems

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab

The Network Approach: Mind as a Web

Presenter: Shih-Hsiang(士翔)

Measuring the Similarity of Rhythmic Patterns

Combination of Feature and Channel Compensation (1/2)

Presentation transcript:

Isolated word recognition with the Liquid State Machine: a case study D. Verstraeten ∗, B. Schrauwen, D. Stroobandt, J. Van Campenhout Written By: Gassan Tabajah Ron Adar Gil Rapaport

Abstract The Liquid State Machine (LSM) is a recently developed computational model with interesting properties. It can be used for pattern classification, function approximation and other complex tasks. Contrary to most common computational models, the LSM does not require information to be stored in some stable state of the system: The inherent dynamics of the system are used by a memoryless readout function to compute the output.

Abstract In this paper we present a case study of the performance of the Liquid State Machine based on a recurrent spiking neural network by applying it to a well known and well studied problem: Speech recognition of isolated digits. We evaluate different ways of coding the speech into spike trains. In its optimal configuration, the performance of the LSM approximates that of a state-of-the-art recognition system. Another interesting conclusion is the fact that the biologically most realistic encoding performs far better than more conventional methods.

1. Introduction Many complex computational problems have a strong temporal aspect: Not only the value of the inputs is important, but also their specific sequence and precise occurrence in time. Tasks such as speech recognition, object tracking, robot control or biometrics are inherently temporal, as are many of the tasks that are usually viewed as ‘requiring intelligence’.

1. Introduction However, most computational models do not explicitly take the temporal aspect of the input into account or transform the time-dependent inputs to static input using. e.g., a tapped delay line. These methods disregard the temporal information contained in the inputs in two ways: The time-dependence of the inputs within a certain time window is compressed into a static snapshot and is therefore partially lost. The temporal correlation between different windows is not preserved.

Liquid State Machine (LSM) The Liquid State Machine (LSM) avoids these problems by construction. The LSM is a computational concept (its structure is depicted in Fig. 1):

Liquid State Machine (LSM) A reservoir of recurrently interacting nodes is stimulated by the input u(t), A running or liquid state x(t) is extracted and a readout function fM converts the high-dimensional liquid state x(t) into the desired output y(t) for the given task.

Liquid State Machine (LSM) The loops and loops within-loops which occur in the recurrent connections between the nodes in the reservoir cause a short-term memory effect to occur: The influence of inputs fed into the network ‘resonate’ for a while before dying out. Hence the name liquid: ripples in a pond also show the influence of past inputs. This property is called temporal integration.

Types of liquids The reservoir or liquid can be any type of network that has sufficient internal dynamics. Currently several types of liquids have been used: The echo state machine using a recurrent analog neural network. A real liquid (water) in a bucket. A delayed threshold logic network. A Spiking Neural Network (SNN). The type of liquid we will be using in this article.

Readout function The actual identity of the readout function is also not explicitly specified; Indeed, it can be any method of statistical analysis or pattern recognition. Possible readout functions include linear projection, Fisher discriminant, a perceptron, a feed forward MLP trained with backpropagation or even a Support Vector Machine. Note that the liquid itself is not trained, but chosen a priori: A heuristic is used to construct ‘interesting’ liquids in a random manner. Only the readout function is adapted so that the LSM performs the required task.

Separation between the liquid and its readout The separation between the liquid and its readout function offers two considerable advantages over traditional Recurrent Neural Networks (RNN). First, the readout function is generally far easier to train than the liquid itself, which is a recurrently connected network. Furthermore, the structure of the LSM permits several different readout functions to be used with the same liquid, which can each be trained to perform different tasks on the same inputs—and the same liquid. This means that the liquid only needs to be computed once, which gives the LSM an inherent parallel processing capability without requiring much additional processing power.

Similarity between LSM and SVM The LSM is a computational model that bears strong resemblance to another well-known paradigm: That of the kernel methods and the Support Vector Machine (SVM). Here, inputs are classified by first using a non-linear kernel function to project the inputs into a very high-dimensional (often even infinitely dimensional) feature space, where the separation between the classes is easier to compute (and can often even be linearly separable). The LSM can be viewed as a variation: Here too, the inputs are non-linearly projected into a high-dimensional space (the space of the liquid state) before the readout function computes the output. The main difference is the fact that the LSM—contrary to SVMs—has an inherent temporal nature.

Similarity between LSM and RNN Another similarity can be found in a the state-of-the-art RNN training algorithm called Backpropagation Decorrelation presented in [6], which was derived mathematically. It has been shown that training a RNN with this algorithm implicitly also keeps a pool of connections almost constant and only trains the connections to the output nodes. This behavior is very similar to the principle of the LSM, specifically when using a perceptron as readout function.

Filter approximation In [7] it has been shown that any time invariant filter with fading memory can be approximated with arbitrary precision by a LSM, under very unrestrictive conditions. In practice, every filter that is relevant from a biological or engineering point of view can be approximated.

Article structure This article is structured as follows: In Section 2 we first detail the setup used to perform the experiments. In Section 3 we briefly introduce and subsequently test three different speech front ends which are used to transform the speech into spike trains. The effects of different types of noise that are commonly found in real world applications are tested in Section 4. In Section 5 we make some relevant comparisons with related isolated word recognition systems. In Section 6 we draw some conclusions about the speech recognition capabilities of this computational model. In Section 7 we point out some possible opportunities for further research.

Experimental Setup Matlab LSM toolbox Liquid is a recurrent network (contains loops) of leaky integrate and fire neurons arranged in a 3D column (pile) Parameters taken from original LSM paper

Network structure Connections between neurons allocated stochastically D = Euclidean distance between neurons a, b C = 0.3/EE, 0.2/EI, 0.4/IE, 0.1/II Lambda = 2 Effectively: mainly local, limited global connectivity

Dataset – subset of TI46 46-word speaker dependant isolated word speech database 500 samples 5 different speakers Digits ‘zero’ through ‘nine’ 10 different utterances 300 training, 200 test

Readout function Simple linear projection Winner take all selection y = output x = liquid state w = weight matrix Winner take all selection Maximal y value taken as result Recompute every 20ms, take maximum again

Performance metric WER – Word Error Rate Fraction of incorrectly classified words as a percentage of total number of words

Transforming signals to spikes Voice is an analog signal Usually even after it is being preprocessed SNN liquid needs spike trains BSA: a heuristic algorithm for deciding at each timestamp whether a spike should be emitted

Preprocessing Common practice in speech recognition Enhances speech specific features Used algorithms Hopfield-Brody MFCC Lyon Passice Ear

Hopfield-Brody FFT the sound Examine 20 specific frequencies Look for onset, offset, peak events Constantly monitor the same 40 events Each one is considered a spike train Treat event time as spike No need for BSA

Hopefield-Brody (cont’) Not suitable at all WER first drops when network size increases, but then increases again Even the best liquids identified 1 in 5 words correctly

Mel Frequency Cepstral Coefficients De-facto standard for speech preprocessing Hamming Windowing FFT the signal Mel-scale filter the magnitude Log10 the values Cosine transform – reduce dimensionality, enhance speech features

Hamming windowing

Mel Scale

MFCC (cont’) Result is the so-called ‘cepstrum’: 13 coefficients of processed analog signal Compute first and second time-derivatives: total of 39 coefficients BSA turns them into 39 spike trains fed into LSM Performance: identifies 1 in 2 words

Lyon passive ear A model of human inner ear (cochlea), on the name of Richard F. Lyon. Describes the way acoustic energy is transformed and converted to neural representations. Considered to be simple model comparing to others.

Lyon passive ear cont’ Model consists of: 1) filter bank-closely resembles the selectivity of the human ear to certain frequencies. 2) a series of half-wave rectifies (HWRs) and adaptive gain controllers (AGCs) both of them modeling the hair cells response. 3) Each filter output, after detection and AGC, is called a channel.

The Peripheral Auditory System Auditory nerve Cochlea Outer Ear Middle Ear Inner Ear 32

Performance Performance results for LPE(WER=word error rate)

Cochleagram Full time sequence of the outputs of the last stage.

The hall process

Input with noise To test the robustness robustness of the recognition performance to noisy environments, a noise is added to the input. Different types of noise have been used: 1) speech babble 2) white noise 3) interior noise The LSM was compared to the best results from one of the books in the references, which was the “Log Auditory Model” (LAM) with noise robust front-end.

Input with noise cont’ The LAM designed for noise robustness and it followed by Hidden Markov Model (HMM). Here we can see a table that compares LSM to LAM with different kind of noise, and in levels of 10,20, 30 dB, tested on single liquid with 1232 neurons, trained on clean speech and tested on noisy speech.

Comparison to other machines Sphinx4 is a recent speech recognition system (by Sun Microsystems) using HMMs and MFCC front end. On the database TI46 get error rate of 0.168%, compare to 0.5% on the best LSM from the experiment. But LSM also have couple of advantages over HMMs: HMMs tends to be sensitive to noisy inputs. Usually biased towards a certain speech database. Do not offer a way to perform additional tasks like speaker identification or word separation on the same input without dramatic increase in computational.

Conclusion The paper have looked the SNN interpretation of the LSM to the task of isolated word recognition with a limited vocabulary. Several methods of transforming the sounds into spikes train have been explored. The results showed that LSM is well suited for the task, and that the performance is far better using biological model (LPEM). LSM also worked well with noise on the input (noise robustness).

Future work Find out what causes the big difference between LPE model to the traditional MFCC methods. Further research is needed in order to find a hardware implementation to parallel LSM that could keep the dynamics of the current LSM.