Automatic Speech Recognition II  Hidden Markov Models  Neural Network.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Advertisements

Artificial Neural Networks (1)
Angelo Dalli Department of Intelligent Computing Systems
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Models By Marc Sobel. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Modeling.
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
HIDDEN MARKOV MODELS Prof. Navneet Goyal Department of Computer Science BITS, Pilani Presentation based on: & on presentation on HMM by Jianfeng Tang Old.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Introduction to Hidden Markov Models
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models Fundamentals and applications to bioinformatics.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Albert Gatt Corpora and Statistical Methods Lecture 8.
Machine Learning Neural Networks
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Data Mining Techniques Outline
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Chapter 6: Multilayer Neural Networks
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
7-Speech Recognition Speech Recognition Concepts
ANNs (Artificial Neural Networks). THE PERCEPTRON.
Appendix B: An Example of Back-propagation algorithm
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
Markov Models and Simulations Yu Meng Department of Computer Science and Engineering Southern Methodist University.
Multi-Layer Perceptron
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
CS621 : Artificial Intelligence
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
EEE502 Pattern Recognition
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
CPSC 7373: Artificial Intelligence Lecture 12: Hidden Markov Models and Filters Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Prof. Carolina Ruiz Department of Computer Science
network of simple neuron-like computing elements
HCI/ComS 575X: Computational Perception
CONTEXT DEPENDENT CLASSIFICATION
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Hidden Markov Model  DTW, VQ => recognize pattern, use distance measurement.  HMM: statistical method for characterizing the properties of the frame of pattern,

Discrete-time Markov Processes  Consider a system with:  N distinct states  A set of probabilities associated with the state. => probabilities to change from one state to another state  Time instants

Discrete-time Markov Processes First order Markov chain: the probability depends on just the preceding state. The set of state-transition probabilities a ij :

Discrete-time Markov Processes  Ex. Consider a simple three-state Markov model of the weather.  What is the probability that the weather for the next seven consecutive days is “sun-sun-snow-snow-sun-cloudy-sun”. Given the weather for today is “sun” and the weather condition for each day depends on the condition on a previous day.  O=(sun, sun, sun, snow, snow, sun, cloudy, sun)  O=(3, 3, 3, 1, 1, 3, 2, 3) State1: snow State2: cloudy State3: sunny Prob. for initial state

Discrete-time Markov Processes  Ex. Given a single fair coin, i.e., P(Heads)=P(Tails)=0.5  What is the probability that the next 10 tosses will provide the sequence (HHTHTTHTTH)  What is the probability that 5 of the next 10 tosses will be tails?

Coin-Toss Models  You are in a room with a barrier which you cannot see what is happening.  On the other side of the barrier is another person who is performing a coin- tossing experiment (using one or more coins).  The person will not tell you which coin he selects at any time; he will only tell you the result of each coin flip.  How do we build an HMM to explain the observe sequence of heads and tails?  What the states in the model correspond to  How many states should be in the model

Coin-Toss Models  Single coin  Two states: heads or tails  Observable Markov Model => not hidden headstails P(H) 1-P(H) P(H)

Coin-Toss Models  Two coins (Hidden Markov Model)  Two state: coin 1, coin 2  Each state(coin) is characterized by a probability distribution of heads and tails  There are probabilities of state transition (state transition matrix) Coin 1Coin 2 a11 a22 1-a11 1-a22 P(H)=P1 P(T)=1-P1) P(H)=P2 P(T)=1-P2)

Coin-Toss Models  Three coins (Hidden Markov Model)  Three state: coin 1, coin 2, coin 3  Each state(coin) is characterized by a probability distribution of heads and tails  There are probabilities of state transition (state transition matrix) a11 a22 a12 a21 a33 a31 a13 a23 a32 P(H)=P1 P(T)=1-P1) P(H)=P3 P(T)=1-P3) P(H)=P2 P(T)=1-P2)

The Urn-and-Ball Model  There are N-glass urns in the room.  Each urn is a large quantity of colored balls : M distinct colors  A genie is in the room and it chooses an initial urn. From this urn, a ball is chosen at random and its color is recorded as the observation. The ball is then replace to the same urn.  A new urn is then selected according to the random selection of the current urn.

Element of an HMM  The number of states in the model (N) : S={1,2,…,N}  The number of distinct observation symbols per state (M): V={v 1,v 2,…v M }  The state transition probability distribution A={a ij } where a ij =P[q t+1 =j|q t =i]  The observation symbol probability distribution, B={b j (k)}, in which b j (k)=P[o t =vk|q t =j]  The initial state distribution  Complete parameter set of the model

HMM Generator of Observations  Given appropriate values of N, M, A, B, and , the HMM can be used as a generator to give an observation sequence O=(o 1 o 2 …o T )  Choose an initial state q 1 =i  Set t=1  Choose o t =v k according to the symbol probability distribution in state i, b j (k)  Transit to the new state q t+1 = j according to the state-transition probability distribution for state i, a ij  Set t=t+1; return to step 3 if t<T, otherwise, terminate the procedure.

HMM Generator of Observations  Ex. Consider an HMM representation of a coin-tossing problem. Assume a three-state model (three coins) with probabilities:  All state transition probabilities = 1/3 State1State2State3 P(H) P(T)

HMM Generator of Observations 1. You observe the sequence O=(H H H H T H T T T T). What state sequence is most likely? What is the probability of the observation sequence and this most likely state sequence? Because all state transition probability are equal, the most likely state sequence is the one for which the probability of each individual observation is maximum. Thus for each H, the most likely state is 2 and for each T the most likely state is 3. The most likely state sequence is q=( ) with probability

HMM Generator of Observations 2. What is the probability that the observation sequence came entirely from state 1? O=(H H H H T H T T T T), q=( ) The probability that the first H come from state 1 =0.5*1/3 The probability that the second H come from state 1 =0.5*1/3… The probability that the first T come from state 1 =0.5*1/3…

HMM Generator of Observations  If the state-transition probabilities were:  What is the most likely state sequence for O=(H H H H T H T T T T). a 11 =0.9a 21 =0.45a 31 =0.45 a 12 =0.05a 22 =0.1a 32 =0.45 a 13 =0.05a 23 =0.45a 33 =0.1

The three basic problems for HMM  Problem 1: How do we compute P(O| )  Problem 2: How do we choose the state sequence q=(q 1, q 2,…q T ) that is optimal? (most likely)  Problem 3: How do we adjust the model to maximize P(O| )  Speech recognition sense Training Model Samples of W word vocab W1 Model Wn Model Problem3

The three basic problems for HMM To study the physical meaning of model states. Initial VowelFinal Problem2

The three basic problems for HMM Unknown word Recognize an unknown word. Calculate P (O| 1 ) Calculate P (O| n ) compare Prediction Problem1

Artificial Neural Network  An artificial neural network (ANN), usually called "neural network" (NN), is a mathematical model or computational model that tries to simulate the structure and/or functional aspects of biological neural networks.

Composition of NN  Input nodes: each node is the feature vector of each sample.  Hidden nodes: can be more than 1 layer.  Output nodes: the output of the correspond input sample.  The connections of input nodes, hidden nodes, and output nodes are specified by weight values. Input nodes Hidden nodes Output nodes Connections

Feedforward operation and classification  A simple three-layer NN x1x1 x2x2 y1y1 y2y2 zkzk bias Output k Hidden j Input i w ji w kj

Feedforward operation and classification  Net activation: the inner product of the inputs with the weights at the hidden unit.  Where i = index of input layer, j =index of hidden layer node  Each hidden unit emits an output that is a nonlinear function of its activation, f(net) that is:  Simple of sign function:

Feedforward operation and classification  Each output unit computes its net activation based on the hidden unit signals as  The output unit computes the nonlinear function of its net:

Back propagation  Backpropagation is the simplest and the most general methods for supervised training of multilayer NN.  The basic approach in learning starts with an untrained network and follows these steps:  Present a training pattern to the input layer.  Pass the signals through the net and determine the output.  Compare the output with the target values => difference (error)  The weights are adjusted to reduce the error.

Exercise  Implement the vowel classifier by using Neural Network.  Use the same speech samples that you use in VQ exercise.  What is the important feature to classify vowels?  Separate your samples into 2 groups: training and testing  Label the class for the training sample.  Train Multilayer perceptron from the training samples and perform testing on the testing data.