Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Slides:



Advertisements
Similar presentations
Probabilistic Reasoning over Time
Advertisements

Learning HMM parameters
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Cognitive Computer Vision
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
An Introduction to Hidden Markov Models and Gesture Recognition Troy L. McDaniel Research Assistant Center for Cognitive Ubiquitous Computing Arizona State.
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Apaydin slides with a several modifications and additions by Christoph Eick.
Albert Gatt Corpora and Statistical Methods Lecture 8.
INTRODUCTION TO Machine Learning 3rd Edition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Cognitive Computer Vision 3R400 Kingsley Sage Room 5C16, Pevensey III
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Isolated-Word Speech Recognition Using Hidden Markov Models
Gaussian Mixture Model and the EM algorithm in Speech Recognition
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Other Models for Time Series. The Hidden Markov Model (HMM)
Cognitive Computer Vision 3R400 Kingsley Sage Room 5C16, Pevensey III
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
MACHINE LEARNING 16. HMM. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Modeling dependencies.
Hidden Markov Models BMI/CS 576
Cognitive Computer Vision
Hidden Markov Models - Training
Hidden Markov Models Part 2: Algorithms
Hidden Markov Model LR Rabiner
CONTEXT DEPENDENT CLASSIFICATION
LECTURE 15: REESTIMATION, EM AND MIXTURES
Hidden Markov Models By Manish Shrivastava.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Lecture 12 Learning the parameters for a continuous valued Hidden Markov Model – Given O, find to maximise likelihood p( |O) – Baum Welch (model parameter) learning Stochastic sampling

So why are HMMs relevant to Cognitive CV? Provides a well-founded methodology for reasoning about temporal events One method that you can use as a basis for our model of expectation In this lecture we shall see how we can learn the HMM model parameters for a task from training observation data

Reminder: What is a Hidden Markov Model? Formally a Hidden Markov Model = ( , A, B)  vector is a 1*N vector that specifies the probability of being in a particular hidden state at time t=0 A is the State Transition Matrix (N*N matrix) B are the Confusion parameters for N Gaussian components (N mean vectors and 1 or N co-variance matrices) O is the observation sequence (1*|O| vector)

Learning for a visual task 2D hand trajectory tracking (movie © ICS, FORTH, Crete GREECE) We use a hand tracker to create positional data for functional gestures (hand going in circles in this case)

Baum Welch learning Given O, find to maximise likelihood p( |O) – Baum Welch (model parameter) learning A type of Expectation Maximisation (EM) learning Start with random parameters for = ( , A, B) – Apply iteration of BW learning to define ’ – Initial model either defines a critical point of the likelihood function, in which case ’ =, or – Model ’ is more likely in the sense that P(O| ’) > P(O| ) I.e. we have found another model ’ from which the observation sequence O is more likely to be produced

Baum Welch learning For continuous valued data we are also learning the parameters of the Gaussian components Starting from random value Here we assume that there is only one covariance matrix

Getting the notation right (1) First, we need to set out some more precise mathematical notation and terms … – p(O| ): fit of O given the model – p( |O): Likelihood function – O = [ o 1, o 2, …, o T ] – N hidden states (we choose this value ourselves) – M symbols in the observation sequence –  = the forwards evaluation trellis (N*T matrix) –  = the backwards evaluation trellis (N*T matrix) – k = # of features in the Gaussian components

Getting the notation right (2)  = A =

Re-estimation procedure# just the same as before … Summary of procedure: Choose =( ,A,B) at random (subject to probability constraints, of course …) LOOP Calculate p(O|  ) Use re-estimation formulae to calculate ’=(  ’, A’,B’) Calculate p(O| ’) IF |p(O| ) - p(O| ’)| <  THEN = ’, Stop ELSE = ’ END LOOP

Calculating  (i,j) The difference for continuous valued data is how we calculate the term b j (o t+1 )

Putting it all together … As we saw in the seminar, we can ignore P(O| ) as it is a constant and use combinations of scaling and normalisation when calculating , A and Gaussian parameters

Re-estimation formulae (1) The mean for Gaussian index m is formed by weighting the observation data according to the count parameters  Normalising constants for  terms cancel out Easily extends to multiple sequences

Re-estimation formulae (2) Covariance calculation is comparable with (O -  ) 2.  If  was scaled correctly, normalising constants cancel out –  is element-wise product – matrix T is matrix transpose – N is the number of hidden states – T is time

Stochastic sampling Can use our continuous valued model to generate data (just like for the discrete case) Let’s assume that some observation data is missing (e.g. in a visual tracker where our target has become occluded) We assume we are applying the correct motion model to our target and that we have some historical data

Stochastic sampling Summary of procedure: Given model ( ,A,B) with N hidden states, observation data O we can calculate a forwards evaluation trellis  up until observation data is no longer available (say at time t=u) For the distribution  (N,t=u), calculate the values based on the state transition matrix A alone from  (N,t=u-1), (there is no value o u ) Stochastically sample  (N,t=u) and select one state q. Set  (n=q,t=u)=1.0 and all other values  (  n  N:n  q,t=u)=0.0. q is the hidden state we have selected to be in Generate o u by sampling  q, 

Stochastic sampling in action Green: constant velocity Blue: constant acceleration Red: First order HMM Purple: Variable Length MM Black shows observed data from our earlier hand trajectory example The rest of the circle is occluded Extrapolation over many timesteps gives a wide variation in prediction. This is because the memory is only first order The Variable Length Markov Model (VLMM) tracker uses a longer temporal history to build a stiffer model

Summary Much of the HMM learning for the continuous case is similar to the discrete case but we use Gaussian models parameterised by  and  The major additional computation involves the re-estimation process since it involves the Gaussian models We can full in the observation sequence for a model ( ,A,B) by using stochastic sampling

Next time … Learning in Bayesian Belief Networks