Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Cognitive Computer Vision Kingsley Sage khs20@sussex.ac.uk and Hilary Buxton hilaryb@sussex.ac.uk Prepared under ECVision Specific Action 8-3 http://www.ecvision.org

Lecture 12 Learning the parameters for a continuous valued Hidden Markov Model – Given O, find to maximise likelihood p( |O) – Baum Welch (model parameter) learning Stochastic sampling

So why are HMMs relevant to Cognitive CV? Provides a well-founded methodology for reasoning about temporal events One method that you can use as a basis for our model of expectation In this lecture we shall see how we can learn the HMM model parameters for a task from training observation data

Reminder: What is a Hidden Markov Model? Formally a Hidden Markov Model = ( , A, B)  vector is a 1*N vector that specifies the probability of being in a particular hidden state at time t=0 A is the State Transition Matrix (N*N matrix) B are the Confusion parameters for N Gaussian components (N mean vectors and 1 or N co-variance matrices) O is the observation sequence (1*|O| vector)

Learning for a visual task 2D hand trajectory tracking (movie © ICS, FORTH, Crete GREECE) We use a hand tracker to create positional data for functional gestures (hand going in circles in this case)

Baum Welch learning Given O, find to maximise likelihood p( |O) – Baum Welch (model parameter) learning A type of Expectation Maximisation (EM) learning Start with random parameters for = ( , A, B) – Apply iteration of BW learning to define ’ – Initial model either defines a critical point of the likelihood function, in which case ’ =, or – Model ’ is more likely in the sense that P(O| ’) > P(O| ) I.e. we have found another model ’ from which the observation sequence O is more likely to be produced

Baum Welch learning For continuous valued data we are also learning the parameters of the Gaussian components Starting from random value Here we assume that there is only one covariance matrix

Getting the notation right (1) First, we need to set out some more precise mathematical notation and terms … – p(O| ): fit of O given the model – p( |O): Likelihood function – O = [ o 1, o 2, …, o T ] – N hidden states (we choose this value ourselves) – M symbols in the observation sequence –  = the forwards evaluation trellis (N*T matrix) –  = the backwards evaluation trellis (N*T matrix) – k = # of features in the Gaussian components

Getting the notation right (2)  = A =

Re-estimation procedure# just the same as before … Summary of procedure: Choose =( ,A,B) at random (subject to probability constraints, of course …) LOOP Calculate p(O|  ) Use re-estimation formulae to calculate ’=(  ’, A’,B’) Calculate p(O| ’) IF |p(O| ) - p(O| ’)| <  THEN = ’, Stop ELSE = ’ END LOOP

Calculating  (i,j) The difference for continuous valued data is how we calculate the term b j (o t+1 )

Putting it all together … As we saw in the seminar, we can ignore P(O| ) as it is a constant and use combinations of scaling and normalisation when calculating , A and Gaussian parameters

Re-estimation formulae (1) The mean for Gaussian index m is formed by weighting the observation data according to the count parameters  Normalising constants for  terms cancel out Easily extends to multiple sequences

Re-estimation formulae (2) Covariance calculation is comparable with (O -  ) 2.  If  was scaled correctly, normalising constants cancel out –  is element-wise product – matrix T is matrix transpose – N is the number of hidden states – T is time

Stochastic sampling Can use our continuous valued model to generate data (just like for the discrete case) Let’s assume that some observation data is missing (e.g. in a visual tracker where our target has become occluded) We assume we are applying the correct motion model to our target and that we have some historical data

Stochastic sampling Summary of procedure: Given model ( ,A,B) with N hidden states, observation data O we can calculate a forwards evaluation trellis  up until observation data is no longer available (say at time t=u) For the distribution  (N,t=u), calculate the values based on the state transition matrix A alone from  (N,t=u-1), (there is no value o u ) Stochastically sample  (N,t=u) and select one state q. Set  (n=q,t=u)=1.0 and all other values  (  n  N:n  q,t=u)=0.0. q is the hidden state we have selected to be in Generate o u by sampling  q, 

Stochastic sampling in action Green: constant velocity Blue: constant acceleration Red: First order HMM Purple: Variable Length MM Black shows observed data from our earlier hand trajectory example The rest of the circle is occluded Extrapolation over many timesteps gives a wide variation in prediction. This is because the memory is only first order The Variable Length Markov Model (VLMM) tracker uses a longer temporal history to build a stiffer model

Summary Much of the HMM learning for the continuous case is similar to the discrete case but we use Gaussian models parameterised by  and  The major additional computation involves the re-estimation process since it involves the Gaussian models We can full in the observation sequence for a model ( ,A,B) by using stochastic sampling

Next time … Learning in Bayesian Belief Networks

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Similar presentations

Presentation on theme: "Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Similar presentations

Presentation on theme: "Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3"— Presentation transcript:

Similar presentations

About project

Feedback