Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Presentation on theme: "Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3"— Presentation transcript:

Cognitive Computer Vision Kingsley Sage khs20@sussex.ac.uk and Hilary Buxton hilaryb@sussex.ac.uk Prepared under ECVision Specific Action 8-3 http://www.ecvision.org

Lecture 13 Learning Bayesian Belief Networks Taxonomy of methods Learning BBNs for the fully observable data and known structure case

So why are BBNs relevant to Cognitive CV? Provides a well-founded methodology for reasoning with uncertainty These methods are the basis for our model of perception guided by expectation We can develop well-founded methods of learning rather than just being stuck with hand- coded models

Reminder: What is a BBN? p(a=detect) 0.2 p(b=detect) 0.1 a=b=p(o=T|A,B) TT0.95 TF0.6 FT0.5 FF0.01 o=p(n=T|O) T0.7 F0.2 o=p(c=T|O) T0.7 F0.1 Compact representation of the joint probability Each variable is represented as a node. Conditional independence assumptions are encoded using a set of arcs Different types of graph exist. The one shown is a Directed Acyclic Graph (DAG) A B O C N

Why is learning important in the context of BBNs? Knowledge acquisition can be an expensive process Experts may not be readily available (scarce knowledge) or simply not exist But you might have a lot of data from (say) case studies Learning allows us to construct BBN models from the data and in the process gain insight into the nature of the problem domain

The process of learning Model structure (if known) Data (may be full or partial) Learning process

What do we mean by “partial” data? Training data where there are missing values e.g.: a=b=o= TTF FTF F?T TF?.. FTF Discrete valued BBN with 3 nodes A B O

What do we mean by “known” and “unknown” structure? AOB Known structure A O B Unknown structure

Taxonomy of learning methods Model structure KnownUnknown Full Maximum likelihood estimationSearch through model space Partial Expectation Maximisation (EM) or gradient descent( EM + search through model space (structural EM) Observability In this lecture we will look at the full observability and known model structure case in detail In the next lecture we will take an overview of the other three cases

Full observability & known structure Getting the notation right The model parameters (CPDs) are represented as  (example later) Training data set D We want to find parameters to maximise P(  |D) Likelihood function L(  :D) is P(D|  ) LIKELIHOOD

Full observability & known structure Getting the notation right A O B Training data Dz

Factorising the likelihood expression A O B

Decomposition in general All the parameters for each node can be estimated separately

Example Estimating parameter for root node Let’s say our training data D contains these values for A {T,F,T,T,F,T,T,T} We represent our single parameter  as the probability that a=T The likelihood for the sequence is:  L(  :D) A B O

So what about the prior on  ? We have an expression for P(a[1],…,a[M]), all we need to do now is to say something about P(  ) If all values of  were equally likely at the outset, then we have a MAXIMUM LIKELIHOOD ESTIMATE (MLE) for P(  |a[1],…,a[M]) which for our example is  = 0.75 I.e. p(a=T is 0.75)

So what about the prior on  ? If P(  ) is not uniform, we need to take that into account when computing our estimate for a model parameter. In that case P(  |x[1],…,x[M]) would be a MAXIMUM APOSTERIORI PROBABILITY (MAP) estimate There are many different forms of prior, one of the more common ones in this application is the DIRICHLET prior …

The Dirichlet prior  p(  ) Dirichlet(  T,  F )

Semantic priors If the training data D is sorted into known classes, the priors can be estimate beforehand. These are called “semantic priors” This involves an element of hand coding and loses the advantage gaining some insight into the problem domain Does give the advantage of mapping into expert knowledge of the classes in the problem

Summary Estimation relies on sufficient statistics For ML estimate for discrete valued nodes, we use counts #: For MAP estimate, we have to account for the prior

Next time … Overview of methods for learning BBNs: – Full data and unknown structure – Partial data and known structure – Partial data and unknown structure Excellent tutorial at by Koller and Friedman: www.cs.huji.ac.il/~nir/Nips01-Tutorial/ www.cs.huji.ac.il/~nir/Nips01-Tutorial/ Some of today’s slides were adapted from that tutorial

Download ppt "Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3"

Similar presentations