Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Similar presentations


Presentation on theme: "Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3"— Presentation transcript:

1 Cognitive Computer Vision Kingsley Sage khs20@sussex.ac.uk and Hilary Buxton hilaryb@sussex.ac.uk Prepared under ECVision Specific Action 8-3 http://www.ecvision.org

2 Lecture 14 Learning Bayesian Belief Networks Overview of learning methods for: – Full data and unknown structure – Partial data and known structure – Partial data and unknown structure

3 So why are BBNs relevant to Cognitive CV? Provides a well-founded methodology for reasoning with uncertainty These methods are the basis for our model of perception guided by expectation We can develop well-founded methods of learning rather than just being stuck with hand- coded models

4 Why is learning important in the context of BBNs? Knowledge acquisition can be an expensive process Experts may not be readily available (scarce knowledge) or simply not exist But you might have a lot of data from (say) case studies Learning allows us to construct BBN models from the data and in the process gain insight into the nature of the problem domain

5 Taxonomy of learning methods Model structure KnownUnknown Full Maximum likelihood estimationSearch through model space Partial Expectation Maximisation (EM) or gradient descent( EM + search through model space (structural EM) Observability In the last lecture we saw the full observability and known model structure case in detail In this lecture we will take an overview of the other three cases

6 Fully observable data and unknown structure a=b=o= TTF FTF FFT TFT.. FTF e.g. 3 discrete nodes We need to establish a structure and CPTs A B O

7 Fully observable data and unknown structure A O B Any particular connection configuration gives rise to a graph g. The set of of all possible graphs is G We need to find a particular graph that fits the data best. We saw in the previous lecture how we can learn the CPT parameters (in the discrete case) for any particular graph

8 Fully observable data and unknown structure To find the best solution (the one that maximises P(G|D) ) we need to search through some set of possible model configurations. But the maximum likelihood model would be a complete graph, since this would have the maximum number of parameters and hence fit the data best. But this would likely constitute OVERFITTING A O BA O BA O B

9 Fully observable data and unknown structure To get around the problem of overfitting, we introduce a complexity penalty term into the expression for L(G:D) = P(D|G) –  G is the maximum likelihood estimate for the parameters  for a particular graph – N is the # of data samples – dim G is the dimension of the model (in the fully observable case, this is the number of free parameters)

10 Partial data and known structure a=b=o= TTF FT? F?T TF?.. FTF  L(  |D) Incomplete data means that the likelihood function can have multiple maxima

11 Partial data and known structure Expectation Maximisation (Dempster, ‘77) – A general purpose method for learning from incomplete data – If we had complete data, we could estimate parameters – But with missing data, true counts needed are unknown) – But we can estimate the true counts using probabilistic inference based on our current model – We can then use “completed” counts as if they were real to re- estimate the parameters – Perform this as an iterative process until the solution converges

12 Partial data and known structure a=b=o= TF? FTF T?? FFT TTT a=b=# o=T# o=F TT0.2 + 1(1 - 0.2) TF0.3 + 0.2(1 - 0.3) + ( 1 - 0.2) FT01 FF10 The values on the left are determined using techniques we saw in a previous lecture.

13 Partial data and known structure First compute all  i that you can (ignoring all data that is missing) Choose random values for the remaining  i Apply EM process until L(  |D) converges EM guarantees that: – L(  k+1 |D)  L(  k |D) where k is the iteration number – If L(  k+1 |D) = L(  k |D) then  k+1 is a stationary point which usually means a local maximum – Solution not guaranteed globally optimal

14 Partial data and known structure Expectation Maximisation iteration A O BA O B Expected counts #(A) #(B) #(O,A,B) Training Data E-step M-step +

15 Partial data and unknown structure a=b=o= TT? F?F FFT T??.. FTF A O B The hardest case: we need to establish structure and CPTs using incomplete data

16 Partial data and unknown structure One approach would be to conduct a search through model structure space (as we previously saw) and perform EM for each candidate graph Computationally expensive Parameter optimisation through EM not trivial Spend a lot of time computing poor graph candidates Rapidly becomes computationally intractable

17 Partial data and unknown structure Approximate solution using Structural EM Complex in practice: – performs a search in (structure,parameters) space) – At each iteration use current model to either improve parameters (“parametric” EM step) or to improve model structure (“structural” EM step) Further details are beyond the scope of this course

18 Summary We can generalise the fully observable, known structure case for more complex BBN learning To cope with unknown structure, we need to iterate over the possible model structures To cope with partial data, we use the EM algorithm that “fills in” the missing data until the model converges. We cannot guarantee that we obtain the best global solution

19 Next time … Research issues: – Active cameras – Future challenges Excellent tutorial at by Koller and Friedman: www.cs.huji.ac.il/~nir/Nips01-Tutorial/ www.cs.huji.ac.il/~nir/Nips01-Tutorial/ Also by Murphy at: www.cs.berkeley.edu/~murphyk/Bayes/bayes.html www.cs.berkeley.edu/~murphyk/Bayes/bayes.html Some of today’s slides were adapted from these sources


Download ppt "Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3"

Similar presentations


Ads by Google