Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Slides:



Advertisements
Similar presentations
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Advertisements

Learning with Missing Data
Image Modeling & Segmentation
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
Expectation Maximization
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Jim Little Nov (Textbook 6.3)
Crime Risk Factors Analysis Application of Bayesian Network.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown.
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Cognitive Computer Vision
Planning under Uncertainty
Graphical Models - Learning -
Mixture Language Models and EM Algorithm
Visual Recognition Tutorial
Overview Full Bayesian Learning MAP learning
Bayesian network inference
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
. Learning Bayesian networks Slides by Nir Friedman.
Lecture 5: Learning models using EM
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Neural Networks Marco Loog.
Fitting models to data. Step 5) Express the relationships mathematically in equations Step 6)Get values of parameters Determine what type of model you.
CPSC 322, Lecture 28Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Computer Science cpsc322, Lecture 28 (Textbook Chpt 6.3)
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Maximum Likelihood (ML), Expectation Maximization (EM)
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Learning Bayesian Networks
CPSC 322, Lecture 29Slide 1 Reasoning Under Uncertainty: Bnet Inference (Variable elimination) Computer Science cpsc322, Lecture 29 (Textbook Chpt 6.4)
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
CPSC 322, Lecture 24Slide 1 Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15,
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Reasoning Under Uncertainty: Independence and Inference Jim Little Uncertainty 5 Nov 10, 2014 Textbook §6.3.1, 6.5, 6.5.1,
Lecture 19: More EM Machine Learning April 15, 2010.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CPSC 322, Lecture 28Slide 1 More on Construction and Compactness: Compact Conditional Distributions Once we have established the topology of a Bnet, we.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Course files
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Learning in Bayesian Networks. Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Cognitive Computer Vision
Irina Rish IBM T.J.Watson Research Center
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Discrete Event Simulation - 4
Bayesian Learning Chapter
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Learning Bayesian networks
Presentation transcript:

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Lecture 14 Learning Bayesian Belief Networks Overview of learning methods for: – Full data and unknown structure – Partial data and known structure – Partial data and unknown structure

So why are BBNs relevant to Cognitive CV? Provides a well-founded methodology for reasoning with uncertainty These methods are the basis for our model of perception guided by expectation We can develop well-founded methods of learning rather than just being stuck with hand- coded models

Why is learning important in the context of BBNs? Knowledge acquisition can be an expensive process Experts may not be readily available (scarce knowledge) or simply not exist But you might have a lot of data from (say) case studies Learning allows us to construct BBN models from the data and in the process gain insight into the nature of the problem domain

Taxonomy of learning methods Model structure KnownUnknown Full Maximum likelihood estimationSearch through model space Partial Expectation Maximisation (EM) or gradient descent( EM + search through model space (structural EM) Observability In the last lecture we saw the full observability and known model structure case in detail In this lecture we will take an overview of the other three cases

Fully observable data and unknown structure a=b=o= TTF FTF FFT TFT.. FTF e.g. 3 discrete nodes We need to establish a structure and CPTs A B O

Fully observable data and unknown structure A O B Any particular connection configuration gives rise to a graph g. The set of of all possible graphs is G We need to find a particular graph that fits the data best. We saw in the previous lecture how we can learn the CPT parameters (in the discrete case) for any particular graph

Fully observable data and unknown structure To find the best solution (the one that maximises P(G|D) ) we need to search through some set of possible model configurations. But the maximum likelihood model would be a complete graph, since this would have the maximum number of parameters and hence fit the data best. But this would likely constitute OVERFITTING A O BA O BA O B

Fully observable data and unknown structure To get around the problem of overfitting, we introduce a complexity penalty term into the expression for L(G:D) = P(D|G) –  G is the maximum likelihood estimate for the parameters  for a particular graph – N is the # of data samples – dim G is the dimension of the model (in the fully observable case, this is the number of free parameters)

Partial data and known structure a=b=o= TTF FT? F?T TF?.. FTF  L(  |D) Incomplete data means that the likelihood function can have multiple maxima

Partial data and known structure Expectation Maximisation (Dempster, ‘77) – A general purpose method for learning from incomplete data – If we had complete data, we could estimate parameters – But with missing data, true counts needed are unknown) – But we can estimate the true counts using probabilistic inference based on our current model – We can then use “completed” counts as if they were real to re- estimate the parameters – Perform this as an iterative process until the solution converges

Partial data and known structure a=b=o= TF? FTF T?? FFT TTT a=b=# o=T# o=F TT ( ) TF ( ) + ( ) FT01 FF10 The values on the left are determined using techniques we saw in a previous lecture.

Partial data and known structure First compute all  i that you can (ignoring all data that is missing) Choose random values for the remaining  i Apply EM process until L(  |D) converges EM guarantees that: – L(  k+1 |D)  L(  k |D) where k is the iteration number – If L(  k+1 |D) = L(  k |D) then  k+1 is a stationary point which usually means a local maximum – Solution not guaranteed globally optimal

Partial data and known structure Expectation Maximisation iteration A O BA O B Expected counts #(A) #(B) #(O,A,B) Training Data E-step M-step +

Partial data and unknown structure a=b=o= TT? F?F FFT T??.. FTF A O B The hardest case: we need to establish structure and CPTs using incomplete data

Partial data and unknown structure One approach would be to conduct a search through model structure space (as we previously saw) and perform EM for each candidate graph Computationally expensive Parameter optimisation through EM not trivial Spend a lot of time computing poor graph candidates Rapidly becomes computationally intractable

Partial data and unknown structure Approximate solution using Structural EM Complex in practice: – performs a search in (structure,parameters) space) – At each iteration use current model to either improve parameters (“parametric” EM step) or to improve model structure (“structural” EM step) Further details are beyond the scope of this course

Summary We can generalise the fully observable, known structure case for more complex BBN learning To cope with unknown structure, we need to iterate over the possible model structures To cope with partial data, we use the EM algorithm that “fills in” the missing data until the model converges. We cannot guarantee that we obtain the best global solution

Next time … Research issues: – Active cameras – Future challenges Excellent tutorial at by Koller and Friedman: Also by Murphy at: Some of today’s slides were adapted from these sources