Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.

Slides:



Advertisements
Similar presentations
1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,
Advertisements

Lecture 13 – Perceptrons Machine Learning March 16, 2010.
John Lafferty, Andrew McCallum, Fernando Pereira
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
Naïve Bayes Classifier
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.
Parameter Learning in Markov Nets Dhruv Batra, Recitation 11/13/2008.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Today Logistic Regression Decision Trees Redux Graphical Models
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Crash Course on Machine Learning
11 CS 388: Natural Language Processing: Discriminative Training and Conditional Random Fields (CRFs) for Sequence Labeling Raymond J. Mooney University.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 6: Conditional Random Fields 1.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Crash Course on Machine Learning Part II
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
Lecture 2: Statistical learning primer for biologists
John Lafferty Andrew McCallum Fernando Pereira
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
Discriminative and Generative Classifiers
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
CS 188: Artificial Intelligence
Logistic Regression.
Support Vector Machines
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Parametric Methods Berlin Chen, 2005 References:
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
Junction Trees 3 Undirected Graphical Models
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Mean Field and Variational Methods Loopy Belief Propagation
Generalized Belief Propagation
Presentation transcript:

Parameter Learning in MN

Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited

–  Carlos Guestrin Log-linear Markov network (most common representation) Feature is some function  [D] for some subset of variables D – e.g., indicator function Log-linear model over a Markov network H: – a set of features  1 [D 1 ],…,  k [D k ] each D i is a subset of a clique in H two  ’s can be over the same variables – a set of weights w 1,…,w k usually learned from data –

Generative v. Discriminative classifiers – A review Want to Learn: h:X  Y – X – features – Y – target classes Bayes optimal classifier – P(Y|X) Generative classifier, e.g., Naïve Bayes: – Assume some functional form for P(X|Y), P(Y) – Estimate parameters of P(X|Y), P(Y) directly from training data – Use Bayes rule to calculate P(Y|X= x) – This is a ‘generative’ model Indirect computation of P(Y|X) through Bayes rule But, can generate a sample of the data, P(X) =  y P(y) P(X|y) Discriminative classifiers, e.g., Logistic Regression: – Assume some functional form for P(Y|X) – Estimate parameters of P(Y|X) directly from training data – This is the ‘discriminative’ model Directly learn P(Y|X) But cannot obtain a sample of the data, because P(X) is not available –  Carlos Guestrin

5 Log-linear CRFs (most common representation) Graph H: only over hidden vars Y 1,..,Y P – No assumptions about dependency on observed vars X – You must always observe all of X Feature is some function  [D] for some subset of variables D – e.g., indicator function, Log-linear model over a CRF H: – a set of features  1 [D 1 ],…,  k [D k ] each D i is a subset of a clique in H two  ’s can be over the same variables – a set of weights w 1,…,w k usually learned from data –

Example: Image Segmentation - A set of features  1 [D 1 ],…,  k [D k ] – each D i is a subset of a clique in H – two  ’s can be over the same variables y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9 We will define features as follows: - : measures compatibility of node color and its segmentation -A set of indicator features triggered for each edge labeling pair {ff,bb,fb,bf} -This is a allowed since we can define many features overr the same subset of variables

Example: Image Segmentation - A set of features  1 [D 1 ],…,  k [D k ] – each D i is a subset of a clique in H – two  ’s can be over the same variables y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9

Example: Image Segmentation - A set of features  1 [D 1 ],…,  k [D k ] – each D i is a subset of a clique in H – two  ’s can be over the same variables We need to learn parameters w m y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9 -Now we just need to sum these features

Example: Image Segmentation Requires inference using the current parameter estimates y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9 Count for features m in data n Given N data points (images and their segmentations)

Example: Inference for Learning How to compute E[C fb |X[n]] y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9

Example: Inference for Learning How to compute E[C fb |X[n]] y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9

Representation Equivalence y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9 Log linear representation Tabular MN representation from HW4

Representation Equivalence y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9 Log linear representation Tabular MN representation from HW4Now do it over the edge potential This is correct as for every assignment to y i y j we select one value from the table

Tabular MN representation from HW4Now do it over the edge potential This is correct as for every assignment to y i y j we select one value from the table The cheap exp(log..) trick Just algebra Now lets combine it over all edge assuming parameter sharing Now use the same C m trick

Representation Equivalence y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9 Log linear representation Tabular MN representation from HW4Now substitute Equivalent, with w m = log  m Where  is the value in the tabular

Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited

–  Carlos Guestrin Iterative Proportional Fitting (IPF) Difficulty SATGrade Happy Job Coherence Letter Intelligence Setting derivative to zero: Fixed point equation: Iterate and converge to optimal parameters – Each iteration, must compute:

Parameter Sharing in your HW Note that I am suing Y for label All edge potentials are shared Also we are learning a conditional model y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9

IPF parameter sharing How to calculate these quantities using parameter sharing? In total we have 4 parameters as opposed to 4 parameters per edge We can cancel |E| due to division YiYi Run lbp,when converged YjYj xixi xjxj We only have one data point (image) in this example so we dropped X[n] to only X