Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.

Slides:

Advertisements

Similar presentations

CS188: Computational Models of Human Behavior

Advertisements

Markov Networks Alan Ritter.

Mixture Models and the EM Algorithm

Unsupervised Learning

Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.

Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.

Supervised Learning Recap

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Visual Recognition Tutorial

Lecture 5: Learning models using EM

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Visual Recognition Tutorial

Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.

Today Logistic Regression Decision Trees Redux Graphical Models

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

Markov Random Fields Probabilistic Models for Images

Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Lecture 2: Statistical learning primer for biologists

Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.

CSE 517 Natural Language Processing Winter 2015

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.

. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.

Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.

Data Modeling Patrice Koehl Department of Biological Sciences

Learning Deep Generative Models by Ruslan Salakhutdinov

LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.

Maximum Entropy Models and Feature Engineering CSCI-GA.2591

Expectation-Maximization (EM)

Qian Liu CSE spring University of Pennsylvania

ICS 280 Learning in Graphical Models

Statistical Models for Automatic Speech Recognition

Multimodal Learning with Deep Boltzmann Machines

10701 / Machine Learning.

Maximum Likelihood Estimation

Data Mining Lecture 11.

CSC 594 Topics in AI – Natural Language Processing

Latent Variables, Mixture Models and EM

with observed random variables

Distributions and Concepts in Probability Theory

Markov Networks.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18

Bayesian Models in Machine Learning

Markov Random Fields Presented by: Vladan Radosavljevic.

Bayesian Learning Chapter

Parametric Methods Berlin Chen, 2005 References:

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Recap: Naïve Bayes classifier

Markov Networks.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Generalized Belief Propagation

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Presentation transcript:

Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models

Last Time Last time we saw how to learn the parameters in a Maximum Likelihood setting for labeled data: fully observed supervised learning. We looked at both the generative (Naive Bayes) and the discriminative (linear & logistic regression) models. Now we turn to unsupervised learning of general Bayes nets and Markov Random field models. Directed: very easy and intuitive. Undirected: very easy for decomposable models. hard for non-decomposable models: IPF.

Directed Models Parameterize by full probability tables. The log-likelihood is a sum of independent tables if there are no hidden variables. Use counts m(x) as the relevant statistics of the data. Constraints enforced using Lagrange multipliers ML estimates for parameters are simple the appropriate counts

Undirected GM The normalization constant spoils the factorization property. There is still an easy characterization in terms of counts, however this does not automatically provide the ML estimates of the potentials When models are decomposable however, we can still write down the solution by inspection. In all other cases we compute the solution iteratively by means of the “iterative proportional fitting procedure”. IPF leaves Z invariant, and sets the new marginal on a clique equal to the empirical marginal. IPF is coordinate ascent on Log-Likelihood! IPF is guaranteed to converge.