Markov Networks.

Slides:



Advertisements
Similar presentations
Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
Advertisements

Markov Networks Alan Ritter.
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Exact Inference in Bayes Nets
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis Lecture)
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
Graduate School of Information Sciences, Tohoku University
BAYESIAN INFERENCE Sampling techniques
Learning Markov Network Structure with Decision Trees Daniel Lowd University of Oregon Jesse Davis Katholieke Universiteit Leuven Joint work with:
11/16: After Sanity Test  Post-mortem  Project presentations in the last 2-3 classes  Start of Statistical Learning.
P(h i ) is called the hypothesis prior Nothing special about “learning” – just vanilla probabilistic inference.
Bayesian network inference
Statistical Relational Learning Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
Lecture 5: Learning models using EM
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
Belief Propagation, Junction Trees, and Factor Graphs
Today Logistic Regression Decision Trees Redux Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Approximate Inference 2: Monte Carlo Markov Chain
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Bayesian networks Chapter 14. Outline Syntax Semantics.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
Markov Logic Parag Singla Dept. of Computer Science University of Texas, Austin.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Markov Logic And other SRL Approaches
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Markov Random Fields Probabilistic Models for Images
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
1 Markov Logic Stanley Kok Dept. of Computer Science & Eng. University of Washington Joint work with Pedro Domingos, Daniel Lowd, Hoifung Poon, Matt Richardson,
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
Lecture 2: Statistical learning primer for biologists
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Pattern Recognition and Machine Learning
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.
Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.
Bayesian Belief Propagation for Image Understanding David Rosenberg.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
10 October, 2007 University of Glasgow 1 EM Algorithm with Markov Chain Monte Carlo Method for Bayesian Image Analysis Kazuyuki Tanaka Graduate School.
Today.
Logic for Artificial Intelligence
Markov Networks.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Learning Markov Networks
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
Markov Networks.
Class #19 – Tuesday, November 3
Lecture 15 Sampling.
Expectation-Maximization & Belief Propagation
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Approximate Inference by Sampling
Junction Trees 3 Undirected Graphical Models
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Markov Networks.
Generalized Belief Propagation
Presentation transcript:

Markov Networks

Overview Markov networks Inference in Markov networks Computing probabilities Markov chain Monte Carlo Belief propagation MAP inference Learning Markov networks Weight learning Generative Discriminative (a.k.a. conditional random fields) Structure learning

Markov Networks Smoking Cancer Asthma Cough Undirected graphical models Smoking Cancer Asthma Cough Potential functions defined over cliques Smoking Cancer Ф(S,C) False 4.5 True 2.7

Markov Networks Smoking Cancer Asthma Cough Undirected graphical models Smoking Cancer Asthma Cough Log-linear model: Weight of Feature i Feature i

Hammersley-Clifford Theorem If Distribution is strictly positive (P(x) > 0) And Graph encodes conditional independences Then Distribution is product of potentials over cliques of graph Inverse is also true. (“Markov network = Gibbs distribution”)

Markov Nets vs. Bayes Nets Property Markov Nets Bayes Nets Form Prod. potentials Potentials Arbitrary Cond. probabilities Cycles Allowed Forbidden Partition func. Z = ? Z = 1 Indep. check Graph separation D-separation Indep. props. Some Inference MCMC, BP, etc. Convert to Markov

Inference in Markov Networks Computing probabilities Markov chain Monte Carlo Belief propagation MAP inference

Computing Probabilities Goal: Compute marginals & conditionals of Exact inference is #P-complete Approximate inference Monte Carlo methods Belief propagation Variational approximations

Markov Chain Monte Carlo General algorithm: Metropolis-Hastings Sample next state given current one according to transition probability Reject new state with some probability to maintain detailed balance Simplest (and most popular) algorithm: Gibbs sampling Sample one variable at a time given the rest

Gibbs Sampling state ← random truth assignment for i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of x P(F) ← fraction of states in which F is true

Belief Propagation Form factor graph: Bipartite network of variables and features Repeat until convergence: Nodes send messages to their features Features send messages to their variables Messages Current approximation to node marginals Initialize to 1

Belief Propagation Features (f) Nodes (x)

Belief Propagation Features (f) Nodes (x)

MAP/MPE Inference Goal: Find most likely state of world given evidence Query Evidence

MAP Inference Algorithms Iterated conditional modes Simulated annealing Belief propagation (max-product) Graph cuts Linear programming relaxations

Learning Markov Networks Learning parameters (weights) Generatively Discriminatively Learning structure (features) In this lecture: Assume complete data (If not: EM versions of algorithms)

Generative Weight Learning Maximize likelihood or posterior probability Numerical optimization (gradient or 2nd order) No local maxima Requires inference at each step (slow!) No. of times feature i is true in data Expected no. times feature i is true according to model

Pseudo-Likelihood Likelihood of each variable given its neighbors in the data Does not require inference at each step Consistent estimator Widely used in vision, spatial statistics, etc. But PL parameters may not work well for long inference chains

Discriminative Weight Learning (a.k.a. Conditional Random Fields) Maximize conditional likelihood of query (y) given evidence (x) Voted perceptron: Approximate expected counts by counts in MAP state of y given x No. of true groundings of clause i in data Expected no. true groundings according to model

Other Weight Learning Approaches Generative: Iterative scaling Discriminative: Max margin

Structure Learning Start with atomic features Greedily conjoin features to improve score Problem: Need to reestimate weights for each new candidate Approximation: Keep weights of previous features constant