Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti.

Slides:

Advertisements

Similar presentations

Part 2: Unsupervised Learning

Advertisements

Bayesian Belief Propagation

CS188: Computational Models of Human Behavior

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays Presented by: ZHANG Xiaofei March 2, 2011.

Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.

Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Exact Inference in Bayes Nets

Dynamic Bayesian Networks (DBNs)

An Introduction to Variational Methods for Graphical Models.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

3 March, 2003University of Glasgow1 Statistical-Mechanical Approach to Probabilistic Inference --- Cluster Variation Method and Generalized Loopy Belief.

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Markov Networks.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Belief Propagation, Junction Trees, and Factor Graphs

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

A Brief Introduction to Graphical Models

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Appendix Kazuyuki Tanaka Graduate School of Information.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.

第十讲概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models

Markov Random Fields Probabilistic Models for Images

28 February, 2003University of Glasgow1 Cluster Variation Method and Probabilistic Image Processing -- Loopy Belief Propagation -- Kazuyuki Tanaka Graduate.

14 October, 2010LRI Seminar 2010 (Univ. Paris-Sud)1 Statistical performance analysis by loopy belief propagation in probabilistic image processing Kazuyuki.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Randomized Algorithms for Bayesian Hierarchical Clustering

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.

An Introduction to Variational Methods for Graphical Models

Lecture 2: Statistical learning primer for biologists

Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.

ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.

Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Graduate School of Information Sciences, Tohoku University

Christopher M. Bishop, Pattern Recognition and Machine Learning 1.

Machine Learning – Lecture 11

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Pattern Recognition and Machine Learning

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.

A Brief Introduction to Bayesian networks

Statistical-Mechanical Approach to Probabilistic Image Processing -- Loopy Belief Propagation and Advanced Mean-Field Method -- Kazuyuki Tanaka and Noriko.

Statistical Models for Automatic Speech Recognition

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

Distributions and Concepts in Probability Theory

Bayesian Models in Machine Learning

CSCI 5822 Probabilistic Models of Human and Machine Learning

Generalized Belief Propagation

An Introduction to Variational Methods for Graphical Models

An introduction to Graphical Models – Michael Jordan

Markov Random Fields Presented by: Vladan Radosavljevic.

Graduate School of Information Sciences, Tohoku University

Physical Fluctuomatics 7th~10th Belief propagation

Expectation-Maximization & Belief Propagation

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Markov Networks.

Generalized Belief Propagation

Presentation transcript:

Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti

Example from Medical Diagnostics Visit to Asia Tuberculosis or Cancer XRay Result Dyspnea BronchitisLung Cancer Smoking Patient Information Medical Difficulties Diagnostic Tests

What is a graphical model ? A graphical model is a way of representing probabilistic relationships between random variables. Variables are represented by nodes: Conditional (in)dependencies are represented by (missing) edges: Undirected edges simply give correlations between variables (Markov Random Field or Undirected Graphical model): Directed edges give causality relationships (Bayesian Network or Directed Graphical Model):

“Graphical models are a marriage between probability theory and graph theory. They provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering – uncertainty and complexity – and in particular they are playing an increasingly important role in the design and analysis of machine learning algorithms. Fundamental to the idea of a graphical model is the notion of modularity – a complex system is built by combining simpler parts.

Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data. The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly- interacting sets of variables as well as a data structure that lends itself naturally to the design of efficient general-purpose algorithms. Many of the classical multivariate probabilistic systems studied in fields such as statistics, systems engineering, information theory, pattern recognition and statistical mechanics are special cases of the general graphical model formalism -- examples include mixture models, factor analysis, hidden Markov models, Kalman filters and Ising models.

The graphical model framework provides a way to view all of these systems as instances of a common underlying formalism. This view has many advantages -- in particular, specialized techniques that have been developed in one field can be transferred between research communities and exploited more widely. Moreover, the graphical model formalism provides a natural framework for the design of new systems.“ --- Michael Jordan, 1998.

(Picture by Zoubin Ghahramani and Sam Roweis) We already know many graphical models:

Plan for the class  Introduction to Graphical Models (Polito)  Basics on graphical models and statistics  Learning from data  Exact inference  Approximate inference  Applications to Vision (Perona)  Applications to Coding Theory (McEliece)  Belief Propagation and Spin Glasses

Plan for the class  Introduction to Graphical Models (Polito)  Basics on graphical models and statistics  Learning from data  Exact inference  Approximate inference  Applications to Vision (Perona)  Applications to Coding Theory (McEliece)  Belief Propagation and Spin Glasses

Basics on graphical models and statistics Basics of graph theory. Families of probability distributions associated to directed and undirected graphs. Markov properties and conditional independence. Statistical concepts as building blocks for graphical models. Density estimation, classification and regression.

Basics on graphical models and statistics Graphs and Families of Probability Distributions There is a family of probability distributions that can be represented with this graph. X1 X4 X2 X3 X5 X6 X7 1) Every P.D. presenting (at least) the conditional independencies that can be derived from the graph belongs to that family. 2) Every P.D. that can be factorized as p(x1,…,x7)=p(x4|x1,x2) p(x7|x4) p(x5|x4) p(x6|x5,x2) p(x3|x2) belongs to that family.

Basics on graphical models and statistics Building blocks for graphical models X Y p(X)= ??? p(Y|X)= ???? Bayesian approach: every unknown quantity (including parameters) is treated as a random variable. X Density estimation Regression Classification Parametric and nonparametric methods Linear, conditional mixture, nonparametric Generative and discriminative approach Q X Q X XY  X X

Plan for the class  Introduction to Graphical Models (Polito)  Basics on graphical models and statistics  Learning from data  Exact inference  Approximate inference  Applications to Vision (Perona)  Applications to Coding Theory (McEliece)  Belief Propagation and Spin Glasses

Learning from data Model structure and parameters estimation. Complete observations and latent variables. MAP and ML estimation. The EM algorithm. Model selection.

StructureObservabilityMethod KnownFullML or MAP estimation KnownPartialEM algorithm UnknownFullModel selection or model averaging UnknownPartialEM + model sel. or model aver.

Plan for the class  Introduction to Graphical Models (Polito)  Basics on graphical models and statistics  Learning from data  Exact inference  Approximate inference  Applications to Vision (Perona)  Applications to Coding Theory (McEliece)  Belief Propagation and Spin Glasses

Exact inference The junction tree and related algorithms. Belief propagation and belief revision. The generalized distributive law. Hidden Markov Models and Kalmann Filtering with graphical models.

Exact inference Conditional independencies Given a probability distribution p(X,Y,Z,W), How can we decide if the groups of variables X and Y are “conditionally independent” from each other once the value of the variables Z is assigned ? With graphical models, we can implement an algorithm reducing this global problem to a series of local problems (see Matlab demo of the Bayes-Ball algorithm)

Exact inference Variable elimination and Distributive Law X1 X4 X2 X3 X5 X7 X6 X5 X4 X3 p(x1,…,x6,x7)=p(x4|x1,x2)p(x6|x4)p(x5|x4) p(x3|x2)p(x7|x2,x5) Marginalize over x7: p(x1,…,x6)=  x7 [p(x4|x1,x2)p(x6|x4)p(x5|x4) p(x3|x2)p(x7|x2,x5)] Applying a “distributive law”: p(x1,…,x6)=  p(x4|x1,x2)p(x6|x4)p(x5|x4) p(x3|x2)  x7 [ p(x7|x2,x5)] The language of graphical models allows a general formalization of this method.

Exact inference Junction graph and message passing Group random variables which are “fully connected”. Connect group-nodes with common members: the “junction graph”. Every node only needs to “communicate” with its neighbors. If the junction graph is a tree, there is a “message passing” protocol which allows exact inference.

Plan for the class  Introduction to Graphical Models (Polito)  Basics on graphical models and statistics  Learning from data  Exact inference  Approximate inference  Applications to Vision (Perona)  Applications to Coding Theory (McEliece)  Belief Propagation and Spin Glasses

Approximate inference Kullback-Leibler divergence and entropy. Variational methods. Monte Carlo Methods. Loopy junction graphs and loopy belief propagation. Performance of loopy belief propagation. Bethe approximation of free energy and belief propagation.

Approximate inference Kullback-Leibler divergence and GM The graphical model associated to a P.D. p(x) is too complicated. AND NOW ?!?!? We choose to approximate p(x) with q(x), obtained by making assumptions on the junction graph corresponding to the graphical model. Example: eliminate loops, bound the number of nodes linked to each node. A good criterion for choosing q: mimimize the cross- entropy, or Kullback-Leibler divergence:

Approximate inference Variational methods Example: the QMR-DT database By using the inequality: We get the approximation: The node fi is “unlinked”: Diseases: d1,d2,d3 Symptoms: f1,f2,f3,f4 p(f|d)=  p(fi|d)  p(di)

Approximate inference Loopy belief propagation When the junction graph is not a tree, inference is not exact: roughly speaking, a message might pass through a node more often, causing trouble. However, in certain cases, an iterated application of a message passing algorithm converges to a good candidate for the exact solution.