. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.

Slides:

Advertisements

Similar presentations

Part 2: Unsupervised Learning

Advertisements

Mean-Field Theory and Its Applications In Computer Vision1 1.

Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

Expectation Maximization

Supervised Learning Recap

An Introduction to Variational Methods for Graphical Models.

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.

Guillaume Bouchard Xerox Research Centre Europe

Visual Recognition Tutorial

Variational Inference and Variational Message Passing

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

. Learning Bayesian networks Slides by Nir Friedman.

Global Approximate Inference Eran Segal Weizmann Institute.

Genome evolution: a sequence-centric approach Lecture 5: Undirected models and variational inference.

Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex.

Expectation Maximization Algorithm

1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.

Expectation-Maximization

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

EM and expected complete log-likelihood Mixture of Experts

Probabilistic Graphical Models

Mean Field Variational Bayesian Data Assimilation EGU 2012, Vienna Michail Vrettas 1, Dan Cornford 1, Manfred Opper 2 1 NCRG, Computer Science, Aston University,

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Randomized Algorithms for Bayesian Hierarchical Clustering

Presented by Jian-Shiun Tzeng 5/7/2009 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania CIS Technical Report MS-CIS

CS Statistical Machine learning Lecture 24

An Introduction to Variational Methods for Graphical Models

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

An Introduction to Latent Dirichlet Allocation (LDA)

Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.

Lecture 2: Statistical learning primer for biologists

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

CS6772 Advanced Machine Learning Fall 2006 Extending Maximum Entropy Discrimination on Mixtures of Gaussians With Transduction Final Project by Barry.

Recitation4 for BigData Jay Gu Feb MapReduce.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Asymptotic Behavior of Stochastic Complexity of Complete Bipartite Graph-Type Boltzmann Machines Yu Nishiyama and Sumio Watanabe Tokyo Institute of Technology,

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.

Learning Deep Generative Models by Ruslan Salakhutdinov

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

Expectation-Maximization

Markov Networks.

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Arthur Choi and Adnan Darwiche UCLA

An Introduction to Variational Methods for Graphical Models

Expectation Maximization

Stochastic Optimization Maximization for Latent Variable Models

An introduction to Graphical Models – Michael Jordan

Expectation-Maximization & Belief Propagation

Latent Dirichlet Allocation

Lecture 11 Generalizations of EM.

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Learning Bayesian networks

Presentation transcript:

. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering Hebrew University

Inference in Graphical Models Exact Inference  NP-hard, in general  Can be efficient for certain classes What do we do when exact inference is intractable? Resort to approximate methods  Approximate inference is also NP-hard  But, specific approximation methods work for specific classes of models  Need to enrich approximate methods

Variational Approximations u Approximate the posterior of a complex model using a simpler distribution u Choice of a simpler model  method : Mean field, Structured approximations, and Mixture models

Variational Approximations u Approximate the posterior of a complex model using a simpler distribution u Choice of a simpler model  method : Mean field, Structured approximations, and Mixture models

Variational Approximations u Approximate the posterior of a complex model using a simpler distribution u Choice of a simpler model  method : Mean field, Structured approximations, and Mixture models

Variational Approximations u Approximate the posterior of a complex model using a simpler distribution u Choice of a simpler model  method : Mean field, Structured approximations, and Mixture models

Enhancing Variational Approximations Basic tradeoff: accuracy  complexity Goal: New families of approximating distributions  better tradeoff

Outline u Structured variational approximations [review] u Using chain-graphs u Adding hidden variables u Discussion

Structured Approximations Target model: Approximation: where

Structured Approximations Goal: Maximize the following functional  F[Q] is a lower bound on the log likelihood  If Q is tractable then F[Q] might be tractable KL Distance  0

Structured Approximations u To characterize the maximum point we define the generalized functional u Differentiation yields the following equation  approximates using the lower bound on the local distribution

Structured Approximations Optimization u Asynchronous updates guaranties convergence u Efficient calculation of the update formulas:

Chain Graph Approximations u Posterior distributions can be modeled as chain graphs = where

Chain Graph Approximations  Chain graph distributions: where are potential functions on subsets of T u Generalize both Bayesian networks and Markov networks u A simple approximation example:

Chain Graph Approximations Optimization where

Adding Hidden Variables Potential pitfall: Multi-modal distributions u Jaakkola & Jordan: Use mixture models  Modeling assumption: Factorized mixture components Generalization: Structured approximation with an extra set of hidden variables u Approximating distribution:

Adding Hidden Variables: Intuition  Lower bound improvement potential where I(T;V) is the mutual information u Capture correlations in a compact manner:

Adding Hidden Variables: Prospects  Lower bound improvement potential where I(T;V) is the mutual information u Describing correlations in a compact manner:

Relaxing the lower bound u Rewriting the lower bound on the log-likelihood where u The conditional entropy does not decompose  The lower bound is intractable

Relaxing the lower bound u Using the following convexity bound u Introducing extra variational parameters u The relaxed lower bound becomes tractable Lower bound on conditional entropy

Optimization u Bayesian network parameters: u Smoothing parameters: u Asynchronous updates guaranties convergence

Results Number of time slices KL Bound

Discussion u Extending representational features of approximating distributions  Better tradeoff ? u Addition of hidden variables improves approximation u Derivations of different methods use a uniform machinery Future directions u Saving computations by planning the order of updates u Structure of the approximating distribution