Variational Inference Amr Ahmed Nov. 6 th 2008. Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.

Slides:

Advertisements

Similar presentations

An Introduction to Calculus. Calculus Study of how things change Allows us to model real-life situations very accurately.

Advertisements

EOG Vocabulary Week of March 28, 2011 Math Words: Fractions and Properties.

Maximal Independent Subsets of Linear Spaces. Whats a linear space? Given a set of points V a set of lines where a line is a k-set of points each pair.

Mean-Field Theory and Its Applications In Computer Vision1 1.

CS188: Computational Models of Human Behavior

Sep 16, 2013 Lirong Xia Computational social choice The easy-to-compute axiom.

PROLOG MORE PROBLEMS.

5.9 + = 10 a)3.6 b)4.1 c)5.3 Question 1: Good Answer!! Well Done!! = 10 Question 1:

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Rahnuma Islam Nishat Debajyoti Mondal Md. Saidur Rahman Graph Drawing and Information Visualization Laboratory Department of Computer Science and Engineering.

2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!

2 x /18/2014 Know Your Facts!. 11 x /18/2014 Know Your Facts!

2 x /10/2015 Know Your Facts!. 8 x /10/2015 Know Your Facts!

2.4 – Factoring Polynomials Tricky Trinomials The Tabletop Method.

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

A man-machine human interface for a special device of the pervasive computing world B. Apolloni, S. Bassis, A. Brega, S. Gaito, D. Malchiodi, A.M. Zanaboni.

5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.

MS 101: Algorithms Instructor Neelima Gupta

Parallel algorithms for expression evaluation Part1. Simultaneous substitution method (SimSub) Part2. A parallel pebble game.

DIRECTIONAL ARC-CONSISTENCY ANIMATION Fernando Miranda 5986/M

Linear Programming – Simplex Method: Computational Problems Breaking Ties in Selection of Non-Basic Variable – if tie for non-basic variable with largest.

CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.

The Problem of K Maximum Sums and its VLSI Implementation Sung Eun Bae, Tadao Takaoka University of Canterbury Christchurch New Zealand.

Multiplication Facts Practice

Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti.

Computational Facility Layout

Section 3.4 The Traveling Salesperson Problem Tucker Applied Combinatorics By Aaron Desrochers and Ben Epstein.

Graeme Henchel Multiples Graeme Henchel

0 x x2 0 0 x1 0 0 x3 0 1 x7 7 2 x0 0 9 x0 0.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Exact Inference in Bayes Nets

Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.

Clique Trees Amr Ahmed October 23, Outline Clique Trees Representation Factorization Inference Relation with VE.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Parameter Learning in Markov Nets Dhruv Batra, Recitation 11/13/2008.

Global Approximate Inference Eran Segal Weizmann Institute.

Genome evolution: a sequence-centric approach Lecture 5: Undirected models and variational inference.

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.

CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

Mean field approximation for CRF inference

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

Exact Inference Continued

Markov Networks.

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

Algorithms and Theory of

Exact Inference Continued

Expectation-Maximization & Belief Propagation

Variable Elimination 2 Clique Trees

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Approximate Inference by Sampling

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Variable Elimination Graphical Models – Carlos Guestrin

Mean Field and Variational Methods Loopy Belief Propagation

Generalized Belief Propagation

Presentation transcript:

Variational Inference Amr Ahmed Nov. 6 th 2008

Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI Examples

Approximate Inference Exact inference is exponential in clique size Posterior is highly peaked – Can be approximated by a simpler distribution Formulate inference as an optimization problem – Define an objective: how good is Q – Define a family of simpler distributions to search over – Find Q* that best approximate P All Distributions P Q* Tractable family

Approximate Inference Exact inference is exponential in clique size Posterior inference is highly peaked – Can be approximated by a simpler distribution Formulate inference as an optimization problem – Define an objective: how good is Q – Define a family of simpler distributions to search over – Find Q* that best approximate P Today we will cover variational Inference – Just a possible way of such a formulation There are many other ways – Variants of loopy BP (later in the semester)

What is Variational Inference? Difficulty SATGrade Happy Job Coherence Letter Intelligence P Posterior “hard” to compute We know the CPTs (factors) Difficulty SATGrade Happy Coherence Letter Intelligence Q Tractable Posterior Family Set CPT (factors) Get distribution, Q Fully instantiated distribution Q* Enables exact Inference Has low tree-width Clique tree, variable elimination, etc.

VI Questions Which family to choose – Basically we want to remove some edges Mean field: fully factorized Structured : Keep tractable sub-graphs How to carry the optimization Assume P is a Markov network – Product of factors (that is all)

Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI Examples

Mean-field Variational Inference Difficulty SATGrade Happy Job Coherence Letter Intelligence P Posterior “hard” to compute We know the CPTs (factors) Difficulty SATGrade Happy Coherence Letter Intelligence Q Tractable Posterior Family Set CPT (factors) Get distribution, Q Fully instantiated distribution Q* Enables exact Inference

–  Carlos Guestrin D(q||p) for mean field – KL the reverse direction: cross-entropy term p: q:

The Energy Functional Theorem : Where energy functional: Our problem now is Minimize Maximize  You get Lower bound on lnZ Maximizing F[P,Q] tighten the bound And gives better prob. estimates Tractable By construction

Our problem now is Theorem: Q is a stationary point of mean field approximation iff for each j: Tractable By construction The Energy Functional

Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI Examples

Example X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 P: Pairwise MN Q : fully factorized MN X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 - Given the factors in P, we want to get the factors for Q. - Iterative procedure. Fix all q -i, compute q i via the above equation - Iterate until you reach a fixed point

Example X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 P: Pairwise MN Q : fully factorized MN X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12

Example X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 P: Pairwise MN Q : fully factorized MN X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12

Example X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 P: Pairwise MN Q : fully factorized MN X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 In your homework X1X1

Example X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 P: Pairwise MN Q : fully factorized MN X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 X1X1 -q(X 6 ) get to be tied with q(x i ) for all xi that appear in a factor with it in P -i.e. fixed point equations are tied according to P - What q(x 6 ) gets is expectations under these q(x i ) of how the factor looks like - can be somehow interpreted as message passing as well (but we won’t cover this) Intuitively

Example X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 P: Pairwise MN Q : fully factorized MN X2X2 X5X5 X6X6 X7X7 X 10 -q(X 6 ) get to be tied with q(x i ) for all xi that appear in a factor with it in P -i.e. fixed point equations are tied according to P - What q(x 6 ) gets is expectations under these q(x i ) of how the factor looks like - can be somehow interpreted as message passing as well (but we won’t cover this) Intuitively

Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI Examples Inference in LDA (time permits)

Structured Variational Inference Difficulty SATGrade Happy Job Coherence Letter Intelligence P Posterior “hard” to compute We know the CPTs (factors) Difficulty SATGrade Happy Coherence Letter Intelligence Q Tractable Posterior Family Set CPT (factors) Get distribution, Q Fully instantiated distribution Q* Enables exact Inference Has low tree-width You don’t have to remove all edges Just keep tractable sub graphs: trees, chain

Structured Variational Inference X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 P: Pairwise MN Q : tractably factorized MN Given factors in P, get factors in Q that minimize energy functional Goal Still tractable By construction X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12

Fixed point Equations X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 P: Pairwise MN Q : tractably factorized MN Factors that scope() is not independent of C j in Q X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12

Fixed point Equations X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 P: Pairwise MN Q : tractably factorized MN X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 Factors that scope() is not independent of C j in Q What are the factors in Q that interact with x1,x5

Fixed point Equations X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 P: Pairwise MN Q : tractably factorized MN X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 What are the factors in P with scope not separated from x1,x5 in Q - First, get variables that can not be separated form x 1 and x 5 in Q - Second collect the factors they appear in in P

X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 P: Pairwise MN Q : tractably factorized MN X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 What are the factors in P with scope not separated from x1,x5 in Q - First, get variables that can not be separated form x 1 and x 5 in Q -Second collect the factors they appear in in P - Expectation is taken over Q(scope[factor] | c j ) Fixed point Equations

P: Pairwise MN Q : tractably factorized MN X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 B =  59 A =  15  59  12  56  9,10 X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 Fixed point Equations

P: Pairwise MN Q : tractably factorized MN X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 A =  59 B =  15  59  12  56  9,10 X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 Fixed point Equations

P: Pairwise MN Q : tractably factorized MN X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 Fixed point Equations

P: Pairwise MN Q : tractably factorized MN X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 -What have we picked from P when dealing with the first factor (X1,x5)? -Factors that interact with (x1,x5) in both P and Q - Fixed point equations are tied in the same way these variables are tied in P and Q - can be also interpreted as passing message over Q of expectations on these factors - Q: how to compute these madrigals in Q efficiently? Homework! Some intuition

Variational Inference in Practice Highly used – Fast and efficient – Approximation quality might be not that good Always peaked answers (we are using the wrong KL) Always captures modes of P Spread of P is usually lost – Famous applications of VI VI for topic models (LDA) – You can derive LDA equations in few lines using the fully factorized update equations (exercise) – We might go over it in a special recitations but try it if you are doing a project in topic models Involved temporal models Many more…