Probabilistic Inference Lecture 1

Slides:



Advertisements
Similar presentations
Mean-Field Theory and Its Applications In Computer Vision1 1.
Advertisements

Bayesian Belief Propagation
CS188: Computational Models of Human Behavior
Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,
Markov Networks Alan Ritter.
Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online
Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.
Introduction to Markov Random Fields and Graph Cuts Simon Prince
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Graph-Based Image Segmentation
Markov Nets Dhruv Batra, Recitation 10/30/2008.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Yilin Wang 11/5/2009. Background Labeling Problem Labeling: Observed data set (X) Label set (L) Inferring the labels of the data points Most vision problems.
Improved Moves for Truncated Convex Models M. Pawan Kumar Philip Torr.
Conditional Random Fields
Belief Propagation Kai Ju Liu March 9, Statistical Problems Medicine Finance Internet Computer vision.
Measuring Uncertainty in Graph Cut Solutions Pushmeet Kohli Philip H.S. Torr Department of Computing Oxford Brookes University.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
MAP Estimation Algorithms in M. Pawan Kumar, University of Oxford Pushmeet Kohli, Microsoft Research Computer Vision - Part I.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Probabilistic Inference Lecture 4 – Part 2 M. Pawan Kumar Slides available online
A Brief Introduction to Graphical Models
Presenter : Kuang-Jui Hsu Date : 2011/5/23(Tues.).
Probabilistic graphical models. Graphical models are a marriage between probability theory and graph theory (Michael Jordan, 1998) A compact representation.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Rounding-based Moves for Metric Labeling M. Pawan Kumar Center for Visual Computing Ecole Centrale Paris.
Probabilistic Graphical Models
Discrete Optimization Lecture 5 – Part 2 M. Pawan Kumar Slides available online
Discrete Optimization Lecture 4 – Part 2 M. Pawan Kumar Slides available online
Markov Random Fields Probabilistic Models for Images
Probabilistic Inference Lecture 3 M. Pawan Kumar Slides available online
Discrete Optimization in Computer Vision M. Pawan Kumar Slides will be available online
Discrete Optimization Lecture 3 – Part 1 M. Pawan Kumar Slides available online
1 Markov Random Fields with Efficient Approximations Yuri Boykov, Olga Veksler, Ramin Zabih Computer Science Department CORNELL UNIVERSITY.
Probabilistic Inference Lecture 5 M. Pawan Kumar Slides available online
CS774. Markov Random Field : Theory and Application Lecture 02
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
Lecture 2: Statistical learning primer for biologists
Belief Propagation and its Generalizations Shane Oldenburger.
Discrete Optimization Lecture 2 – Part 2 M. Pawan Kumar Slides available online
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Probabilistic Inference Lecture 2 M. Pawan Kumar Slides available online
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Discrete Optimization Lecture 1 M. Pawan Kumar Slides available online
Efficient Belief Propagation for Image Restoration Qi Zhao Mar.22,2006.
Today Graphical Models Representing conditional dependence graphically
MAP Estimation of Semi-Metric MRFs via Hierarchical Graph Cuts M. Pawan Kumar Daphne Koller Aim: To obtain accurate, efficient maximum a posteriori (MAP)
Markov Random Fields in Vision
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Biointelligence Laboratory, Seoul National University
CS 2750: Machine Learning Directed Graphical Models
Markov Random Fields with Efficient Approximations
The set  of all independence statements defined by (3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Discrete Inference and Learning
CS498-EA Reasoning in AI Lecture #20
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17
Lecture 31: Graph-Based Image Segmentation
Markov Random Fields Presented by: Vladan Radosavljevic.
MAP Estimation of Semi-Metric MRFs via Hierarchical Graph Cuts
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Junction Trees 3 Undirected Graphical Models
Markov Networks.
Outline Texture modeling - continued Markov Random Field models
Presentation transcript:

Probabilistic Inference Lecture 1 M. Pawan Kumar pawan.kumar@ecp.fr Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/

About the Course 7 lectures + 1 exam Probabilistic Models – 1 lecture Energy Minimization – 4 lectures Computing Marginals – 2 lectures Related Courses Probabilistic Graphical Models (MVA) Structured Prediction

Instructor Assistant Professor (2012 – Present) Center for Visual Computing 12 Full-time Faculty Members 2 Associate Faculty Members Research Interests Probabilistic Models Machine Learning Computer Vision Medical Image Analysis

Students Third year at ECP Specializing in Machine Learning and Vision Prerequisites Probability Theory Continuous Optimization Discrete Optimization

Outline Probabilistic Models Conversions Exponential Family Inference Example (on board) !!

Outline Probabilistic Models Markov Random Fields (MRF) Bayesian Networks Factor Graphs Conversions Exponential Family Inference

MRF Unobserved Random Variables Neighbors Edges define a neighborhood over random variables

MRF Variable Va takes a value or a label va from a set L = {l1, l2,…, lh} V = v is called a labeling Discrete, Finite

MRF MRF assumes the Markovian property for P(v) V1 V2 V3 V4 V5 V6 V7

MRF Va is conditionally independent of Vb given Va’s neighbors Hammersley-Clifford Theorem

MRF Potential ψ12(v1,v2) Potential ψ56(v5,v6) Probability P(v) can be decomposed into clique potentials

MRF Potential ψ1(v1,d1) Observed Data Probability P(v) proportional to Π(a,b) ψab(va,vb) Probability P(d|v) proportional to Πa ψa (va,da)

MRF Πa ψa(va,da) Π(a,b) ψab(va,vb) Probability P(v,d) = Z Z is known as the partition function

MRF High-order Potential ψ4578(v4,v5,v7,v8) d1 d2 d3 V1 V2 V3 d4 d5 d6

Pairwise MRF Unary Potential ψ1(v1,d1) Pairwise Potential ψ56(v5,v6) Πa ψa(va,da) Π(a,b) ψab(va,vb) Probability P(v,d) = Z Z is known as the partition function

MRF A is conditionally independent of B given C if there is no path from A to B when C is removed

Conditional Random Fields (CRF) V1 V2 V3 d4 d5 d6 V4 V5 V6 d7 d8 d9 V7 V8 V9 CRF assumes the Markovian property for P(v|d) Hammersley-Clifford Theorem

CRF Probability P(v|d) proportional to Πa ψa(va;d) Π(a,b) ψab(va,vb;d) Clique potentials that depend on the data

CRF Πa ψa (va;d) Π(a,b) ψab(va,vb;d) Probability P(v|d) = Z Z is known as the partition function

MRF and CRF Πa ψa(va) Π(a,b) ψab(va,vb) Probability P(v) = Z V1 V2 V3

Outline Probabilistic Models Markov Random Fields (MRF) Bayesian Networks Factor Graphs Conversions Exponential Family Inference

Bayesian Networks Directed Acyclic Graph (DAG) – no directed loops V1 V2 V3 V4 V5 V6 V7 V8 Directed Acyclic Graph (DAG) – no directed loops Ignoring directionality of edges, a DAG can have loops

Bayesian Networks V1 V2 V3 V4 V5 V6 V7 V8 Bayesian Network concisely represents the probability P(v)

Bayesian Networks Probability P(v) = Πa P(va|Parents(va)) P(v1)P(v2|v1)P(v3|v1)P(v4|v2)P(v5|v2,v3)P(v6|v3)P(v7|v4,v5)P(v8|v5,v6)

Bayesian Networks Courtesy Kevin Murphy

Bayesian Networks V1 V2 V3 V4 V5 V6 V7 V8 Va is conditionally independent of its ancestors given its parents

Bayesian Networks Conditional independence of A and B given C Courtesy Kevin Murphy

Outline Probabilistic Models Markov Random Fields (MRF) Bayesian Networks Factor Graphs Conversions Exponential Family Inference

Factor Graphs Two types of nodes: variable nodes and factor nodes Bipartite graph between the two types of nodes

Factor Graphs ψa(v1,v2) V1 V2 V3 a b c d e V4 V5 V6 f g Factor graphs concisely represents the probability P(v)

Factor Graphs ψa({v}a) b c d e V4 V5 V6 f g Factor graphs concisely represents the probability P(v)

Factor Graphs ψb(v2,v3) V1 V2 V3 a b c d e V4 V5 V6 f g Factor graphs concisely represents the probability P(v)

Factor Graphs ψb({v}b) d e V4 V5 V6 f g Factor graphs concisely represents the probability P(v)

Factor Graphs ψb({v}b) Πa ψa({v}a) Probability P(v) = Z d e V4 V5 V6 f g Πa ψa({v}a) Probability P(v) = Z Z is known as the partition function

Outline Probabilistic Models Conversions Exponential Family Inference

MRF to Factor Graphs

Bayesian Networks to Factor Graphs

Factor Graphs to MRF

Outline Probabilistic Models Conversions Exponential Family Inference

Motivation Random Variable V Label set L = {l1, l2,…, lh} Samples V1, V2, …, Vm that are i.i.d. Functions ϕα: L  Reals α indexes a set of functions Empirical expectations: μα = (Σi ϕα(Vi))/m Expectation wrt distribution P: EP[ϕα(V)] = Σi ϕα(li)P(li) Given empirical expectations, find compatible distribution Underdetermined problem

Maximum Entropy Principle max Entropy of the distribution s.t. Distribution is compatible

Maximum Entropy Principle max -Σi P(li)log(P(li)) s.t. Distribution is compatible

Maximum Entropy Principle max -Σi P(li)log(P(li)) s.t. Σi ϕα(li)P(li) = μα for all α Σi P(li) = 1 P(v) proportional to exp(-Σα θαϕα(v))

Exponential Family Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2,…, lh} Labeling V = v, va  L for all a  {1, 2,…, n} Functions ϕα: Ln  Reals α indexes a set of functions P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters Sufficient Statistics Normalization Constant

Minimal Representation P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters Sufficient Statistics Normalization Constant No non-zero c such that Σα cαΦα(v) = Constant

Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2}

Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters va θa for all Va  V vavb θab for all (Va,Vb)  E

Ising Model P(v) = exp{-Σa θava -Σa,b θabvavb- A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters va θa for all Va  V vavb θab for all (Va,Vb)  E

Interactive Binary Segmentation

Interactive Binary Segmentation Foreground histogram of RGB values FG Background histogram of RGB values BG ‘+1’ indicates foreground and ‘-1’ indicates background

Interactive Binary Segmentation More likely to be foreground than background

Interactive Binary Segmentation θa proportional to -log(FG(da)) + log(BG(da)) More likely to be background than foreground

Interactive Binary Segmentation More likely to belong to same label

Interactive Binary Segmentation θab proportional to -exp(-(da-db)2) Less likely to belong to same label

Rest of lecture 1 ….

Exponential Family P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters Sufficient Statistics Log-Partition Function Random Variables V = {V1,V2,…,Vn} Random Variable Va takes a value or label va va  L = {l1,l2,…,lh} Labeling V = v

Overcomplete Representation P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters Sufficient Statistics Log-Partition Function There exists a non-zero c such that Σα cαΦα(v) = Constant

Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2}

Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va  V, li  L θab;ik Iab;ik(va,vb) for all (Va,Vb)  E, li, lk  L Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk

Ising Model P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va  V, li  L θab;ik Iab;ik(va,vb) for all (Va,Vb)  E, li, lk  L Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk

Interactive Binary Segmentation Foreground histogram of RGB values FG Background histogram of RGB values BG ‘1’ indicates foreground and ‘0’ indicates background

Interactive Binary Segmentation More likely to be foreground than background

Interactive Binary Segmentation θa;0 proportional to -log(BG(da)) θa;1 proportional to -log(FG(da)) More likely to be background than foreground

Interactive Binary Segmentation More likely to belong to same label

Interactive Binary Segmentation θab;ik proportional to exp(-(da-db)2) if i ≠ k θab;ik = 0 if i = k Less likely to belong to same label

Metric Labeling P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh}

Metric Labeling P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, …, h-1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va  V, li  L θab;ik Iab;ik(va,vb) for all (Va,Vb)  E, li, lk  L θab;ik is a metric distance function over labels

Metric Labeling P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, …, h-1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va  V, li  L θab;ik Iab;ik(va,vb) for all (Va,Vb)  E, li, lk  L θab;ik is a metric distance function over labels

Stereo Correspondence Disparity Map

Stereo Correspondence L = {disparities} Pixel (xa,ya) in left corresponds to pixel (xa+va,ya) in right

Stereo Correspondence L = {disparities} θa;i is proportional to the difference in RGB values

Stereo Correspondence L = {disparities} θab;ik = wab d(i,k) wab proportional to exp(-(da-db)2)

Pairwise MRF P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va  V, li  L θab;ik Iab;ik(va,vb) for all (Va,Vb)  E, li, lk  L

Pairwise MRF P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Πa ψa(va) Π(a,b) ψab(va,vb) Probability P(v) = Z A(θ) : log Z ψa(li) : exp(-θa;i) ψa(li,lk) : exp(-θab;ik) Parameters θ are sometimes also referred to as potentials

Pairwise MRF P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n}  {1, 2, …, h} Variable Va takes a label lf(a)

Pairwise MRF P(f) = exp{-Σa θa;f(a) -Σa,b θab;f(a)f(b) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n}  {1, 2, …, h} Variable Va takes a label lf(a) Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)

Pairwise MRF P(f) = exp{-Q(f) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n}  {1, 2, …, h} Variable Va takes a label lf(a) Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)

Outline Probabilistic Models Conversions Exponential Family Inference

Inference maxv ( P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} ) Maximum a Posteriori (MAP) Estimation minf ( Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b) ) Energy Minimization P(va = li) = Σv P(v)δ(va = li) P(va = li, vb = lk) = Σv P(v)δ(va = li)δ(vb = lk) Computing Marginals

Next Lecture … Energy minimization for tree-structured pairwise MRF