Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Inference Lecture 1

Similar presentations


Presentation on theme: "Probabilistic Inference Lecture 1"— Presentation transcript:

1 Probabilistic Inference Lecture 1
M. Pawan Kumar Slides available online

2 About the Course 7 lectures + 1 exam Probabilistic Models – 1 lecture
Energy Minimization – 4 lectures Computing Marginals – 2 lectures Related Courses Probabilistic Graphical Models (MVA) Structured Prediction

3 Instructor Assistant Professor (2012 – Present)
Center for Visual Computing 12 Full-time Faculty Members 2 Associate Faculty Members Research Interests Probabilistic Models Machine Learning Computer Vision Medical Image Analysis

4 Students Third year at ECP Specializing in Machine Learning and Vision
Prerequisites Probability Theory Continuous Optimization Discrete Optimization

5 Outline Probabilistic Models Conversions Exponential Family Inference
Example (on board) !!

6 Outline Probabilistic Models Markov Random Fields (MRF)
Bayesian Networks Factor Graphs Conversions Exponential Family Inference

7 MRF Unobserved Random Variables Neighbors
Edges define a neighborhood over random variables

8 MRF Variable Va takes a value or a label va from a set L
= {l1, l2,…, lh} V = v is called a labeling Discrete, Finite

9 MRF MRF assumes the Markovian property for P(v) V1 V2 V3 V4 V5 V6 V7

10 MRF Va is conditionally independent of Vb given Va’s neighbors
Hammersley-Clifford Theorem

11 MRF Potential ψ12(v1,v2) Potential ψ56(v5,v6)
Probability P(v) can be decomposed into clique potentials

12 MRF Potential ψ1(v1,d1) Observed Data
Probability P(v) proportional to Π(a,b) ψab(va,vb) Probability P(d|v) proportional to Πa ψa (va,da)

13 MRF Πa ψa(va,da) Π(a,b) ψab(va,vb) Probability P(v,d) = Z
Z is known as the partition function

14 MRF High-order Potential ψ4578(v4,v5,v7,v8) d1 d2 d3 V1 V2 V3 d4 d5 d6

15 Pairwise MRF Unary Potential ψ1(v1,d1) Pairwise Potential ψ56(v5,v6)
Πa ψa(va,da) Π(a,b) ψab(va,vb) Probability P(v,d) = Z Z is known as the partition function

16 MRF A is conditionally independent of B given C if
there is no path from A to B when C is removed

17 Conditional Random Fields (CRF)
V1 V2 V3 d4 d5 d6 V4 V5 V6 d7 d8 d9 V7 V8 V9 CRF assumes the Markovian property for P(v|d) Hammersley-Clifford Theorem

18 CRF Probability P(v|d) proportional to Πa ψa(va;d) Π(a,b) ψab(va,vb;d)
Clique potentials that depend on the data

19 CRF Πa ψa (va;d) Π(a,b) ψab(va,vb;d) Probability P(v|d) = Z
Z is known as the partition function

20 MRF and CRF Πa ψa(va) Π(a,b) ψab(va,vb) Probability P(v) = Z V1 V2 V3

21 Outline Probabilistic Models Markov Random Fields (MRF)
Bayesian Networks Factor Graphs Conversions Exponential Family Inference

22 Bayesian Networks Directed Acyclic Graph (DAG) – no directed loops
V1 V2 V3 V4 V5 V6 V7 V8 Directed Acyclic Graph (DAG) – no directed loops Ignoring directionality of edges, a DAG can have loops

23 Bayesian Networks V1 V2 V3 V4 V5 V6 V7 V8 Bayesian Network concisely represents the probability P(v)

24 Bayesian Networks Probability P(v) = Πa P(va|Parents(va))
P(v1)P(v2|v1)P(v3|v1)P(v4|v2)P(v5|v2,v3)P(v6|v3)P(v7|v4,v5)P(v8|v5,v6)

25 Bayesian Networks Courtesy Kevin Murphy

26 Bayesian Networks V1 V2 V3 V4 V5 V6 V7 V8 Va is conditionally independent of its ancestors given its parents

27 Bayesian Networks Conditional independence of A and B given C
Courtesy Kevin Murphy

28 Outline Probabilistic Models Markov Random Fields (MRF)
Bayesian Networks Factor Graphs Conversions Exponential Family Inference

29 Factor Graphs Two types of nodes: variable nodes and factor nodes
Bipartite graph between the two types of nodes

30 Factor Graphs ψa(v1,v2) V1 V2 V3 a b c d e V4 V5 V6 f g Factor graphs concisely represents the probability P(v)

31 Factor Graphs ψa({v}a)
b c d e V4 V5 V6 f g Factor graphs concisely represents the probability P(v)

32 Factor Graphs ψb(v2,v3) V1 V2 V3 a b c d e V4 V5 V6 f g Factor graphs concisely represents the probability P(v)

33 Factor Graphs ψb({v}b)
d e V4 V5 V6 f g Factor graphs concisely represents the probability P(v)

34 Factor Graphs ψb({v}b) Πa ψa({v}a) Probability P(v) = Z
d e V4 V5 V6 f g Πa ψa({v}a) Probability P(v) = Z Z is known as the partition function

35 Outline Probabilistic Models Conversions Exponential Family Inference

36 MRF to Factor Graphs

37 Bayesian Networks to Factor Graphs

38 Factor Graphs to MRF

39 Outline Probabilistic Models Conversions Exponential Family Inference

40 Motivation Random Variable V Label set L = {l1, l2,…, lh}
Samples V1, V2, …, Vm that are i.i.d. Functions ϕα: L  Reals α indexes a set of functions Empirical expectations: μα = (Σi ϕα(Vi))/m Expectation wrt distribution P: EP[ϕα(V)] = Σi ϕα(li)P(li) Given empirical expectations, find compatible distribution Underdetermined problem

41 Maximum Entropy Principle
max Entropy of the distribution s.t. Distribution is compatible

42 Maximum Entropy Principle
max -Σi P(li)log(P(li)) s.t. Distribution is compatible

43 Maximum Entropy Principle
max -Σi P(li)log(P(li)) s.t. Σi ϕα(li)P(li) = μα for all α Σi P(li) = 1 P(v) proportional to exp(-Σα θαϕα(v))

44 Exponential Family Random Variable V = {V1, V2, …,Vn}
Label set L = {l1, l2,…, lh} Labeling V = v, va  L for all a  {1, 2,…, n} Functions ϕα: Ln  Reals α indexes a set of functions P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters Sufficient Statistics Normalization Constant

45 Minimal Representation
P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters Sufficient Statistics Normalization Constant No non-zero c such that Σα cαΦα(v) = Constant

46 Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2}

47 Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters va θa for all Va  V vavb θab for all (Va,Vb)  E

48 Ising Model P(v) = exp{-Σa θava -Σa,b θabvavb- A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters va θa for all Va  V vavb θab for all (Va,Vb)  E

49 Interactive Binary Segmentation

50 Interactive Binary Segmentation
Foreground histogram of RGB values FG Background histogram of RGB values BG ‘+1’ indicates foreground and ‘-1’ indicates background

51 Interactive Binary Segmentation
More likely to be foreground than background

52 Interactive Binary Segmentation
θa proportional to -log(FG(da)) + log(BG(da)) More likely to be background than foreground

53 Interactive Binary Segmentation
More likely to belong to same label

54 Interactive Binary Segmentation
θab proportional to -exp(-(da-db)2) Less likely to belong to same label

55 Rest of lecture 1 ….

56 Exponential Family P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters
Sufficient Statistics Log-Partition Function Random Variables V = {V1,V2,…,Vn} Random Variable Va takes a value or label va va  L = {l1,l2,…,lh} Labeling V = v

57 Overcomplete Representation
P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters Sufficient Statistics Log-Partition Function There exists a non-zero c such that Σα cαΦα(v) = Constant

58 Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2}

59 Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va  V, li  L θab;ik Iab;ik(va,vb) for all (Va,Vb)  E, li, lk  L Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk

60 Ising Model P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va  V, li  L θab;ik Iab;ik(va,vb) for all (Va,Vb)  E, li, lk  L Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk

61 Interactive Binary Segmentation
Foreground histogram of RGB values FG Background histogram of RGB values BG ‘1’ indicates foreground and ‘0’ indicates background

62 Interactive Binary Segmentation
More likely to be foreground than background

63 Interactive Binary Segmentation
θa;0 proportional to -log(BG(da)) θa;1 proportional to -log(FG(da)) More likely to be background than foreground

64 Interactive Binary Segmentation
More likely to belong to same label

65 Interactive Binary Segmentation
θab;ik proportional to exp(-(da-db)2) if i ≠ k θab;ik = 0 if i = k Less likely to belong to same label

66 Metric Labeling P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh}

67 Metric Labeling P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {0, …, h-1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va  V, li  L θab;ik Iab;ik(va,vb) for all (Va,Vb)  E, li, lk  L θab;ik is a metric distance function over labels

68 Metric Labeling P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, …, h-1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va  V, li  L θab;ik Iab;ik(va,vb) for all (Va,Vb)  E, li, lk  L θab;ik is a metric distance function over labels

69 Stereo Correspondence
Disparity Map

70 Stereo Correspondence
L = {disparities} Pixel (xa,ya) in left corresponds to pixel (xa+va,ya) in right

71 Stereo Correspondence
L = {disparities} θa;i is proportional to the difference in RGB values

72 Stereo Correspondence
L = {disparities} θab;ik = wab d(i,k) wab proportional to exp(-(da-db)2)

73 Pairwise MRF P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va  V, li  L θab;ik Iab;ik(va,vb) for all (Va,Vb)  E, li, lk  L

74 Pairwise MRF P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Πa ψa(va) Π(a,b) ψab(va,vb) Probability P(v) = Z A(θ) : log Z ψa(li) : exp(-θa;i) ψa(li,lk) : exp(-θab;ik) Parameters θ are sometimes also referred to as potentials

75 Pairwise MRF P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n}  {1, 2, …, h} Variable Va takes a label lf(a)

76 Pairwise MRF P(f) = exp{-Σa θa;f(a) -Σa,b θab;f(a)f(b) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n}  {1, 2, …, h} Variable Va takes a label lf(a) Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)

77 Pairwise MRF P(f) = exp{-Q(f) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n}  {1, 2, …, h} Variable Va takes a label lf(a) Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)

78 Outline Probabilistic Models Conversions Exponential Family Inference

79 Inference maxv ( P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} ) Maximum a Posteriori (MAP) Estimation minf ( Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b) ) Energy Minimization P(va = li) = Σv P(v)δ(va = li) P(va = li, vb = lk) = Σv P(v)δ(va = li)δ(vb = lk) Computing Marginals

80 Next Lecture … Energy minimization for tree-structured pairwise MRF


Download ppt "Probabilistic Inference Lecture 1"

Similar presentations


Ads by Google