Download presentation

Presentation is loading. Please wait.

Published byGrady How Modified over 3 years ago

1
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi

2
2 Outline Motivation Inference in graphical models Exact inference is intractable Variational methodology –Sequential approach –Block approach Conclusions

3
3 Motivation (Example: Medical Diagnosis) symptoms diseases What is the most probable disease?

4
4 Motivation We want to answer some queries about our data Graphical model is a way to model data Inference in some graphical models is intractable (NP-hard) Variational methods simplify the inference in graphical models by using approximation

5
5 Graphical Models Directed (Bayesian network) Undirected S1S1 S3S3 S5S5 S4S4 S2S2 P(S 2 ) P(S 1 ) P(S 5 |S 3,S 4 ) P(S 3 |S 1,S 2 ) P(S 4 |S 3 ) (C 1 ) (C 2 ) (C 3 )

6
6 Inference in Graphical Models Inference: Given a graphical model, the process of computing answers to queries How computationally hard is this decision problem? Theorem: Computing P(X = x) in a Bayesian network is NP-hard

7
7 Why Exact Inference is Intractable? symptoms diseases Diagnose the most probable disease

8
8 Why Exact Inference is Intractable? symptoms diseases : Observed symptoms

9
9 Why Exact Inference is Intractable? symptoms diseases :Noisy-OR model 101

10
10 Why Exact Inference is Intractable? symptoms diseases : Noisy-OR model 101

11
11 Why Exact Inference is Intractable?

12
12 Why Exact Inference is Intractable? symptoms diseases : Observed symptoms

13
13 Why Exact Inference is Intractable? symptoms diseases : Observed symptoms

14
14 Reducing the Computational Complexity Variational Methods Simple graph for exact methods Approximate the probability distribution Use the role of convexity

15
15 Express a Function Variationally is a concave function

16
16 Express a Function Variationally is a concave function

17
17 Express a Function Variationally If the function is not convex or concave: transform the function to a desired form Example: logistic function Transformation Approximation Transforming back

18
18 Approaches to Variational Methods Sequential Approach: (on-line) nodes are transformed in an order, determined during inference process Block Approach: (off-line) has obvious substructures

19
19 Sequential Approach (Two Methods) Untransformed Graph Transform one node at a time Simple Graph for exact methods Reintroduce one node at a time Simple Graph for exact methods Completely transformed Graph

20
20 Sequential Approach (Example) symptoms diseases Log Concave

21
21 Sequential Approach (Example) symptoms diseases Log Concave

22
22 Sequential Approach (Example) symptoms diseases 1

23
23 Sequential Approach (Example) symptoms diseases 1

24
24 Sequential Approach (Example) symptoms diseases 1

25
25 Sequential Approach (Upper Bound and Lower Bound) We need both lower bound and upper bound

26
26 How to Compute Lower Bound for a Concave Function? Lower bound for concave functions: Variational parameter is probability distribution

27
27 Block Approach (Overview) Off-line application of sequential approach –Identify some structure amenable to exact inference –Family of probability distribution via introduction of parameters –Choose best approximation based on evidence

28
28 Block Approach (Details) KL divergence Family of Minimize KL divergence

29
29 Block Approach (Example – Boltzmann machine) SiSi SjSj

30
30 Block Approach (Example – Boltzmann machine) SiSi S j =1

31
31 Block Approach (Example – Boltzmann machine) sisi sjsj

32
32 Block Approach (Example – Boltzmann machine) sisi sjsj Minimize KL Divergence

33
33 Block Approach (Example – Boltzmann machine) sisi sjsj Minimize KL Divergence Mean field equations: solve for fixed point

34
34 Conclusions Time or space complexity of exact calculation is unacceptable Complex graphs can be probabilistically simple Inference in simplified models provides bounds on probabilities in the original model

35
35

36
36 Extra Slides

37
37 Concerns Approximation accuracy Strong dependencies can be identified Not based on convexity transformation Not able to assure that the framework will transfer to other examples Not straightforward to develop a variational approximation for new architectures

38
38 Justification for KL Divergence Best lower bound on the probability of the evidence

39
39 EM Maximum likelihood parameter estimation: Following function is the lower bound on log likelihood KL Divergence between Q(H|E) and P(H|E, )

40
40 EM 1.Maximize the bound with respect to Q 2.Fix Q, maximize with respect to Traditional EM Approximation to EM algorithm

41
41 Principle of Inference DAG Junction Tree Inconsistent Junction Tree Initialization Consistent Junction Tree Propagation Marginalization

42
42 Example: Create Join Tree X1X2 Y1Y2 HMM with 2 time steps: Junction Tree: X1,X2 X1,Y1 X2,Y2 X1 X2

43
43 Example: Initialization Variable Associated Cluster Potential function X1X1,Y1 Y1X1,Y1 X2X1,X2 Y2X2,Y2 X1,X2 X1,Y1 X2,Y2 X1 X2

44
44 Example: Collect Evidence Choose arbitrary clique, e.g. X1,X2, where all potential functions will be collected. Call recursively neighboring cliques for messages: 1. Call X1,Y1. –1. Projection: –2. Absorption:

45
45 Example: Collect Evidence (cont.) 2. Call X2,Y2: –1. Projection: –2. Absorption: X1,X2 X1,Y1 X2,Y2 X1 X2

46
46 Example: Distribute Evidence Pass messages recursively to neighboring nodes Pass message from X1,X2 to X1,Y1: –1. Projection: –2. Absorption:

47
47 Example: Distribute Evidence (cont.) Pass message from X1,X2 to X2,Y2: –1. Projection: –2. Absorption: X1,X2 X1,Y1 X2,Y2 X1 X2

48
48 Example: Inference with evidence Assume we want to compute: P(X2|Y1=0,Y2=1) (state estimation) Assign likelihoods to the potential functions during initialization:

49
49 Example: Inference with evidence (cont.) Repeating the same steps as in the previous case, we obtain:

50
50 Variable Elimination General idea: Write query in the form Iteratively –Move all irrelevant terms outside of innermost sum –Perform innermost sum, getting a new term –Insert the new term into the product

51
51 Complexity of variable elimination Suppose in one elimination step we compute This requires multiplications additions Complexity is exponential in number of variables in the intermediate factor

52
52 Chordal Graphs elimination ordering undirected chordal graph Graph: Maximal cliques are factors in elimination Factors in elimination are cliques in the graph Complexity is exponential in size of the largest clique in graph L T A B X V S D V S L T A B XD

53
53 Induced Width The size of the largest clique in the induced graph is thus an indicator for the complexity of variable elimination This quantity is called the induced width of a graph according to the specified ordering Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph

54
54 Properties of Junction Trees In every junction tree: –For each cluster (or sepset), –The probability distribution of any variable, using any cluster (or sepset) that contains

55
55 Exact inference Using Junction Trees Undirected tree Each node is a cluster Running intersection property: –Given two clusters and, all clusters on the path between and contain Separator sets (sepsets): –Intersection of adjacent clusters ADEABD DEF ADDE Cluster ABD Sepset DE

56
56 Constructing Junction Trees Marrying Parents X4X4 X6X6 X5X5 X3X3 X2X2 X1X1

57
57 Moral Graph X4X4 X6X6 X5X5 X3X3 X2X2 X1X1

58
58 Triangulation X4X4 X6X6 X5X5 X3X3 X2X2 X1X1

59
59 Identify Cliques X4X4 X6X6 X5X5 X3X3 X2X2 X1X1 X2X5X6X2X5X6 X1X2X3X1X2X3 X2X3X5X2X3X5 X2X4X2X4

60
60 Junction Tree Junction tree is a subgraph of the clique graph satisfying the running intersection property X1X2X3X1X2X3 X2X5X6X2X5X6 X2X3X5X2X3X5 X2X3X2X3 X2X5X2X5 X2X2 X2X5X6X2X5X6 X2X4X2X4 X1X2X3X1X2X3 X2X3X5X2X3X5 X2X4X2X4

61
61 Constructing Junction Trees DAG Moral GraphTriangulated GraphJunction TreeIdentify Cliques

62
62 Sequential Approach (Example) Lower bound for medical diagnosis ex:

Similar presentations

OK

Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD 2009 2010. 4. 9 Presented.

Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD 2009 2010. 4. 9 Presented.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on revolt of 1857 causes Ppt on harmful effects of drinking alcohol Ppt on brand equity Ppt on beer lambert law and its function Ppt on strategic brand management process Ppt on water scarcity in south Ppt on operating system linux Ppt on forest management in india New era of management ppt on communication Ppt on do's and don'ts of group discussion activities