1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

Slides:



Advertisements
Similar presentations
1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,
Advertisements

Markov Networks Alan Ritter.
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Bayes Networks Markov Networks Noah Berlow. Bayesian -> Markov (Section 4.5.1) Given B, How can we turn into Markov Network? The general idea: – Convert.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Exact Inference in Bayes Nets
BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Entropy Rates of a Stochastic Process
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Junction Tree Algorithm Brookes Vision Reading Group.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
From Variable Elimination to Junction Trees
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Global Approximate Inference Eran Segal Weizmann Institute.
Bayesian Network Representation Continued
Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.
Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.
Exact Inference: Clique Trees
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
. Introduction to Bayesian Networks Instructor: Dan Geiger Web page:
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Presenter : Kuang-Jui Hsu Date : 2011/5/23(Tues.).
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 Inferring structure to make substantive conclusions: How does it work? Hypothesis testing approaches: Tests on deviances, possibly penalised (AIC/BIC,
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Knowledge Repn. & Reasoning Lec #11: Partitioning & Treewidth UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2004.
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
CS774. Markov Random Field : Theory and Application Lecture 02
The countable character of uncountable graphs François Laviolette Barbados 2003.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.
Independence, Decomposability and functions which take values into an Abelian Group Adrian Silvescu Vasant Honavar Department of Computer Science Iowa.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Today Graphical Models Representing conditional dependence graphically
Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.
Markov Random Fields in Vision
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
. Bayesian Networks Some slides have been edited from Nir Friedman’s lectures which is available at Changes made by Dan Geiger.
. Introduction to Bayesian Networks Instructor: Dan Geiger Web page:
An introduction to chordal graphs and clique trees
The countable character of uncountable graphs François Laviolette Barbados 2003.
Parameterized complexity Bounded tree width approaches
Bayesian Networks Background Readings: An Introduction to Bayesian Networks, Finn Jensen, UCL Press, Some slides have been edited from Nir Friedman’s.
Bell & Coins Example Coin1 Bell Coin2
The set  of all independence statements defined by (3
Dependency Models – abstraction of Probability distributions
Bayesian Networks Based on
Markov Random Fields Presented by: Vladan Radosavljevic.
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Approximate Inference by Sampling
Junction Trees 3 Undirected Graphical Models
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Variable Elimination Graphical Models – Carlos Guestrin
Mean Field and Variational Methods Loopy Belief Propagation
Locality In Distributed Graph Algorithms
Presentation transcript:

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”, “connected ideas”, “far-fetched arguments”. Still, capturing the essence of dependence is not an easy task. When modeling causation, association, and relevance, it is hard to distinguish between direct and indirect neighbors. If we just connect “dependent variables” we will get cliques.

2 Undirected Graphs can represent Independence Let G be an undirected graph (V,E). Define I G (X,Z,Y) for disjoint sets of nodes X,Y, and Z if and only if all paths between a node in X and a node in Y pass via a node in Z. In the text book another notation used is G.

3 M = { I G (M 1,{F 1,F 2 },M 2 ), I G (F 1,{M 1,M 2 },F 2 ) + symmetry }

4 Dependency Models – abstraction of Probability distributions

5

6

7 Recall that composition and contraction are implied.

8 (namely, Composition holds)

9

10

11 The set  of all independence statements defined by (3.11) is called the pairwise basis of G. These are the independence statements that define the graph.

12

13 Edge minimal and unique.

14 The set  of all independence statements defined by (3.12) is called the neighboring basis of G.

15

16

17 Testing I-mapness Proof: (2)  (1)  (3)  (2). (2)  (1): Holds because G is an I-map of G 0 is an I map of P. (1)  (3): True due to I-mapness of G (by definition). (3)  (2):

18 Insufficiency of Local tests for non strictly positive probability distributions Consider the case X=Y=Z=W. What is a Markov network for it ? Is it unique ? The Intersection property is critical !

19 Markov Networks that represents probability distributions (rather than just independence)

20 The two males and females example

21

22 Theorem 6 does not guarantee that every dependency of P will be represented by G. However one can show the following claim: Theorem X: Every undirected graph G has a distribution P such that G is a perfect map of P. (In light of previous notes, it must have the form of a product over cliques).

23 Proof Sketch Given a graph G, it is sufficient to show that for an independence statement  = I( ,Z,  ) that does NOT hold in G, there exists a probability distribution that satisfies all independence statements that hold in the graph and does not satisfy  = I( ,Z,  ). Well, simply pick a path in G between  and  that does not contain a node from Z. Define a probability distribution that is a perfect map of the chain and multiply it by any marginal probabilities on all other nodes forming P . Now “multiply” all P  (Armstrong relation) to obtain P. Interesting task (Replacing HMW #4): Given an undirected graph over binary variables construct a perfect map probability distribution. (Note: most Markov random fields are perfect maps !).

24 The set  of all independence statements defined by (3.12) was called the neighboring basis of G. Interesting conclusion of Theorem X. All independence statements that follow for strictly-positive probability from the neighborhood basis are derivable via symmetry, decomposition, intersection, and weak union. These axioms are sound and complete for neighborhood bases ! Same conclusion with pairwise bases. In fact for saturated statements independence and separation have the same characterization. See paper P2 in the recitation class. Recall:

25 Drawback: Interpreting the Links is not simple Another drawback is the difficulty with extreme probabilities. Both drawbacks disappear in the class of decomposable models, which are a special case of Bayesian networks

26 Decomposable Models Example: Markov Chains and Markov Trees Assume the following chain is an I-map of some P(x 1,x 2,x 3,x 4 ) and was constructed using the methods we just described. The “compatibility functions” on all links can be easily interpreted in the case of chains. Same also for trees. This idea actually works for all chordal graphs.

27 Chordal Graphs

28 Interpretation of the links Clique 1 Clique 2 Clique 3 A probability distribution that can be written as a product of low order marginals divided by a product of low order marginals is said to be decomposable.

29 Importance of Decomposability When assigning compatibility functions it suffices to use marginal probabilities on cliques and just make sure to be locally consistent. Marginals can be assessed from experts or estimated directly from data.

30 The Diamond Example – The smallest non chordal graph Adding one more link will turn the graph to become chordal. Turning a general undirected graph into a chordal graph in some optimal way is the key for all exact computations done on Markov and Bayesian networks.

31 Chordal Graphs

32 Example of the Theorem 1.Each cycle has a chord. 2.There is a way to direct edges legally, namely, A  B, A  C, B  C, B  D, C  D, C  E 3.Legal removal order (eg): start with E, than D, than the rest. 4.The maximal cliques form a join (clique) tree.