Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dependency Models โ€“ abstraction of Probability distributions

Similar presentations


Presentation on theme: "Dependency Models โ€“ abstraction of Probability distributions"โ€” Presentation transcript:

1 Dependency Models โ€“ abstraction of Probability distributions
A dependency model M over a finite set of elements U is a rule that assigns truth values to the predicate IM(X,Z,Y) where X,Y, Z are (disjoint) subsets of U.

2 Important Properties of Independence
๐ผ ๐‘‹,๐‘,๐‘Œ โ†’ ๐ผ ๐‘‹,๐‘,๐‘Œโˆช๐‘Š โ†”๐ผ(๐‘‹,๐‘โˆช๐‘Œ,๐‘Š) ๐ผ ๐‘‹,๐‘,๐‘Œ โ†’{ ๐ผ ๐‘‹,๐‘,๐‘Œโˆช๐‘Š โ†’๐ผ(๐‘‹,๐‘โˆช๐‘Œ,๐‘Š) โˆง [ ยฌ๐ผ ๐‘‹,๐‘,๐‘Œโˆช๐‘Š โ†’ยฌ๐ผ ๐‘‹,๐‘โˆช๐‘Œ,๐‘Š ]} When learning an irrelevant fact Y, the relevance status of every fact W remains unchanged.

3 Undirected Graphs can represent Independence
Let G be an undirected graph (V,E). Define IG(X,Z,Y) for disjoint sets of nodes X,Y, and Z if and only if all paths between a node in X and a node in Y pass via a node in Z. In the text book another notation used is <X|Z|Y>G

4 M = { IG(M1,{F1,F2},M2), IG(F1,{M1,M2},F2) + symmetry }

5 Definitions: 1. G=(U,E) is an I-map of a model M over U if IG(X,Z,Y) ๏ƒจ IM(X,Z,Y) for all disjoint subsets X,Y, Z of U. 2. G is a perfect map of M if IG(X,Z,Y) ๏ƒณ IM(X,Z,Y) for all disjoint subsets X,Y, Z of U. 3. M is graph-isomorph if there exists a graph G such that G is a perfect map of M.

6 Undirected Graphs can not always be perfect
Strong Union: IG(X, Z, Y) ๏ƒจ IG(X, ZโˆชW, Y) This property holds for graph separation but not for conditional independence in probability. So if G is an I-map of P it can represent IP(X, Z, Y) but can not represent the negation ๏€ ๏ƒ˜IP(X, ZโˆชW, Y). Needed property of separation for later Theorem 3: IG(X, S ,Y) ๏ƒจ [ IG(X, S โˆช Y, ฮด) or IG(Y, S โˆชX, ฮด) ]

7 Undirected Graphs as I-map
Definition: An undirected graph G is a minimal I-map of a dependency model M if deleting an edge of G would make G cease to be an I-map of M. Such a graph is called a Markov network of M.

8 THEOREM 3 [Pearl and Paz 1985]: Every dependency model M satisfying symmetry, decomposition, and intersection has a unique minimal I-map G0 = (U, E0) where the vertices U are the elements of M and the edges E0 are defined by ฮฑ,ฮฒ โˆ‰ ๐ธ0 โ†” IM(ฮฑ, U-{ฮฑ, ฮฒ}, ฮฒ) (3.11) Proof: by descending induction as done for THEOREM 2 in HMWs

9 Proof: First we prove that G0 is an I-map of M, namely, that
for every three disjoint non-empty subsets of U, IG(X,S,Y) ๏ƒจ IM(X,S,Y) (Eq. 3.II) (i). Let n=|U|. For |S| = n-2 Eq. 3.II follows from (3.11). (ii). Assume the theorem holds for every Sโ€™ with size |Sโ€™| = k โ‰ค n-2 and let S be any set such that |S| = k-1 and IG(X,S,Y) We consider two cases: XโˆชSโˆชY equal U or are a proper subset of U. (iii). If XโˆชSโˆชY = U then either X or Y has two elements. Assume Y has 2 elements so Y=Yโ€™ โˆช ฮณ. From IG(X,S,Y) we get for graph separation IG(X,S โˆช ฮณ,Yโ€™) and IG(X, S โˆช Yโ€™, ฮณ). By the induction hypothesis: IM(X,S โˆช ฮณ,Yโ€™) and IM(X, S โˆช Yโ€™, ฮณ), which implies by Intersection IM(X,S,Y), as claimed.

10 (iv). XโˆชSโˆชY โ‰  U, then there exist an element ฮด not in XโˆชSโˆชY.
From IG(X,S ,Y) we get IG(X, S โˆช ฮด, Y) and also get [ IG(X, S โˆช Y, ฮด) or IG(ฮด, S โˆชX, Y) ] The separating sets are all of size k+1 and hence by the Induction Hypothesis also IM(X, S โˆช ฮด, Y) and IM(X, S โˆช Y, ฮด) or IM(X, S โˆช ฮด, Y) and IM(ฮด, S โˆชX, Y) Applying Intersection and then decomposition in either cases yields IM(X,S ,Y), as claimed.

11 Next we prove the graph G0 is minimal, namely, no edge can be
removed from G0 without ceasing to be an I-map of M. Deleting an edge (ฮฑ,ฮฒ) leaves ฮฑ separated from ฮฒ by U-{ฮฑ, ฮฒ}. So if the remaining graph is still an I-map, then IM(ฮฑ,U-{ฮฑ, ฮฒ}, ฮฒ). But by definition of G0 (Eq 3.11) this edge is not part of G0. Hence no edge can be removed and G0 is edge-minimal. Finally is the claim that G0 is the unique Markov network of M. Let G be a minimal I-map. Every edge satisfying Eq. 3.11 must be removed from G to ensure a minimal I-map. No further edge can be removed without violating the I-map property. Hence G0 is formed from G and thus is the unique Markov network of M.

12 Pairwise Basis of a Markov Network
The set ๏“ of all independence statements defined by ฮฑ,ฮฒ โˆ‰ ๐ธ0 โ†” IM(ฮฑ, U-{ฮฑ, ฮฒ}, ฮฒ) (3.11) is called the pairwise basis of G. This set consists of at most n(n-1) independence statements, one per missing edge, that define the Markov network of M.

13 Neighboring Basis of a Markov Network
The set ๏“ of all independence statements defined by (3.12) is called the neighboring basis of G. This set consists of n independence statements, one per vertex, that define the neighbors of each vertex and hence defines a graph. Is this graph the Unique Markov network G0 of M ???

14 Alternative Construction of the Markov Network
THEOREM 4 [Pearl and Paz 1985]: Every element ฮฑ โˆˆ U in a dependency model M satisfying symmetry, decomposition, intersection, and weak union has a unique Markov boundary BI(ฮฑ). Moreover, BI(ฮฑ) equals the set of vertices BG0(ฮฑ) neighboring ฮฑ in the minimal I-map G0 (The Markov Network). Proof : (i) First we show that BI(ฮฑ) is unique. Take two Markov blankets B1 and B2. Hence IG(ฮฑ, B1 ,U-B1) and IG (ฮฑ, B2, U-B2). By Intersection also IG (ฮฑ, B1 โˆฉ B2 , U- B1โˆฉ B2) {HMW !}. Hence BI(ฮฑ) is the intersection of the set ๐‘ฉ๐‘ณ โˆ— I(ฮฑ) all blankets.

15 Proof Continues : It remains to show the second claim, that the graph G1 constructed with the neighbors BI(ฮฑ) of each vertex is the same as G0 that is constructed with edge definitions (Eq. 3.11). (ii) Every Markov boundary BI(ฮฑ) and for every element ฮฒ not in the boundary, we get due to weak union: IG(ฮฑ, BI(ฮฑ) , ฮฒ โˆช Rest-of-vertices) ๏ƒจ IG(ฮฑ, U โ€“ {a,b} , ฮฒ). Hence every edge not in G1 is also not in G0. In other words, the set of neighbors of each vertex ฮฑ in G0 is a subset of the set of neighbors in G1: BG0(ฮฑ) subset-of BI(ฮฑ) However, equality actually holds, because BG0(ฮฑ) is by itself a Markov blanket of ฮฑ, due to the I-map property of G0, and the boundary BI(ฮฑ) is the intersection of all blankets. Hence equality Holds. Thus, G0 = G1.

16 Insufficiency of Local tests for non strictly positive probability distributions
Consider the case X=Y=Z=W. What is a Markov network for it ? Is it unique ? The Intersection property is critical !

17 Markov Networks with probabilities
1. Define for each (maximal) clique Ci a non-negative function g(Ci) called the compatibility function. 2. Take the product ๏€ ๏i g(Ci) over all cliques. 3. Define P(X1,โ€ฆ,Xn) = Kยท ๏i g(Ci) where K is a normalizing factor (inverse sum of the product).

18 P(๏ก, BG(๏ก), U- ๏ก -BG(๏ก)) = f1(๏ก,BG(๏ก)) f2 (U-๏ก) (*)
Theorem 6 [Hammersley and Clifford 1971]: If a probability function P is formed by a normalized product of non negative functions on the cliques of G, then G is an I-map of P. Proof: It suffices to show (Theorem 4) that the neighborhood basis of G holds in P. Namely, show that I(๏ก, BG(๏ก), U- ๏ก -BG(๏ก)) hold in P, or just that: P(๏ก, BG(๏ก), U- ๏ก -BG(๏ก)) = f1(๏ก,BG(๏ก)) f2 (U-๏ก) (*) Let J๏ก stand for the set of indices marking all cliques in G that include ๏ก. = f1(๏ก,BG(๏ก)) f2 (U-๏ก) The first product contains only variables adjacent to ๏ก because Cj is a clique. The second product does not contain ๏ก. Hence (*) holds.

19 Note: The theorem and its converse hold also for extreme probabilities but the presented proof does not apply due to the use of Intersection in Theorem 4. Theorem X: Every undirected graph G has a distribution P such that G is a perfect map of P. (In light of previous notes, it must have the form of a product over cliques).

20 Proof Sketch of Theorem X
Theorem Y (Completeness): Given a graph G, for every independence statement ๏ณ = I(๏ก,Z,๏€ ๏ข) that does NOT hold in G, there exists a probability distribution P๏ณ that satisfies all independence statements that hold in the graph G and does not satisfy ๏ณ = I(๏ก,Z,๏€ ๏ข). Proof of Theorem Y: Pick a path in G between ๏ก and ๏ข that does not contain a node from Z. Define a probability distribution that is a perfect map of the chain and multiply it by any marginal probabilities on all other nodes not on the path, forming P๏ณ . Sketch for Theorem X (Strong Completeness): โ€œMultiplyโ€ all P๏ณ (via Armstrong relation) to obtain P that is a perfect map of G. (Continue here with โ€œProof by intimidationโ€ --)

21 Interesting conclusion of Theorem Y:
All independence statements that follow for strictly-positive probability from the neighborhood basis are derivable via symmetry, decomposition, intersection, and weak union. These axioms are (sound and) complete for neighborhood bases. These axioms are (sound and) complete also for pairwise bases. In fact for saturated statements conditional independence (whose span of variables is all of U) and vertex separation have exactly the same axioms. Isnโ€™t that amazing ? (See paper P2).

22 Drawback: Interpreting the Links is not simple
Another drawback is the difficulty with extreme probabilities. There is no local test for I-mapness. Both drawbacks disappear in the class of decomposable models, which are a special case of Bayesian networks

23 Decomposable Models Example: Markov Chains and Markov Trees
Assume the following chain is an I-map of some P(x1,x2,x3,x4) and was constructed using the methods we just described. The โ€œcompatibility functionsโ€ on all links can be easily interpreted in the case of chains. Same also for trees. This idea actually works for all chordal graphs.

24 Chordal Graphs

25 Interpretation of the links
Clique 1 Clique 2 Clique 3 A probability distribution that can be written as a product of low order marginals divided by a product of low order marginals is said to be decomposable.

26 Importance of Decomposability
When assigning compatibility functions it suffices to use marginal probabilities on cliques and just make sure to be locally consistent. Marginals can be assessed from experts or estimated directly from data.

27

28 Main results on d-separation
The definition of ID(X, Z, Y) is such that: Soundness [Theorem 9]: ID(X, Z, Y) = yes implies IP(X, Z, Y) follows from the boundary Basis(D). Completeness [Theorem 10]: ID(X, Z, Y) = no implies IP(X, Z, Y) does not follow from the boundary Basis(D).

29 Claim 1: Each vertex Xi in a Bayesian Network is d-separated of all its non-descendantsโ€™ given its parents pai. Proof : Each vertex Xi is connected to its non-descendantsi via its parents or via its descendants. All paths via its parents are blocked because pai are given and all paths via descendants are blocked because they pass through converging edges ๏ƒ  Z ๏ƒŸ were Z is not given. Hence by definition of d-separation the claim holds: ID(Xi, pai, non-descendantsi).

30 Claim 2: Each topological order d in a BN entails the same set of independence assumptions.
Proof : By Claim 1: ID(Xi, pai, non-descendandsi) holds. For each topological order d on {1,โ€ฆ,n}, it follows IP(Xd(i), pad(i), non-descendsd(i)) holds as well. From soundness (Theorem 9) IP(Xd(i), pad(i), non-descendsd(i)) holds as well. By the decomposition property of conditional independence IP(Xd(i), pad(i), S ) holds for every S that is a subset of non-descendsd(i) . Hence, Xi is independent given its parents also from S ={all variables before Xi in an arbitrary topological order d}.


Download ppt "Dependency Models โ€“ abstraction of Probability distributions"

Similar presentations


Ads by Google