Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Networks A causal probabilistic network, or Bayesian network,

Similar presentations


Presentation on theme: "Bayesian Networks A causal probabilistic network, or Bayesian network,"— Presentation transcript:

1 Bayesian Networks A causal probabilistic network, or Bayesian network,
is an directed acyclic graph (DAG) where nodes represent variables and links represent dependency relations, e.g. of the type cause-effect, between variables and quantified by (conditional) probabilities Qualitative component + quantitative component

2 Bayesian Networks Qualitative component : relations of conditional dependence / independence I(A, B | C): A and B are independent given C I(A, B) = I(A, B | Ø): A and B are a priori independent Formal study of the properties of the ternary relation I A Bayesian network may encode three fundamental types of relations among neighbour variables.

3 Qualitative Relations : type I
FGH Ex: F: smoke, G: bronchitis, H: respiratory problems (dyspnea) Relations: ¬ I(F, H) I(F, H | G)

4 Qualitative Relations : type II
EFG Ex: F: smoke, G: bronchitis, E: lung cancer Relations: ¬ I(E, G) I(E, G | F)

5 Qualitative Relations : type III
B  C  E Ex: C: alarm, B: movement detection, E: rain Relations: I(B, E) ¬ I(B, E | C)

6 Probabilistic component
Qualitative knowledge: a directed acyclic graph G (DAG) Nodes(G) = V = {X1, …, Xn} -- discrete variables -- Edges(G)  VxV Parents(Xi) = {Xi : (Xj, Xi)  Edges(G)} Probabilistic knowledge: P(Xi | parents(Xi)) These probabilities determine a joint probability distribution P over V = {X1, …, Xn}: P(X1, …, Xn) = P(X1 | parents(X1)) · · · P(Xn | parents(Xn)) Bayesian Network = (G, P)

7 P(X1, X2, ..., Xn) = i=1,n P(Xi | parents(Xi))
Joint Distribution P(X1,X2,...Xn) = P(Xn|Xn-1,...X1) ... P(X3|X2,X1) P(X2|X1) P(X1). Independence relations of each variable Xi with the set of predecessor variables of the parents of Xi: P(Xi | parents(Xi), Y1,.., Yk) = P(Xi | parents(Xi)) P(X1, X2, ..., Xn) = i=1,n P(Xi | parents(Xi)) • to have in each node Xi the conditional probability distribution P(Xi | parents(Xi)) is enough to determine the full joint probability distribution P(X1,X2,...,Xn)

8 Example A: visit to Asia B: tuberculosis F: smoke E: lung cancer
G: bronchitis C: B or E D: X-ray H: dyspnea P(A): P(a) = 0.01 P(B | A): P(b | a) = 0.05, P(b | ¬a) = 0.01 P(C | B,E): P(c | b, e) = 1, P(c | b, ¬e) = 1, P(c | ¬b, e) = 1, P(c | ¬b, ¬e) = 0 P(F): P(f) = 0.5 P(D | C): P(d | c) = .98, P(d | ¬c) = 0.05 P(E | F): P(e | f) = 0.1, P(e | ¬f) = 0.01 P(G | F): P(g | f) = 0.6, P(g | ¬f) = 0.3 P(H | C, G): P(h | c,g) =0.9 , P(h | c,¬g) = 0.7, P(h | ¬c,g) = 0.8, P(h | ¬c,¬g) = 0.1, P(A,B,C,D,E,F,G,H) = P(D | C) P(H | C, G) P(C | B, E) P(G | F) P(E | F) P(F) P(B | A) P(A) P(a,¬b,c,¬d,e,f,g,¬h) = P(¬d |c) P(¬h |c,g) P(c | ¬b,e) P(g | f) P(e | f) P(f) P(¬b | a) P(a) = ( )  (1-0.9)  1  0.6  0.1  0.5  (1-0.05)  0.01 = 5,7  10-7.

9 D-separation relations and probabilistic independence
Goal: precesely determine which independence relations (graphically) are defined by one DAG. Previous definitions: A path is a sequence of connected nodes in the graph. A non directed path is a path without taking into account the directions of the arrows. A “head-to-head” link in a node is a (non directed) path of the form xyw, the node y is clalled a “head-to-head” node.

10 D-separation • A path c is called to be activated by a set of nodes Z if the following two conditions are satisfied: Every node in c with links head-to-head is in Z or it has a descendent in Z. Any other node in c does not belong to Z. Otherwise, the path c is called to be blocked by Z. Definition. If X, Y and Z are three disjoint subsets of nodes disjunts in a DAG G, then Z d-separates X from Y, or equivalently X and Y are graphically independent given Z, when all the paths between any node from X and any node from Y are blocked by Z

11 D-separation {B} and {C} are d-separated by {A}:
Path B-E-C: E,G  {A} - {A} blocks the path B-E-C Path B-A-C: - {A} blocks the path B-A-C Theorem. Let G be a DAG and let X,Y and Z be subsets of nodes such that X and Y are d-separated by Z. Then, X and Y are conditionally independent from Z for any probability P such that (G, P) is a causal network over G, that is, s.t. P(X | Y,Z) = P(X | Z) and P(Y | X,Z) = P(Y | Z).

12 Inference in Bayesian Networks
Knowledge about a domain encoded by a Bayesian network XB = (G, P). Inference = updating probabilities: evidence E on values taken by some variables modify the probabilities of the rest of variables P(X) --- > P’(X) = P(X | E) Direct Method: XB = < G = {A,B,C,D,E}, P(A,B,C,D,E) > Evidence: A = ai, B = bj P(C = ck | A = ai, B = bj) =

13 Inference in Bayesian Networks
Bayesian networks allow local computations, which exploit the indepence relations among variables explictly induced by the corresponding DAG of the networks. They allow updating the probability of a variable using only the probabilities of the immediat predecessor nodes (parents), and in this way, step by step to update the probabilities of all non-instantiated variables in the network ---> propagation methods Two main propagation methods: Pearl method: message passing over the DAG Lauritzen & Spiegelhalter method: previous transformation of the DAG in a tree of cliques

14 Propagation method in trees of cliques
transformation of initial network in another graphical structure, a tree of cliques (subsets de nodes) equivalent probabilistic information BN = (G, P) ----> [Tree, P] propagation algorithm over the new structure

15 Graphical Transformation
Definition: a “clique” in a non-directed graph is a complete and maximal subgraph To transform a DAG G in a tree of cliques: Delete directions in edges of G: G’ Moralization of G’: add edges between nodes with common sons in the original DAG G: G’’ Triangularization of G’’ : G* Identification of the cliques in G* Suitable enumeration of the cliques (Running Inters. Prop.) Construction of the tree according to the enumeration

16 Example (1) 1) 2)

17 Example (2): triangularization
3) 3)

18 Example (3): cliques Cliques: 4) Cliques:
{A,B}, {B,C,E}, {E,F,G}, {C,E,G}, {C,G,H}, {C,D}

19 Ordering of cliques Enumeration of cliques Clq1, Clq2, …, Clqn such that the following property holds: Running Intersection Property: for all i=1,…, n there exists j < i such that Si Clqj , where Si = Clqi(Clq1Clq2...Clqi-1). This property is guaranteed if: (i) nodes of the graph are enumerated following the criterion of “maximum cardinality search” (ii) cliques are ordered according to the node of the clique with a highest ranking in the former enumaration.

20 Example (4): ordering cliques
1 6 3 2 5 4 8 7 Clq1 = {A,B}, Clq2 = {B,E,C}, Clq3 = {E,C,G}, Clq4 = {E,G,F}, Clq5 = {C,G,H}, Clq6 = {C,D}

21 Tree Construction Let [Clq1, Clq2, …, Clqn ] be an ordering satisfying R.I.P. For each clique Clqi, define Si = Clqi(Clq1Clq2...Clqi-1) Ri = Clqi-Si. Tree of cliques: - (hyper) nodes: cliques - root: Clq1 - for each clique Clqi, its “father” candidates are cliques Clqk with k < i and s.t. Si  Clqk (if more than one candidate, random selection)

22 Example (5): trees S2 = Clq2 Clq1{Clq1
S3 = Clq3(Clq1Clq2){E,CClq2 S4 = Clq4(Clq1Clq2 Clq3){GClq3 S5 = Clq5(Clq1Clq2 Clq3.Clq4){C,GClq3 S6 = Clq6( Clq1Clq2 Clq3.Clq4Clq5){CClq2, Clq3, Clq5

23 Propagation Algorithm
Potential Representation of the distribution P(X1, …, Xn): ([W1...Wp], ) is a potential representation of P, where the Wi are subsets of V = {X1, …, Xn}, if P(V) = In a Bayesian network (G, P): P(X1, ..., Xn) = P(Xn| parents(Xn))·...· P(X1| parents(X1)) admits a potential representation P(X1, ..., Xn) = (Clq1) ·(Clq2) · ... · (Clqm) with (Clqi)= ∏{P(Xj | parents(Xj)) | XjClqi, parents(Xj) Clqi ,

24 Propagation Algorithm (2)
Fundamental property of the potential representations: Let ([W1, ..., Wm], ) be a potential representation for P Evidence: X3 = a and X5 = b. Problem: update the probabilitaty P’(X1, ..., Xn) = P(X1, ..., Xn| X3=a,X5 = b) ?? Define: W^i = Wi - {X3, X5} ^(W^i) =  (Wi  (X3=a, X5=b)) ([W^1, ..., W^m], ^) is a potential representation for P'.

25 Example (6): potentials
Clq1 = {A,B}, Clq2 = {B,E,C}, Clq3 = {E,C,G}, Clq4 = {E,G,F}, Clq5 = {C,G,H}, Clq6 = {C,D} P(A,B,C,D,E,F,G,H) = P(D | C) P(H | C, G) P(C | B, E) P(G | F) P(E | F) P(F) P(B | A) P(A) (Clq1) = P(A)· P(B | A) (Clq2) = P(C | B,E), (Clq3) = 1 (Clq4) = P(F).P(E | F).P(G | F), (Clq5) = P(H | C, G) (Clq6) = P(D | C) P(A,B,C,D,E,F,G,H) = (Clq1) • …. • (Clq6)

26 Example(6): potentials
(Clq1) = P(A)· P(B | A) (a,b) = P(a) · P(b | a) = 0.005 (¬a,b) = P(¬a) · P(b | ¬a) = (a,¬b) = P(a) · P(¬b | a) = (¬a,¬b) = P(¬a) · P(¬b | ¬a) = (Clq5) = P(H | C, G) (c,g,h) = P(h | c,g) = 0.9 (c,g,¬h) = P(¬h | c,g) = 0.1 (c,¬g,h) = P(h | c,¬g) = 0.7 (c,¬g,¬h) = P(¬h | c,¬g) = 0.3 (¬c,g,h) = P(h | ¬c,g) = 0.8 (¬c,g,¬h) = P(¬h | ¬c,g) = 0.2 (¬c,¬g,h) = P(h | ¬c,¬g) = 0.1 (¬c,¬g,¬h) = P(¬h | ¬c,¬g) = 0.9

27

28 Propagation algorithm: theoretical resultats
Causal network (G, P) ([Clq1, ..., Clqp], ) is a potential representation for P 1) P(Clqi) = P(Ri|Si).P(Si) 2) P(Rp|Sp) = , where is the marginal of the function  with respect to the variables of Rp. 3) If father(Clqp) = Clqj, then ([Clq1,...Clqp-1], ') is a potential representation for the marginal distribution of P(V-Rp) where: '(Clqi)=Clqi) for all i≠j, i < p '(Clqj)=Clqj)

29 Propagation algorithm: step by step (2)
Goal: to compute P(Clqi) for all cliques. Two graph traversals: one bottom-up and one top-down BU) start with clique Clqp . Combining properties 2 i 3 we have, an iterative form of computing the conditional distributions P(Ri|Si) in each clique until reaching the root clique Clq1. Root: P(Clq1)=P(R1|S1). TD) P(S2)= , and from there P(Si)= --we can always compute in a clique Clqi the distribution P(Si) whenever we have already computed the distribution of its father clique Clqj --

30 P(Ri | Si) P(Si) P(Clqi) = P(Ri,Si) = P(Ri | Si) P(Si)

31 Case 1) (Clqi) (Clqi) (Clqi) Clqi P(Ri|Si) = = Ri(Clqi) i(Si) Case 2) ’(Clqi) = (Clqi) j(Sj) k(Sk) (Clqi) Clqi Clqi Clqj Clqk Clqj Clqk

32 2(S2) 3(S3) 4(S4) 5(S5) 6(S6)

33 Example (7) A) Bottom-up traversal: passing k(Sk) = Rk(Clqk),
Clique Clq6 = {C,D} (R6= {D}, S6 = {C}). P(R6|S6) = P(D | C) = 6(c) = (c, d) + (c, ¬d) = = 1 6(¬c) = (¬c, d) + (¬c, ¬d) = = 1, P(d | c) = P(¬d | c) = 0.02 P(d | ¬c) = P(¬d | ¬c) = 0.95

34 Example (7) Clique Clq5 = {C, G, H} (R5 = {H}, S5 = {C, G}).
This node is clique Clq6’s father. According to point [3], we modify the potential function of the clique Clq5: '(Clq5)=Clq5) P(R5 | S5) = P(H | C,G) = where 5(C,G) = 5(c,g) = '(c, g, h) + '(e, g, ¬h) = = 1 5(c,¬g) = '(c, ¬g, f) + '(c, ¬g, ¬h) = = 1 5(¬c,g) = … = 5(c,¬g) = ...= 1

35 Exemple (7) Clique Clq3 = {E,C,G} (R3 = {G}, S3 = {E,C})
Clq3 is father of two cliques: Clq4 and Clq5, both already processed '(Clq3) = Clq3) R(Clq4) · R(Clq5) = (Clq5) · 4(S4) · 5(S5) '(E,C,G) = E,C,G) · 4(E,G) · 5(C,G) P(R3 | S3) = P(G | E, C) = where 3(E,C) =

36 Example (7) Root: Clique Clq1 = {A, B} (R1 = {A, B}, S1 = ).
'(A,B)=A,B) · 2(B) P(R1) = P(R1 | S1) = where 1 = '(a,b) + '(a,¬b)+'(¬a,b)+'(¬a,¬b). P(A,B) = A,B) : P(a,b) = 0.005, P(a, ¬b) = , P(¬a, b) = 0.099, P(¬a, ¬b) =

37 Clqi P(Clqi) = P(Ri|Si).P(Si) Clqj Clqk P(Sj) = Clqi -Sj P(Clqi) = i(Sj) P(Sk) = Clqi -Sk P(Clqi) = i(Sk)

38 1(S2) 2(S3) 3(S4) 3(S5) 5(S6)

39 Example (7) Top-down traversal:
Clique Clq2 = {B,E,C} (R2 = {E,C}, S2 = {B}). P(B) = P(S2) = P(b) = P(a, b) + P(¬a, b) = = , P(¬b) = P(a, ¬b) + P(¬a, ¬b) = = 0.896 *** P(Clq2) = P(R2 | S2) · P(S2)

40 Example (7) Clique Clq3 = {E,C,G} (R3 = G, S3 = {E,C}).
we have to compute P(S3) i P(Clq3) Clique Clq4 ={E, G, F} (R4 = {F}, S4 = {E,G}). we have to compute P(S4) i P(Clq4) Clique Clq5 = {C, G, H} (R5 = {H}, S5 = {C, G}). we have to compute P(S5) i P(Clq5) Clique Clq6 = {C,D} (R6= {D}, S6 = {C}). we have to compute P(S6) i P(Clq6)

41

42 Summary Given a Bayesian network BN = (G, P), we have seen how
1) To transform G into a tree of cliques and factorize P as P(X1, ..., Xn) = (Clq1) ·(Clq2) ·... · (Clqm) where (Clqi)= ∏{P(Xj | parents(Xj)) | XjClqi, parents(Xj) Clqi, 2) To compute the probabilty distributions P(Clqi) with a propagation algorithm, and from there, to compute the probabilities P(Xj) for Xj Clqi, by marginalization.

43 P(X1, ..., Xn) = (Clq1) ·(Clq2) ·... · (Clqm)
Probability updating It remains to see how to perform inference, i.e. how to update probabilities P(Xj) when some information (evidence E) is available about some variables: P(Xj) > P*(Xj) = P(Xj | E) The updating mechanism is based in a fundamental property of the potential representations when applied to P(X1, ..., Xn) and its potential representation in terms of cliques: P(X1, ..., Xn) = (Clq1) ·(Clq2) ·... · (Clqm)

44 Updating mechanism Recall:
Let ([Clq1, ..., Clqm], ) be a potential representation for P(X1, …, Xn). We observe: X3 = a and X5 = b. Actualització de la probabilitat: P*(X1,X2,X4,X6,..., Xn) = P(X1, ...,Xn| X3=a,X5 = b) Define: Clq^i = Clqi - {X3, X5} ^(Clq^i) =  (Clqi  (X3=a, X5=b)) ([Clq^1, ..., Clq^m], ^) is a potential representation for P*.

45 Updating mechanism Based on three steps:
build the new tree of cliques obtained by deleting from the original tree the instantiated variables, B) re-compute the new potential functions ^ corresponding to the new cliques and, finally, C) apply the propagation algorithm over the new tree of cliques and potential functions.

46 Clq1 Clq’1 Clq2 Clq’2 Clq3 Clq’3 Clq5 Clq’5 Clq4 Clq’4 Clq6 Clq’6
A,B Clq’1 B Clq2 B,E,C Clq’2 B,E,C Clq3 E,C,G Clq’3 E,C,G E,G,F Clq5 C,G,H E,G,F Clq’5 C,G Clq4 Clq’4 Clq6 C,D Clq’6 C,D P(Xj) A = a, H = b P*(Xj) = P(Xj | X=a,H=h)

47 A = a, H = b

48 A = a, H = b

49

50 P(D = d | A = a, H = h) ?


Download ppt "Bayesian Networks A causal probabilistic network, or Bayesian network,"

Similar presentations


Ads by Google