Presentation is loading. Please wait.

Presentation is loading. Please wait.

Directed Graphical Probabilistic Models: the sequel

Similar presentations


Presentation on theme: "Directed Graphical Probabilistic Models: the sequel"— Presentation transcript:

1 Directed Graphical Probabilistic Models: the sequel
William W. Cohen Machine Learning Feb 2008

2 Summary of Monday(1): Bayes nets
Many problems can be solved using the joint probability P(X1,…,Xn). Bayes nets describe a way to compactly write the joint. For a Bayes net: A P(A) 1 0.33 2 3 First guess The money B P(B) 1 0.33 2 3 A B Stick or swap? C The goat D A B C P(C|A,B) 1 2 0.5 3 1.0 E Conditional independence: Second guess A C D P(E|A,C,D)

3 Aside: Conditional Independence?
(Chain rule – always true) (Fancy version of c.r.) (Def’n of cond. Indep.) Caveat divisor: we’ll usually assume no probabilities are zero, so division is safe….

4 Summary of Monday(2): d-separation
X E Y There are three ways paths from X to Y given evidence E can be blocked. X is d-separated from Y given E iff all paths from X to Y given E are blocked…see there?  If X is d-separated from Y given E, then I<X,E,Y> Z Z Z

5 d-separation continued…
X Y Question: is E X P(X) 0.5 1 ? E Y P(Y|E) 0.5 1 It depends…on the CPTs X E P(E|X) 0.01 1 0.99 This is why d-separation => independence but not the converse… E Y P(Y|E) 0.01 1 0.99

6 d-separation continued…
X Y Question: is E Yes! ?

7 d-separation continued…
X Y Question: is E Yes! ? Bayes rule Fancier version of B.R. From previous slide…

8 An aside: computations with Bayes Nets
X Y Question: what is ? E Main point: inference has no preferred direction in a Bayes net

9 d-separation X E Y Z Z Z

10 d-separation X E Y ? Z ? Z Z

11 d-separation continued…
X Y Question: is E Yes! ?

12 d-separation continued…
X Y Question: is E No ? E P(E) 0.5 1

13 d-separation X E Y ? Z ? Z Z

14 d-separation continued…
X Y Question: is E Yes! ?

15 d-separation continued…
X Y Question: is E ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01

16 d-separation continued…
X Y Question: is E No! ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01

17 d-separation continued…
X Y Question: is E No! ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01

18 “Explaining away” NO X Y E YES This is “explaining away”:
E is common symptom of two causes, X and Y After observing E=1, both X and Y become more probable After observing E=1 and X=1, Y becomes less probable since X alone is enough to “explain” E

19 “Explaining away” and common-sense
Historical note: Classical logic is monotonic: the more you know, the more you deduce. “Common-sense” reasoning is not monotonic birds fly but, not after being cooked for 20min/lb at 350o F This led to numerous “non-monotonic logics” for AI This examples shows that Bayes nets are not monotonic If P(Y|E) is “your belief” in Y after observing E, and P(Y|X,E) is “your belief” in Y after observing E,X your belief in Y decreases after you discover X

20 A special case: linear chain networks
X1 Xn ... Xj ... d-separation “backward” “forward”

21 A special case: linear chain networks
X1 Xn ... Xj ... Fwd: Recursion! (fwd) CPT entry

22 A special case: linear chain networks
X1 Xn ... Xj ... Back: Chain rule CPT Recursion backward

23 A special case: linear chain networks
X1 Xn ... Xj ... “backward” “forward” Instead of recursion: iteratively compute P(Xj|x1) from P(Xj-1|x1) – the forward probabilities iteratively compute P(xn|Xj) from P(xn|Xj+1) – the backward probabilities can view the forward computations as passing a “message” forward and vice versa

24 Linear-chain message passing
How long is this line? 1 1 Xj ... Xn X1 E+ E- j-1 n-j 2 ? ? How many ahead? How many behind?

25 Linear-chain message passing
P(Xj|E) = P(X|E+)P(X|E-) … true by d-separation Xj ... Xn X1 E+ E- Pass forward: P(Xj|E+)…computed from P(Xj-1|E+) and CPT for Xj Pass backward: P(Xj|E-)…computed from P(Xj+1|E-) and CPT for Xj+1

26 Inference in Bayes nets
General problem: given evidence E1,…,Ek compute P(X|E1,..,Ek) for any X Big assumption: graph is “polytree” <=1 undirected path between any nodes X,Y Notation: U1 U2 X Z1 Z2 Y1 Y2

27 Inference in Bayes nets: P(X|E)

28 Inference in Bayes nets: P(X|E+)
d-sep – write as product d-sep.

29 Inference in Bayes nets: P(X|E+)
d-sep – write as product d-sep. CPT table lookup So far: simple way of propogating “belief due to causal evidence” up the tree Recursive call to P(.|E+)

30 Inference in Bayes nets: P(E-|X)
recursion

31 Inference in Bayes nets: P(E-|X)
recursion

32 Inference in Bayes nets: P(E-|X)

33 Inference in Bayes nets: P(E-|X)
Recursive call to P(.|E) CPT Recursive call to P(E-|.) where

34 More on Message Passing
We reduced P(X|E) to product of two recursively calculated parts: P(X=x|E+) i.e., CPT for X and product of “forward” messages from parents P(E-|X=x) i.e., combination of “backward” messages from parents, CPTs, and P(Z|EZ\Yk), a simpler instance of P(X|E) This can also be implemented by message-passing (belief propogation)

35 Learning for Bayes nets
Input: Sample of the joint: Graph structure of the variables for I=1,…,N, you know Xi and parents(Xi) Output: Estimated CPTs A B B P(B) 1 0.33 2 3 C D Method (discrete variables): Estimate each CPT independently Use a MLE or MAP A B C P(C|A,B) 1 2 0.5 3 1.0 E

36 Learning for Bayes nets
Method (discrete variables): Estimate each CPT independently Use a MLE or MAP MLE: A B B P(B) 1 0.33 2 3 C D A B C P(C|A,B) 1 2 0.5 3 1.0 E

37 MAP estimates The beta distribution:
“pseudo-data”: like hallucinating a few heads and a few tails

38 The Dirichlet distribution:
MAP estimates The Dirichlet distribution: “pseudo-data”: like hallucinating αi examples of X=i for each value of i

39 Learning for Bayes nets
Method (discrete variables): Estimate each CPT independently Use a MLE or MAP MAP: A B B P(B) 1 0.33 2 3 C D A B C P(C|A,B) 1 2 0.5 3 1.0 E

40 Additional reading ftp://ftp.research.microsoft.com/pub/tr/tr pdf A Tutorial on Learning With Bayesian Networks, Heckerman


Download ppt "Directed Graphical Probabilistic Models: the sequel"

Similar presentations


Ads by Google