Directed Graphical Probabilistic Models: the sequel

Directed Graphical Probabilistic Models: the sequel
William W. Cohen Machine Learning Feb 2008

Summary of Monday(1): Bayes nets
Many problems can be solved using the joint probability P(X1,…,Xn). Bayes nets describe a way to compactly write the joint. For a Bayes net: A P(A) 1 0.33 2 3 First guess The money B P(B) 1 0.33 2 3 A B Stick or swap? C The goat D A B C P(C|A,B) 1 2 0.5 3 1.0 … E Conditional independence: Second guess A C D P(E|A,C,D) …

Aside: Conditional Independence?
(Chain rule – always true) (Fancy version of c.r.) (Def’n of cond. Indep.) Caveat divisor: we’ll usually assume no probabilities are zero, so division is safe….

Summary of Monday(2): d-separation
X E Y There are three ways paths from X to Y given evidence E can be blocked. X is d-separated from Y given E iff all paths from X to Y given E are blocked…see there?  If X is d-separated from Y given E, then I<X,E,Y> Z Z Z

d-separation continued…
X Y Question: is E X P(X) 0.5 1 ? E Y P(Y|E) 0.5 1 It depends…on the CPTs X E P(E|X) 0.01 1 0.99 This is why d-separation => independence but not the converse… E Y P(Y|E) 0.01 1 0.99

X Y Question: is E Yes! ?

X Y Question: is E Yes! ? Bayes rule Fancier version of B.R. From previous slide…

An aside: computations with Bayes Nets
X Y Question: what is ? E Main point: inference has no preferred direction in a Bayes net

d-separation X E Y Z Z Z

d-separation X E Y ? Z ? Z Z

X Y Question: is E No ? E P(E) 0.5 1

d-separation X E Y ? Z ? Z Z

X Y Question: is E ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01

X Y Question: is E No! ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01

“Explaining away” NO X Y E YES This is “explaining away”:
E is common symptom of two causes, X and Y After observing E=1, both X and Y become more probable After observing E=1 and X=1, Y becomes less probable since X alone is enough to “explain” E

“Explaining away” and common-sense
Historical note: Classical logic is monotonic: the more you know, the more you deduce. “Common-sense” reasoning is not monotonic birds fly but, not after being cooked for 20min/lb at 350o F This led to numerous “non-monotonic logics” for AI This examples shows that Bayes nets are not monotonic If P(Y|E) is “your belief” in Y after observing E, and P(Y|X,E) is “your belief” in Y after observing E,X your belief in Y decreases after you discover X

A special case: linear chain networks
X1 Xn ... Xj ... d-separation “backward” “forward”

X1 Xn ... Xj ... Fwd: Recursion! (fwd) CPT entry

X1 Xn ... Xj ... Back: Chain rule CPT Recursion backward

X1 Xn ... Xj ... “backward” “forward” Instead of recursion: iteratively compute P(Xj|x1) from P(Xj-1|x1) – the forward probabilities iteratively compute P(xn|Xj) from P(xn|Xj+1) – the backward probabilities can view the forward computations as passing a “message” forward and vice versa

Linear-chain message passing
How long is this line? 1 1 Xj ... Xn X1 E+ E- j-1 n-j 2 … … ? ? … How many ahead? … How many behind?

Inference in Bayes nets
General problem: given evidence E1,…,Ek compute P(X|E1,..,Ek) for any X Big assumption: graph is “polytree” <=1 undirected path between any nodes X,Y Notation: U1 U2 X Z1 Z2 Y1 Y2

Inference in Bayes nets: P(X|E)

Inference in Bayes nets: P(X|E+)
d-sep – write as product d-sep.

Inference in Bayes nets: P(X|E+)
d-sep – write as product d-sep. CPT table lookup So far: simple way of propogating “belief due to causal evidence” up the tree Recursive call to P(.|E+)

Inference in Bayes nets: P(E-|X)
recursion

Recursive call to P(.|E) CPT Recursive call to P(E-|.) where

More on Message Passing
We reduced P(X|E) to product of two recursively calculated parts: P(X=x|E+) i.e., CPT for X and product of “forward” messages from parents P(E-|X=x) i.e., combination of “backward” messages from parents, CPTs, and P(Z|EZ\Yk), a simpler instance of P(X|E) This can also be implemented by message-passing (belief propogation)

Learning for Bayes nets
Input: Sample of the joint: Graph structure of the variables for I=1,…,N, you know Xi and parents(Xi) Output: Estimated CPTs A B B P(B) 1 0.33 2 3 C D Method (discrete variables): Estimate each CPT independently Use a MLE or MAP A B C P(C|A,B) 1 2 0.5 3 1.0 … E …

Method (discrete variables): Estimate each CPT independently Use a MLE or MAP MLE: A B B P(B) 1 0.33 2 3 C D A B C P(C|A,B) 1 2 0.5 3 1.0 … E …

MAP estimates The beta distribution:
“pseudo-data”: like hallucinating a few heads and a few tails

The Dirichlet distribution:
MAP estimates The Dirichlet distribution: “pseudo-data”: like hallucinating αi examples of X=i for each value of i

Method (discrete variables): Estimate each CPT independently Use a MLE or MAP MAP: A B B P(B) 1 0.33 2 3 C D A B C P(C|A,B) 1 2 0.5 3 1.0 … E …

Additional reading ftp://ftp.research.microsoft.com/pub/tr/tr pdf A Tutorial on Learning With Bayesian Networks, Heckerman

Directed Graphical Probabilistic Models: the sequel

Similar presentations

Presentation on theme: "Directed Graphical Probabilistic Models: the sequel"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Directed Graphical Probabilistic Models: the sequel

Similar presentations

Presentation on theme: "Directed Graphical Probabilistic Models: the sequel"— Presentation transcript:

Similar presentations

About project

Feedback