I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

I NFERENCE IN B AYESIAN N ETWORKS

A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination Monte-Carlo methods

S OME A PPLICATIONS OF BN Medical diagnosis Troubleshooting of hardware/software systems Fraud/uncollectible debt detection Data mining Analysis of genetic sequences Data interpretation, computer vision, image understanding

M ORE C OMPLICATED S INGLY -C ONNECTED B ELIEF N ET Radio Battery SparkPlugs Starts Gas Moves

Region = {Sky, Tree, Grass, Rock} R2 R4 R3 R1 Above

BN to evaluate insurance risks

BN FROM L AST L ECTURE BurglaryEarthquake Alarm MaryCallsJohnCalls causes effects Directed acyclic graph Intuitive meaning of arc from x to y: “x has direct influence on y”

A RCS DO NOT NECESSARILY ENCODE CAUSALITY ! A B C C B A 2 BN’s that can encode the same joint probability distribution

R EADING OFF INDEPENDENCE RELATIONSHIPS Given B, does the value of A affect the probability of C? P(C|B,A) = P(C|B)? No! C parent’s (B) are given, and so it is independent of its non-descendents (A) Independence is symmetric: C  A | B => A  C | B A B C

W HAT DOES THE BN ENCODE ? Burglary  Earthquake JohnCalls  MaryCalls | Alarm JohnCalls  Burglary | Alarm JohnCalls  Earthquake | Alarm MaryCalls  Burglary | Alarm MaryCalls  Earthquake | Alarm BurglaryEarthquake Alarm MaryCallsJohnCalls A node is independent of its non-descendents, given its parents

R EADING OFF INDEPENDENCE RELATIONSHIPS How about Burglary  Earthquake | Alarm ? No! Why? BurglaryEarthquake Alarm MaryCallsJohnCalls

R EADING OFF INDEPENDENCE RELATIONSHIPS How about Burglary  Earthquake | JohnCalls? No! Why? Knowing JohnCalls affects the probability of Alarm, which makes Burglary and Earthquake dependent BurglaryEarthquake Alarm MaryCallsJohnCalls

I NDEPENDENCE RELATIONSHIPS Rough intuition (this holds for tree-like graphs, polytrees): Evidence on the (directed) road between two variables makes them independent Evidence on an “A” node makes descendants independent Evidence on a “V” node, or below the V, makes the ancestors of the variables dependent (otherwise they are independent) Formal property in general case : D-separation  independence (see R&N)

B ENEFITS OF S PARSE M ODELS Modeling Fewer relationships need to be encoded (either through understanding or statistics) Large networks can be built up from smaller ones Intuition Dependencies/independencies between variables can be inferred through network structures Tractable inference

BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 T OP -D OWN INFERENCE Suppose we want to compute P(Alarm)

BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 T OP -D OWN INFERENCE Suppose we want to compute P(Alarm) 1.P(Alarm) = Σ b,e P(A,b,e) 2.P(Alarm) = Σ b,e P(A|b,e)P(b)P(e) Suppose we want to compute P(Alarm) 1.P(Alarm) = Σ b,e P(A,b,e) 2.P(Alarm) = Σ b,e P(A|b,e)P(b)P(e)

BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 T OP -D OWN INFERENCE Suppose we want to compute P(Alarm) 1.P(Alarm) = Σ b,e P(A,b,e) 2.P(Alarm) = Σ b,e P(A|b,e)P(b)P(e) 3.P(Alarm) = P(A|B,E)P(B)P(E) + P(A|B,  E)P(B)P(  E) + P(A|  B,E)P(  B)P(E) + P(A|  B,  E)P(  B)P(  E) Suppose we want to compute P(Alarm) 1.P(Alarm) = Σ b,e P(A,b,e) 2.P(Alarm) = Σ b,e P(A|b,e)P(b)P(e) 3.P(Alarm) = P(A|B,E)P(B)P(E) + P(A|B,  E)P(B)P(  E) + P(A|  B,E)P(  B)P(E) + P(A|  B,  E)P(  B)P(  E)

BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 T OP -D OWN INFERENCE Suppose we want to compute P(Alarm) 1.P(A) = Σ b,e P(A,b,e) 2.P(A) = Σ b,e P(A|b,e)P(b)P(e) 3.P(A) = P(A|B,E)P(B)P(E) + P(A|B,  E)P(B)P(  E) + P(A|  B,E)P(  B)P(E) + P(A|  B,  E)P(  B)P(  E) 4.P(A) = 0.95*0.001*0.002 + 0.94*0.001*0.998 + 0.29*0.999*0.002 + 0.001*0.999*0.998 = 0.00252 Suppose we want to compute P(Alarm) 1.P(A) = Σ b,e P(A,b,e) 2.P(A) = Σ b,e P(A|b,e)P(b)P(e) 3.P(A) = P(A|B,E)P(B)P(E) + P(A|B,  E)P(B)P(  E) + P(A|  B,E)P(  B)P(E) + P(A|  B,  E)P(  B)P(  E) 4.P(A) = 0.95*0.001*0.002 + 0.94*0.001*0.998 + 0.29*0.999*0.002 + 0.001*0.999*0.998 = 0.00252

BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 T OP -D OWN INFERENCE Now, suppose we want to compute P(MaryCalls)

BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 T OP -D OWN INFERENCE Now, suppose we want to compute P(MaryCalls) 1.P(M) = P(M|A)P(A) + P(M|  A) P(  A) Now, suppose we want to compute P(MaryCalls) 1.P(M) = P(M|A)P(A) + P(M|  A) P(  A)

BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 T OP -D OWN INFERENCE Now, suppose we want to compute P(MaryCalls) 1.P(M) = P(M|A)P(A) + P(M|  A) P(  A) 2.P(M) = 0.70*0.00252 + 0.01*(1-0.0252) = 0.0117 Now, suppose we want to compute P(MaryCalls) 1.P(M) = P(M|A)P(A) + P(M|  A) P(  A) 2.P(M) = 0.70*0.00252 + 0.01*(1-0.0252) = 0.0117

BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 T OP -D OWN INFERENCE WITH E VIDENCE Suppose we want to compute P(Alarm|Earthquake)

T OP -D OWN INFERENCE Only works if the graph of ancestors of a variable is a polytree Evidence given on ancestor(s) of the query variable Efficient: O(d 2 k ) time, where d is the number of ancestors of a variable, with k a bound on # of parents Evidence on an ancestor cuts off influence of portion of graph above evidence node

Q UERYING THE BN The BN gives P(T|C) What about P(C|T)? Cavity Toothache P(C) 0.1 CP(T|C) TFTF 0.4 0.01111

B AYES ’ R ULE P(A  B) = P(A|B) P(B) = P(B|A) P(A) So… P(A|B) = P(B|A) P(A) / P(B)

A PPLYING B AYES ’ R ULE Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) What’s P(B)?

A PPLYING B AYES ’ R ULE Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) What’s P(B)? P(B) =  a P(B,A=a) [marginalization] P(B,A=a) = P(B|A=a)P(A=a)[conditional probability] So, P(B) =  a P(B | A=a) P(A=a)

A PPLYING B AYES ’ R ULE Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) What’s P(A|B)?

H OW DO WE READ THIS ? P(A|B) = P(B|A)P(A) / [  a P(B | A=a) P(A=a)] [An equation that holds for all values A can take on, and all values B can take on] P(A=a|B=b) =

Q UERYING THE BN The BN gives P(T|C) What about P(C|T)? P(Cavity|Toothache) = P(Toothache|Cavity) P(Cavity) P(Toothache) [Bayes’ rule] Querying a BN is just applying Bayes’ rule on a larger scale… Cavity Toothache P(C) 0.1 CP(T|C) TFTF 0.4 0.01111 Denominator computed by summing out numerator over Cavity and  Cavity

P ERFORMING I NFERENCE Variables X Have evidence set E = e, query variable Q Want to compute the posterior probability distribution over Q, given E = e Let the non-evidence variables be Y (= X \ E ) Straight forward method: 1. Compute joint P( Y  E = e ) 2. Marginalize to get P(Q, E = e ) 3. Divide by P( E = e ) to get P(Q| E = e )

I NFERENCE IN THE A LARM E XAMPLE BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 P(J|M) = ?? Query Q Evidence E=e

I NFERENCE IN THE A LARM E XAMPLE BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 P(J|MaryCalls) = ?? 1. P(J,A,B,E,MaryCalls) = P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) P(x 1  x 2  …  x n ) =  i=1,…,n P(x i |parents(X i ))  full joint distribution table 2 4 entries

I NFERENCE IN THE A LARM E XAMPLE BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 P(J|MaryCalls) = ?? 1. P(J,A,B,E,MaryCalls) = P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =  a,b,e P(J,A=a,B=b,E=e,MaryCalls) 2 entries: one for JohnCalls, the other for  JohnCalls

I NFERENCE IN THE A LARM E XAMPLE BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 P(J|MaryCalls) = ?? 1. P(J,A,B,E,MaryCalls) = P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =  a,b,e P(J,A=a,B=b,E=e,MaryCalls) 3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls) = P(J,MaryCalls)/(  j P(j,MaryCalls))

H OW EXPENSIVE ? P( X ) = P(x 1  x 2  …  x n ) =  i=1,…,n P(x i |parents(X i )) Straightforward method: 1. Use above to compute P( Y, E = e ) 2. P(Q, E = e ) =  y1 …  yk P( Y, E = e ) 3. P( E = e ) =  q P(Q, E = e ) Step 1: O( 2 n-|E| ) entries! Normalization factor – no big deal once we have P(Q,E=e) Can we do better?

V ARIABLE E LIMINATION Consider linear network X 1  X 2  X 3 P( X ) = P(X 1 ) P(X 2 |X 1 ) P(X 3 |X 2 ) P(X 3 ) = Σ x1 Σ x2 P(x 1 ) P(x 2 |x 1 ) P(X 3 |x 2 )

V ARIABLE E LIMINATION Consider linear network X 1  X 2  X 3 P( X ) = P(X 1 ) P(X 2 |X 1 ) P(X 3 |X 2 ) P(X 3 ) = Σ x1 Σ x2 P(x 1 ) P(x 2 |x 1 ) P(X 3 |x 2 ) = Σ x2 P(X 3 |x 2 ) Σ x1 P(x 1 ) P(x 2 |x 1 ) = Σ x2 P(X 3 |x 2 ) P(x 2 ) Computed for each value of X 2 How many * and + saved? *: 2*4*2=16 vs 4+4=8 + 2*3=8 vs 2+1=3 Can lead to huge gains in larger networks

VE IN A LARM E XAMPLE P(E|j,m)=P(E,j,m)/P(j,m) P(E,j,m) = Σ a Σ b P(E) P(b) P(a|E,b) P(j|a) P(m|a)

W HAT ORDER TO PERFORM VE? For tree-like BNs (polytrees), order so parents come before children # of variables in each intermediate probability table is 2^(# of parents of a node) If the number of parents of a node is bounded, then VE is linear time! Other networks: intermediate factors may become large

N ON - POLYTREE NETWORKS P(D) = Σ a Σ b Σ c P(A)P(B|A)P(C|A)P(D|B,C) = Σ b Σ c P(D|B,C) Σ a P(A)P(B|A)P(C|A) A BC D No more simplifications…

A PPROXIMATE I NFERENCE T ECHNIQUES Based on the idea of Monte Carlo simulation Basic idea: To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed Conditional simulation: To estimate the probability P(H) that a coin picked out of bucket B flips heads, I can: 1. Pick a coin C out of B (occurs with probability P(C)) 2. Flip C and observe whether it flips heads (occurs with probability P(H|C)) 3. Put C back and repeat from step 1 many times 4. Return the fraction of heads observed (estimate of P(H))

A PPROXIMATE I NFERENCE : M ONTE -C ARLO S IMULATION Sample from the joint distribution BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 B=0 E=0 A=0 J=1 M=0

A PPROXIMATE I NFERENCE : M ONTE -C ARLO S IMULATION As more samples are generated, the distribution of the samples approaches the joint distribution! B=0 E=0 A=0 J=1 M=0 B=0 E=0 A=0 J=0 M=0 B=0 E=0 A=0 J=0 M=0 B=1 E=0 A=1 J=1 M=0

A PPROXIMATE I NFERENCE : M ONTE -C ARLO S IMULATION Inference: given evidence E = e (e.g., J=1) Remove the samples that conflict B=0 E=0 A=0 J=1 M=0 B=0 E=0 A=0 J=0 M=0 B=0 E=0 A=0 J=0 M=0 B=1 E=0 A=1 J=1 M=0 Distribution of remaining samples approximates the conditional distribution!

H OW MANY SAMPLES ?

R ARE E VENT P ROBLEM : What if some events are really rare (e.g., burglary & earthquake ?) # of samples must be huge to get a reasonable estimate Solution: likelihood weighting Enforce that each sample agrees with evidence While generating a sample, keep track of the ratio of (how likely the sampled value is to occur in the real world) (how likely you were to generate the sampled value)

L IKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5 BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 w=1

L IKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5 BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 B=0 E=1 w=0.008

L IKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5 BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 B=0 E=1 A=1 w=0.0023 A=1 is enforced, and the weight updated to reflect the likelihood that this occurs

L IKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5 BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 B=0 E=1 A=1 M=1 J=1 w=0.0016

L IKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5 BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 B=0 E=0 w=3.988

L IKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5 BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 B=0 E=0 A=1 w=0.004

L IKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5 BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 B=1 E=0 A=1 w=0.00375

L IKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5 BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 B=1 E=1 A=1 M=1 J=1 w=5e-7

L IKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5 N=4 gives P(B|A,M)~=0.371 Exact inference gives P(B|A,M) = 0.375 B=0 E=1 A=1 M=1 J=1 w=0.0016 B=0 E=0 A=1 M=1 J=1 w=0.0028 B=1 E=0 A=1 M=1 J=1 w=0.0026 B=1 E=1 A=1 M=1 J=1 w~=0

R ECAP Efficient inference in BNs Variable elimination Approximate methods: Monte-Carlo sampling

N EXT L ECTURE Statistical learning: from data to distributions R&N 20.1-2

I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

Similar presentations

Presentation on theme: "I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

Similar presentations

Presentation on theme: "I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination."— Presentation transcript:

Similar presentations

About project

Feedback