Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.

Similar presentations


Presentation on theme: "Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration."— Presentation transcript:

1 Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration ◦Exact inference by variable elimination ◦Approximate inference by stochastic simulation ◦Approximate inference by Markov chain Monte Carlo

2 Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: ◦a set of nodes, one per variable ◦a directed, acyclic graph (link ≈ “directly influences”) ◦a conditional distribution for each node given its parents: P(X i |Parents(X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values

3 Bayes’ Nets A Bayes’ net is an efficient encoding of a probabilistic model of a domain Questions we can ask: ◦Inference: given a fixed BN, what is P(X | e)? ◦Representation: given a BN graph, what kinds of distributions can it encode? ◦Modeling: what BN is most appropriate for a given domain?

4 Bayes’ Net Semantics Let’s formalize the semantics of a Bayes’ net A set of nodes, one per variable X A directed, acyclic graph A conditional distribution for each node ◦A collection of distributions over X, one for each combination of parents’ values ◦CPT: conditional probability table ◦Description of a noisy “causal” process A Bayes net = Topology (graph) + Local Conditional Probabilities

5 Topology Limits Distributions Given some graph topology G, only certain joint distributions can be encoded The graph structure guarantees certain (conditional) independences (There might be more independence) Adding arcs increases the set of distributions, but has several costs Full conditioning can encode any distribution

6 Independence in a BN Important question about a BN: ◦Are two nodes independent given certain evidence? ◦If yes, can prove using algebra (tedious in general) ◦If no, can prove with a counter example ◦Example: XYZ ◦Question: are X and Z necessarily independent? Answer: no.Example: low pressure causes rain, which causes traffic. X can influence Z, Z can influence X (via Y) Addendum: they could be independent: how?

7 Causal Chains This configuration is a “causal chain” X: Low pressure XYZXYZ Y: Rain Z: Traffic ◦Is X independent of Z given Y? Yes! ◦Evidence along the chain “blocks” the influence

8 Common Cause Another basic configuration: two effects of the same cause ◦Are X and Z independent? ◦Are X and Z independent given Y? XZ Y: Project due X: Newsgroup busy Z: Lab full Y ◦Observing the cause blocks influence between effects. Yes!

9 Common Effect XZ Last configuration: two causes of one effect (v-structures) ◦Are X and Z independent? Yes: the ballgame and the rain cause traffic, but they are not correlated Still need to prove they must be (try it!) ◦Are X and Z independent given Y? No: seeing traffic puts the rain and the ballgame in competition as explanation? ◦This is backwards from the other cases Observing an effect activates influence between possible causes. Y X: Raining Z: Ballgame Y: Traffic

10 The General Case Any complex example can be analyzed using these three canonical cases General question: in a given BN, are two variables independent (given evidence)? Solution: analyze the graph

11 Reachability Recipe: shade evidence nodes Attempt 1: if two nodes are connected by an undirected path not blocked by a shaded node, they are conditionally independent L Almost works, but not quite ◦Where does it break? ◦Answer: the v-structure at T doesn’t count as a link in a path unless “active” R T B D

12 Reachability (D-Separation) Question: Are X and Y conditionally independent given evidence vars {Z}? ◦Yes, if X and Y “separated” by Z ◦Look for active paths from X to Y ◦No active paths = independence! A path is active if each triple is active: ActiveTriplesXBActiveTriplesXB Y Inactive Triples XB Y X B YX B Y ◦Causal chain X  B  Y where B is unobserved (either direction) ◦Common cause X  B  Y where B is unobserved ◦Common effect (aka v-structure) X  B  Y where B or one of its descendents is observed All it takes to block a path is a single inactive segment X B Y X B Y XY B

13 ExampleExample RB XBYBYYBYBYY X B Inactive Triples YesYes T T’T’ XBYBYYBYBYY B X Active TriplesActive Triples

14 ExampleExample RB L YesYesYesYes XBYBYYBYBYY X B T D T’T’ YesYes XBYBYYBYBYY B X Active TriplesActive Triples

15 ExampleExample Variables: ◦R: Raining ◦T: Traffic ◦D: Roof drips ◦S: I’m sad XBYBYYBYBYY X B Inactive Triples Questions: YesYes XBYBYYBYBYY B X Active TriplesActive Triples

16 Causality? When Bayes’ nets reflect the true causal patterns: ◦Often simpler (nodes have fewer parents) ◦Often easier to think about ◦Often easier to elicit from experts BNs need not actually be causal ◦Sometimes no causal net exists over the domain ◦E.g. consider the variables Traffic and Drips ◦End up with arrows that reflect correlation, not causation What do the arrows really mean? ◦Topology may happen to encode causal structure ◦Topology only guaranteed to encode conditional independence

17 Changing Bayes’ Net Structure The same joint distribution can be encoded in many different Bayes’ nets ◦Causal structure tends to be the simplest Analysis question: given some edges, what other edges do you need to add? ◦One answer: fully connect the graph ◦Better answer: don’t make any false conditional independence assumptions

18 ExampleExample Topology of network encodes conditional independence assertions: Weather is independent of the other variables Toothache and Catch are conditionally independent, given Cavity

19 ExampleExample I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects “causal” knowledge: ◦A burglar can set the alarm off ◦An earthquake can set the alarm off ◦The alarm can cause Mary to call ◦The alarm can cause John to call

20 Example contd. Probabilities derived from prior observations

21 Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values Each row requires one number p for X i =true (the number for X i =false is just 1 - p) If each variable has no more than k parents, the complete network requires O(n · 2 k ) numbers I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2=10 numbers (vs. 2 5 -1 = 31)

22 Global semantics “Global” semantics defines the full joint distribution as the product of the local conditional distributions: P(x 1,..., x n ) = Π n i=1 P(x | parents(X )) iiiiii e.g., P(j ∧ m ∧ a ∧ ¬ b ∧ ¬ e) = P(j | a)P(m | a)P(a |¬ b, ¬ e)P( ¬ b)P( ¬ e) = 0.9 × 0.7 × 0.001 × 0.999 × 0.998 ≈ 0.00063

23 Local semantics Local semantics: each node is conditionally independent of its nondescendants given its parents Theorem: Local semantics ⇔ global semantics

24 Markov blanket Each node is conditionally independent of all others given its Markov blanket: parents + children + children's parents

25 Constructing Bayesian networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics 1. Choose an ordering of variables X ,...,X n 2. For i = 1 to n add X i to the network select parents from X ,...,X i- such that P(X i | Parents(X i )) = P(X i | X ,...,X i- ) This choice of parents guarantees the global semantics: P(X,..., X n )= Π n i=1 P(X | X,...,X iiiii-i- ) (chain rule) = Π n i=1 P(X | Parents(X )) (by construction) iii

26 Example: Problem formulation I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects “causal” knowledge: ◦A burglar can set the alarm off ◦An earthquake can set the alarm off ◦The alarm can cause Mary to call ◦The alarm can cause John to call

27 ExampleExample Suppose we choose the ordering M, J, A, B, E MaryCalls JohnCalls Alarm P(J | M) = P(J)? P(A | J,M) = P(A | J)? P(A | J,M) = P(A)? P(B | A,J,M) = P(B | A)? Yes P(B | A,J,M) = P(B)? P(E | B,A,J,M) = P(E | A)? P(E | B,A,J,M) = P(E | A,B)? Yes Burglary Earthquake NoNo NoNo NoNo NoNo

28 Example contd. MaryCalls JohnCalls Alarm Deciding conditional independence is hard in non-causal directions (Causal models and conditional independence seem hardwired for humans!) Assessing conditional probabilities is hard in non-causal directions Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed Burglary Earthquake

29 Example: Car diagnosis Initial evidence: car won't start Testable variables (green), “broken, so fix it” variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters

30 Example: Car insurance

31 Compact conditional distributions CPT grows exponentially with number of parents CPT becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: X = f(Parents(X)) for some function f E.g., Boolean functions NorthAmerican ⇔ Canadian ∨ US ∨ Mexican E.g., numerical relationships among continuous variables  Level  inflow  precipitation - outflow - evaporation tt

32 Compact conditional distributions contd. Noisy-OR distributions model multiple noninteracting causes 1)Parents U 1...U k include all causes (can add leak node) 2)Independent failure probability qi for each cause alone ⇒ P(X | U 1...U j, ¬ U j +1... ¬ U k ) = 1 - Π j ii=1 q i 0.01.01.0 Know Number of parameters linear in number of parents ColdFluMalariaFFFFFTFTFFTTTFFTFTTTFTTTColdFluMalariaFFFFFTFTFFTTTFFTFTTTFTTT P(Fever)P(Fever)P(¬Fever)P(¬Fever) 0.90.90.1 0.80.80.2 0.980.980.02 = 0.2 x 0.1 0.40.40.6 0.940.940.06 = 0.6 x 0.1 0.880.880.12 = 0.6 x 0.2 0.9880.012 = 0.6 x 0.2 x 0.1 Infer


Download ppt "Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration."

Similar presentations


Ads by Google