Download presentation
1
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Lecture 15, 5/25/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes 5/25/2005 EE562
2
Uncertainty & Bayesian Networks
Chapter 13/14 5/25/2005 EE562
3
Outline Inference Independence and Bayes' Rule Chapter 14 Syntax
Semantics Parameterized Distributions 5/25/2005 EE562
4
Homework Last HW of the quarter Due next Wed, June 1st, in class:
Chapter 13: 13.3, 13.7, 13.16 Chapter 14: 14.2, 14.3, 14.10 5/25/2005 EE562
5
Inference by enumeration
Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) 5/25/2005 EE562
6
Inference by enumeration
Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) P(toothache) = = 0.2 5/25/2005 EE562
7
Inference by enumeration
Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) P(toothache Ç cavity) = = 0.2 5/25/2005 EE562
8
Inference by enumeration
Start with the joint probability distribution: Can also compute conditional probabilities: P(cavity | toothache) = P(cavity toothache) P(toothache) = = 0.4 5/25/2005 EE562
9
Normalization Denominator can be viewed as a normalization constant α
P(Cavity | toothache) = α, P(Cavity,toothache) = α, [P(Cavity,toothache,catch) + P(Cavity,toothache, catch)] = α, [<0.108,0.016> + <0.012,0.064>] = α, <0.12,0.08> = <0.6,0.4> General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables 5/25/2005 EE562
10
Inference by enumeration, contd.
Let X be all the variables. Typically, we are interested in the posterior joint distribution of the query variables Y given specific values e for the evidence variables E Let the hidden variables be H = X - Y - E Then the required summation of joint entries is done by summing out the hidden variables: P(Y | E = e) = αP(Y,E = e) = αΣhP(Y,E= e, H = h) The terms in the summation are joint entries because Y, E and H together exhaust the set of random variables Obvious problems: Worst-case time complexity O(dn) where d is the largest arity Space complexity O(dn) to store the joint distribution How to find the numbers for O(dn) entries? 5/25/2005 EE562
11
Independence A and B are independent iff
P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B) P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather) 16 entries reduced to 10; for n independent biased coins, O(2n) →O(n) Absolute independence powerful but rare Dentistry is a large field with hundreds of variables, none of which are independent. What to do? 5/25/2005 EE562
12
Conditional independence
P(Toothache, Cavity, Catch) has 23 – 1 = 7 independent entries If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) P(catch | toothache, cavity) = P(catch | cavity) The same independence holds if I haven't got a cavity: (2) P(catch | toothache,cavity) = P(catch | cavity) Catch is conditionally independent of Toothache given Cavity: P(Catch | Toothache,Cavity) = P(Catch | Cavity) Equivalent statements: P(Toothache | Catch, Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) 5/25/2005 EE562
13
Conditional independence contd.
Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) I.e., = 5 independent numbers In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments. 5/25/2005 EE562
14
Bayes' Rule Product rule P(ab) = P(a | b) P(b) = P(b | a) P(a)
Bayes' rule: P(a | b) = P(b | a) P(a) / P(b) or in distribution form P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y) Useful for assessing diagnostic probability from causal probability: P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) E.g., let M be meningitis, S be stiff neck: P(m|s) = P(s|m) P(m) / P(s) = 0.8 × / 0.1 = Note: posterior probability of meningitis still very small! 5/25/2005 EE562
15
Bayes' Rule and conditional independence
P(Cavity | toothache catch) = αP(toothache catch | Cavity) P(Cavity) = αP(toothache | Cavity) P(catch | Cavity) P(Cavity) This is an example of a naïve Bayes model: P(Cause,Effect1, … ,Effectn) = P(Cause) πiP(Effecti|Cause) Total number of parameters is linear in n 5/25/2005 EE562
16
Key Benefit Probabilistic reasoning (using things like conditional probability, conditional independence, and Bayes’ rule) make it possible to make reasonable decisions amongst a set of actions, that otherwise (without probability, as in propositional or first order logic) we would have to resort to random guessing. Example: Wumpus World 5/25/2005 EE562
17
Bayesian Networks Chapter 14 5/25/2005 EE562
18
Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents: P (Xi | Parents (Xi)) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values 5/25/2005 EE562
19
Example Topology of network encodes conditional independence assertions: Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity 5/25/2005 EE562
20
Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call 5/25/2005 EE562
21
Example contd. 5/25/2005 EE562
22
Compactness A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values Each row requires one number p for Xi = true (the number for Xi = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n · 2k) numbers I.e., grows linearly with n, vs. O(2n) for the full joint distribution For burglary net, = 10 numbers (vs = 31) 5/25/2005 EE562
23
Semantics The full joint distribution is defined as the product of the local conditional distributions: P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi)) e.g., P(j m a b e) = P (j | a) P (m | a) P (a | b, e) P (b) P (e) n 5/25/2005 EE562
24
Local Semantics Local semantics: each node is conditionally independent of its nondescendants given its parents Thm: Local semantics global semantics 5/25/2005 EE562
25
Markov Blanket Each node is conditionally independent of all others given its “Markov blanket”, i.e., its parents, children, and children’s parents. 5/25/2005 EE562
26
Constructing Bayesian networks
Key point: we need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics. 1. Choose an ordering of variables X1, … ,Xn 2. For i = 1 to n add Xi to the network select parents from X1, … ,Xi-1 such that P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1) This choice of parents guarantees: (chain rule) (by construction) 5/25/2005 EE562
27
Example Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)?
5/25/2005 EE562
28
Example Suppose we choose the ordering M, J, A, B, E No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? 5/25/2005 EE562
29
Example Suppose we choose the ordering M, J, A, B, E No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)? 5/25/2005 EE562
30
Example Suppose we choose the ordering M, J, A, B, E No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A ,J, M) = P(E | A)? P(E | B, A, J, M) = P(E | A, B)? 5/25/2005 EE562
31
Example Suppose we choose the ordering M, J, A, B, E No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A ,J, M) = P(E | A)? No P(E | B, A, J, M) = P(E | A, B)? Yes 5/25/2005 EE562
32
Example contd. Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Assessing conditional probabilities is hard in non-causal directions. Network is less compact: = 13 numbers needed 5/25/2005 EE562
33
Example: car diagnosis
Initial evidence: car won’t start Testable variables (green), “broken, so fix it” variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters. 5/25/2005 EE562
34
Example: car insurance
5/25/2005 EE562
35
compact conditional dists.
CPT grows exponentially with number of parents CPT becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: X = f(Parents(X)), for some deterministic function f (could be logical form) E.g., boolean functions NorthAmerican Canadian Ç US Ç Mexican E.g,. numerical relationships among continuous variables 5/25/2005 EE562
36
compact conditional dists.
“Noisy-Or” distributions model multiple interacting causes: 1) Parents U1, …, Uk include all possible causes 2) Independent failure probability qi for each cause alone : X ´ U1 Æ U2 Æ … Æ Uk P(X|U1, …, Uj, : Uj+1, …, : Uk ) = 1 - i=1j qi Number of parameters is linear in number of parents. 5/25/2005 EE562
37
Hybrid (discrete+cont) networks
Discrete (Subsidy? and Buys?); continuous (Harvest and Cost) Option 1: discretization – large errors and large CPTs Option 2: finitely parameterized canonical families Gaussians, Logistic Distributions (as used in Neural Networks) Continuous variables, discrete+continuous parents (e.g., Cost) Discrete variables, continuous parents (e.g., Buys?) 5/25/2005 EE562
38
Summary Bayesian networks provide a natural representation for (causally induced) conditional independence Topology + CPTs = compact representation of joint distribution Generally easy for domain experts to construct Take my Graphical Models class if more interested (much more theoretical depth) 5/25/2005 EE562
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.