EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
CPSC 422 Review Of Probability Theory.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Bayesian Belief Network. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Probabilistic Reasoning
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes Feb 28 and March 13-15, 2012.
Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Consider:  x Symptom(x, Toothache)  Disease(x, Cavity). The problem is that this.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
Bayesian Networks Material used 1 Random variables
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 15and20, 2012.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
1 Chapter 13 Uncertainty. 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 20, 2012.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Uncertainty Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Outline [AIMA Ch 13] 1 Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
Introduction to Artificial Intelligence – Unit 7 Probabilistic Reasoning Course The Hebrew University of Jerusalem School of Engineering and Computer.
A Brief Introduction to Bayesian networks
Another look at Bayesian inference
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Uncertainty Chapter 13.
Uncertainty.
Uncertainty in Environments
Probabilistic Reasoning; Network-based reasoning
CS 188: Artificial Intelligence Fall 2007
Uncertainty Chapter 13.
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Presentation transcript:

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 15, 5/25/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes 5/25/2005 EE562

Uncertainty & Bayesian Networks Chapter 13/14 5/25/2005 EE562

Outline Inference Independence and Bayes' Rule Chapter 14 Syntax Semantics Parameterized Distributions 5/25/2005 EE562

Homework Last HW of the quarter Due next Wed, June 1st, in class: Chapter 13: 13.3, 13.7, 13.16 Chapter 14: 14.2, 14.3, 14.10 5/25/2005 EE562

Inference by enumeration Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) 5/25/2005 EE562

Inference by enumeration Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 5/25/2005 EE562

Inference by enumeration Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) P(toothache Ç cavity) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 5/25/2005 EE562

Inference by enumeration Start with the joint probability distribution: Can also compute conditional probabilities: P(cavity | toothache) = P(cavity  toothache) P(toothache) = 0.016+0.064 0.108 + 0.012 + 0.016 + 0.064 = 0.4 5/25/2005 EE562

Normalization Denominator can be viewed as a normalization constant α P(Cavity | toothache) = α, P(Cavity,toothache) = α, [P(Cavity,toothache,catch) + P(Cavity,toothache, catch)] = α, [<0.108,0.016> + <0.012,0.064>] = α, <0.12,0.08> = <0.6,0.4> General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables 5/25/2005 EE562

Inference by enumeration, contd. Let X be all the variables. Typically, we are interested in the posterior joint distribution of the query variables Y given specific values e for the evidence variables E Let the hidden variables be H = X - Y - E Then the required summation of joint entries is done by summing out the hidden variables: P(Y | E = e) = αP(Y,E = e) = αΣhP(Y,E= e, H = h) The terms in the summation are joint entries because Y, E and H together exhaust the set of random variables Obvious problems: Worst-case time complexity O(dn) where d is the largest arity Space complexity O(dn) to store the joint distribution How to find the numbers for O(dn) entries? 5/25/2005 EE562

Independence A and B are independent iff P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B) P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather) 16 entries reduced to 10; for n independent biased coins, O(2n) →O(n) Absolute independence powerful but rare Dentistry is a large field with hundreds of variables, none of which are independent. What to do? 5/25/2005 EE562

Conditional independence P(Toothache, Cavity, Catch) has 23 – 1 = 7 independent entries If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) P(catch | toothache, cavity) = P(catch | cavity) The same independence holds if I haven't got a cavity: (2) P(catch | toothache,cavity) = P(catch | cavity) Catch is conditionally independent of Toothache given Cavity: P(Catch | Toothache,Cavity) = P(Catch | Cavity) Equivalent statements: P(Toothache | Catch, Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) 5/25/2005 EE562

Conditional independence contd. Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) I.e., 2 + 2 + 1 = 5 independent numbers In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments. 5/25/2005 EE562

Bayes' Rule Product rule P(ab) = P(a | b) P(b) = P(b | a) P(a)  Bayes' rule: P(a | b) = P(b | a) P(a) / P(b) or in distribution form P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y) Useful for assessing diagnostic probability from causal probability: P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) E.g., let M be meningitis, S be stiff neck: P(m|s) = P(s|m) P(m) / P(s) = 0.8 × 0.0001 / 0.1 = 0.0008 Note: posterior probability of meningitis still very small! 5/25/2005 EE562

Bayes' Rule and conditional independence P(Cavity | toothache  catch) = αP(toothache  catch | Cavity) P(Cavity) = αP(toothache | Cavity) P(catch | Cavity) P(Cavity) This is an example of a naïve Bayes model: P(Cause,Effect1, … ,Effectn) = P(Cause) πiP(Effecti|Cause) Total number of parameters is linear in n 5/25/2005 EE562

Key Benefit Probabilistic reasoning (using things like conditional probability, conditional independence, and Bayes’ rule) make it possible to make reasonable decisions amongst a set of actions, that otherwise (without probability, as in propositional or first order logic) we would have to resort to random guessing. Example: Wumpus World 5/25/2005 EE562

Bayesian Networks Chapter 14 5/25/2005 EE562

Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents: P (Xi | Parents (Xi)) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values 5/25/2005 EE562

Example Topology of network encodes conditional independence assertions: Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity 5/25/2005 EE562

Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call 5/25/2005 EE562

Example contd. 5/25/2005 EE562

Compactness A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values Each row requires one number p for Xi = true (the number for Xi = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n · 2k) numbers I.e., grows linearly with n, vs. O(2n) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31) 5/25/2005 EE562

Semantics The full joint distribution is defined as the product of the local conditional distributions: P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi)) e.g., P(j  m  a  b  e) = P (j | a) P (m | a) P (a | b, e) P (b) P (e) n 5/25/2005 EE562

Local Semantics Local semantics: each node is conditionally independent of its nondescendants given its parents Thm: Local semantics  global semantics 5/25/2005 EE562

Markov Blanket Each node is conditionally independent of all others given its “Markov blanket”, i.e., its parents, children, and children’s parents. 5/25/2005 EE562

Constructing Bayesian networks Key point: we need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics. 1. Choose an ordering of variables X1, … ,Xn 2. For i = 1 to n add Xi to the network select parents from X1, … ,Xi-1 such that P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1) This choice of parents guarantees: (chain rule) (by construction) 5/25/2005 EE562

Example Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? 5/25/2005 EE562

Example Suppose we choose the ordering M, J, A, B, E No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? 5/25/2005 EE562

Example Suppose we choose the ordering M, J, A, B, E No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)? 5/25/2005 EE562

Example Suppose we choose the ordering M, J, A, B, E No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A ,J, M) = P(E | A)? P(E | B, A, J, M) = P(E | A, B)? 5/25/2005 EE562

Example Suppose we choose the ordering M, J, A, B, E No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A ,J, M) = P(E | A)? No P(E | B, A, J, M) = P(E | A, B)? Yes 5/25/2005 EE562

Example contd. Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Assessing conditional probabilities is hard in non-causal directions. Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed 5/25/2005 EE562

Example: car diagnosis Initial evidence: car won’t start Testable variables (green), “broken, so fix it” variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters. 5/25/2005 EE562

Example: car insurance 5/25/2005 EE562

compact conditional dists. CPT grows exponentially with number of parents CPT becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: X = f(Parents(X)), for some deterministic function f (could be logical form) E.g., boolean functions NorthAmerican  Canadian Ç US Ç Mexican E.g,. numerical relationships among continuous variables 5/25/2005 EE562

compact conditional dists. “Noisy-Or” distributions model multiple interacting causes: 1) Parents U1, …, Uk include all possible causes 2) Independent failure probability qi for each cause alone : X ´ U1 Æ U2 Æ … Æ Uk  P(X|U1, …, Uj, : Uj+1, …, : Uk ) = 1 - i=1j qi Number of parameters is linear in number of parents. 5/25/2005 EE562

Hybrid (discrete+cont) networks Discrete (Subsidy? and Buys?); continuous (Harvest and Cost) Option 1: discretization – large errors and large CPTs Option 2: finitely parameterized canonical families Gaussians, Logistic Distributions (as used in Neural Networks) Continuous variables, discrete+continuous parents (e.g., Cost) Discrete variables, continuous parents (e.g., Buys?) 5/25/2005 EE562

Summary Bayesian networks provide a natural representation for (causally induced) conditional independence Topology + CPTs = compact representation of joint distribution Generally easy for domain experts to construct Take my Graphical Models class if more interested (much more theoretical depth) 5/25/2005 EE562