1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
1 Bayesian Networks Slides from multiple sources: Weng-Keen Wong, School of Electrical Engineering and Computer Science, Oregon State University.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Belief Networks Done by: Amna Omar Abdullah Fatima Mahmood Al-Hajjat Najwa Ahmad Al-Zubi.
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Artificial Intelligence Probabilistic reasoning Fall 2008 professor: Luigi Ceccaroni.
Bayesian networks Chapter 14 Section 1 – 2.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 14 Jim Martin.
Bayesian Belief Networks
Bayesian Belief Network. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most.
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Probabilistic Reasoning
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Bayesian Networks Material used 1 Random variables
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian Networks Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
A Brief Introduction to Bayesian networks
Another look at Bayesian inference
Reasoning Under Uncertainty: Belief Networks
CS 188: Artificial Intelligence Spring 2007
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
Bayesian Networks: A Tutorial
Learning Bayesian Network Models from Data
Bayesian Networks Probability In AI.
Read R&N Ch Next lecture: Read R&N
Read R&N Ch Next lecture: Read R&N
Probabilistic Reasoning; Network-based reasoning
CS 188: Artificial Intelligence Fall 2007
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Presentation transcript:

1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

2 Review of Probability Theory Random Variables The probability that a random variable X has value val is written as P(X=val) P: domain ! [0, 1] –Sums to 1 over the domain: »P(Raining = true) = P(Raining) = 0.2 »P(Raining = false) = P( : Raining) = 0.8 Joint distribution: P(X1, X2, …, Xn) Probability assignment to all combinations of values of random variables and provide complete information about the probabilities of its random variables. A JPD table for n random variables, each ranging over k distinct values, has k n entries! Toothache : Toothache Cavity : Cavity

3 Review of Probability Theory Conditioning P(A) = P(A | B) P(B) + P(A | : B) P( : B) = P(A Æ B) + P(A Æ : B) A and B are independent iff P(A Æ B) = P(A) ¢ P(B) P(A | B) = P(A) P(B | A) = P(B) A and B are conditionally independent given C iff P(A | B, C) = P(A | C) P(B | A, C) = P(B | C) P(A Æ B | C) = P(A | C) ¢ P(B | C) Bayes’ Rule P(A | B) = P(B | A) P(A) / P(B) P(A | B, C) = P(B | A, C) P(A | C) / P(B | C)

4 Bayesian Networks To do probabilistic reasoning, you need to know the joint probability distribution But, in a domain with N propositional variables, one needs 2 N numbers to specify the joint probability distribution We want to exploit independences in the domain Two components: structure and numerical parameters

Lecture 14 5 Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents: P (X i | Parents (X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values

6 Bayesian (Belief) Networks Set of random variables, each has a finite set of values Set of directed arcs between them forming acyclic graph, representing causal relation Every node A, with parents B 1, …, B n, has P(A | B 1,…,B n ) specified

7 Key Advantage The conditional independencies (missing arrows) mean that we can store and compute the joint probability distribution more efficiently How to design a Belief Network? Explore the causal relations

8 Icy Roads “Causal” Component Holmes Crash Icy Watson Crash Inspector Smith is waiting for Holmes and Watson, who are driving (separately) to meet him. It is winter. His secretary tells him that Watson has had an accident. He says, “It must be that the roads are icy. I bet that Holmes will have an accident too. I should go to lunch.” But, his secretary says, “No, the roads are not icy, look at the window.” So, he says, “I guess I better wait for Holmes.”

9 Icy Roads “Causal” Component Holmes Crash Icy Watson Crash Inspector Smith is waiting for Holmes and Watson, who are driving (separately) to meet him. It is winter. His secretary tells him that Watson has had an accident. He says, “It must be that the roads are icy. I bet that Holmes will have an accident too. I should go to lunch.” But, his secretary says, “No, the roads are not icy, look at the window.” So, he says, “I guess I better wait for Holmes.”

10 Icy Icy Roads “Causal” Component Holmes Crash Watson Crash Inspector Smith is waiting for Holmes and Watson, who are driving (separately) to meet him. It is winter. His secretary tells him that Watson has had an accident. He says, “It must be that the roads are icy. I bet that Holmes will have an accident too. I should go to lunch.” But, his secretary says, “No, the roads are not icy, look at the window.” So, he says, “I guess I better wait for Holmes.”

11 Icy Icy Roads “Causal” Component Holmes Crash Watson Crash Inspector Smith is waiting for Holmes and Watson, who are driving (separately) to meet him. It is winter. His secretary tells him that Watson has had an accident. He says, “It must be that the roads are icy. I bet that Holmes will have an accident too. I should go to lunch.” But, his secretary says, “No, the roads are not icy, look at the window.” So, he says, “I guess I better wait for Holmes.” H and W are dependent,

12 Icy Roads “Causal” Component Holmes Crash Icy Watson Crash Inspector Smith is waiting for Holmes and Watson, who are driving (separately) to meet him. It is winter. His secretary tells him that Watson has had an accident. He says, “It must be that the roads are icy. I bet that Holmes will have an accident too. I should go to lunch.” But, his secretary says, “No, the roads are not icy, look at the window.” So, he says, “I guess I better wait for Holmes.” H and W are dependent, but conditionally independent given I

13 Holmes and Watson in IA Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained. Holmes Lawn Wet Sprinkler Watson Lawn Wet Rain

14 Holmes and Watson in IA Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained. Holmes Lawn Wet Sprinkler Watson Lawn Wet Rain

15 Holmes and Watson in IA Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained. Holmes Lawn Wet Sprinkler Watson Lawn Wet Rain

16 Holmes and Watson in IA Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained. Holmes Lawn Wet Sprinkler Watson Lawn Wet Rain

17 Rain Holmes Lawn Wet Holmes and Watson in IA Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained. Sprinkler Watson Lawn Wet Given W, P(R) goes up

18 Rain Holmes Lawn Wet Holmes and Watson in IA Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained. Sprinkler Watson Lawn Wet Given W, P(R) goes up and P(S) goes down – “explaining away”

19 Inference in Bayesian Networks Query Types Given a Bayesian network, what questions might we want to ask? Conditional probability query: P(x | e) Maximum a posteriori probability: What value of x maximizes P(x|e) ? General question: What’s the whole probability distribution over variable X given evidence e, P(X | e)?

20 Using the joint distribution To answer any query involving a conjunction of variables, sum over the variables not involved in the query.

21 Using the joint distribution To answer any query involving a conjunction of variables, sum over the variables not involved in the query.

22 Chain Rule Variables: V 1, …, V n Values: v 1, …, v n P(V 1 =v 1, V 2 =v 2, …, V n =v n ) =  i P(V i =v i | parents(V i )) AB C D P(A)P(B) P(C|A,B) P(D|C)

23 Chain Rule Variables: V 1, …, V n Values: v 1, …, v n P(V 1 =v 1, V 2 =v 2, …, V n =v n ) =  i P(V i =v i | parents(V i )) AB C D P(A)P(B) P(C|A,B) P(D|C) P(ABCD) = P(A=true, B=true, C=true, D=true)

24 Chain Rule Variables: V 1, …, V n Values: v 1, …, v n P(V 1 =v 1, V 2 =v 2, …, V n =v n ) =  i P(V i =v i | parents(V i )) AB C D P(A)P(B) P(C|A,B) P(D|C) P(ABCD) = P(A=true, B=true, C=true, D=true) P(ABCD) = P(D|ABC)P(ABC)

25 Chain Rule Variables: V 1, …, V n Values: v 1, …, v n P(V 1 =v 1, V 2 =v 2, …, V n =v n ) =  i P(V i =v i | parents(V i )) AB C D P(A)P(B) P(C|A,B) P(D|C) P(ABCD) = P(A=true, B=true, C=true, D=true) P(ABCD) = P(D|ABC)P(ABC) = P(D|C) P(ABC) = A independent from D given C B independent from D given C

26 Chain Rule Variables: V 1, …, V n Values: v 1, …, v n P(V 1 =v 1, V 2 =v 2, …, V n =v n ) =  i P(V i =v i | parents(V i )) AB C D P(A)P(B) P(C|A,B) P(D|C) P(ABCD) = P(A=true, B=true, C=true, D=true) P(ABCD) = P(D|ABC)P(ABC) = P(D|C) P(ABC) = P(D|C) P(C|AB) P(AB) = A independent from D given C B independent from D given C

27 Chain Rule Variables: V 1, …, V n Values: v 1, …, v n P(V 1 =v 1, V 2 =v 2, …, V n =v n ) =  i P(V i =v i | parents(V i )) AB C D P(A)P(B) P(C|A,B) P(D|C) P(ABCD) = P(A=true, B=true, C=true, D=true) P(ABCD) = P(D|ABC)P(ABC) = P(D|C) P(ABC) = P(D|C) P(C|AB) P(AB) = P(D|C) P(C|AB) P(A)P(B) A independent from D given C B independent from D given C A independent from B

28 Chain Rule Variables: V 1, …, V n Values: v 1, …, v n P(V 1 =v 1, V 2 =v 2, …, V n =v n ) =  i P(V i =v i | parents(V i )) AB C D P(A)P(B) P(C|A,B) P(D|C) P(ABCD) = P(A=true, B=true, C=true, D=true) P(ABCD) = P(D|ABC)P(ABC) = P(D|C) P(ABC) = P(D|C) P(C|AB) P(AB) = P(D|C) P(C|AB) P(A)P(B) A independent from D given C B independent from D given C A independent from B

29 Icy Roads with Numbers Holmes Crash Icy Watson Crash P(I=t)P(I=f) t = true f = false The right-hand column in these tables is redundant, since we know the entries in each row must add to 1. NB: the columns need NOT add to 1.

30 Icy Roads with Numbers P(W=t | I)P(W=f | I) I=t I=f Holmes Crash Icy Watson Crash P(I=t)P(I=f) t = true f = false The right-hand column in these tables is redundant, since we know the entries in each row must add to 1. Note: the columns need NOT add to 1.

31 Icy Roads with Numbers P(W=t | I)P(W=f | I) I=t I=f Holmes Crash Icy Watson Crash P(I=t)P(I=f) P(H=t | I)P(H=f | I) I=t I=f t = true f = false The right-hand column in these tables is redundant, since we know the entries in each row must add to 1. Note: the columns need NOT add to 1.

32 Probability that Watson Crashes Holmes Crash Icy Watson Crash P(I)=0.7 P(H| I) I0.8 -I-I0.1 P(W| I) I0.8 -I-I0.1 P(W) =

33 Probability that Watson Crashes Holmes Crash Icy Watson Crash P(I)=0.7 P(H| I) I0.8 -I-I0.1 P(W| I) I0.8 -I-I0.1 P(W) = P(W| I) P(I) + P(W| - I) P( - I) = 0.8* *0.3 = = 0.59

34 Probability of Icy given Watson Holmes Crash Icy Watson Crash P(I)=0.7 P(H| I) I0.8 -I-I0.1 P(W| I) I0.8 -I-I0.1 P(I | W) =

35 Probability of Icy given Watson Holmes Crash Icy Watson Crash P(I)=0.7 P(H| I) I0.8 -I-I0.1 P(W| I) I0.8 -I-I0.1 P(I | W) = P(W | I) P(I) / P(W) = 0.8*0.7 / 0.59 = 0.95 We started with P(I) = 0.7; knowing that Watson crashed raised the probability to 0.95

36 Probability of Holmes given Watson Holmes Crash Icy Watson Crash P(I)=0.7 P(H| I) I0.8 -I-I0.1 P(W| I) I0.8 -I-I0.1 P(H|W) =

37 Probability of Holmes given Watson Holmes Crash Icy Watson Crash P(I)=0.7 P(H| I) I0.8 -I-I0.1 P(W| I) I0.8 -I-I0.1 P(H|W) = P(H, I | W) + P(H, -I | W) = P(H|W,I)P(I|W) + P(H|W, - I) P( - I| W) = P(H|I)P(I|W) + P(H| - I) P( - I| W) = 0.8* *0.05 = We started with P(H) = 0.59; knowing that Watson crashed raised the probability to 0.765

38 Prob of Holmes given Icy and Watson Holmes Crash Icy Watson Crash P(I)=0.7 P(H| I) I0.8 ~I0.1 P(W| I) I0.8 ~I0.1 P(H|W, ~I I) = P(H ~I ) = 0.1 H and W are independent given I, so H and W are conditionally independent given I

Lecture Example Topology of network encodes conditional independence assertions: Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity

Lecture Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call

Lecture Example contd.

Lecture Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values Each row requires one number p for X i = true (the number for X i = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n · 2 k ) numbers I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution For burglary net, = 10 numbers (vs = 31)

Lecture Semantics The full joint distribution is defined as the product of the local conditional distributions: P (X 1, …,X n ) = π i = 1 P (X i | Parents(X i )) e.g., P(j  m  a  b  e) = P (j | a) P (m | a) P (a | b, e) P (b) P (e) n

Lecture Constructing Bayesian networks 1. Choose an ordering of variables X 1, …,X n 2. For i = 1 to n add X i to the network select parents from X 1, …,X i-1 such that P (X i | Parents(X i )) = P (X i | X 1,... X i-1 ) This choice of parents guarantees: P (X 1, …,X n ) = π i =1 P (X i | X 1, …, X i-1 ) (chain rule) = π i =1 P (X i | Parents(X i )) (by construction) n n

Lecture Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? Example

Lecture Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? Example

Lecture Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)? Example

Lecture Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A,J, M) = P(E | A)? P(E | B, A, J, M) = P(E | A, B)? Example

Lecture Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A,J, M) = P(E | A)? No P(E | B, A, J, M) = P(E | A, B)? Yes Example

Lecture Example contd. Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Network is less compact: = 13 numbers needed

51 Excercises P(J, M, A, B, E ) = ? P(M, A, B) = ? P(-M, A, B) = ? P(A, B) = ? P(M, B) = ? P(A | J) = ?

Lecture Summary Bayesian networks provide a natural representation for (causally induced) conditional independence Topology + CPTs = compact representation of joint distribution Generally easy for domain experts to construct