Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
1 Bayesian Networks Slides from multiple sources: Weng-Keen Wong, School of Electrical Engineering and Computer Science, Oregon State University.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
1 Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14,
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Probabilistic Reasoning
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Bayesian Networks Material used 1 Random variables
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Artificial Intelligence CS 165A Thursday, November 29, 2007  Probabilistic Reasoning / Bayesian networks (Ch 14)
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Ch. 14 – Probabilistic Reasoning Supplemental slides for CSE 327 Prof. Jeff Heflin.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14.
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Ch. 14 – Probabilistic Reasoning Supplemental slides for CSE 327 Prof. Jeff Heflin.
Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1,
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Bayesian Networks.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
Reasoning Under Uncertainty: Belief Networks
CS 2750: Machine Learning Directed Graphical Models
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
Conditional Probability, Bayes’ Theorem, and Belief Networks
Bayesian Networks Probability In AI.
Read R&N Ch Next lecture: Read R&N
Structure and Semantics of BN
CS 188: Artificial Intelligence
CSE 473: Artificial Intelligence Autumn 2011
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
CAP 5636 – Advanced Artificial Intelligence
Structure and Semantics of BN
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Bayesian networks (1) Lirong Xia. Bayesian networks (1) Lirong Xia.
Presentation transcript:

Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)

2 Notes HW #5 now posted –Due in one week –No programming problems –Work on your own Schedule from here on...

3 Marginalization Given P(X, Y), derive P(X) (“marginalize away Y”) Equivalent: Given P(X|Y) and P(Y), derive P(X) or

4 Marginalization (cont.) Marginalization is a common procedure –E.g., used to normalize Bayes’ rule Known ? By marginalization, P(D) =  H P(D, H) =  H P(D | H) P(H) = P(D | H 1 ) P(H 1 ) + P(D | H 2 ) P(H 2 ) + … + P(D | H N ) P(H N )

5 Computing probabilities From the joint distribution P(X 1, X 2, …, X N ) we can compute P(some variables) by summing over all the other variables For example –P(X 2, …, X N ) =  i P(X 1i, X 2, …, X N ) –P(X 1 ) =  i  j …  p P(X 1, X 2i, X 3j, …, X Np ) For binary variables, from P(X, Y) –P(Y) =  i P(X i,Y) = P(  X, Y) + P(X, Y) –P(X) =  i P(X,Y i ) = P(X,  Y) + P(X, Y)

6 Independence Absolute Independence X and Y are independent iff –P(X,Y) = P(X) P(Y) [by definition] –P(X | Y) = P(X) Conditional Independence If X and Y are (conditionally) independent given Z, then –P(X | Y, Z) = P(X | Z) –Example:  P(WetGrass | Season, Rain) = P(WetGrass | Rain) Since P(X | Y) = P(X, Y)/P(Y) = P(X) P(Y)/P(Y)

7 Conditional Independence In practice, conditional independence is more common than absolute independence. –P(Final exam grade | Weather)  P(Final exam grade)  I.e., they are not independent –P(Final exam grade | Weather, Effort) = P(Final exam grade | Effort)  But they are conditionally independent given Effort This leads to simplified rules for updating Bayes’ Rule, and then onward to Bayesian networks F E W P(F | W, E) = P(F | E)

8 Joint Probability Joint probability: P(X 1, X 2, …, X N ) –Defines the probability for any possible state of the world –2 N entries in the case of binary variables  Defined by 2 N -1 independent numbers –What if not binary? If the variables are independent, then P(X 1, X 2, …, X N ) = P(X 1 ) P(X 2 ) …P(X N )  Binary case: Defined by N independent numbers –What if not binary?

9 The Chain Rule again

10 The Chain Rule again (cont.) Recursive definition: P(X 1,X 2, …,X N ) = P(X 1 | X 2, …,X N ) P(X 2 |X 3, …,X N ) … P(X N-1 |X N ) P(X N ) or equivalently = P(X 1 ) P(X 2 | X 1 ) P(X 3 | X 2,X 1 ) … P(X N |X N-1, …,X 1 ) 2 N How many values needed to represent this? (assuming binary variables) 2 N - 1 =  + 2 N-1

11 Note on number of independent values…. Random variables W, X, Y, and Z –W = { w 1, w 2, w 3, w 4 } –X = { x 1, x 2 } –Y = { y 1, y 2, y 3 } –Z = { z 1, z 2, z 3, z 4 } How many (independent) numbers are needed to describe: –P(W) –P(X, Y) –P(W, X, Y, Z) –P(X | Y) –P(W | X, Y, Z) = 3 2*3 – 1 = 5 4*2*3*4 – 1 = 95 (2-1)*3 = 3 (4-1)*(2*3*4) = 72

12 Benefit of conditional independence If some variables are conditionally independent, the joint probability can be specified with many fewer than 2 N -1 numbers (or 3 N -1, or 10 N -1, or…) For example: (for binary variables W, X, Y, Z) –P(W,X,Y,Z) = P(W) P(X|W) P(Y|W,X) P(Z|W,X,Y)  = 15 numbers to specify –But if Y and W are independent given X, and Z is independent of W and X given Y, then  P(W,X,Y,Z) = P(W) P(X|W) P(Y|X) P(Z|Y) – = 7 numbers This is often the case in real problems, and belief networks take advantage of this.

13 Belief Networks (a.k.a. Bayesian Networks) a.k.a. Probabilistic networks, Belief nets, Bayes nets, etc. Belief network –A data structure (depicted as a graph) that represents the dependence among variables and allows us to concisely specify the joint probability distribution –The graph itself is known as an “influence diagram” A belief network is a directed acyclic graph where: –The nodes represent the set of random variables (one node per random variable) –Arcs between nodes represent influence, or causality  A link from node X to node Y means that X “directly influences” Y –Each node has a conditional probability table (CPT) that defines P(node | parents)

14 Example Random variables X and Y –X – It is raining –Y – The grass is wet X has a causal effect on Y Or, Y is a symptom of X Draw two nodes and link them Define the CPT for each node −P(X) and P(Y | X) Typical use: we observe Y and we want to query P(X | Y) −Y is an evidence variable – T ELL (KB, Y) −X is a query variable – A SK (KB, X) X Y P(X) P(Y|X)

15 Try it… ASK(KB, X) What is P(X | Y)? –Given that we know the CPTs of each node in the graph X Y P(X) P(Y|X)

16 Belief nets represent the joint probability The joint probability function can be calculated directly from the network –It’s the product of the CPTs of all the nodes –P(var 1, …, var N ) = Π i P(var i |Parents(var i )) X Y P(X) P(Y|X) P(X,Y) = P(X) P(Y|X)P(X,Y,Z) = P(X) P(Y) P(Z|X,Y) X Z Y P(Y) P(Z|X,Y) P(X)

17 Example I’m at work and my neighbor John calls to say my home alarm is ringing, but my neighbor Mary doesn’t call. The alarm is sometimes triggered by minor earthquakes. Was there a burglar at my house? Random (boolean) variables: –JohnCalls, MaryCalls, Earthquake, Burglar, Alarm The belief net shows the causal links This defines the joint probability –P(JohnCalls, MaryCalls, Earthquake, Burglar, Alarm) What do we want to know? P(B | J,  M) Why not P(B | J, A,  M) ?

18 Example Links and CPTs?

19 Example Joint probability? P(J,  M, A, B,  E)?

20 Calculate P(J,  M, A, B,  E) Read the joint pf from the graph: P(J, M, A, B, E) = P(B) P(E) P(A|B,E) P(J|A) P(M|A) Plug in the desired values: P(J,  M, A, B,  E) = P(B) P(  E) P(A|B,  E) P(J|A) P(  M|A) = * * 0.94 * 0.9 * 0.3 = How about P(B | J,  M) ? Remember, this means P(B=true | J=true, M=false)

21 Calculate P(B | J,  M) By marginalization:

22 Example Conditional independence is seen here –P(JohnCalls | MaryCalls, Alarm, Earthquake, Burglary) = P(JohnCalls | Alarm) –So JohnCalls is independent of MaryCalls, Earthquake, and Burglary, given Alarm Does this mean that an earthquake or a burglary do not influence whether or not John calls? –No, but the influence is already accounted for in the Alarm variable –JohnCalls is conditionally independent of Earthquake, but not absolutely independent of it

23 Naive Bayes model A common situation is when a single cause directly influences several variables, which are all conditionally independent, given the cause. e1e1 C e2e2 e3e3 Rain Wet grassPeople with umbrellas Car accidents P(C, e 1, e 2, e 3 ) = P(C) P(e 1 | C) P(e 2 | C) P(e 3 | C) In general,

24 Naive Bayes model Typical query for naive Bayes: –Given some evidence, what’s the probability of the cause? –P(C | e 1 ) = ? –P(C | e 1, e 3 ) = ? e1e1 C e2e2 e3e3 Rain Wet grassPeople with umbrellas Car accidents

25 Drawing belief nets What would a belief net look like if all the variables were fully dependent? But this isn’t the only way to draw the belief net when all the variables are fully dependent X1X1 X2X2 X3X3 X4X4 X5X5 P(X 1,X 2,X 3,X 4,X 5 ) = P(X 1 )P(X 2 |X 1 )P(X 3 |X 1,X 2 )P(X 4 |X 1,X 2,X 3 )P(X 5 |X 1,X 2,X 3,X 4 )

26 Fully connected belief net In fact, there are N! ways of connecting up a fully- connected belief net –That is, there are N! ways of ordering the nodes X1X1 X2X2 X1X1 X2X2 P(X 1,X 2 ) = ? For N=2 For N=5 X1X1 X2X2 X3X3 X4X4 X5X5 P(X 1,X 2,X 3,X 4,X 5 ) = ? and 119 others…

27 Drawing belief nets (cont.) Fully-connected net displays the joint distribution P(X 1, X 2, X 3, X 4, X 5 ) = P(X 1 ) P(X 2 |X 1 ) P(X 3 |X 1,X 2 ) P(X 4 |X 1,X 2,X 3 ) P(X 5 |X 1, X 2, X 3, X 4 ) X1X1 X2X2 X3X3 X4X4 X5X5 But what if there are conditionally independent variables? P(X 1, X 2, X 3, X 4, X 5 ) = P(X 1 ) P(X 2 |X 1 ) P(X 3 |X 1,X 2 ) P(X 4 |X 2,X 3 ) P(X 5 |X 3, X 4 ) X1X1 X2X2 X3X3 X4X4 X5X5

28 Drawing belief nets (cont.) What if the variables are all independent? P(X 1, X 2, X 3, X 4, X 5 ) = P(X 1 ) P(X 2 ) P(X 3 ) P(X 4 ) P(X 5 ) X1X1 X2X2 X3X3 X4X4 X5X5 What if the links are drawn like this: Not allowed – not a DAG X1X1 X2X2 X3X3 X4X4 X5X5

29 Drawing belief nets (cont.) What if the links are drawn like this: X1X1 X2X2 X3X3 X4X4 X5X5 P(X 1, X 2, X 3, X 4, X 5 ) = P(X 1 ) P(X 2 | X 3 ) P(X 3 | X 1 ) P(X 4 | X 2 ) P(X 5 | X 4 ) It can be redrawn like this: X1X1 X3X3 X2X2 X4X4 X5X5 All arrows going left-to-right