Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)

Similar presentations


Presentation on theme: "Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)"— Presentation transcript:

1 Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)

2 2 Notes HW #5 now posted –Due in one week –No programming problems –Work on your own Schedule from here on...

3 3 Marginalization Given P(X, Y), derive P(X) (“marginalize away Y”) Equivalent: Given P(X|Y) and P(Y), derive P(X) or

4 4 Marginalization (cont.) Marginalization is a common procedure –E.g., used to normalize Bayes’ rule Known ? By marginalization, P(D) =  H P(D, H) =  H P(D | H) P(H) = P(D | H 1 ) P(H 1 ) + P(D | H 2 ) P(H 2 ) + … + P(D | H N ) P(H N )

5 5 Computing probabilities From the joint distribution P(X 1, X 2, …, X N ) we can compute P(some variables) by summing over all the other variables For example –P(X 2, …, X N ) =  i P(X 1i, X 2, …, X N ) –P(X 1 ) =  i  j …  p P(X 1, X 2i, X 3j, …, X Np ) For binary variables, from P(X, Y) –P(Y) =  i P(X i,Y) = P(  X, Y) + P(X, Y) –P(X) =  i P(X,Y i ) = P(X,  Y) + P(X, Y)

6 6 Independence Absolute Independence X and Y are independent iff –P(X,Y) = P(X) P(Y) [by definition] –P(X | Y) = P(X) Conditional Independence If X and Y are (conditionally) independent given Z, then –P(X | Y, Z) = P(X | Z) –Example:  P(WetGrass | Season, Rain) = P(WetGrass | Rain) Since P(X | Y) = P(X, Y)/P(Y) = P(X) P(Y)/P(Y)

7 7 Conditional Independence In practice, conditional independence is more common than absolute independence. –P(Final exam grade | Weather)  P(Final exam grade)  I.e., they are not independent –P(Final exam grade | Weather, Effort) = P(Final exam grade | Effort)  But they are conditionally independent given Effort This leads to simplified rules for updating Bayes’ Rule, and then onward to Bayesian networks F E W P(F | W, E) = P(F | E)

8 8 Joint Probability Joint probability: P(X 1, X 2, …, X N ) –Defines the probability for any possible state of the world –2 N entries in the case of binary variables  Defined by 2 N -1 independent numbers –What if not binary? If the variables are independent, then P(X 1, X 2, …, X N ) = P(X 1 ) P(X 2 ) …P(X N )  Binary case: Defined by N independent numbers –What if not binary?

9 9 The Chain Rule again

10 10 The Chain Rule again (cont.) Recursive definition: P(X 1,X 2, …,X N ) = P(X 1 | X 2, …,X N ) P(X 2 |X 3, …,X N ) … P(X N-1 |X N ) P(X N ) or equivalently = P(X 1 ) P(X 2 | X 1 ) P(X 3 | X 2,X 1 ) … P(X N |X N-1, …,X 1 ) 2 N - 1 1 2 4 How many values needed to represent this? (assuming binary variables) 2 N - 1 = 1 + 2 + 4 +  + 2 N-1

11 11 Note on number of independent values…. Random variables W, X, Y, and Z –W = { w 1, w 2, w 3, w 4 } –X = { x 1, x 2 } –Y = { y 1, y 2, y 3 } –Z = { z 1, z 2, z 3, z 4 } How many (independent) numbers are needed to describe: –P(W) –P(X, Y) –P(W, X, Y, Z) –P(X | Y) –P(W | X, Y, Z) 4 - 1 = 3 2*3 – 1 = 5 4*2*3*4 – 1 = 95 (2-1)*3 = 3 (4-1)*(2*3*4) = 72

12 12 Benefit of conditional independence If some variables are conditionally independent, the joint probability can be specified with many fewer than 2 N -1 numbers (or 3 N -1, or 10 N -1, or…) For example: (for binary variables W, X, Y, Z) –P(W,X,Y,Z) = P(W) P(X|W) P(Y|W,X) P(Z|W,X,Y)  1 + 2 + 4 + 8 = 15 numbers to specify –But if Y and W are independent given X, and Z is independent of W and X given Y, then  P(W,X,Y,Z) = P(W) P(X|W) P(Y|X) P(Z|Y) –1 + 2 + 2 + 2 = 7 numbers This is often the case in real problems, and belief networks take advantage of this.

13 13 Belief Networks (a.k.a. Bayesian Networks) a.k.a. Probabilistic networks, Belief nets, Bayes nets, etc. Belief network –A data structure (depicted as a graph) that represents the dependence among variables and allows us to concisely specify the joint probability distribution –The graph itself is known as an “influence diagram” A belief network is a directed acyclic graph where: –The nodes represent the set of random variables (one node per random variable) –Arcs between nodes represent influence, or causality  A link from node X to node Y means that X “directly influences” Y –Each node has a conditional probability table (CPT) that defines P(node | parents)

14 14 Example Random variables X and Y –X – It is raining –Y – The grass is wet X has a causal effect on Y Or, Y is a symptom of X Draw two nodes and link them Define the CPT for each node −P(X) and P(Y | X) Typical use: we observe Y and we want to query P(X | Y) −Y is an evidence variable – T ELL (KB, Y) −X is a query variable – A SK (KB, X) X Y P(X) P(Y|X)

15 15 Try it… ASK(KB, X) What is P(X | Y)? –Given that we know the CPTs of each node in the graph X Y P(X) P(Y|X)

16 16 Belief nets represent the joint probability The joint probability function can be calculated directly from the network –It’s the product of the CPTs of all the nodes –P(var 1, …, var N ) = Π i P(var i |Parents(var i )) X Y P(X) P(Y|X) P(X,Y) = P(X) P(Y|X)P(X,Y,Z) = P(X) P(Y) P(Z|X,Y) X Z Y P(Y) P(Z|X,Y) P(X)

17 17 Example I’m at work and my neighbor John calls to say my home alarm is ringing, but my neighbor Mary doesn’t call. The alarm is sometimes triggered by minor earthquakes. Was there a burglar at my house? Random (boolean) variables: –JohnCalls, MaryCalls, Earthquake, Burglar, Alarm The belief net shows the causal links This defines the joint probability –P(JohnCalls, MaryCalls, Earthquake, Burglar, Alarm) What do we want to know? P(B | J,  M) Why not P(B | J, A,  M) ?

18 18 Example Links and CPTs?

19 19 Example Joint probability? P(J,  M, A, B,  E)?

20 20 Calculate P(J,  M, A, B,  E) Read the joint pf from the graph: P(J, M, A, B, E) = P(B) P(E) P(A|B,E) P(J|A) P(M|A) Plug in the desired values: P(J,  M, A, B,  E) = P(B) P(  E) P(A|B,  E) P(J|A) P(  M|A) = 0.001 * 0.998 * 0.94 * 0.9 * 0.3 = 0.0002532924 How about P(B | J,  M) ? Remember, this means P(B=true | J=true, M=false)

21 21 Calculate P(B | J,  M) By marginalization:

22 22 Example Conditional independence is seen here –P(JohnCalls | MaryCalls, Alarm, Earthquake, Burglary) = P(JohnCalls | Alarm) –So JohnCalls is independent of MaryCalls, Earthquake, and Burglary, given Alarm Does this mean that an earthquake or a burglary do not influence whether or not John calls? –No, but the influence is already accounted for in the Alarm variable –JohnCalls is conditionally independent of Earthquake, but not absolutely independent of it

23 23 Naive Bayes model A common situation is when a single cause directly influences several variables, which are all conditionally independent, given the cause. e1e1 C e2e2 e3e3 Rain Wet grassPeople with umbrellas Car accidents P(C, e 1, e 2, e 3 ) = P(C) P(e 1 | C) P(e 2 | C) P(e 3 | C) In general,

24 24 Naive Bayes model Typical query for naive Bayes: –Given some evidence, what’s the probability of the cause? –P(C | e 1 ) = ? –P(C | e 1, e 3 ) = ? e1e1 C e2e2 e3e3 Rain Wet grassPeople with umbrellas Car accidents

25 25 Drawing belief nets What would a belief net look like if all the variables were fully dependent? But this isn’t the only way to draw the belief net when all the variables are fully dependent X1X1 X2X2 X3X3 X4X4 X5X5 P(X 1,X 2,X 3,X 4,X 5 ) = P(X 1 )P(X 2 |X 1 )P(X 3 |X 1,X 2 )P(X 4 |X 1,X 2,X 3 )P(X 5 |X 1,X 2,X 3,X 4 )

26 26 Fully connected belief net In fact, there are N! ways of connecting up a fully- connected belief net –That is, there are N! ways of ordering the nodes X1X1 X2X2 X1X1 X2X2 P(X 1,X 2 ) = ? For N=2 For N=5 X1X1 X2X2 X3X3 X4X4 X5X5 P(X 1,X 2,X 3,X 4,X 5 ) = ? and 119 others…

27 27 Drawing belief nets (cont.) Fully-connected net displays the joint distribution P(X 1, X 2, X 3, X 4, X 5 ) = P(X 1 ) P(X 2 |X 1 ) P(X 3 |X 1,X 2 ) P(X 4 |X 1,X 2,X 3 ) P(X 5 |X 1, X 2, X 3, X 4 ) X1X1 X2X2 X3X3 X4X4 X5X5 But what if there are conditionally independent variables? P(X 1, X 2, X 3, X 4, X 5 ) = P(X 1 ) P(X 2 |X 1 ) P(X 3 |X 1,X 2 ) P(X 4 |X 2,X 3 ) P(X 5 |X 3, X 4 ) X1X1 X2X2 X3X3 X4X4 X5X5

28 28 Drawing belief nets (cont.) What if the variables are all independent? P(X 1, X 2, X 3, X 4, X 5 ) = P(X 1 ) P(X 2 ) P(X 3 ) P(X 4 ) P(X 5 ) X1X1 X2X2 X3X3 X4X4 X5X5 What if the links are drawn like this: Not allowed – not a DAG X1X1 X2X2 X3X3 X4X4 X5X5

29 29 Drawing belief nets (cont.) What if the links are drawn like this: X1X1 X2X2 X3X3 X4X4 X5X5 P(X 1, X 2, X 3, X 4, X 5 ) = P(X 1 ) P(X 2 | X 3 ) P(X 3 | X 1 ) P(X 4 | X 2 ) P(X 5 | X 4 ) It can be redrawn like this: X1X1 X3X3 X2X2 X4X4 X5X5 All arrows going left-to-right


Download ppt "Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)"

Similar presentations


Ads by Google