Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Networks CS182/CogSci110/Ling109 Spring 2007 Leon Barrett a.k.a. belief nets, bayes nets.

Similar presentations


Presentation on theme: "Bayesian Networks CS182/CogSci110/Ling109 Spring 2007 Leon Barrett a.k.a. belief nets, bayes nets."— Presentation transcript:

1 Bayesian Networks CS182/CogSci110/Ling109 Spring 2007 Leon Barrett lbarrett@eecs.berkeley.edu a.k.a. belief nets, bayes nets

2 Bayes Nets Representation of probabilistic information Representation of probabilistic information –reasoning with uncertainty Example tasks Example tasks –Diagnose a disease from symptoms –Predict real-world information from noisy sensors –Process speech –Parse natural language

3 This lecture Basic probability Basic probability –distributions –conditional distributions –Bayes' rule Bayes nets Bayes nets –representation –independence –algorithms –specific types of nets  Markov chains, HMMs

4 Probability Random Variables Random Variables –Boolean/Discrete  True/false  Cloudy/rainy/sunny  e.g. die roll, coin flip –Continuous  [0,1] (i.e. 0.0 <= x <= 1.0)  e.g. thrown dart position, amount of rainfall

5 Unconditional Probability Probability Distribution Probability Distribution –In absence of any other info –Sums to 1 –for discrete variable, it's a table –E.g. P(Sunny) =.65 (thus, P(¬Sunny) =.35) –for discrete variables, it's a table

6 Continuous Probability Probability Density Function Probability Density Function –Continuous variables –E.g. Uniform, Gaussian, Poisson…

7 Joint Probability Probability of several variables being set at the same time Probability of several variables being set at the same time –e.g. P(Weather,Season) Still sums to 1 Still sums to 1 2-D table 2-D table P(Weather, Season) P(Weather, Season) Full Joint is a joint of all variables in model Full Joint is a joint of all variables in model Can get “marginal” of one variable Can get “marginal” of one variable –sum over the ones we don't care about

8 Conditional Probability P(Y | X) is probability of Y given that all we know is the value of X P(Y | X) is probability of Y given that all we know is the value of X –E.g. P(cavity | toothache) =.8  thus P( ¬cavity | toothache) =.2 Product Rule Product Rule –P(X, Y) = P(Y | X) P(X) –P(Y | X) = P(X, Y) / P(X)(normalizer to add up to 1 ) YX

9 Conditional Probability Example P(disease=true) = 0.001 ; P(disease=false) = 0.999 P(disease=true) = 0.001 ; P(disease=false) = 0.999 test 99% accurate: test 99% accurate: Compute joint probabilities Compute joint probabilities –P(test=positive, disease=true) = 0.001 * 0.99 = 0.00099 –P(test=positive, disease=false) = 0.999 * 0.01 = 0.00999 –P(test=positive) = 0.00099 + 0.00999 = 0.01098

10 Bayes' Rule Result of product rule Result of product rule –P(X, Y) = P(Y | X) P(X) = P(X | Y) P(Y) P(X | Y) = P(Y | X) P(X) / P(Y) P(X | Y) = P(Y | X) P(X) / P(Y) P(disease | test) = P(test | disease) * P(disease) / P(test) P(disease | test) = P(test | disease) * P(disease) / P(test)

11 Conditional Probability Example (Revisited) P(disease=true) = 0.001 ; P(disease=false) = 0.999 P(disease=true) = 0.001 ; P(disease=false) = 0.999 test 99% accurate: test 99% accurate: P(disease=true | test=positive) = P(disease=true, test=positive) / P(test=positive) P(disease=true | test=positive) = P(disease=true, test=positive) / P(test=positive) = 0.00099 / 0.01098 = 0.0901 = 9% = 0.00099 / 0.01098 = 0.0901 = 9%

12 Important equations P(X,Y) = P(X | Y) P(Y) = P(Y | X) P(X) P(X,Y) = P(X | Y) P(Y) = P(Y | X) P(X) P(Y | X) = P(X | Y) P(Y) / P(X) P(Y | X) = P(X | Y) P(Y) / P(X) Chain Rule of Probability Chain Rule of Probability P(x 1,x 2,x 3,…,x k ) = P(x 1 )P(x 2 |x 1 )P(x 3 |x 1,x 2 )…P(x k |x 1,x 2,…,x k-1 ) P(x 1,x 2, x 3 ) P(x 1,x 2 )

13 Bayes Nets Diseas e Test result

14 Causal reasoning Diseas e Test result Diseas e Test result ≠

15 Causal reasoning Diseas e Test result not just probabilistic reasoning not just probabilistic reasoning causal reasoning causal reasoning –arrow direction has important meaning manipulating causes changes outcomes manipulating causes changes outcomes manipulating outcomes does not change causes manipulating outcomes does not change causes

16 Bayes Nets Diseas e Test result Shaded means observed Shaded means observed –we know the value of the variable –then we calculate P(net | observed)

17 Example: Markov Chain Joint probability = P(A,B,C,D) Joint probability = P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C)(by C.R.) ABCD

18 Example: Markov Chain Joint probability = P(A,B,C,D) Joint probability = P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C)(by C.R.) ABCD P(D|A,B,C) = P(D|C)

19 Example: Markov Chain Joint probability = P(A,B,C,D) Joint probability = P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C)(by C.R.) ABCD = P(A)P(B|A)P(C|B)P(D|C)

20 Example: Markov Chain Joint probability = P(A)P(B|A)P(C|B)P(D|C) Joint probability = P(A)P(B|A)P(C|B)P(D|C) ABCD

21 Example: Markov Chain Joint probability = P(A)P(B|A)P(C|B)P(D|C) Joint probability = P(A)P(B|A)P(C|B)P(D|C) ABCD

22 Variable Elimination General idea: Write query in the form Write query in the form Iteratively Iteratively –Move all irrelevant terms outside of innermost sum –Perform innermost sum, getting a new term –Insert the new term into the product

23 Example: Alarm Five state features A: Alarm A: Alarm B: Burglary B: Burglary E: Earthquake E: Earthquake J: JohnCalls J: JohnCalls M: MaryCalls M: MaryCalls

24 A Simple Bayes Net Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls causes effects Directed acyclic graph (DAG)

25 Assigning Probabilities to Roots Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls 0.001 P(B) 0.002 P(E)

26 Conditional Probability Tables 0.95 0.94 0.29 0.001 TFTFTTFF P(A| … )EB Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls 0.001 P(B) 0.002 P(E) Size of the CPT for a node with k parents: 2 k+1

27 Conditional Probability Tables 0.95 0.94 0.29 0.001 TFTFTTFF P(A| … )EB Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls 0.001 P(B) 0.002 P(E) 0.90 0.05 TF P(J|…)A 0.70 0.01 TF P(M|…)A

28 What the BN Means 0.95 0.94 0.29 0.001 TFTFTTFF P(A| … )EB Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls 0.001 P(B) 0.002 P(E) 0.90 0.05 TF P(J|…)A 0.70 0.01 TF P(M|…)A P(x 1,x 2,…,x n ) =  i=1,…,n P(x i |Parents(X i ))

29 Calculation of Joint Probability 0.95 0.94 0.29 0.001 TFTFTTFF P(A| … )EB Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls 0.001 P(B) 0.002 P(E) 0.90 0.05 TF P(J|…)A 0.70 0.01 TF P(M|…)A P(J  M  A   B   E) = P(J|A)P(M|A)P(A|  B,  E)P(  B)P(  E) = 0.9 x 0.7 x 0.001 x 0.999 x 0.998 = 0.00062

30 Background: Independence Marginal independence: Marginal independence: X ⊥ Y := P(X,Y) = P(X)P(Y) in other words, P(X|Y) = P(X) P(Y|X) = P(Y) Conditional independence Conditional independence X ⊥ Y | Z := P(X, Y | Z) = P(X | Z)P(Y | Z) or := P(X | Y, Z) = P(X | Z) or := P(X | Y, Z) = P(X | Z) Recall that P(x|y) = P(x,y)/P(y)

31 What the BN Encodes Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or  Alarm Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or  Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or  Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or  Alarm Burg lary Earthq uake Ala rm MaryC alls John Calls For example, John does not observe any burglaries directly

32 What the BN Encodes Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or  Alarm Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or  Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or  Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or  Alarm Burg lary Earthq uake Ala rm MaryC alls John Calls For instance, the reasons why John and Mary may not call if there is an alarm are unrelated

33 Independence Say we want to know the probability of some variable (e.g. JohnCalls) given evidence on another (e.g. Alarm). What variables are relevant to this calculation? Say we want to know the probability of some variable (e.g. JohnCalls) given evidence on another (e.g. Alarm). What variables are relevant to this calculation? I.e.: Given an arbitrary graph G = (V,E), is X A ⊥ X B |X C for some A,B, and C? I.e.: Given an arbitrary graph G = (V,E), is X A ⊥ X B |X C for some A,B, and C? The answer can be read directly off the graph, using a notion called D-separation The answer can be read directly off the graph, using a notion called D-separation

34 Independence Three cases: Three cases:

35 Independence (1) Markov Chain (linear) ABC ABC ~(X A ⊥ X C ) X A ⊥ X c |X B

36 Independence Three cases: Three cases: (2) Common Cause Model (diverging) ABC ABC ~(X A ⊥ X C ) X A ⊥ X c |X B

37 Independence Three cases: Three cases: (3) “Explaining away” (converging) ABC ABC XA⊥XCXA⊥XC ~(X A ⊥ X c |X B )

38 Structure of BN The relation: P(x 1,x 2,…,x n ) =  i=1,…,n P(x i |Parents(X i )) means that each belief is independent of its predecessors in the BN given its parents The relation: P(x 1,x 2,…,x n ) =  i=1,…,n P(x i |Parents(X i )) means that each belief is independent of its predecessors in the BN given its parents Said otherwise, the parents of a belief X i are all the beliefs that “directly influence” X i Said otherwise, the parents of a belief X i are all the beliefs that “directly influence” X i E.g., JohnCalls is influenced by Burglary, but not directly. JohnCalls is directly influenced by Alarm

39 Locally Structured Domain Size of CPT: 2 k+1, where k is the number of parents Size of CPT: 2 k+1, where k is the number of parents In a locally structured domain, each belief is directly influenced by relatively few other beliefs and k is small In a locally structured domain, each belief is directly influenced by relatively few other beliefs and k is small BN are better suited for locally structured domains BN are better suited for locally structured domains

40 Inference Patterns Bur glar y Earth quake Al ar m Mary Calls John Calls Diagnostic Bur glar y Earth quake Al ar m Mary Calls John Calls Causal Bur glar y Earth quake Al ar m Mary Calls John Calls Intercausal Bur glar y Earth quake Al ar m Mary Calls John Calls Mixed Basic use of a BN: Given new observations, compute the new strengths of some (or all) beliefs Other use: Given the strength of a belief, which observation should we gather to make the greatest change in this belief’s strength?

41 What can Bayes nets be used for? Posterior probabilities –Probability of any event given any evidence Most likely explanation –Scenario that explains evidence Rational decision making –Maximize expected utility –Value of Information Effect of intervention –Causal analysis Earthq uake Radio Burgla ry Alarm Call Radio Call Figure from N. Friedman Explaining away effect

42 Inference Ex. 2 Ra in Sprin kler Clou dy WetGra ss Algorithm is computing not individual probabilities, but entire tables Two ideas crucial to avoiding exponential blowup: because of the structure of the BN, some subexpression in the joint depends only on a small number of variables By computing them once and caching the result, we can avoid generating them exponentially many times

43 Hidden Markov Models Observe effects of hidden state Observe effects of hidden state Hidden state changes over time Hidden state changes over time We have a model of how it changes We have a model of how it changes E.g. speech recognition E.g. speech recognition ABCD A a bcd

44 Types Of Nodes On A Path Ra dio Batt ery SparkPl ugs Sta rts Ga s Mo ves linear converging diverging

45 Independence Relations In BN Ra dio Batt ery SparkPl ugs Sta rts Ga s Mo ves linear converging diverging Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E

46 Independence Relations In BN Ra dio Batt ery SparkPl ugs Sta rts Ga s Mo ves linear converging diverging Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E Gas and Radio are independent given evidence on SparkPlugs

47 Independence Relations In BN Ra dio Batt ery SparkPl ugs Sta rts Ga s Mo ves linear converging diverging Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E Gas and Radio are independent given evidence on Battery

48 Independence Relations In BN Ra dio Batt ery SparkPl ugs Sta rts Ga s Mo ves linear converging diverging Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E Gas and Radio are independent given no evidence, but they are dependent given evidence on Starts or Moves


Download ppt "Bayesian Networks CS182/CogSci110/Ling109 Spring 2007 Leon Barrett a.k.a. belief nets, bayes nets."

Similar presentations


Ads by Google