Download presentation

Presentation is loading. Please wait.

Published byJulianne Duty Modified over 2 years ago

1
Lirong Xia Bayesian networks (1) Thursday, Feb 21, 2014

2
Reassigned Project 1 due today midnight –please upload as a new assignment on LMS –dont update old project 1 For Q6 and Q7, remember to briefly write done what is your heuristic Project 2 due on Feb 28 Homework 1 out on Feb 28 Ask questions on piazza, come to OH 1 Reminder

3
Probability –marginal probability of events –joint probability of events –conditional probability of events –independent of events Random variables –marginal distribution –joint distribution –conditional distribution –independent variables 2 Last class

4
A sample space Ω –states of the world, or equivalently, outcomes –only need to model the states that are relevant to the problem A probability mass function p over the sample space –for all a Ω, p( a )0 –Σ a Ω p( a )=1 3 Probability space Temperature Weathe r p Ω hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3

5
An event is a subset of the sample space An atomic event is a single outcome in Ω Given –a probability space ( Ω, p) –an event A The marginal probability of A, denoted by p( A ) is Σ a A p( a ) Example –p(sun) = 0.6 –p(hot) = Events and marginal probability of an event Temperature Weathe r p Ω hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3

6
Given –a probability space –two events A and B The joint probability of A and B is p(A and B) 5 Joint probability A B A and B Sample space

7
Conditional probability p(A | B) = p(A and B) / p(B) A B A and B Sample space p(A | B)p(B) = p(A and B) = p(B | A)P(A) p(A | B) = p(B | A)p(A)/p(B) –Bayes rule

8
Given –a probability space –two events A and B A and B are independent, if and only if –p(A and B) = p(A)p(B) –equivalently, p(A|B) = p(A) Interpretation –Knowing that the state of the world is in B does not affect the probability that the state of the world is in A (vs. not in A) Different from [A and B are disjoint] 7 Independent events

9
Random variables and joint distributions 8 A random variable is a variable with a domain –Random variables: capital letters, e.g. W, D, L –values: small letters, e.g. w, d, l A joint distribution over a set of random variables: X 1, X 2, …, X n specifies a real number for each assignment (or outcome) –p(X 1 = x 1, X 2 = x 2, …, X n = x n ) –p( x 1, x 2, …, x n ) This is a special (structured) probability space –Sample space Ω: all combinations of values –probability mass function is p A probabilistic model is a joint distribution over a set of random variables –will be our focus of this course TWp hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3

10
Marginal Distributions 9 Marginal distributions are sub-tables which eliminate variables Marginalization (summing out): combine collapsed rows by adding TWp hotsun0.4 hotrain0.1 coldsun0.2 coldrain W: sun rain T: hot cold

11
Conditional Distributions 10 Conditional distributions are probability distributions over some variables given fixed values of others TWp hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3 Wp sun0.8 rain0.2 Wp sun0.4 rain0.6 Conditional DistributionsJoint Distributions

12
Independence 11 Two variables are independent in a joint distribution if for all x, y, the events X= x and Y= y are independent: –The joint distribution factors into a product of two simple ones –Usually variables arent independent! Can use independence as a modeling assumption

13
The Chain Rule 12 Write any joint distribution as an incremental product of conditional distributions Why is this always true?

14
Conditional independence Bayesian networks –definitions –independence 13 Todays schedule

15
Conditional Independence among random variables 14 p(Toothache, Cavity, Catch) If I dont have a cavity, the probability that the probe catches in it doesnt depend on whether I have a toothache: – p(+Catch|+Toothache,-Cavity) = p(+Catch|-Cavity) The same independence holds if I have a cavity: – p(+Catch|+Toothache,+Cavity) = p(+Catch|+Cavity) Catch is conditionally independent of toothache given cavity: –p(Catch|Toothache,Cavity) = p(Catch|Cavity) Equivalent statements: –p(Toothache|Catch,Cavity) = p(Toothache|Cavity) –p(Toothache,Catch|Cavity) = p(Toothache|Cavity)×p(Catch|Cavity) –One can be derived from the other easily (part of Homework 1)

16
Conditional Independence 15 Unconditional (absolute) independence very rare Conditional independence is our most basic and robust form of knowledge about uncertain environments: Definition: X and Y are conditionally independent given Z, if – x, y, z : p( x, y | z )=p( x | z ) × p( y | z ) –or equivalently, x, y, z : p( x | z, y )=p( x | z ) –X, Y, Z are random variables –written as X Y|Z What about Traffic, Umbrella, raining? What about fire, smoke, alarm? Brain teaser: in a probabilistic model with three random variables XYZ –If X and Y are independent, can we say X and Y are conditionally independent given Z –If X and Y are conditionally independent given Z, can we say X and Y are independent? –Bonus questions in Homework 1

17
The Chain Rule 16 p(X 1,…, X n )=p(X 1 ) p(X 2 |X 1 ) p(X 3 |X 1, X 2 )… Trivial decomposition: –p(Traffic, Rain, Umbrella) = p(Rain) p(Traffic|Rain) p(Umbrella|Traffic,Rain) With assumption of conditional independence: –Traffic Umbrella|Rain –p(Traffic, Rain, Umbrella) = p(Rain) p(Traffic|Rain) p(Umbrella|Rain) Bayesian networks/ graphical models help us express conditional independence assumptions

18
Bayesian networks: Big Picture 17 Using full joint distribution tables –Representation: n random variables, at least 2 n entries –Computation: hard to learn (estimate) anything empirically about more than a few variables at a time STWp summerhotsun0.30 summerhotrain0.05 summercoldsun0.10 summercoldrain0.05 winterhotsun0.10 winterhotrain0.05 wintercoldsun0.15 wintercoldrain0.20 Bayesian networks: a technique for describing complex joint distributions (models) using simple, local distributions (conditional probabilities) –More properly called graphical models –We describe how variables locally interact –Local interactions chain together to give global, indirect interactions

19
Example Bayesian networks: Car 18 Initial observation: car wont start Orange: broken nodes Green: testable evidence Gray: hidden variables to ensure sparse structure, reduce parameters

20
Graphical Model Notation 19 Nodes: variables (with domains) Arcs: interactions –Indicate direct influence between variables –Formally: encode conditional independence (more later) For now: imagine that arrows mean direct causation (in general, they dont!)

21
Example: Coin Flips 20 n independent coin flips (different coins) No interactions between variables: independence –Really? How about independent flips of the same coin? –How about a skillful coin flipper? –Bottom line: build an application-oriented model

22
Example: Traffic 21 Variables: –R: It rains –T: There is traffic Model 1: independence Model 2: rain causes traffic Which model is better?

23
Example: Burglar Alarm Network 22 Variables: –B: Burglary –A: Alarm goes off –M: Mary calls –J: John calls –E: Earthquake!

24
Bayesian network 23 Definition of Bayesian network (Bayes net or BN) A set of nodes, one per variable X A directed, acyclic graph A conditional distribution for each node –A collection of distributions over X, one for each combination of parents values p(X| a 1,…, a n ) –CPT: conditional probability table –Description of a noisy causal process A Bayesian network = Topology (graph) + Local Conditional Probabilities

25
Probabilities in BNs 24 Bayesian networks implicitly encode joint distributions –As a product of local conditional distributions –Example: This lets us reconstruct any entry of the full joint Not every BN can represent every joint distribution –The topology enforces certain conditional independencies

26
Example: Coin Flips 25 h0.5 t … h t h t Only distributions whose variables are absolutely independent can be represented by a Bayesian network with no arcs.

27
Example: Traffic 26 +r r r+t r-t r+t r-t 0.50

28
Example: Alarm Network 27 BEAp(A|B,E) +b+e+a0.95 +b+e-a0.05 +b-e+a0.94 +b-e-a0.06 -b+e+a0.29 -b+e-a0.71 -b-e+a b-e-a0.999 AJp(J|A) +a+j0.9 +a-j0.1 -a+j0.05 -a-j0.95 AMp(M|A) +a+m0.7 +a-m0.3 -a+m0.01 -a-m0.99 Bp(B) +b b0.999 Ep(E) +e e0.998

29
Size of a Bayesian network 28 How big is a joint distribution over N Boolean variables? –2 N How big is an N-node net if nodes have up to k parents? –O(N×2 k+1 ) Both give you the power to calculate p(X 1,…, X n ) BNs: Huge space savings! Also easier to elicit local CPTs Also turns out to be faster to answer queries

30
Bayesian networks 29 So far: how a Bayesian network encodes a joint distribution Next: how to answer queries about that distribution –Key idea: conditional independence –Main goal: answer queries about conditional independence and influence from the graph After that: how to answer numerical queries (inference)

31
Conditional Independence in a BN 30 Important question about a BN: –Are two nodes independent given certain evidence? –If yes, can prove using algebra (tedious in general) –If no, can prove with a counter example –Example: X: pressure, Y: rain, Z: traffic –Question: are X and Z necessarily independent? Answer: no. Example: low pressure causes rain, which causes traffic X can influence Z, Z can influence X (via Y)

32
Causal Chains 31 This configuration is a causal chain –Is X independent of Z given Y? –Evidence along the chain blocks the influence X: Low pressure Y: Rain Z: Traffic Yes!

33
Common Cause 32 Another basic configuration: two effects of the same cause –Are X and Z independent? –Are X and Z independent given Y? –Observing the cause blocks influence between effects. Y: Project due X: Newsgroup busy Z: Lab full Yes!

34
Common Effect 33 Last configuration: two causes of one effect (v-structure) –Are X and Z independent? Yes: the ballgame and the rain cause traffic, but they are not correlated Still need to prove they must be (try it!) –Are X and Z independent given Y? No: seeing traffic puts the rain and the ballgame in competition as explanation? –This is backwards from the other cases Observing an effect activates influence between possible causes. X: Raining Z: Ballgame Y: Traffic

35
The General Case 34 Any complex example can be analyzed using these three canonical cases General question: in a given BN, are two variables independent (given evidence)? Solution: analyze the graph

36
Reachability 35 Recipe: shade evidence nodes Attempt 1: if two nodes are connected by an undirected path not blocked by a shaded node, they are NOT conditionally independent Almost works, but not quite –Where does it break? –Answer: the v-structure at T doesnt count as a link in a path unless active

37
Reachability (D-Separation) 36 Question: are X and Y conditionally independent given evidence vars {Z}? –Yes, if X and Y separated by Z –Look for active paths from X to Y –No active paths = independence! A path is active if each triple is active: –Causal chain where B is unobserved (either direction) –Common cause where B is unobserved –Common effect where B or one of its descendents is observed All it takes to block a path is a single inactive segment

38
Example 37 Yes!

39
Example 38 Yes!

40
Example 39 Yes! Variables: –R: Raining –T: Traffic –D: Roof drips –S: I am sad Questions:

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google