Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.

Similar presentations


Presentation on theme: "CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks."— Presentation transcript:

1 CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks

2 Joint Probability

3 Marginal Probability

4 Conditional Probability

5 The Chain Rule I

6 Bayes’ Rule

7 More Bayes’ Rule

8 The Chain Rule II

9 Independence

10 Example: Independence

11 Example: Independence?

12 Conditional Independence

13

14 The Chain Rule III

15 Expectations

16 Expectations

17 Estimation

18 Estimation Problems with maximum likelihood estimates: - If I flip a coin once, and it’s heads, what’s the estimate for P(heads)? - What if I flip it 50 times with 27 heads? - What if I flip 10M times with 8M heads? Basic idea: - We have some prior expectation about parameters (here, the probability of heads) - Given little evidence, we should skew toward prior - Given lots of evidence, we should listen to data How can we accomplish this? Stay tuned!

19 Lewis Carroll's Pillow Problem

20

21 Bayesian Networks: Big Picture Two big problems with joint probability distributions: - Unless there are only a few variables, the distribution is too big to represent explicitly (Why?) - Hard to estimate anything empirically about more than a few variables at a time (Why?) Hard to compute answers to queries of the form P(y | a) (Why?) Bayesian networks are a technique for describing complex joint distributions (models) using a bunch of simple, local distributions - It describes how variables interact locally - Local interactions chain together to give global, indirect interactions - For about 10 min, we’ll be very vague about how these interactions are specified

22 Graphical Model Notation

23 Example: Coin Flips

24 Example: Traffic

25 Example: Traffic II

26 Example: Alarm Network

27 Bayesian Network Semantics

28 Example: Alarm Network

29 Size of a Bayes’ Net How big is a joint distribution over N Boolean variables? 2N How big is a Bayes net if each node has k parents? N 2k Both give you the power to calculate P(X1,X2,…,Xn) Bayesian Networks = Huge space savings! Also easier to elicit local CPTs Also turns out to be faster to answer queries (future class)

30 Building the (Entire) Joint

31 Example: Traffic

32 Example: Reverse Traffic

33 Causality? When Bayes’ nets reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts BNs need not actually be causal Sometimes no causal net exists over the domain E.g. consider the variables Traffic and RoofDrips End up with arrows that reflect correlation, not causation What do the arrows really mean? Topology may happen to encode causal structure Topology really encodes conditional independencies

34 Creating Bayes’ Nets So far, we talked about how any fixed Bayes’ net encodes a joint distribution Next: how to represent a fixed distribution as a Bayes’ net Key ingredient: conditional independence The exercise we did in “causal” assembly of BNs was a kind of intuitive use of conditional independence Now we have to formalize the process After that: how to answer queries (inference)

35 Conditional Independence


Download ppt "CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks."

Similar presentations


Ads by Google