CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Review: Bayesian learning and inference
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
CS 188: Artificial Intelligence Fall 2009 Lecture 15: Bayes’ Nets II – Independence 10/15/2009 Dan Klein – UC Berkeley.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Constructing Belief Networks: Summary [[Decide on what sorts of queries you are interested in answering –This in turn dictates what factors to model in.
CS 188: Artificial Intelligence Spring 2009 Lecture 15: Bayes’ Nets II -- Independence 3/10/2009 John DeNero – UC Berkeley Slides adapted from Dan Klein.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
© Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors.
CS 188: Artificial Intelligence Fall 2009 Lecture 14: Bayes’ Nets 10/13/2009 Dan Klein – UC Berkeley.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Bayes’ Nets [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at
Advanced Artificial Intelligence
Announcements  Midterm 1  Graded midterms available on pandagrader  See Piazza for post on how fire alarm is handled  Project 4: Ghostbusters  Due.
Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Probabilistic Models  Models describe how (a portion of) the world works  Models are always simplifications  May not account for every variable  May.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
QUIZ!!  T/F: Traffic, Umbrella are cond. independent given raining. TRUE  T/F: Fire, Smoke are cond. Independent given alarm. FALSE  T/F: BNs encode.
Announcements Project 4: Ghostbusters Homework 7
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
QUIZ!! T/F: Probability Tables PT(X) do not always sum to one. FALSE
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 188: Artificial Intelligence Spring 2007
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Bayesian networks (1) Lirong Xia Spring Bayesian networks (1) Lirong Xia Spring 2017.
Probabilistic Models Models describe how (a portion of) the world works Models are always simplifications May not account for every variable May not account.
CS 4/527: Artificial Intelligence
CS 4/527: Artificial Intelligence
Structure and Semantics of BN
CHAPTER 7 BAYESIAN NETWORK INDEPENDENCE BAYESIAN NETWORK INFERENCE MACHINE LEARNING ISSUES.
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
CS 188: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CSE 473: Artificial Intelligence Autumn 2011
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Spring 2007
Structure and Semantics of BN
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
CS 188: Artificial Intelligence Spring 2006
Bayesian networks (1) Lirong Xia. Bayesian networks (1) Lirong Xia.
CS 188: Artificial Intelligence Fall 2008
Presentation transcript:

CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks

Joint Probability

Marginal Probability

Conditional Probability

The Chain Rule I

Bayes’ Rule

More Bayes’ Rule

The Chain Rule II

Independence

Example: Independence

Example: Independence?

Conditional Independence

The Chain Rule III

Expectations

Expectations

Estimation

Estimation Problems with maximum likelihood estimates: - If I flip a coin once, and it’s heads, what’s the estimate for P(heads)? - What if I flip it 50 times with 27 heads? - What if I flip 10M times with 8M heads? Basic idea: - We have some prior expectation about parameters (here, the probability of heads) - Given little evidence, we should skew toward prior - Given lots of evidence, we should listen to data How can we accomplish this? Stay tuned!

Lewis Carroll's Pillow Problem

Bayesian Networks: Big Picture Two big problems with joint probability distributions: - Unless there are only a few variables, the distribution is too big to represent explicitly (Why?) - Hard to estimate anything empirically about more than a few variables at a time (Why?) Hard to compute answers to queries of the form P(y | a) (Why?) Bayesian networks are a technique for describing complex joint distributions (models) using a bunch of simple, local distributions - It describes how variables interact locally - Local interactions chain together to give global, indirect interactions - For about 10 min, we’ll be very vague about how these interactions are specified

Graphical Model Notation

Example: Coin Flips

Example: Traffic

Example: Traffic II

Example: Alarm Network

Bayesian Network Semantics

Example: Alarm Network

Size of a Bayes’ Net How big is a joint distribution over N Boolean variables? 2N How big is a Bayes net if each node has k parents? N 2k Both give you the power to calculate P(X1,X2,…,Xn) Bayesian Networks = Huge space savings! Also easier to elicit local CPTs Also turns out to be faster to answer queries (future class)

Building the (Entire) Joint

Example: Traffic

Example: Reverse Traffic

Causality? When Bayes’ nets reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts BNs need not actually be causal Sometimes no causal net exists over the domain E.g. consider the variables Traffic and RoofDrips End up with arrows that reflect correlation, not causation What do the arrows really mean? Topology may happen to encode causal structure Topology really encodes conditional independencies

Creating Bayes’ Nets So far, we talked about how any fixed Bayes’ net encodes a joint distribution Next: how to represent a fixed distribution as a Bayes’ net Key ingredient: conditional independence The exercise we did in “causal” assembly of BNs was a kind of intuitive use of conditional independence Now we have to formalize the process After that: how to answer queries (inference)

Conditional Independence