Bayesian Networks CS182/CogSci110/Ling109 Spring 2007 Leon Barrett a.k.a. belief nets, bayes nets.

Slides:



Advertisements
Similar presentations
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Advertisements

Identifying Conditional Independencies in Bayes Nets Lecture 4.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Bayesian Networks Russell and Norvig: Chapter 14 CMCS424 Fall 2003 based on material from Jean-Claude Latombe, Daphne Koller and Nir Friedman.
Belief Networks Russell and Norvig: Chapter 15 CS121 – Winter 2002.
Bayesian Networks Alan Ritter.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
1 Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14,
Bayesian Networks Russell and Norvig: Chapter 14 CMCS421 Fall 2006.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Advanced Artificial Intelligence
CSCI 121 Special Topics: Bayesian Network Lecture #1: Reasoning Under Uncertainty.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical.
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Belief Networks CS121 – Winter Other Names Bayesian networks Probabilistic networks Causal networks.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Reasoning Under Uncertainty: Belief Networks
CS 2750: Machine Learning Directed Graphical Models
Bayesian Networks Chapter 14 Section 1, 2, 4.
Russell and Norvig: Chapter 14 CMCS424 Fall 2005
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
Conditional Probability, Bayes’ Theorem, and Belief Networks
Learning Bayesian Network Models from Data
CSCI 121 Special Topics: Bayesian Networks Lecture #2: Bayes Nets
Read R&N Ch Next lecture: Read R&N
Uncertainty in AI.
CAP 5636 – Advanced Artificial Intelligence
Bayesian Statistics and Belief Networks
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence
Belief Networks CS121 – Winter 2003 Belief Networks.
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Warm-up as you walk in Each node in a Bayes net represents a conditional probability distribution. What distribution do you get when you multiply all of.
Probabilistic Reasoning
Bayesian networks (1) Lirong Xia. Bayesian networks (1) Lirong Xia.
Presentation transcript:

Bayesian Networks CS182/CogSci110/Ling109 Spring 2007 Leon Barrett a.k.a. belief nets, bayes nets

Bayes Nets Representation of probabilistic information Representation of probabilistic information –reasoning with uncertainty Example tasks Example tasks –Diagnose a disease from symptoms –Predict real-world information from noisy sensors –Process speech –Parse natural language

This lecture Basic probability Basic probability –distributions –conditional distributions –Bayes' rule Bayes nets Bayes nets –representation –independence –algorithms –specific types of nets  Markov chains, HMMs

Probability Random Variables Random Variables –Boolean/Discrete  True/false  Cloudy/rainy/sunny  e.g. die roll, coin flip –Continuous  [0,1] (i.e. 0.0 <= x <= 1.0)  e.g. thrown dart position, amount of rainfall

Unconditional Probability Probability Distribution Probability Distribution –In absence of any other info –Sums to 1 –for discrete variable, it's a table –E.g. P(Sunny) =.65 (thus, P(¬Sunny) =.35) –for discrete variables, it's a table

Continuous Probability Probability Density Function Probability Density Function –Continuous variables –E.g. Uniform, Gaussian, Poisson…

Joint Probability Probability of several variables being set at the same time Probability of several variables being set at the same time –e.g. P(Weather,Season) Still sums to 1 Still sums to 1 2-D table 2-D table P(Weather, Season) P(Weather, Season) Full Joint is a joint of all variables in model Full Joint is a joint of all variables in model Can get “marginal” of one variable Can get “marginal” of one variable –sum over the ones we don't care about

Conditional Probability P(Y | X) is probability of Y given that all we know is the value of X P(Y | X) is probability of Y given that all we know is the value of X –E.g. P(cavity | toothache) =.8  thus P( ¬cavity | toothache) =.2 Product Rule Product Rule –P(X, Y) = P(Y | X) P(X) –P(Y | X) = P(X, Y) / P(X)(normalizer to add up to 1 ) YX

Conditional Probability Example P(disease=true) = ; P(disease=false) = P(disease=true) = ; P(disease=false) = test 99% accurate: test 99% accurate: Compute joint probabilities Compute joint probabilities –P(test=positive, disease=true) = * 0.99 = –P(test=positive, disease=false) = * 0.01 = –P(test=positive) = =

Bayes' Rule Result of product rule Result of product rule –P(X, Y) = P(Y | X) P(X) = P(X | Y) P(Y) P(X | Y) = P(Y | X) P(X) / P(Y) P(X | Y) = P(Y | X) P(X) / P(Y) P(disease | test) = P(test | disease) * P(disease) / P(test) P(disease | test) = P(test | disease) * P(disease) / P(test)

Conditional Probability Example (Revisited) P(disease=true) = ; P(disease=false) = P(disease=true) = ; P(disease=false) = test 99% accurate: test 99% accurate: P(disease=true | test=positive) = P(disease=true, test=positive) / P(test=positive) P(disease=true | test=positive) = P(disease=true, test=positive) / P(test=positive) = / = = 9% = / = = 9%

Important equations P(X,Y) = P(X | Y) P(Y) = P(Y | X) P(X) P(X,Y) = P(X | Y) P(Y) = P(Y | X) P(X) P(Y | X) = P(X | Y) P(Y) / P(X) P(Y | X) = P(X | Y) P(Y) / P(X) Chain Rule of Probability Chain Rule of Probability P(x 1,x 2,x 3,…,x k ) = P(x 1 )P(x 2 |x 1 )P(x 3 |x 1,x 2 )…P(x k |x 1,x 2,…,x k-1 ) P(x 1,x 2, x 3 ) P(x 1,x 2 )

Bayes Nets Diseas e Test result

Causal reasoning Diseas e Test result Diseas e Test result ≠

Causal reasoning Diseas e Test result not just probabilistic reasoning not just probabilistic reasoning causal reasoning causal reasoning –arrow direction has important meaning manipulating causes changes outcomes manipulating causes changes outcomes manipulating outcomes does not change causes manipulating outcomes does not change causes

Bayes Nets Diseas e Test result Shaded means observed Shaded means observed –we know the value of the variable –then we calculate P(net | observed)

Example: Markov Chain Joint probability = P(A,B,C,D) Joint probability = P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C)(by C.R.) ABCD

Example: Markov Chain Joint probability = P(A,B,C,D) Joint probability = P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C)(by C.R.) ABCD P(D|A,B,C) = P(D|C)

Example: Markov Chain Joint probability = P(A,B,C,D) Joint probability = P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C)(by C.R.) ABCD = P(A)P(B|A)P(C|B)P(D|C)

Example: Markov Chain Joint probability = P(A)P(B|A)P(C|B)P(D|C) Joint probability = P(A)P(B|A)P(C|B)P(D|C) ABCD

Example: Markov Chain Joint probability = P(A)P(B|A)P(C|B)P(D|C) Joint probability = P(A)P(B|A)P(C|B)P(D|C) ABCD

Variable Elimination General idea: Write query in the form Write query in the form Iteratively Iteratively –Move all irrelevant terms outside of innermost sum –Perform innermost sum, getting a new term –Insert the new term into the product

Example: Alarm Five state features A: Alarm A: Alarm B: Burglary B: Burglary E: Earthquake E: Earthquake J: JohnCalls J: JohnCalls M: MaryCalls M: MaryCalls

A Simple Bayes Net Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls causes effects Directed acyclic graph (DAG)

Assigning Probabilities to Roots Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls P(B) P(E)

Conditional Probability Tables TFTFTTFF P(A| … )EB Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls P(B) P(E) Size of the CPT for a node with k parents: 2 k+1

Conditional Probability Tables TFTFTTFF P(A| … )EB Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls P(B) P(E) TF P(J|…)A TF P(M|…)A

What the BN Means TFTFTTFF P(A| … )EB Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls P(B) P(E) TF P(J|…)A TF P(M|…)A P(x 1,x 2,…,x n ) =  i=1,…,n P(x i |Parents(X i ))

Calculation of Joint Probability TFTFTTFF P(A| … )EB Burgl ary Earthqu ake Ala rm MaryC alls JohnC alls P(B) P(E) TF P(J|…)A TF P(M|…)A P(J  M  A   B   E) = P(J|A)P(M|A)P(A|  B,  E)P(  B)P(  E) = 0.9 x 0.7 x x x =

Background: Independence Marginal independence: Marginal independence: X ⊥ Y := P(X,Y) = P(X)P(Y) in other words, P(X|Y) = P(X) P(Y|X) = P(Y) Conditional independence Conditional independence X ⊥ Y | Z := P(X, Y | Z) = P(X | Z)P(Y | Z) or := P(X | Y, Z) = P(X | Z) or := P(X | Y, Z) = P(X | Z) Recall that P(x|y) = P(x,y)/P(y)

What the BN Encodes Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or  Alarm Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or  Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or  Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or  Alarm Burg lary Earthq uake Ala rm MaryC alls John Calls For example, John does not observe any burglaries directly

What the BN Encodes Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or  Alarm Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or  Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or  Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or  Alarm Burg lary Earthq uake Ala rm MaryC alls John Calls For instance, the reasons why John and Mary may not call if there is an alarm are unrelated

Independence Say we want to know the probability of some variable (e.g. JohnCalls) given evidence on another (e.g. Alarm). What variables are relevant to this calculation? Say we want to know the probability of some variable (e.g. JohnCalls) given evidence on another (e.g. Alarm). What variables are relevant to this calculation? I.e.: Given an arbitrary graph G = (V,E), is X A ⊥ X B |X C for some A,B, and C? I.e.: Given an arbitrary graph G = (V,E), is X A ⊥ X B |X C for some A,B, and C? The answer can be read directly off the graph, using a notion called D-separation The answer can be read directly off the graph, using a notion called D-separation

Independence Three cases: Three cases:

Independence (1) Markov Chain (linear) ABC ABC ~(X A ⊥ X C ) X A ⊥ X c |X B

Independence Three cases: Three cases: (2) Common Cause Model (diverging) ABC ABC ~(X A ⊥ X C ) X A ⊥ X c |X B

Independence Three cases: Three cases: (3) “Explaining away” (converging) ABC ABC XA⊥XCXA⊥XC ~(X A ⊥ X c |X B )

Structure of BN The relation: P(x 1,x 2,…,x n ) =  i=1,…,n P(x i |Parents(X i )) means that each belief is independent of its predecessors in the BN given its parents The relation: P(x 1,x 2,…,x n ) =  i=1,…,n P(x i |Parents(X i )) means that each belief is independent of its predecessors in the BN given its parents Said otherwise, the parents of a belief X i are all the beliefs that “directly influence” X i Said otherwise, the parents of a belief X i are all the beliefs that “directly influence” X i E.g., JohnCalls is influenced by Burglary, but not directly. JohnCalls is directly influenced by Alarm

Locally Structured Domain Size of CPT: 2 k+1, where k is the number of parents Size of CPT: 2 k+1, where k is the number of parents In a locally structured domain, each belief is directly influenced by relatively few other beliefs and k is small In a locally structured domain, each belief is directly influenced by relatively few other beliefs and k is small BN are better suited for locally structured domains BN are better suited for locally structured domains

Inference Patterns Bur glar y Earth quake Al ar m Mary Calls John Calls Diagnostic Bur glar y Earth quake Al ar m Mary Calls John Calls Causal Bur glar y Earth quake Al ar m Mary Calls John Calls Intercausal Bur glar y Earth quake Al ar m Mary Calls John Calls Mixed Basic use of a BN: Given new observations, compute the new strengths of some (or all) beliefs Other use: Given the strength of a belief, which observation should we gather to make the greatest change in this belief’s strength?

What can Bayes nets be used for? Posterior probabilities –Probability of any event given any evidence Most likely explanation –Scenario that explains evidence Rational decision making –Maximize expected utility –Value of Information Effect of intervention –Causal analysis Earthq uake Radio Burgla ry Alarm Call Radio Call Figure from N. Friedman Explaining away effect

Inference Ex. 2 Ra in Sprin kler Clou dy WetGra ss Algorithm is computing not individual probabilities, but entire tables Two ideas crucial to avoiding exponential blowup: because of the structure of the BN, some subexpression in the joint depends only on a small number of variables By computing them once and caching the result, we can avoid generating them exponentially many times

Hidden Markov Models Observe effects of hidden state Observe effects of hidden state Hidden state changes over time Hidden state changes over time We have a model of how it changes We have a model of how it changes E.g. speech recognition E.g. speech recognition ABCD A a bcd

Types Of Nodes On A Path Ra dio Batt ery SparkPl ugs Sta rts Ga s Mo ves linear converging diverging

Independence Relations In BN Ra dio Batt ery SparkPl ugs Sta rts Ga s Mo ves linear converging diverging Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E

Independence Relations In BN Ra dio Batt ery SparkPl ugs Sta rts Ga s Mo ves linear converging diverging Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E Gas and Radio are independent given evidence on SparkPlugs

Independence Relations In BN Ra dio Batt ery SparkPl ugs Sta rts Ga s Mo ves linear converging diverging Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E Gas and Radio are independent given evidence on Battery

Independence Relations In BN Ra dio Batt ery SparkPl ugs Sta rts Ga s Mo ves linear converging diverging Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E Gas and Radio are independent given no evidence, but they are dependent given evidence on Starts or Moves