Bayesian networks Chapter 14. Outline Syntax Semantics.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Probabilistic Reasoning (2)
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Bayesian Belief Network. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Announcements Homework 8 is out Final Contest (Optional)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian Networks Russell and Norvig: Chapter 14 CMCS421 Fall 2006.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Probabilistic Reasoning
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Bayesian Networks Material used 1 Random variables
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Announcements Project 4: Ghostbusters Homework 7
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
CS 416 Artificial Intelligence Lecture 16 Uncertainty Chapter 14 Lecture 16 Uncertainty Chapter 14.
A Brief Introduction to Bayesian networks
Another look at Bayesian inference
Reasoning Under Uncertainty: Belief Networks
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Web-Mining Agents Data Mining
Qian Liu CSE spring University of Pennsylvania
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Probabilistic Reasoning; Network-based reasoning
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
Class #16 – Tuesday, October 26
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Presentation transcript:

Bayesian networks Chapter 14

Outline Syntax Semantics

Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: –a set of nodes, one per variable –a directed, acyclic graph (link ≈ "directly influences") –a conditional distribution for each node given its parents: P (X i | Parents (X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values

Example Topology of network encodes conditional independence assertions: Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity

Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: –A burglar can set the alarm off –An earthquake can set the alarm off –The alarm can cause Mary to call –The alarm can cause John to call

Example contd.

Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values Each row requires one number p for X i = true (the number for X i = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n · 2 k ) numbers I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution For burglary net, = 10 numbers (vs = 31)

Semantics The full joint distribution is defined as the product of the local conditional distributions: P (X 1, …,X n ) = π i = 1 P (X i | Parents(X i )) e.g., P(j  m  a   b   e) = P (j | a) P (m | a) P (a |  b,  e) P (  b) P (  e) n

Constructing Bayesian networks 1. Choose an ordering of variables X 1, …,X n 2. For i = 1 to n –add X i to the network –select parents from X 1, …,X i-1 such that P (X i | Parents(X i )) = P (X i | X 1,... X i-1 ) This choice of parents guarantees: P (X 1, …,X n ) = π i =1 P (X i | X 1, …, X i-1 ) (chain rule) = π i =1 P (X i | Parents(X i )) (by construction) n n

Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? Example

Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J) No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? Example

Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J) No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)? Example

Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J) No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A,J, M) = P(E | A)? P(E | B, A, J, M) = P(E | A, B)? Example

Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J) No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A,J, M) = P(E | A)? No P(E | B, A, J, M) = P(E | A, B)? Yes Example

Example contd. Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Network is less compact: = 13 numbers needed

Inference Complexity of inference Bayes Net –Bayes Net reduces space complexity –Bayes Net does not reduce time complexity for general case

Conditional independence and D-separation Two sets of nodes, X and Y, are conditionally independent given an evidence set of nodes, E if every undirected path from a node in X to a node in Y is d-seperated by E. A set of nodes, E d-separates to sets of nodes, X and Y, if every undirected path from a node in X to a node in Y is blocked by E A path is blocked given E if there is a node Z on the path for which one of the following holds:

Conditional independence and D-separation - example

Inference Inference by enumeration P(B | j, m) = P(B, j, m) / P(j, m) =  P(B, j, m) =   a  e P(B, e, a, j, m) =  P(B)  e P(e)  a P(a | B,e) P(j | a) P(m | a) =  =

Enumeration algorithm

Time complexity Enumeration is inefficient (e.g., computes P(j|a)P(m|a) for each value of e) Dynamic programming (for example, by Variable Elimination)

Inference for polytrees

Time complexity Dynamic programming –Linear space and time complexity? –Works well for polytrees (where there is at most one path between any two nodes) –Doesn’t work for multiply connected networks (exponential space and time complexity on worst case)

Inference in multiply connected networks Clustering Cutset conditioning Stochastic methods –Monte Carlo –likelihood weighting –Markov Chain Monte Carlo Will be skipped in this course

Clustering Merge variables, replace their CPTs by their combined CPT

Cutset conditioning cutting off cycles to obtain polytree. Sum over all instantiations of cutset variables.

Cutset conditioning - example P(W|S) =  RC P(W|S,R,C) P(R,C|S) =  RC P(W|S,R) P(R,C|S) P(R,C|S) = P(R,S|C) P(C) / P(S) = P(R|C) P(S|C) P(C) / P(S) (Notice here that node C d-separates R and S) P(C = +)+ P(C = -)

Stochastic Inference It’s expensive to work with the full joint distribution… whether as a table or as a Bayes Network Is approximation good enough? Monte Carlo

Use samples to approximate solution –Simulated annealing used Monte Carlo theories to justify why random guesses and sometimes going uphill can lead to optimality More samples = better approximation –How many are needed? –Where should you take the samples?

Prior sampling An ability to model the prior probabilities of a set of random variables

Approximating true distribution With enough samples, perfect modeling is possible

Rejection sampling Compute P(X | e) –Use PriorSample (S PS ) and create N samples –Denote N e number of samples consistent with e (namely E = e) –Denote N ex number of samples consistent with E = e) AND X = x –P(X | e) can be computed from N ex / N e

Example –P(Rain | Sprinkler = true) –Use Bayes Net to generate 100 samples Suppose 73 have Sprinkler=false Suppose 27 have Sprinkler=true –8 have Rain=true –19 have Rain=false –P(Rain | Sprinkler=true) =

Problems with rejection sampling –Standard deviation of the error in probability is proportional to 1/sqrt(n), where n is the number of samples consistent with evidence –As problems become complex, number of samples consistent with evidence becomes small and it becomes harder to construct accurate estimates

Likelihood weighting We only want to generate samples that are consistent with the evidence, e –We’ll sample the Bayes Net, but we won’t let every random variable be sampled, some will be forced to produce a specific output

Example P (Rain | Sprinkler=true, WetGrass=true)

Example P (Rain | Sprinkler=true, WetGrass=true) –First, weight vector, w, set to 1.0

Example Notice that weight is reduced according to how likely an evidence variable’s output is given its parents –So final probability is a function of what comes from sampling the free variables while constraining the evidence variables

Comparing techniques –In likelihood weighting, attention is paid to evidence variables before samples are collected –In rejection sampling, evidence variables are considered after the sampling –Likelihood weighting isn’t as accurate as the true posterior distribution, P(z | e) because the sampled variables ignore evidence among z’s non-ancestors

Likelihood weighting –Uses all the samples –As evidence variables increase, it becomes harder to keep the weighting constant high and estimate quality drops

Gibbs sampling Start with an arbitrary state of the network, with Evidence variables set to their observed values Randomly sample a value for a non evidence variable, conditioned on its: parents, children and childrens parents (Markov blanket) Each state sampled contributes equally to the estimate of the query

Some Applications of BN  Medical diagnosis, e.g., lymph-node diseases  Troubleshooting of hardware/software systems  Fraud/uncollectible debt detection  Data mining  Analysis of genetic sequences  Data interpretation, computer vision, image understanding

Case study- Pathfinder system Diagnostic system for lymph-node diseases. 60 diseases and 100 symptoms and test-results. 14,000 probabilities Expert consulted to create the net. – 8 hours to determine variables. –35 hours for net topology. –40 hours for probability table values. Apparently, the experts found it quite easy to invent the causal links and probabilities. Pathfinder is now outperforming the world experts in diagnosis. Being extended to several dozen other medical domains.