Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Artificial Intelligence Universitatea Politehnica Bucuresti Adina Magda Florea
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Belief networks Conditional independence Syntax and semantics Exact inference Approximate inference CS 460, Belief Networks1 Mundhenk and Itti Based.
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Identifying Conditional Independencies in Bayes Nets Lecture 4.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Introduction of Probabilistic Reasoning and Bayesian Networks
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian networks Chapter 14 Section 1 – 2.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 14 Jim Martin.
Bayesian Belief Networks
Bayesian Belief Network. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
1 Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14,
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Probabilistic Reasoning
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Bayesian Networks Material used 1 Random variables
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Artificial Intelligence CS 165A Thursday, November 29, 2007  Probabilistic Reasoning / Bayesian networks (Ch 14)
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1,
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
A Brief Introduction to Bayesian networks
CS 188: Artificial Intelligence Spring 2007
CS 2750: Machine Learning Directed Graphical Models
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
Bayesian networks (1) Lirong Xia Spring Bayesian networks (1) Lirong Xia Spring 2017.
Bayesian Networks Probability In AI.
Probabilistic Reasoning; Network-based reasoning
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
CAP 5636 – Advanced Artificial Intelligence
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Bayesian networks (1) Lirong Xia. Bayesian networks (1) Lirong Xia.
Presentation transcript:

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Example of Marginalization Using the full joint distribution P(cavity)= P(cavity, toothache, catch) + P(cavity, toothache,  catch) + P(cavity,  toothache, catch) + P(cavity,  toothache,  catch) = = 0.2

Inference By Enumeration using Full Joint Distribution Let X be a random variable about which we want to know its probabilities, given some evidence (values e for a set E of other variables). Let the remaining (unobserved, so-called hidden) variables be Y. The query is P(X|e), and it can be answered using the full joint distribution by

Example of Inference By Enumeration using Full Joint Distribution

Independence Propositions a and b are independent if and only if Equivalently (by product rule): Equivalently:

Illustration of Independence We know (product rule) that

Illustration continued Allows us to represent a 32-element table for full joint on Weather, Toothache, Catch, Cavity by an 8-element table for the joint of Toothache, Catch, Cavity, and a 4-element table for Weather. If we add a Boolean variable X to the 8- element table, we get 16 elements. A new 2-element table suffices with independence.

Difficulty with Bayes’ Rule with More than Two Variables

Conditional Independence X and Y are conditionally independent given Z if and only if P(X,Y|Z) = P(X|Z) P(Y|Z). Y 1,…,Y n are conditionally independent given X 1,…,X m if and only if P(Y 1,…,Y n |X 1,…,X m )= P(Y 1 |X 1,…,X m ) P(Y 2 |X 1,…,X m ) … P(Y m |X 1,…,X m ). We’ve reduced 2 n 2 m to 2n2 m. Additional conditional independencies may reduce 2 m.

Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y, Z) = P(X|Z) and P(Y|X, Z) = P(Y|Z)

Benefits of Conditional Independence Allows probabilistic systems to scale up (tabular representations of full joint distributions quickly become too large.) Conditional independence is much more commonly available than is absolute independence.

Decomposing a Full Joint by Conditional Independence Might assume Toothache and Catch are conditionally independent given Cavity: P(Toothache,Catch|Cavity) = P(Toothache|Cavity) P(Catch|Cavity). Then P(Toothache,Catch,Cavity) = [product rule] P(Toothache,Catch|Cavity) P(Cavity) = [conditional independence] P(Toothache|Cavity) P(Catch|Cavity) P(Cavity).

Naive Bayes Algorithm Let Fi be the i-th feature having valuej and Out be the target feature. We can use training data to estimate: P(F i = v j ) P(F i = v j | Out = True) P(F i = v j | Out = False) P(Out = True) P(Out = False)

Naive Bayes Algorithm For a test example described by F 1 = v 1,..., F n = v n, we need to compute: P(Out = True | F 1 = v 1,..., F n = v n ) Applying Bayes rule: P(Out = True | F 1 = v 1,..., F n = v n ) = P(F 1 = v 1,..., F n = v n | Out = True) P(Out = True) _______________________________________ P(F 1 = v 1,..., F n = v n )

Naive Bayes Algorithm By independence assumption: P(F 1 = v 1,..., F n = v n ) = P(F 1 = v 1 )x...x P(F n = v n ) This leads to conditional independence: P(F 1 = v 1,..., F n = v n | Out = True) = P(F 1 = v 1 | Out = True) x...x P(F n = v n | Out = True)

Naive Bayes Algorithm P(Out = True | F 1 = v 1,..., F n = v n ) = P(F 1 = v 1 | Out = True) x...x P(F n = v n | Out = True)x P(Out = True) _______________________________________ P(F 1 = v 1 )x...x P(F n = v n ) All terms are computed using the training data! Works well despite of strong assumptions(see [Domingos and Pazzani MLJ 97]) and thus provides a simple benchmark testset accuracy for a new data set

Bayesian Networks: Motivation Although the full joint distribution can answer any question about the domain it can become intractably large as the number of variable grows. Specifying probabilities for atomic events is rather unnatural and may be very difficult. Use a graphical representation for which we can more easily investigate the complexity of inference and can search for efficient inference algorithms.

Bayesian Networks Capture independence and conditional independence where they exist, thus reducing the number of probabilities that need to be specified. It represents dependencies among variables and encodes a concise specification of the full joint distribution.

A Bayesian Network is a... Directed Acyclic Graph (DAG) in which … … the nodes denote random variables … each node X has a conditional probability distribution P(X|Parents(X)). The intuitive meaning of an arc from X to Y is that X directly influences Y.

Additional Terminology If X and its parents are discrete, we can represent the distribution P(X|Parents(X)) by a conditional probability table (CPT) specifying the probability of each value of X given each possible combination of settings for the variables in Parents(X). A conditioning case is a row in this CPT (a setting of values for the parent nodes). Each row must sum to 1.

Bayesian Network Semantics A Bayesian Network completely specifies a full joint distribution over its random variables, as below -- this is its meaning. P In the above, P(x 1,…,x n ) is shorthand notation for P(X 1 =x 1,…,X n =x n ).

Inference Example What is probability alarm sounds, but neither a burglary nor an earthquake has occurred, and both John and Mary call? Using j for John Calls, a for Alarm, etc.:

Chain Rule Generalization of the product rule, easily proven by repeated application of the product rule. Chain Rule:

Chain Rule and BN Semantics

Example of the Key Property The following conditional independence holds: P (MaryCalls |JohnCalls, Alarm, Earthquake, Burglary) = P(MaryCalls | Alarm)

Procedure for BN Construction Choose relevant random variables. While there are variables left:

Principles to Guide Choices Goal: build a locally structured (sparse) network -- each component interacts with a bounded number of other components. Add root causes first, then the variables that they influence.