Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a multidimensional distribution p(x). If for two features we.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,
Bayesian Networks CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Jim Little Nov (Textbook 6.3)
3/19. Conditional Independence Assertions We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables.
Review: Bayesian learning and inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
CPSC 322, Lecture 28Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Computer Science cpsc322, Lecture 28 (Textbook Chpt 6.3)
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Bayesian Networks Alan Ritter.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Read R&N Ch Next lecture: Read R&N
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.
Artificial Intelligence CS 165A Thursday, November 29, 2007  Probabilistic Reasoning / Bayesian networks (Ch 14)
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Baye’s Rule.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Introduction to Bayesian Networks
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Chapter 6 Bayesian Learning
CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1,
Representation of CPTs CH discrete Canonical distribution: standard Deterministic nodes: values computable exactly from parent nodes Noisy-OR relations:
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Introduction on Graphic Models
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
A Brief Introduction to Bayesian networks
Another look at Bayesian inference
Reasoning Under Uncertainty: Belief Networks
Review of Probability.
Reasoning Under Uncertainty: More on BNets structure and construction
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Problems on Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Reasoning Under Uncertainty: More on BNets structure and construction
Read R&N Ch Next lecture: Read R&N
CSE-490DF Robotics Capstone
Data Mining Classification: Alternative Techniques
Pattern Recognition and Image Analysis
LECTURE 07: BAYESIAN ESTIMATION
Read R&N Ch Next lecture: Read R&N
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Presented by Uroš Midić
Chapter 14 February 26, 2004.
Presentation transcript:

Bayesian Belief Networks

What does it mean for two variables to be independent? Consider a multidimensional distribution p(x). If for two features we know that p(xi,xj) = p(xi)p(xj) we say the features are statistically independent. If we know which features are independent and which not, we can simplify the computation of joint probabilities.

Figure 2.23

Bayesian Belief Networks A Bayesian Belief Network is a method to describe the joint probability distribution of a set of variables. It is also called a casual network or belief net. Let x1, x2, …, xn be a set of variables or features. A Bayesian Belief Network or BBN will tell us the probability of any combination of x1, x2,.., xn.

An Example Storm Bus Tour Group Lightning Thunder Campfire Forest Fire Set of Boolean variables and their relations:

Conditional Probabilities C S,B S,~B ~S,B ~S,~B ~C Storm Bus Tour Group Campfire

Conditional Independence We say x1 is conditionally independent of x2 given x3 if the probability of x1 is independent of x2 given x3: P(x1|x2,x3) = P(x1|x3) The same can be said for a set of variables: x1,x2,x3 is independent of y1,y2,y3 given z1,z2,z3: P(x1,x2,x2|y1,y2,y3,z1,z2,z3) = P(x1,x2,x3|z1,z2,z3)

Representation A BBN represents the joint probability distribution of a set of variables by explicitly indicating the assumptions of conditional independence through : a)directed acyclic graph and b)local conditional probabilities. conditional probabilities Storm Bus Tour Group Campfire

Representation Each variable is independent of its non-descendants given its predecessors We say x1 is a descendant of x2 if there is a direct path from x2 to x1. Example: Predecessors of Campfire: Storm, Bus Tour Group. (Campfire is a descendant of these two variables). Campfire is independent of Lightning given its predecessors. Lightning Storm Bus Tour Group Campfire

Figure 2.25

Joint Probability Distribution To compute the joint probability distribution of a set of variables given a Bayesian Belief Network we simply use the formula: P(x1,x2,…,xn) = Π i P(xi | Parents(xi)) Where parents are the immediate predecessors of xi. Example: P(Campfire, Storm, BusGroupTour, Lightning, Thunder, ForestFire)? P(Storm)P(BusTourGroup)P(Campfire|Storm,BusTourGroup) P(Lightning|Storm)P(Thunder|Lightning) P(ForestFire|Lightning,Storm,Campfire).

Joint Distribution, An Example P(Storm)P(BusTourGroup)P(Campfire|Storm,BusTourGroup) P(Lightning|Storm)P(Thunder|Lightning) P(ForestFire|Lightning,Storm,Campfire). Storm Bus Tour Group Lightning Thunder Campfire Forest Fire

Conditional Probabilities, An Example C S,B S,~B ~S,B ~S,~B ~C P(Campfire=true|Storm=true,BusTourGroup=true) = 0.4 Storm Bus Tour Group Campfire

Learning Belief Networks We can learn BBN in different ways. Two basic approaches follow: 1.Assume we know the network structure: We can estimate the conditional probabilities for each variable from the data. 2.Assume we know part of the structure but some variables are missing: This is like learning hidden units in a neural network. One can use a gradient ascent method to train the BBN. 3.Assume nothing is known. We can learn the structure and conditional probabilities by looking in the space of possible networks.

Naïve Bayes What is the connection between a BBN and classification? Suppose one of the variables is the target variable. Can we compute the probability of the target variable given the other variables? In Naïve Bayes: X1 X2 Xn … Concept wj P(x1,x2,…xn,wj) = P(wj) P(x1|wj) P(x2|wj) … P(xn|wj)

General Case In the general case we can use a BBN to specify independence assumptions among variables. General Case: X1 X2 x4 Concept wj P(x1,x2,…xn,wj) = P(wj) P(x1|wj) P(x2|wj) P(x3|x1,x2,wj)P(x4,wj) X3

Judea Pearl – coined the term in 1985 Professor at Univ. of California at LA Pioneer of Bayesian Networks

Application – Medical Domain Simple Diagnosis Smoker Cancer Lung Disease Positive Results Congestion

Constructing Bayesian Networks Choose the right order from causes to effects. P(x1,x2,…,xn) = P(xn|xn-1,..,x1)P(xn-1,…,x1) = Π P(xi|xi-1,…,x1) -- chain rule Example: P(x1,x2,x3) = P(x1|x2,x3)P(x2|x3)P(x3)

How to construct BBN P(x1,x2,x3) x3 x2 x1 root cause leaf Correct order: add root causes first, and then “leaves”, with no influence on other nodes.

Compactness BBN are locally structured systems. They represent joint distributions compactly. Assume n random variables, each influenced by k nodes. Size BBN: n2 k Full size: 2 n

Representing Conditional Distributions Even if k is small O(2 k ) may be unmanageable. Solution: use canonical distributions. Example: U.S. Canada Mexico North America simple disjunction

Noisy-OR Cold Flu Malaria Fever A link may be inhibited due to uncertainty

Noisy-OR Inhibitions probabilities: P(~fever | cold, ~flu, ~malaria) = 0.6 P(~fever | ~cold, flu, ~malaria) = 0.2 P(~fever | ~cold, ~flu, malaria) = 0.1

Noisy-OR Now the whole probability can be built: P(~fever | cold, ~flu, malaria) = 0.6 x 0.1 P(~fever | cold, flu, ~malaria) = 0.6 x 0.2 P(~fever | ~cold, flu, malaria) = 0.2 x 0.1 P(~fever | cold, flu, malaria) = 0.6 x 0.2 x 0.1 P(~fever | ~cold, ~flu, ~malaria) = 1.0

Continuous Variables Continuous variables can be discretized. Or define probability density functions Example: Gaussian distribution. A network with both variables is called a Hybrid Bayesian Network.

Continuous Variables Subsidy Harvest Cost Buys

Continuous Variables P(cost | harvest, subsidy) P(cost | harvest, ~subsidy) Normal distribution x P(x)

Summary Bayesian networks are directed acyclic graphs that concisely represent conditional independence relations among random variables. BBN specify the full joint probability distribution of a set of variables. BBN can be hybrid, combining categorical variables with numeric variables.