Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a multidimensional distribution p(x). If for two features we.

Similar presentations


Presentation on theme: "Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a multidimensional distribution p(x). If for two features we."— Presentation transcript:

1 Bayesian Belief Networks

2 What does it mean for two variables to be independent? Consider a multidimensional distribution p(x). If for two features we know that p(xi,xj) = p(xi)p(xj) we say the features are statistically independent. If we know which features are independent and which not, we can simplify the computation of joint probabilities.

3 Figure 2.23

4 Bayesian Belief Networks A Bayesian Belief Network is a method to describe the joint probability distribution of a set of variables. It is also called a casual network or belief net. Let x1, x2, …, xn be a set of variables or features. A Bayesian Belief Network or BBN will tell us the probability of any combination of x1, x2,.., xn.

5 An Example Storm Bus Tour Group Lightning Thunder Campfire Forest Fire Set of Boolean variables and their relations:

6 Conditional Probabilities C 0.4 0.1 0.8 0.2 S,B S,~B ~S,B ~S,~B ~C 0.6 0.9 0.2 0.8 Storm Bus Tour Group Campfire

7 Conditional Independence We say x1 is conditionally independent of x2 given x3 if the probability of x1 is independent of x2 given x3: P(x1|x2,x3) = P(x1|x3) The same can be said for a set of variables: x1,x2,x3 is independent of y1,y2,y3 given z1,z2,z3: P(x1,x2,x2|y1,y2,y3,z1,z2,z3) = P(x1,x2,x3|z1,z2,z3)

8 Representation A BBN represents the joint probability distribution of a set of variables by explicitly indicating the assumptions of conditional independence through : a)directed acyclic graph and b)local conditional probabilities. conditional probabilities Storm Bus Tour Group Campfire

9 Representation Each variable is independent of its non-descendants given its predecessors We say x1 is a descendant of x2 if there is a direct path from x2 to x1. Example: Predecessors of Campfire: Storm, Bus Tour Group. (Campfire is a descendant of these two variables). Campfire is independent of Lightning given its predecessors. Lightning Storm Bus Tour Group Campfire

10 Figure 2.25

11 Joint Probability Distribution To compute the joint probability distribution of a set of variables given a Bayesian Belief Network we simply use the formula: P(x1,x2,…,xn) = Π i P(xi | Parents(xi)) Where parents are the immediate predecessors of xi. Example: P(Campfire, Storm, BusGroupTour, Lightning, Thunder, ForestFire)? P(Storm)P(BusTourGroup)P(Campfire|Storm,BusTourGroup) P(Lightning|Storm)P(Thunder|Lightning) P(ForestFire|Lightning,Storm,Campfire).

12 Joint Distribution, An Example P(Storm)P(BusTourGroup)P(Campfire|Storm,BusTourGroup) P(Lightning|Storm)P(Thunder|Lightning) P(ForestFire|Lightning,Storm,Campfire). Storm Bus Tour Group Lightning Thunder Campfire Forest Fire

13 Conditional Probabilities, An Example C 0.4 0.1 0.8 0.2 S,B S,~B ~S,B ~S,~B ~C 0.6 0.9 0.2 0.8 P(Campfire=true|Storm=true,BusTourGroup=true) = 0.4 Storm Bus Tour Group Campfire

14 Learning Belief Networks We can learn BBN in different ways. Two basic approaches follow: 1.Assume we know the network structure: We can estimate the conditional probabilities for each variable from the data. 2.Assume we know part of the structure but some variables are missing: This is like learning hidden units in a neural network. One can use a gradient ascent method to train the BBN. 3.Assume nothing is known. We can learn the structure and conditional probabilities by looking in the space of possible networks.

15 Naïve Bayes What is the connection between a BBN and classification? Suppose one of the variables is the target variable. Can we compute the probability of the target variable given the other variables? In Naïve Bayes: X1 X2 Xn … Concept wj P(x1,x2,…xn,wj) = P(wj) P(x1|wj) P(x2|wj) … P(xn|wj)

16 General Case In the general case we can use a BBN to specify independence assumptions among variables. General Case: X1 X2 x4 Concept wj P(x1,x2,…xn,wj) = P(wj) P(x1|wj) P(x2|wj) P(x3|x1,x2,wj)P(x4,wj) X3

17 Judea Pearl – coined the term in 1985 Professor at Univ. of California at LA Pioneer of Bayesian Networks

18 Application – Medical Domain Simple Diagnosis Smoker Cancer Lung Disease Positive Results Congestion

19 Constructing Bayesian Networks Choose the right order from causes to effects. P(x1,x2,…,xn) = P(xn|xn-1,..,x1)P(xn-1,…,x1) = Π P(xi|xi-1,…,x1) -- chain rule Example: P(x1,x2,x3) = P(x1|x2,x3)P(x2|x3)P(x3)

20 How to construct BBN P(x1,x2,x3) x3 x2 x1 root cause leaf Correct order: add root causes first, and then “leaves”, with no influence on other nodes.

21 Compactness BBN are locally structured systems. They represent joint distributions compactly. Assume n random variables, each influenced by k nodes. Size BBN: n2 k Full size: 2 n

22 Representing Conditional Distributions Even if k is small O(2 k ) may be unmanageable. Solution: use canonical distributions. Example: U.S. Canada Mexico North America simple disjunction

23 Noisy-OR Cold Flu Malaria Fever A link may be inhibited due to uncertainty

24 Noisy-OR Inhibitions probabilities: P(~fever | cold, ~flu, ~malaria) = 0.6 P(~fever | ~cold, flu, ~malaria) = 0.2 P(~fever | ~cold, ~flu, malaria) = 0.1

25 Noisy-OR Now the whole probability can be built: P(~fever | cold, ~flu, malaria) = 0.6 x 0.1 P(~fever | cold, flu, ~malaria) = 0.6 x 0.2 P(~fever | ~cold, flu, malaria) = 0.2 x 0.1 P(~fever | cold, flu, malaria) = 0.6 x 0.2 x 0.1 P(~fever | ~cold, ~flu, ~malaria) = 1.0

26 Continuous Variables Continuous variables can be discretized. Or define probability density functions Example: Gaussian distribution. A network with both variables is called a Hybrid Bayesian Network.

27 Continuous Variables Subsidy Harvest Cost Buys

28 Continuous Variables P(cost | harvest, subsidy) P(cost | harvest, ~subsidy) Normal distribution x P(x)

29 Summary Bayesian networks are directed acyclic graphs that concisely represent conditional independence relations among random variables. BBN specify the full joint probability distribution of a set of variables. BBN can be hybrid, combining categorical variables with numeric variables.


Download ppt "Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a multidimensional distribution p(x). If for two features we."

Similar presentations


Ads by Google