Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence.

Similar presentations


Presentation on theme: "Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence."— Presentation transcript:

1 Bayesian Network

2 Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence assumptions that are often not true Also called Idiot Bayes Method for this reason. Bayesian Network Explicitly models the independence relationships in the data. Use these independence relationships to make probabilistic inferences. Also known as: Belief Net, Bayes Net, Causal Net, …

3 Why Bayesian Networks? Intuitive language Can utilize causal knowledge in constructing models Domain experts comfortable building a network General purpose “inference” algorithms P(Bad Battery | Has Gas, Won’t Start) Exact: Modular specification leads to large computational efficiencies Gas Start Battery

4 Random Variables A random variable is a set of exhaustive and mutually exclusive possibilities. Example: throwing a die:  small {1,2}  medium: {3,4}  large: {5,6} Medical data  patient’s age  blood pressure Variable vs. Event a variable taking a value = an event.

5 Independence of Variables Instantiation of a variable is an event. A set of variables are independent iff all possible instantiations of the variables are independent. Example: X: patient blood pressure {high, medium, low} Y: patient sneezes {yes, no} P(X=high, Y=yes) = P(X=high) x P(Y=yes) P(X=high, Y=no) = P(X=high) x P(Y=no)... P(X=low, Y=yes) = P(X=low) x P(Y=yes) P(X=low, Y=no) = P(X=low) x P(Y=no) Conditional independence between a set of variables holds iff the conditional independence between all possible instantiations of the variables holds.

6 Bayesian Networks: Definition Bayesian networks are directed acyclic graphs (DAGs). Nodes in Bayesian networks represent random variables, which is normally assumed to take on discrete values. The links of the network represent direct probabilistic influence. The structure of the network represents the probabilistic dependence/independence relationships between the random variables represented by the nodes.

7 Example

8 Bayesian Network: Probabilities The nodes and links are quantified with probability distributions. The root nodes (those with no ancestors) are assigned prior probability distributions. The other nodes are assigned with the conditional probability distribution of the node given its parents.

9 Example Conditional Probability Tables (CPTs)

10 Noisy-OR-Gate Exception Independence Noisy-OR-Gate—the exceptions to the causations are independent. P(E|C1, C2)=1-(1-P(E|C1))(1-P(E|C2))

11 Inference in Bayesian Networks Given a Bayesian network and its CPTs, we can compute probabilities of the following form: P(H | E 1, E 2,......, E n ) where H, E 1, E 2,......, E n are assignments to nodes (random variables) in the network. Example: The probability of family-out given lights out and hearing bark: P( fo |  lo, hb).

12 Example: Car Diagnosis

13 MammoNet

14 Applications of Bayesian Networks

15 Application of Bayesian Network: Diagnosis

16 ARCO1: Forecasting Oil Prices

17

18 Semantics of Belief Networks Two ways understanding the semantics of belief network Representation of the joint probability distribution Encoding of a collection of conditional independence statements

19 Terminology Descendent Ancestor Parent Non-descendent X Y1Y1 Y2Y2

20 Three Types of Connections

21 Connection Pattern and Independence Linear connection: The two end variables are usually dependent on each other. The middle variable renders them independent. Converging connection: The two end variables are usually independent on each other. The middle variable renders them dependent. Divergent connection: The two end variables are usually dependent on each other. The middle variable renders them independent.

22 D-Separation A variable a is d-separated from b by a set of variables E if there does not exist a d-connecting path between a and b such that None of its linear or diverging nodes is in E For each of the converging nodes, either it or one of its descendents is in E. Intuition: The influence between a and b must propagate through a d-connecting path

23 If a and b are d-separated by E, then they are conditionally independent of each other given E: P(a, b | E) = P(a | E) x P(b | E)

24 Example Independence Relationships

25 Chain Rule A joint probability distribution can be expressed as a product of conditional probabilities: P(X 1, X 2,..., Xn) =P(X 1 ) x P(X 2, X 1,..., Xn | X 1 ) =P(X 1 ) x P(X 2 |X 1 ) x P(X 3, X 4,..., Xn| X 1, X 2 ) =P(X 1 ) x P(X 2 |X 1 ) x P(X 3 |X 1,X 2 ) x P(X 4,..., Xn|X 1,X 2, X 3 )...... = P(X 1 ) x P(X 2 |X 1 ) x P(X 3 |X 1, X 2 )x...x P(Xn| X 1,...Xn -1 ) This has nothing to do with any independence assumption!

26 Compute the Joint Probability Given a Bayesian network, let X 1, X 2,..., Xn be an ordering of the nodes such that only the nodes that are indexed lower than i may have directed path to X i. Since the parents of X i, d-separates X i and all the other nodes that are indexed lower than i, P(X i | X 1,...X i -1 )=P(X i | parents(X i )) This probability is available in the Bayesian network. Therefore, P(X 1, X 2,..., Xn) can be computed from the probabilities available in Beyesian network.

27 P(fo, ¬lo, do, hb, ¬bp) = ?

28 What can Bayesian Networks Compute? The inputs to a Bayesian Network evaluation algorithm is a set of evidences: e.g., E = { hear-bark=true, lights-on=true } The outputs of Bayesian Network evaluation algorithm are P(X i =v | E) where X i is an variable in the network. For example: P(family-out=true| E) is the probability of family being out given hearing dog's bark and seeing the lights on.

29 Computation in Bayesian Networks Computation in Bayesian networks is NP-hard. All algorithms for computing the probabilities are exponential to the size of the network. There are two ways around the complexity barrier: Algorithms for special subclass of networks, e.g., singly connected networks. Approximate algorithms. The computation for singly connected graph is linear to the size of the network.

30 Bayesian Network Inference Inference: calculating P(X |Y ) for some variables or sets of variables X and Y. Inference in Bayesian networks is #P-hard! Reduces to How many satisfying assignments? I1I2I3I4I5 O Inputs: prior probabilities of.5 P(O) must be (#sat. assign.)*(.5^#inputs) www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

31 Bayesian Network Inference But…inference is still tractable in some cases. Let’s look a special class of networks: trees / forests in which each node has at most one parent. www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

32 Decomposing the probabilities Suppose we want P(X i | E ) where E is some set of evidence variables. Let’s split E into two parts: E i - is the part consisting of assignments to variables in the subtree rooted at X i E i + is the rest of it XiXi www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

33 Decomposing the probabilities XiXi Where:  is a constant independent of X i  (X i ) = P(X i |E i + ) (X i ) = P(E i - | X i ) www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

34 Using the decomposition for inference We can use this decomposition to do inference as follows. First, compute (X i ) = P(E i - | X i ) for all X i recursively, using the leaves of the tree as the base case. If X i is a leaf: If X i is in E : (X i ) = 0 if X i matches E, 1 otherwise If X i is not in E : E i - is the null set, so P(E i - | X i ) = 1 (constant) www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

35 Quick aside: “Virtual evidence” For theoretical simplicity, but without loss of generality, let’s assume that all variables in E (the evidence set) are leaves in the tree. Why can we do this WLOG: XiXi XiXi Xi’Xi’ Observe X i Equivalent to Observe X i ’ Where P( X i ’| X i ) =1 if X i ’=X i, 0 otherwise www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

36 Calculating (X i ) for non- leaves Suppose X i has one child, X j Then: XiXi XjXj www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

37 Calculating (X i ) for non- leaves Now, suppose X i has a set of children, C. Since X i d-separates each of its subtrees, the contribution of each subtree to (X i ) is independent: where j (X i ) is the contribution to P(E i - | X i ) of the part of the evidence lying in the subtree rooted at one of X i ’s children X j. www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

38 We are now -happy So now we have a way to recursively compute all the (X i )’s, starting from the root and using the leaves as the base case. If we want, we can think of each node in the network as an autonomous processor that passes a little “ message” to its parent. www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

39 The other half of the problem Remember, P(X i |E) =  (X i ) (X i ). Now that we have all the (X i )’s, what about the  (X i )’s?  (X i ) = P(X i |E i + ). What about the root of the tree, X r ? In that case, E r + is the null set, so  (X r ) = P(X r ). No sweat. Since we also know (X r ), we can compute the final P(X r ). So for an arbitrary X i with parent X p, let’s inductively assume we know  (X p ) and/or P(X p |E). How do we get  (X i )? www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

40 Computing  (X i ) XpXp XiXi Where  i (X p ) is defined as www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

41 We’re done. Yay! Thus we can compute all the  (X i )’s, and, in turn, all the P(X i |E)’s. Can think of nodes as autonomous processors passing and  messages to their neighbors    www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

42 Conjunctive queries What if we want, e.g., P(A, B | C) instead of just marginal distributions P(A | C) and P(B | C)? Just use chain rule: P(A, B | C) = P(A | C) P(B | A, C) Each of the latter probabilities can be computed using the technique just discussed. www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

43 Polytrees Technique can be generalized to polytrees: undirected versions of the graphs are still trees, but nodes can have more than one parent www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

44 Dealing with cycles Can deal with undirected cycles in graph by clustering variables together Conditioning A BC D A D BC Set to 0 Set to 1 www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

45 Join trees Arbitrary Bayesian network can be transformed via some evil graph-theoretic magic into a join tree in which a similar method can be employed. A B ED F C G ABC BCD DF In the worst case the join tree nodes must take on exponentially many combinations of values, but often works well in practice www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt


Download ppt "Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence."

Similar presentations


Ads by Google