Baye’s Rule.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Belief networks Conditional independence Syntax and semantics Exact inference Approximate inference CS 460, Belief Networks1 Mundhenk and Itti Based.
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Exact Inference in Bayes Nets
Identifying Conditional Independencies in Bayes Nets Lecture 4.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Jim Little Nov (Textbook 6.3)
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Artificial Intelligence Probabilistic reasoning Fall 2008 professor: Luigi Ceccaroni.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Bayesian Belief Network. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
CPSC 322, Lecture 28Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Computer Science cpsc322, Lecture 28 (Textbook Chpt 6.3)
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Bayesian Networks Alan Ritter.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Probabilistic Reasoning
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a multidimensional distribution p(x). If for two features we.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Announcements Project 4: Ghostbusters Homework 7
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14.
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Computer Science CPSC 322 Lecture 27 Conditioning Ch Slide 1.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
Representation of CPTs CH discrete Canonical distribution: standard Deterministic nodes: values computable exactly from parent nodes Noisy-OR relations:
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Introduction on Graphic Models
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Inference in Bayesian Networks
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence
Class #16 – Tuesday, October 26
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Chapter 14 February 26, 2004.
Presentation transcript:

Baye’s Rule

Baye’s Rule and Reasoning Allows use of uncertain causal knowledge Knowledge: given a cause what is the likelihood of seeing particular effects (conditional probabilities) Reasoning: Seeing some effects, how do we infer the likelihood of a cause. This can be very complicated: need joint probability distribution of (k+1) variables, i.e., 2k+1 numbers. Use conditional independence to simplify expressions. Allows sequential step by step computation

Bayesian/Belief Network To avoid problems of enumerating large joint probabilities Use causal knowledge and independence to simplify reasoning, and draw inferences

Bayesian Networks Also called Belief Network or probabilistic network Nodes – random variables, one variable per node Directed Links between pairs of nodes. AB A has a direct influence on B With no directed cycles A conditional distribution for each node given its parents Cavity Toothache Catch Weather Must determine the Domain specific topology.

Bayesian Networks Next step is to determine the conditional probability distribution for each variable. Represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of the parent value. Once CPT is determined, the full joint probability distribution is represented by the network. The network provides a complete description of a domain.

Belief Networks: Example If you go to college, this will effect the likelihood that you will study and the likelihood that you will party. Studying and partying effect your chances of exam success, and partying effects your chances of having fun. Variables: College, Study, Party, Exam (success), Fun Causal Relations: College will affect studying College will affect parting Studying and partying will affect exam success Partying affects having fun. College Study Party Fun Exam

College example: CPTs Discrete Variables only in this format CPT P(C) 0.2 College Party Study Fun Exam C P(S) True 0.8 False 0.2 C P(P) True 0.6 False 0.5 S P P(E) True 0.6 False 0.9 0.1 0.2 P P(F) True 0.9 False 0.7 CPT Discrete Variables only in this format

Belief Networks: Compactness A CPT for Boolean variable Xi with k Boolean parents is 2k rows for combinations of parent values Each row requires one number p for Xi = true (the number Xi = false is 1-p) Row must sum to 1. Conditional Probability If each variable had no more than k parents, then complete network requires O(n2k) numbers i.e., the numbers grow linearly in n vs. O(2n) for the full joint distribution College net has 1+2+2+4+2=11 numbers

Belief Networks: Joint Probability Distribution Calculation Global semantics defines the full joint distribution as the product of local distributions: Can use the networks to make inferences. College Study Party 0.2*0.8*0.4*0.9*0.3 = 0.01728 Fun Exam Every value in a full joint probability distribution can be calculated.

College example: CPTs 0.2*0.8*0.4*0.9*0.3 = 0.01728 P(C) 0.2 C P(S) Party Study Fun Exam C P(S) True 0.8 False 0.2 C P(P) True 0.6 False 0.5 S P P(E) True 0.6 False 0.9 0.1 0.2 P P(F) True 0.9 False 0.7 0.2*0.8*0.4*0.9*0.3 = 0.01728

Network Construction Must ensure network and distribution are good representations of the domain. Want to rely on conditional independence relationships. First, rewrite the joint distribution in terms of the conditional probability. Repeat for each conjunctive probability Chain Rule

Network Construction Note is equivalent to: where the partial order is defined by the graph structure. The above equation says that the network correctly represents the domain only if each node is conditionally independent of its predecessors in the node ordering, given the node’s parents. Means: Parents of Xi needs to contain all nodes in X1,…,Xi-1 that have a direct influence on Xi.

College example: P(F|C, S, P, E) = P(F|P) P(C) 0.2 C P(S) True 0.8 Party Study Fun Exam C P(S) True 0.8 False 0.2 C P(P) True 0.6 False 0.5 S P P(E) True 0.6 False 0.9 0.1 0.2 P P(F) True 0.9 False 0.7 P(F|C, S, P, E) = P(F|P)

Compact Networks Bayesian networks are sparse, therefore, much more compact than full joint distribution. Sparse: each subcomponent interacts directly with a bounded number of other nodes independent of the total number of components. Usually linearly bounded complexity. College net has 1+2+2+4+2=11 numbers Fully connected domain = full joint distribution. Must determine the correct network topology. Add “root causes” first then the variables that they influence.

Network Construction Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics Choose an ordering of variables X1, …., Xn For i = 1 to n add Xi to network select parents from X1, …, Xi-1 such that P(Xi |Parents(Xi)) = P(Xi | X1,… Xi-1 ) The choice of parents guarantees the global semantics

Constructing Baye’s networks: Example Choose an ordering F, E, P, S, C Fun P(E|F)=P(E)? Exam P(P|F)=P(P)? Study P(S|F,E)=P(S|E)? Party P(S|F,E)=P(S)? College P(C|F,E,P,S)=P(C|P,S)? P(C|F,E,P,S)=P(C)? Note that this network has additional dependencies

Compact Networks Fun Exam College Study Party Study Party College Fun

Network Construction: Alternative Start with topological semantics that specifies the conditional independence relationships. Defined by either: A node is conditionally independent of its non-descendants, given its parents. A node is conditionally independent of all other nodes given its parents, children, and children’s parents: Markov Blanket. Then reconstruct the CPTs.

Network Construction: Alternative X Each node is conditionally independent of its non-descendants given its parents Local semantics  Global semantics Exam is independent of College, given the values of Study and Party.

Network Construction: Alternative … U1 Um Each node is conditionally independent of its parents, children and children’s parents. – Markov Blanket Z1j X Znj College is independent of fun, given Party. Y1 Yn …

Canonical Distribution Completing a node’s CPT requires up to O(2k) numbers. (k – number of parents) If the parent child relationship is arbitrary, than can be difficult to do. Standard patterns can be named along with a few parameters to satisfy the CPT. Canonical distribution

Deterministic Nodes Simplest form is to use deterministic nodes. A value is specified exactly by its parent’s values. No uncertainty. But what about relationships that are uncertain? If someone has a fever do they have a cold, the flu, or a stomach bug? Can you have a cold or stomach bug without a fever?

Noisy-Or Relationships A Noisy-or relationship permits uncertainty related to the each parent causing a child to be true. The causal relationship may be inhibited. Assumes: All possible causes are known. Can have a miscellaneous category if necessary (leak node) Inhibition of a particular parent is independent of inhibiting other parents. Can you have a cold or stomach bug without a fever? Fever is true iff cold, Flu, or Malaria is true.

Example Given:

Example Cold Flu Malaria P( Fever) F 1.0 T 0.1 0.2 0.6 Requires O(k) parameters rather than O(2k) Cold Flu Malaria P( Fever) F 1.0 T 0.1 0.2 0.6 0.2 * 0.1 = 0.02 0.6 * 0.1 = 0.06 0.6 * 0.2 = 0.12 0.6 * 0.2 * 0.1 = 0.012

Networks with Continuous Variables How are continuous variables represented? Discretization using intervals Can result in loss of accuracy and large CPTs Define probability density functions specified by a finite number of parameters. i.e. Gaussian distribution

Hybrid Bayesian Networks Contains both discrete and continuous variables. Specification of such a network requires: Conditional distribution for a continuous variable with discrete or continuous parents. Conditional distribution for a discrete variable with continuous parents.

Example subsidy harvest Cost Buys Continuous child with a discrete parent and a continuous parent Discrete parent Continuous parent subsidy harvest Continuous parent is represented as a distribution. Cost c depends on the distribution function for h. Cost Discrete parent is Explicitly enumerated. A linear Gaussian distribution can be used. Have to define the distribution for both values of subsidy. Buys

Example subsidy harvest Cost Buys Discrete child with a continuous parent subsidy harvest Set a threshold for cost. Can use a integral of the standard normal distribution. Continuous parent Cost Underlying decision process has a hard threshold but the Threshold’s location moves based upon random Gaussian noise. Probit Distribution Buys Discrete child

Example Probit distribution Logit distribution Usually a better fit for real problems Logit distribution Uses sigmoid function to determine threshold. Can be mathematically easier to work with.

Baye’s Networks and Exact Inference Notation X: Query variable E: set of evidence variables E1,…Em e: a particular observed event Y: set of nonevidence variables Y1,…Ym Also called hidden variables. The complete set of variables: A query: P(X|e)

College example: CPTs P(C) 0.2 C P(S) True 0.8 False 0.2 C P(P) True Party Study Fun Exam C P(S) True 0.8 False 0.2 C P(P) True 0.6 False 0.5 S P P(E) True 0.6 False 0.9 0.1 0.2 P P(F) True 0.9 False 0.7

Example Query If you succeeded on an exam and had fun, what is the probability of partying? P(Party|Exam=true, Fun=true)

Inference by Enumeration From Chap 13 we know: From this Chapter we have: P(x,b,y) in the joint distribution can be represented as products of the conditional probabilities.

Inference by Enumeration A query can be answered using a Baye’s Net by computing the sums of products of the conditional probabilities from the network.

Example Query If you succeeded on an exam and had fun, what is the probability of partying? P(Party|Exam=true, Fun=true) What are the hidden variables?

Example Query Let: Then we have from eq. 13.6 (p.476): C = College PR = Party S = Study E = Exam F =Fun Then we have from eq. 13.6 (p.476):

Example Query Using we can put in terms of the CPT entries. The worst case complexity of this equation is: O(n2n) for n variables.

Example Query Improving the calculation P(f|pr) is a constant so it can be moved out of the summation over C and S. The move the elements that only involve C and not S to outside the summation over S.

College example: P(C) 0.2 C P(S) True 0.8 False 0.2 C P(PR) True 0.6 Party Study Fun Exam C P(S) True 0.8 False 0.2 C P(PR) True 0.6 False 0.5 S PR P(E) True 0.6 False 0.9 0.1 0.2 PR P(F) True 0.9 False 0.7

Example Query + + + Still O(2n) Similarly for P( pr|e,f). P(f|pr) .126 .9 .126 P(c) .2 + P(pr|c) .6 .06 + .08 = .14 + + P(s|c) .8 .12 + .08 = .2 .48 + .02 = .5 P(e|s,pr) .6 P(e|s,pr) .6 Still O(2n)

Variable Elimination A problem with the enumeration method is that particular products can be computed multiple times, thus reducing efficiency. Reduce the number of duplicate calculations by doing the calculation once and saving it for later. Variable elimination evaluates expressions from right to left, stores the intermediate results and sums over each variable for the portions of the expression dependent upon the variable.

Variable Elimination First, factor the equation. Second, store the factor for E A 2x2 matrix fE(S,PR). Third, store the factor for S. A 2x2 matrix. F C PR S E

Variable Elimination Fourth, Sum out S from the product of the first two factors. This is called a pointwise product It creates a new factor whose variables are the union of the two factors in the product. Any factor that does not depend on the variable to be summed out can be moved outside the summation.

Variable Elimination Fifth, store the factor for PR A 2x2 matrix. Sixth, Store the factor for C.

Variable Elimination Seventh, sum out C from the product of the factors where

Variable Elimination Next, store the factor for F. Finally, calculate the final result

Elimination Simplification Any leaf node that is not a query variable or an evidence variable can be removed. Every variable that is not an ancestor of a query variable or an evidence variable is irrelevant to the query and can be eliminated.

Elimination Simplification Book Example: What is the probability that John calls if there is a burglary? Does this matter? Burglary Earthquake Alarm MaryCalls JohnCalls

Complexity of Exact Inference Variable elimination is more efficient than enumeration. Time and space requirements are dominated by the size of the largest factor constructed which is determined by the order of variable elimination and the network structure.

Polytrees Polytrees are singly connected networks At most one directed path between any two nodes. Time and space requirements are linear in the size of the network. Size is the number of CPT entries.

Polytrees Are these trees polytrees? College Earthquake Burglary Study Party Alarm Fun MaryCalls Exam JohnCalls Applying variable elimination to multiply connected networks has worst case exponential time and space complexity. Are these trees polytrees?