UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Section 6.4.1: Probabilistic Inference and.

Slides:



Advertisements
Similar presentations
Big Ideas in Cmput366. Search Blind Search State space representation Iterative deepening Heuristic Search A*, f(n)=g(n)+h(n), admissible heuristics Local.
Advertisements

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
A Tutorial on Learning with Bayesian Networks
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
. Exact Inference in Bayesian Networks Lecture 9.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Knowledge Representation and Reasoning University "Politehnica" of Bucharest Department of Computer Science Fall 2010 Adina Magda Florea
Lauritzen-Spiegelhalter Algorithm
Bucket Elimination: A unifying framework for Probabilistic inference Rina Dechter presented by Anton Bezuglov, Hrishikesh Goradia CSCE 582 Fall02 Instructor:
CSCE 582 Computation of the Most Probable Explanation in Bayesian Networks using Bucket Elimination -Hareesh Lingareddy University of South Carolina.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering A Comparison of Lauritzen- Spiegelhalter, Hugin, and Shenoy- Shafer Architectures.
Dynamic Bayesian Networks (DBNs)
MPE, MAP AND APPROXIMATIONS Lecture 10: Statistical Methods in AI/ML Vibhav Gogate The University of Texas at Dallas Readings: AD Chapter 10.
Bayesian Networks. Introduction A problem domain is modeled by a list of variables X 1, …, X n Knowledge about the problem domain is represented by a.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering An Introduction to Pearl’s Do-Calculus of Intervention Marco Valtorta Department.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6 [P]: Reasoning Under Uncertainty Section.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.5 [P]: Propositions and Inference Sections.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering An Introduction to Bayesian Networks March 16, 2010 Marco Valtorta SWRG 3A55.
Recent Development on Elimination Ordering Group 1.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.3 [P]: Searching Fall 2009 Marco Valtorta.
CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6: Adversarial Search Fall 2008 Marco Valtorta.
Math443/543 Mathematical Modeling and Optimization
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering An Introduction to Bayesian Networks September 12, 2003 Marco Valtorta SWRG.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.12 [P]: Individuals and Relations Proofs.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
December Marginal and Joint Beliefs in BN1 A Hybrid Algorithm to Compute Marginal and Joint Beliefs in Bayesian Networks and its complexity Mark.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering An Introduction to Bayesian Networks January 10, 2006 Marco Valtorta SWRG 3A55.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6 [P]: Reasoning Under Uncertainty Sections.
. Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.2 [P]: Agent Architectures and Hierarchical.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.12 [P]: Individuals and Relations Datalog.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Conflicts in Bayesian Networks January 23, 2007 Marco Valtorta
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Department of Computer Science Undergraduate Events More
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making 2007 Bayesian networks Variable Elimination Based on.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
Made by: Maor Levy, Temple University  Inference in Bayes Nets ◦ What is the probability of getting a strong letter? ◦ We want to compute the.
Probabilistic Networks Chapter 14 of Dechter’s CP textbook Speaker: Daniel Geschwender April 1, 2013 April 1&3, 2013DanielG--Probabilistic Networks1.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk
Inference in Bayesian Networks
CSCE 580 Artificial Intelligence Ch
CSCE 580 Artificial Intelligence Ch.3 [P]: Searching
Exact Inference Continued
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,
Professor Marie desJardins,
Exact Inference ..
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
January 15, 2019 Marco Valtorta SWGN 2A15
Variable Elimination Graphical Models – Carlos Guestrin
presented by Anton Bezuglov, Hrishikesh Goradia CSCE 582 Fall02
Presentation transcript:

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Section 6.4.1: Probabilistic Inference and Variable Elimination Fall 2009 Marco Valtorta Probability does not exist. --Bruno de Finetti, 1970 It is remarkable that a science which began with the consideration of games of chance should become the most important object of human knowledge... The most important questions of life are, for the most part, really only problems of probability... The theory of probabilities is at bottom nothing but common sense reduced to calculus. --Pierre Simon de Laplace, 1812

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Acknowledgment The slides are based on the textbook [P] and other sources, including other fine textbooks –[AIMA-2] –David Poole, Alan Mackworth, and Randy Goebel. Computational Intelligence: A Logical Approach. Oxford, 1998 A second edition (by Poole and Mackworth) is under development. Dr. Poole allowed us to use a draft of it in this course –Ivan Bratko. Prolog Programming for Artificial Intelligence, Third Edition. Addison-Wesley, 2001 The fourth edition is under development –George F. Luger. Artificial Intelligence: Structures and Strategies for Complex Problem Solving, Sixth Edition. Addison-Welsey, 2009

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Probabilistic Inference Methods Three main approaches to determine posterior distributions in belief networks: Exploiting the structure of the network to eliminate (sum out) the non- observed, non-query variables one at a time. Stochastic simulation where random cases are generated according to the probability distributions. Search-based approaches that enumerate some of the possible worlds, and estimate posterior probabilities from the worlds generated Variational approaches, where the idea is to find an approximation to the problem that is easy to compute. First choose a class of representations that are easy to compute. This could be as simple as the set of disconnected belief networks (with no arcs). Next try to find the member of the class that is closest to the original problem. That is, find an easy-to-compute distribution that is as close as possible to the posterior distribution that needs to be computed. Thus, the problem reduces to a optimization problem of minimizing the error.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Factors A factor is a function from a tuple of random variables to a number In probabilistic inference, factors usually represent joint probability distributions, conditional probability distributions, or non-normalized probability distributions (potentials) Factors are usually realized as tables, but they can exploit context-specific independence, and be realized as decision trees, rules with probabilities, and tables with contexts

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Examples of Factors with context- specific independence

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Example factors

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Multiplying factors

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Summing out variables

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Evidence

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Probability of a conjunction

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Computing sums of products

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Variable elimination algorithm See Figure 6.8 [P]

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Summing out a variable

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Variable elimination example

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Visit to Asia Example Shortness of breadth (dyspnoea) may be due to tuberculosis, lung cancer or bronchitis, or none of them, or more than one of them. A recent visit to Asia increases the chances of tuberculosis, while smoking is known to be a risk factor for both lung cancer and bronchitis. The results of a single chest X-ray do not discriminate between lung cancer and tuberculosis, as neither does the presence of dyspnoea [Lauritzen and Spiegelhalter, 1988].

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Visit to Asia Example Tuberculosis and lung cancer can cause shortness of breadth (dyspnea) with equal likelihood. The same is true for a positive chest Xray (i.e., a positive chest Xray is also equally likely given either tuberculosis or lung cancer). Bronchitis is another cause of dyspnea. A recent visit to Asia increases the likelihood of tuberculosis, while smoking is a possible cause of both lung cancer and bronchitis [Neapolitan, 1990].

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Visit to Asia Example α τ ε λ σ β δξ α (Asia): P(a)=.01ε (λ or β):P(e|l,t)=1 P(e|l,~t)=1 τ (TB): P(t|a)=.05 P(e|~l,t)=1 P(t|~a)=.01 P(e|~l,~t)=0 σ(Smoking): P(s)=.5 ξ: P(x|e)=.98 P(x|~e)=.05 λ(Lung cancer): P(l|s)=.1 P(l|~s)=.01δ (Dyspnea): P(d|e,b)=.9 P(d|e,~b)=.7 β(Bronchitis): P(b|s)=.6 P(d|~e.b)=.8 P(b|~s)=.3 P(d|~e,~b)=.1

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Three Computational Problems For a Bayesian network, we presents algorithms for –Belief Assessment –Most Probable Explanation (MPE) –Maximum a posteriori Hypothesis (MAP)

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Belief Assessment Definition –The belief assessment task of X k = x k is to find In the Visit to Asia example, the belief assessment problem answers questions like –What is the probability that a person has tuberculosis, given that he/she has dyspnea and has visited Asia recently ? where k – normalizing constant

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Most Probable Explanation (MPE) Definition –The MPE task is to find an assignment x o = (x o 1, …, x o n ) such that In the Visit to Asia example, the MPE problem answers questions like –What are the most probable values for all variables such that a person doesn’t catch dyspnea ?

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Maximum A posteriori Hypothesis (MAP) Definition –Given a set of hypothesized variables A = {A 1, …, A k },, the MAP task is to find an assignment a o = (a o 1, …, a o k ) such that In the Visit to Asia example, the MAP problem answers questions like –What are the most probable values for a person having both lung cancer and bronchitis, given that he/she has dyspnea and that his/her X-ray is positive?

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Axioms for Local Computation

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Comments on the Axioms (part I) Presentation of the axioms is from Madsen’s dissertation (section 3.1.1) after Shenoy and Shafer The best description of the axioms is in: Shenoy, Prakash P. “Valuation-Based Systems for Discrete Optimization.” Uncertainty in Artificial Intelligence, 6 (P.P. Bonissone, M. Henrion, L.N. Kanal, eds.), pp The first axioms is written in quite a different form in that reference, but Shenoy notes that his axiom “can be interpreted as saying that the order in which we delete the variables does not matter,” “if we regards marginalization as a reduction of a valuation by deleting variables.” This seems to be what Madsen emphasizes in his axiom 1.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Comments on the Axioms (part I) Another key reference is: S. Bistarelli, U. Montanari, and F. Rossi. “Semiring-Based Constraint Satisfaction and Optimization,” Journal of the ACM 44, 2 (March 1997), pp –This is an abstract algebraic treatment –The authors explicitly mention Shenoy’s axioms as a special case in section 5, where they also discuss the solution of the secondary problem of Non-Serial Dynamic Programming, as introduced in: Bertelè and Brioschi, Non-Serial Dynamic Programming, Academic Press,1972. An alternative algebraic generalization is in: S.L. Lauritzen and F.V. Jensen, “Local Computations with Valuations from a Commutative Semigroup,” Annals of Mathematics and Artificial Intelligence 21 (1997), pp

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Some Algorithms for Belief Update Construct joint first (not based on local computation) Stochastic Simulation (not based on local computation) Conditioning (not based on local computation) Direct Computation –Variable elimination Bucket elimination (described next), variable elimination proper, peeling –Combination of potentials SPI, factor trees Junction trees L&S, Shafer-Shenoy, Hugin, Lazy propagation Polynomials Castillo et al., Darwiche

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Ordering the Variables        Method 1 (Minimum deficiency) Begin elimination with the node which adds the fewest number of edges 1. , ,  (nothing added) 2.  (nothing added) 3. ,, ,  (one edge added) Method 2 (Minimum degree) Begin elimination with the node which has the lowest degree 1. ,  (degree = 1) 2. , ,  (degree = 2) 3., ,  (degree = 2)

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Elimination Algorithm for Belief Assessment Bucket  : Bucket  : Bucket  : Bucket  : Bucket  : Bucket : Bucket  : Bucket  : P(  |  ) P(  |  )*P(  ),  =“yes” P(  | , ) P(  | ,  ),  =“yes” P(  |  =“yes”,  =“yes”) =  X\ {  } (P(  |  )* P(  |  )* P(  | , )* P(  | ,  )* P(  )*P( |  )*P(  |  )*P(  )) P( |  ) P(  |  )*P(  ) H()H() H()H() H(,)H(,) H  ( ,,  ) H ( , ,  ) H()H() H(,)H(,) P(  |  =“yes”,  =“yes”) H n (u)=  xn П j i=1 C i (x n,u si ) *k k-normalizing constant

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Elimination Algorithm for Most Probable Explanation Bucket  : Bucket  : Bucket  : Bucket  : Bucket  : Bucket : Bucket  : Bucket  : P(  |  ) P(  |  )*P(  ) P(  | , ) P(  | ,  ),  =“no” MPE= MAX { , , , ,, , ,  } (P(  |  )* P(  |  )* P(  | , )* P(  | ,  )* P(  )*P( |  )*P(  |  )*P(  )) P( |  ) P(  |  )*P(  ) H()H() H()H() H(,)H(,) H  ( ,,  ) H ( , ,  ) H()H() H(,)H(,) MPE probability Finding MPE = max , , , ,, , ,  P( , , , ,, , ,  ) H n (u)=max xn ( П xn  Fn C(x n |x pa ))

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Elimination Algorithm for Most Probable Explanation Bucket  : Bucket  : Bucket  : Bucket  : Bucket  : Bucket : Bucket  : Bucket  : P(  |  ) P(  |  )*P(  ) P(  | , ) P(  | ,  ),  =“no” P( |  ) P(  |  )*P(  ) H()H() H()H() H(,)H(,) H  ( ,,  ) H ( , ,  ) H()H() H(,)H(,) Forward part  ’ = arg max  H  (  )* H  (  )  ’ = arg max  H  (  ’,  )  ’ = arg max  P(  ’|  )*P(  )* H (  ’,  ’,  ) ’ = arg max P( |  ’)*H  (  ’,,  ’)  ’ = arg max  P(  |  ’, ’)*H  ( ,  ’)*H  (  )  ’ = “no”  ’ = arg max  P(  |  ’)  ’ = arg max  P(  ’|  )*P(  ) Return: (  ’,  ’,  ’, ’,  ’,  ’,  ’,  ’)

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Complexity and junction trees The complexity of the algorithm depends on a measure of complexity of the network. The size of a tabular representation of a factor is exponential in the number of variables in the factor. The treewidth of a network, given an elimination ordering, is the maximum number of variables in a factor created by summing out a variable, given the elimination ordering. –The treewidth of a belief network is the minimum treewidth over all elimination orderings. –The treewidth depends only on the graph structure and is a measure of the sparseness of the graph. –The complexity of variable elimination is exponential in the treewidth and linear in the number of variables. –Finding the elimination ordering with minimum treewidth is NP-hard, but there is some good elimination ordering heuristics, as discussed for CSP variable elimination (page 130 [P]). There are two main ways to speed up this algorithm. Irrelevant variables can be pruned given the observations and the query. Alternatively, it is possible to compile the graph into a secondary structure that allows for caching of values. This leads to the justly celebrated junction tree algorithm.