Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Section 6.4.1: Probabilistic Inference and.

Similar presentations


Presentation on theme: "UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Section 6.4.1: Probabilistic Inference and."— Presentation transcript:

1 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Section 6.4.1: Probabilistic Inference and Variable Elimination Fall 2009 Marco Valtorta mgv@cse.sc.edu Probability does not exist. --Bruno de Finetti, 1970 It is remarkable that a science which began with the consideration of games of chance should become the most important object of human knowledge... The most important questions of life are, for the most part, really only problems of probability... The theory of probabilities is at bottom nothing but common sense reduced to calculus. --Pierre Simon de Laplace, 1812

2 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Acknowledgment The slides are based on the textbook [P] and other sources, including other fine textbooks –[AIMA-2] –David Poole, Alan Mackworth, and Randy Goebel. Computational Intelligence: A Logical Approach. Oxford, 1998 A second edition (by Poole and Mackworth) is under development. Dr. Poole allowed us to use a draft of it in this course –Ivan Bratko. Prolog Programming for Artificial Intelligence, Third Edition. Addison-Wesley, 2001 The fourth edition is under development –George F. Luger. Artificial Intelligence: Structures and Strategies for Complex Problem Solving, Sixth Edition. Addison-Welsey, 2009

3 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Probabilistic Inference Methods Three main approaches to determine posterior distributions in belief networks: Exploiting the structure of the network to eliminate (sum out) the non- observed, non-query variables one at a time. Stochastic simulation where random cases are generated according to the probability distributions. Search-based approaches that enumerate some of the possible worlds, and estimate posterior probabilities from the worlds generated Variational approaches, where the idea is to find an approximation to the problem that is easy to compute. First choose a class of representations that are easy to compute. This could be as simple as the set of disconnected belief networks (with no arcs). Next try to find the member of the class that is closest to the original problem. That is, find an easy-to-compute distribution that is as close as possible to the posterior distribution that needs to be computed. Thus, the problem reduces to a optimization problem of minimizing the error.

4 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Factors A factor is a function from a tuple of random variables to a number In probabilistic inference, factors usually represent joint probability distributions, conditional probability distributions, or non-normalized probability distributions (potentials) Factors are usually realized as tables, but they can exploit context-specific independence, and be realized as decision trees, rules with probabilities, and tables with contexts

5 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Examples of Factors with context- specific independence

6 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Example factors

7 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Multiplying factors

8 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Summing out variables

9 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Evidence

10 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Probability of a conjunction

11 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Computing sums of products

12 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Variable elimination algorithm See Figure 6.8 [P]

13 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Summing out a variable

14 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Variable elimination example

15 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Visit to Asia Example Shortness of breadth (dyspnoea) may be due to tuberculosis, lung cancer or bronchitis, or none of them, or more than one of them. A recent visit to Asia increases the chances of tuberculosis, while smoking is known to be a risk factor for both lung cancer and bronchitis. The results of a single chest X-ray do not discriminate between lung cancer and tuberculosis, as neither does the presence of dyspnoea [Lauritzen and Spiegelhalter, 1988].

16 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Visit to Asia Example Tuberculosis and lung cancer can cause shortness of breadth (dyspnea) with equal likelihood. The same is true for a positive chest Xray (i.e., a positive chest Xray is also equally likely given either tuberculosis or lung cancer). Bronchitis is another cause of dyspnea. A recent visit to Asia increases the likelihood of tuberculosis, while smoking is a possible cause of both lung cancer and bronchitis [Neapolitan, 1990].

17 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Visit to Asia Example α τ ε λ σ β δξ α (Asia): P(a)=.01ε (λ or β):P(e|l,t)=1 P(e|l,~t)=1 τ (TB): P(t|a)=.05 P(e|~l,t)=1 P(t|~a)=.01 P(e|~l,~t)=0 σ(Smoking): P(s)=.5 ξ: P(x|e)=.98 P(x|~e)=.05 λ(Lung cancer): P(l|s)=.1 P(l|~s)=.01δ (Dyspnea): P(d|e,b)=.9 P(d|e,~b)=.7 β(Bronchitis): P(b|s)=.6 P(d|~e.b)=.8 P(b|~s)=.3 P(d|~e,~b)=.1

18 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Three Computational Problems For a Bayesian network, we presents algorithms for –Belief Assessment –Most Probable Explanation (MPE) –Maximum a posteriori Hypothesis (MAP)

19 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Belief Assessment Definition –The belief assessment task of X k = x k is to find In the Visit to Asia example, the belief assessment problem answers questions like –What is the probability that a person has tuberculosis, given that he/she has dyspnea and has visited Asia recently ? where k – normalizing constant

20 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Most Probable Explanation (MPE) Definition –The MPE task is to find an assignment x o = (x o 1, …, x o n ) such that In the Visit to Asia example, the MPE problem answers questions like –What are the most probable values for all variables such that a person doesn’t catch dyspnea ?

21 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Maximum A posteriori Hypothesis (MAP) Definition –Given a set of hypothesized variables A = {A 1, …, A k },, the MAP task is to find an assignment a o = (a o 1, …, a o k ) such that In the Visit to Asia example, the MAP problem answers questions like –What are the most probable values for a person having both lung cancer and bronchitis, given that he/she has dyspnea and that his/her X-ray is positive?

22 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Axioms for Local Computation

23 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Comments on the Axioms (part I) Presentation of the axioms is from Madsen’s dissertation (section 3.1.1) after Shenoy and Shafer The best description of the axioms is in: Shenoy, Prakash P. “Valuation-Based Systems for Discrete Optimization.” Uncertainty in Artificial Intelligence, 6 (P.P. Bonissone, M. Henrion, L.N. Kanal, eds.), pp.385- 400. The first axioms is written in quite a different form in that reference, but Shenoy notes that his axiom “can be interpreted as saying that the order in which we delete the variables does not matter,” “if we regards marginalization as a reduction of a valuation by deleting variables.” This seems to be what Madsen emphasizes in his axiom 1.

24 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Comments on the Axioms (part I) Another key reference is: S. Bistarelli, U. Montanari, and F. Rossi. “Semiring-Based Constraint Satisfaction and Optimization,” Journal of the ACM 44, 2 (March 1997), pp.201-236. –This is an abstract algebraic treatment –The authors explicitly mention Shenoy’s axioms as a special case in section 5, where they also discuss the solution of the secondary problem of Non-Serial Dynamic Programming, as introduced in: Bertelè and Brioschi, Non-Serial Dynamic Programming, Academic Press,1972. An alternative algebraic generalization is in: S.L. Lauritzen and F.V. Jensen, “Local Computations with Valuations from a Commutative Semigroup,” Annals of Mathematics and Artificial Intelligence 21 (1997), pp.51- 69.

25 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Some Algorithms for Belief Update Construct joint first (not based on local computation) Stochastic Simulation (not based on local computation) Conditioning (not based on local computation) Direct Computation –Variable elimination Bucket elimination (described next), variable elimination proper, peeling –Combination of potentials SPI, factor trees Junction trees L&S, Shafer-Shenoy, Hugin, Lazy propagation Polynomials Castillo et al., Darwiche

26 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Ordering the Variables        Method 1 (Minimum deficiency) Begin elimination with the node which adds the fewest number of edges 1. , ,  (nothing added) 2.  (nothing added) 3. ,, ,  (one edge added) Method 2 (Minimum degree) Begin elimination with the node which has the lowest degree 1. ,  (degree = 1) 2. , ,  (degree = 2) 3., ,  (degree = 2)

27 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Elimination Algorithm for Belief Assessment Bucket  : Bucket  : Bucket  : Bucket  : Bucket  : Bucket : Bucket  : Bucket  : P(  |  ) P(  |  )*P(  ),  =“yes” P(  | , ) P(  | ,  ),  =“yes” P(  |  =“yes”,  =“yes”) =  X\ {  } (P(  |  )* P(  |  )* P(  | , )* P(  | ,  )* P(  )*P( |  )*P(  |  )*P(  )) P( |  ) P(  |  )*P(  ) H()H() H()H() H(,)H(,) H  ( ,,  ) H ( , ,  ) H()H() H(,)H(,) P(  |  =“yes”,  =“yes”) H n (u)=  xn П j i=1 C i (x n,u si ) *k k-normalizing constant

28 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Elimination Algorithm for Most Probable Explanation Bucket  : Bucket  : Bucket  : Bucket  : Bucket  : Bucket : Bucket  : Bucket  : P(  |  ) P(  |  )*P(  ) P(  | , ) P(  | ,  ),  =“no” MPE= MAX { , , , ,, , ,  } (P(  |  )* P(  |  )* P(  | , )* P(  | ,  )* P(  )*P( |  )*P(  |  )*P(  )) P( |  ) P(  |  )*P(  ) H()H() H()H() H(,)H(,) H  ( ,,  ) H ( , ,  ) H()H() H(,)H(,) MPE probability Finding MPE = max , , , ,, , ,  P( , , , ,, , ,  ) H n (u)=max xn ( П xn  Fn C(x n |x pa ))

29 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Elimination Algorithm for Most Probable Explanation Bucket  : Bucket  : Bucket  : Bucket  : Bucket  : Bucket : Bucket  : Bucket  : P(  |  ) P(  |  )*P(  ) P(  | , ) P(  | ,  ),  =“no” P( |  ) P(  |  )*P(  ) H()H() H()H() H(,)H(,) H  ( ,,  ) H ( , ,  ) H()H() H(,)H(,) Forward part  ’ = arg max  H  (  )* H  (  )  ’ = arg max  H  (  ’,  )  ’ = arg max  P(  ’|  )*P(  )* H (  ’,  ’,  ) ’ = arg max P( |  ’)*H  (  ’,,  ’)  ’ = arg max  P(  |  ’, ’)*H  ( ,  ’)*H  (  )  ’ = “no”  ’ = arg max  P(  |  ’)  ’ = arg max  P(  ’|  )*P(  ) Return: (  ’,  ’,  ’, ’,  ’,  ’,  ’,  ’)

30 UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Complexity and junction trees The complexity of the algorithm depends on a measure of complexity of the network. The size of a tabular representation of a factor is exponential in the number of variables in the factor. The treewidth of a network, given an elimination ordering, is the maximum number of variables in a factor created by summing out a variable, given the elimination ordering. –The treewidth of a belief network is the minimum treewidth over all elimination orderings. –The treewidth depends only on the graph structure and is a measure of the sparseness of the graph. –The complexity of variable elimination is exponential in the treewidth and linear in the number of variables. –Finding the elimination ordering with minimum treewidth is NP-hard, but there is some good elimination ordering heuristics, as discussed for CSP variable elimination (page 130 [P]). There are two main ways to speed up this algorithm. Irrelevant variables can be pruned given the observations and the query. Alternatively, it is possible to compile the graph into a secondary structure that allows for caching of values. This leads to the justly celebrated junction tree algorithm.


Download ppt "UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Section 6.4.1: Probabilistic Inference and."

Similar presentations


Ads by Google