. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

. Approximate Inference Slides by Nir Friedman

When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded u “Peaked” distributions improbable values are ignored

Stochasticity & Approximations u Consider a chain: u P(X i+1 = t | X i = t) = 1-  P(X i+1 = f | X i = f) = 1-   Computing the probability of X n+1 given X 1, we get X1X1 X2X2 X3X3 X n+1 Even # of flips: Odd # of flips:

Plot of P(X n = t | X 1 = t) 0.5 0.6 0.7 0.8 0.9 1 00.050.10.150.20.250.30.350.40.450.5 n = 5 n = 10 n = 20 

Stochastic Processes u This behavior of a chain (a Markov Process) is called Mixing. u In general Bayes nets there is a similar behavior.  If probabilities are far from 0 & 1, then effect of “far” evidence vanishes (and so can be discarded in approximations).

“Peaked” distributions u If the distribution is “peaked”, then most of the mass is on few instances u If we can focus on these instances, we can ignore the rest Instances

Global conditioning A L C I D J B M E K Fixing value of A & B Fixing values in the beginning of the summation can decrease tables formed by variable elimination. This way space is traded with time. Special case: choose to fix a set of nodes that “break all loops”. This method is called cutset-conditioning. L C I J M E K D a b ba

Bounded conditioning A B Fixing value of A & B By examining only the probable assignment of A & B, we perform several simple computations instead of a complex one.

Bounded conditioning u Choose A and B so that P(Y,e |a,b) can be computed easily. E.g., a cycle cutset. u Search for highly probable assignments to A,B. l Option 1--- select a,b with high P(a,b). l Option 2--- select a,b with high P(a,b | e). u We need to search for such high mass values and that can be hard.

Bounded Conditioning Advantages: u Combines exact inference within approximation u Continuous: more time can be used to examine more cases u Bounds: unexamined mass used to compute error-bars Possible problems:  P(a,b) is prior mass not the posterior.  If posterior is significantly different P(a,b| e), Computation can be wasted on irrelevant assignments

Network Simplifications u In these approaches, we try to replace the original network with a simpler one l the resulting network allows fast exact methods

Network Simplifications Typical simplifications: l Remove parts of the network l Remove edges l Reduce the number of values (value abstraction) l Replace a sub-network with a simpler one (model abstraction) u These simplifications are often w.r.t. to the particular evidence and query

Stochastic Simulation  Suppose our goal is the compute the likelihood of evidence P(e) where e is an assignment to some variables in {X 1,…,X n }.  Assume that we can sample instances according to the distribution P (x 1,…,x n ).  What is then the probability that a random sample satisfies e? Answer: simply P(e) which is what we wish to compute.  Each sample simulates the tossing of a biased coin with probability P(e) of “ Heads ”.

Stochastic Sampling  Intuition: given a sufficient number of samples x[1],…,x[N], we can estimate  Law of large number implies that as N grows, our estimate will converge to p with high probability Zeros or ones u How many samples do we need to get a reliable estimation? We will not discuss this issue here.

Sampling a Bayesian Network  If P (X 1,…,X n ) is represented by a Bayesian network, can we efficiently sample from it? u Idea: sample according to structure of the network l Write distribution using the chain rule, and then sample each variable given its parents

Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 b Earthquake Radio Burglary Alarm Call 0.03

Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eb Earthquake Radio Burglary Alarm Call 0.001

Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eab 0.4 Earthquake Radio Burglary Alarm Call

Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eacb Earthquake Radio Burglary Alarm Call 0.8

Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eacb 0.3 Earthquake Radio Burglary Alarm Call

Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eacb r Earthquake Radio Burglary Alarm Call

Logic Sampling  Let X 1, …, X n be order of variables consistent with arc direction  for i = 1, …, n do sample x i from P(X i | pa i ) (Note: since Pa i  {X 1,…,X i-1 }, we already assigned values to them)  return x 1, …,x n

Logic Sampling u Sampling a complete instance is linear in number of variables l Regardless of structure of the network  However, if P(e) is small, we need many samples to get a decent estimate

Can we sample from P( X i |e) ?  If evidence e is in roots of the Bayes network, easily u If evidence is in leaves of the network, we have a problem: l Our sampling method proceeds according to the order of nodes in the network. Z R B A=a X

Likelihood Weighting  Can we ensure that all of our sample satisfy e? u One simple (but wrong) solution: When we need to sample a variable Y that is assigned value by e, use its specified value.  For example: we know Y = 1 Sample X from P(X) Then take Y = 1  Is this a sample from P( X,Y |Y = 1) ? NO. X Y

Likelihood Weighting  Problem: these samples of X are from P(X) u Solution: Penalize samples in which P(Y=1|X) is small u We now sample as follows: Let x i be a sample from P(x) Let w i = P(Y = 1|X = x i ) X Y

Likelihood Weighting  Let X 1, …, X n be order of variables consistent with arc direction u w = 1  for i = 1, …, n do if X i = x i has been observed  w  w  P(X i = x i | pa i ) l else  sample x i from P(X i | pa i )  return x 1, …,x n, and w

Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a 0.8 0.05 P(r) r r 0.30.001 b Earthquake Radio Burglary Alarm Call 0.03 Weight = r a = a

Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) r r 0.30.001 eb Earthquake Radio Burglary Alarm Call 0.001 Weight = r = a

Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) r r 0.30.001 eb 0.4 Earthquake Radio Burglary Alarm Call Weight = r = a 0.6 a

Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) r r 0.30.001 ecb Earthquake Radio Burglary Alarm Call 0.05 Weight = r = a a 0.6

Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) r r 0.30.001 ecb r 0.3 Earthquake Radio Burglary Alarm Call Weight = r = a a 0.6 *0.3

Likelihood Weighting u Why does this make sense?  When N is large, we expect to sample NP(X = x) samples with x[i] = x u Thus,

Summary Approximate inference is needed for large pedigrees. We have seen a few methods today. Some could fit genetic linkage analysis and some do not. There are many other approximation algorithms: Variational methods, MCMC, and others. In next semester’s project of Bioinformatics (236524), we will offer projects that seek to implement some approximation methods and embed them in the superlink software.

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Similar presentations

Presentation on theme: ". Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Similar presentations

Presentation on theme: ". Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded."— Presentation transcript:

Similar presentations

About project

Feedback