# . Exact Inference in Bayesian Networks Lecture 9.

## Presentation on theme: ". Exact Inference in Bayesian Networks Lecture 9."— Presentation transcript:

. Exact Inference in Bayesian Networks Lecture 9

2 Queries There are many types of queries. Most queries involve evidence An evidence e is an assignment of values to a set E of variables in the domain P(Dyspnea = Yes | Visit_to_Asia = Yes, Smoking=Yes) P(Smoking=Yes | Dyspnea = Yes ) V S L T A B XD

3 Computing A posteriori Belief in Bayesian Networks u Set the evidence in all local probability tables that are defined over some variables from E. u Iteratively l Move all irrelevant terms outside of innermost sum l Perform innermost sum, getting a new term l Insert the new term into the product Input: A Bayesian network, a set of nodes E with evidence E=e, an ordering x 1,…,x m of all variables not in E. Output: P(x 1,e) for every value x 1 of X 1. {from which p(x 1 |e) is available} The query:

4 Belief Update I V S L T A G XD Suppose get evidence V = v o, S = s o, D = d o We wish to compute P(l,v o,s o,d o ) for every value l of L. P(l, v o,s o,d o ) =  t,b,x,a P(v o,s o,l,t,g,a,x,d o ) = p(v 0 ) p(s 0 ) p(l|s 0 )  t p(t|v 0 )  g p(g|s 0 )  a p(a|t,l)p(d 0 |a,g)  x p(x|a) p(v 0 ) p(s 0 ) p(l|s 0 )  t p(t|v 0 )  g p(g|s 0 )  a p(a|t,l)p(d 0 |a,g) b x (a) p(v 0 ) p(s 0 ) p(l|s 0 )  t p(t|v 0 )  g p(g|s 0 ) b a (t,l,g) p(v 0 ) p(s 0 ) p(l|s 0 )  t p(t|v 0 ) b g (t,l) p(v 0 ) p(s 0 ) p(l|s 0 ) b t (l) To obtain the posterior belief in L given the evidence we normalize the result to 1.

5 Belief Update II T A XD Suppose we get evidence D = d o We wish to compute P(l,d o ) for every value l of L. Good summation order (variable A is summed last): P(l, d o ) =  a,t,x P(a,t,x,l,d o ) =  a p(a) p(l|a) p(d o |a)  t p(t|a)  x p(x|a) L Bad summation order (variable A is summed first): P(l, d o ) =  a,t,x P(a,t,x,l,d o ) =  x  t  a p(a) p(l|a) p(d o |a) p(t|a) p(x|a) Yields a three dimensional temporary table How to choose a reasonable order ?

6 The algorithm to compute P(x 1,e) u Initialization u Set the evidence in all (local probability) tables that are defined over some variables from E. Set an order to all variables not in E. u Partition all tables into buckets such that bucket i contains all tables whose highest indexed variable is X i. u For p=m downto 1 do Suppose 1,…, j are the tables in bucket p being processed and suppose S 1,…S j are the respective set of variables in these tables. l U p  the union of S 1,…,S j with X p excluded l max  the largest indexed variable in U p l For every assignment U p =u compute: l Add p (u) into bucket max {Def: is the value of u projected on S i.} Return the vector

7 The computational task at hand Multidimensional multiplication/summation: Example: Matrix multiplication: versus

8 Complexity of variable elimination Space and time Complexity is at least exponential in number of variables in the largest intermediate factor. Space and time complexity can be as large as the sum of sizes of the intermediate factors taking into account the number of values of each variable.

9 A Graph-Theoretic View N G (v) is the set of vertices that are adjacent to v in G. Eliminating vertex v from a (weighted) undirected graph G – the process of making N G (v) a clique and removing v and its incident edges from G.

10 Example  Weights of vertices (#of states): yellow nodes: w = 2 blue nodes: w = 4 Original Bayes network. VS T L AB D X Undirected graph representation. VS T L AB D X

11 Elimination Sequence  Elimination sequence of G – an order of the vertices of G, written as X α = (X α(1),…,X α(n) ), where α is a permutation on {1,…,n}. The residual graph G i is the graph obtained from G i-1 by eliminating vertex X α (i-1). (G 1 ≡G). The cost of an elimination sequence X α is the sum of costs of eliminating X α(i) from G i, for all i. The cost of eliminating vertex v from a graph G i is the product of weights of the vertices in N Gi (v).

12 Example Suppose the elimination sequence is X α =(V,B,S,…) : G1G1 V V S T L AB D X G2G2 S T L A B B D X G3G3 S S T L A D X

13 Optimal elimination sequence: one with minimal cost. Optimal elimination sequence: one with minimal cost.

14 Several Greedy Algorithms 1. In each step a variable with minimal elimination cost is selected. 2. In each step a variable is selected that adds the smallest number of edges. 3. In each step a variable is selected that adds the edges whose sum of weights is minimal. Since these algorithms are very fast compared to the actual likelihood computation, all options can be tried and the best order among the three be selected.

15 Stochastic Greedy Algorithm u Iteration i: Three (say) variables with (say) minimal elimination cost are found and a coin is flipped to choose between them. u Repeat many times (say, 100) unless the cost becomes low. The coin could be weighted according to the elimination costs of the vertices or a function of these costs. E.g p 1 = log 2 (cost 1 )/{log 2 (cost 1 ) + log 2 (cost 2 ) } p 2 = 1-p 1

16 CASE STUDY: Genetic Linkage Analysis via Bayesian Networks We speculate a locus with alleles H (Healthy) / D (affected) If the expected number of recombinants is low (close to zero), then the speculated locus and the marker are tentatively physically closed. 2 4 5 1 3 H A 1 /A 1 D A 2 /A 2 H A 1 /A 2 D A 1 /A 2 H A 2 /A 2 D A 1 A 2 H D A 1 A 2 H | D A 2 | A 2 D A 2 Recombinant Phase inferred

17 The Variables Involved L ijm = Maternal allele at locus i of person j. The values of this variables are the possible alleles l i at locus i. X ij = Unordered allele pair at locus i of person j. The values are pairs of i th -locus alleles (l i,l’ i ). “The genotype” Y j = person I is affected/not affected. “The phenotype”. L ijf = Paternal allele at locus i of person j. The values of this variables are the possible alleles l i at locus i (Same as for L ijm ). S ijm = a binary variable {0,1} that determines which maternal allele is received from the mother. Similarly, S ijf = a binary variable {0,1} that determines which paternal allele is received from the father. It remains to specify the joint distribution that governs these variables. Bayesian networks turn to be a perfect choice.

18 The Bayesian network for Linkage This network depicts the qualitative relations between the variables. We have already specified the local conditional probability tables.

19 Details regarding recombination S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 S 13m L 11f L 11m L 13m X 11 S 13f L 12f L 12m L 13f X 12 X 13  is the recombination fraction between loci 2 & 1. Y2Y2 Y1Y1 Y3Y3

20 Details regarding the Loci The phenotype variables Y j are 0 or 1 (e.g, affected or not affected) are connected to the X ij variables (only in the disease locus). For example, model of perfect recessive disease yields the penetrance probabilities: P(y 11 = sick | X 11 = (a,a)) = 1 P(y 11 = sick | X 11 = (A,a)) = 0 P(y 11 = sick | X 11 = (A,A)) = 0 L i1f L i1m L i3m X i1 S i3m Y1Y1 P(L 11m =a) is the frequency of allele a. X 11 is an unordered allele pair at locus 1 of person 1 = “the data”. P(x 11 | l 11m, l 11f ) = 0 or 1 depending on consistency

21 SUPERLINK u Stage 1: each pedigree is translated into a Bayesian network. u Stage 2: value elimination is performed on each pedigree (i.e., some of the impossible values of the variables of the network are eliminated). u Stage 3: an elimination order for the variables is determined, according to some heuristic.  Stage 4: the likelihood of the pedigrees given the  values is calculated using variable elimination according to the elimination order determined in stage 3. u Allele recoding and special matrix multiplication is used.

22 Comparing to the HMM model X1X1 X2X3Xi-1XiXi+1 X1X1 X2X2 X3X3 Y i-1 XiXi X i+1 X1X1 X2X3Xi-1XiXi+1 S1S1 S2S2 S3S3 S i-1 SiSi S i+1 The compounded variable S i = (S i,1,m,…,S i,2n,f ) is called the inheritance vector. It has 2 2n states where n is the number of persons that have parents in the pedigree (non-founders). The compounded variable X i = (X i,1,m,…,X i,2n,f ) is the data regarding locus i. Similarly for the disease locus we use Y i. REMARK: The HMM approach is equivalent to the Bayesian network approach provided we sum variables locus-after-locus say from left to right.

23 Experiment A (V1.0) Same topology (57 people, no loops) Increasing number of loci (each one with 4-5 alleles) Run time is in seconds. over 100 hours Out-of-memory Pedigree size Too big for Genehunter. Elimination Order: General Person-by-Person Locus-by-Locus (HMM)

24 Experiment C (V1.0) Same topology (5 people, no loops) Increasing number of loci (each one with 3-6 alleles) Run time is in seconds. Out-of-memory Bus error Order type Software

25 Some options for improving efficiency 1. Multiplying special probability tables efficiently. 2. Grouping alleles together and removing inconsistent alleles. 3. Optimizing the elimination order of variables in a Bayesian network. 4. Performing approximate calculations of the likelihood.

26 Standard usage of linkage There are usually 5-15 markers. 20-30% of the persons in large pedigrees are genotyped (namely, their x ij is measured). For each genotyped person about 90% of the loci are measured correctly. Recombination fraction between every two loci is known from previous studies (available genetic maps). The user adds a locus called the “disease locus” and places it between two markers i and i+1. The recombination fraction  ’ between the disease locus and marker i and  ” between the disease locus and marker i+1 are the unknown parameters being estimated using the likelihood function. This computation is done for every gap between the given markers on the map. The MLE hints on the whereabouts of a single gene causing the disease (if a single one exists).

27 The unconstrained Elimination Problem reduces to finding treewidth if : the weight of each vertex is constant, the cost function is Finding the treewidth of a graph is known to be NP- complete (Arnborg et al., 1987). When no edges are added, the elimination sequence is perfect and the graph is chordal. Relation to Treewidth