Download presentation
Presentation is loading. Please wait.
Published byVíctor Prado Acuña Modified over 6 years ago
1
A Robust Procedure for Gaussian Graphical Model Search from Microarray Data with p Larger Than n
Robert Castelo and Alberto Roverato In JMLR 7, December 2006 Presented by Kuan-ming Lin 11/30/2018 Gaussian Graphical Model Search From Microarray
2
Gaussian Graphical Model Search From Microarray
Outline Motivation and challenge when p (#genes) > n (#microarrays) Theory on undirected Gaussian graphic model (aka Markov random field, Markov network) Full-order graph vs. partial-order (q-partial) graph approximations qp-Procedure algorithm to build partial graphs Simulations and experiment Discussions Can you please give a little explanations (words) about p>n? I believe most in our group are not clear about the relationship between gene & microarray. Also, what’s q, n? 11/30/2018 Gaussian Graphical Model Search From Microarray
3
Motivation and Challenge
A microarray could be modeled as a p-variate random variable XV ~ PV, where V={1,…, p} and PV is some distribution over the p genes defined by biological function. A p n microarray table could be seen as n draws of XV. We can construct a graph G(V,E) to describe gene interactions by computing cor(Xi,Xj|XQ), the conditional correlation of the ith and jth genes given Q, for all (Xi,Xj)E. If Q=V\{i,j}, cor(Xi,Xj|XQ) can only be obtained from joint probabilities distribution PV, which is unknown and hard to estimate from n samples when p>n. This paper proposes q-order partial correlation cor(Xi,Xj|XQ), |Q| ≤ q < n, as an approximation to the full-order one. Could you please give a simple example, or in equation, relationship of microarray & p? Maybe equation is easier to understand. Then, you can express “joint and fully –marginal prob distributions” in mathematical form. 11/30/2018 Gaussian Graphical Model Search From Microarray
4
Gaussian Graphical Model and q-partial Graph
Gaussian graphical model: assume XV ~ PV = Gauss(μV, ΣVV) If p>n, MLE of (μV, ΣVV) does not exist.[1] q-partial graph G(q) is defined as G(q) = (V,E(q)), where (Xi,Xj)Ē(q) iff cor(Xi,Xj|XQ)=0 for some |Q| ≤ q. q = 0: Ē(q) are zero entries of covariance matrix ΣVV. q = p-2: Ē(q) are zero entries of concentration matrix K=(ΣVV )-1.[2] In covariance graph, edges present whenever there exist indirect associations. Thus, covariance graph is usually much denser than concentration graph might be inadequate for describing gene networks What’s n in terms of Xv & Pv? So, from this slide, Pv=Gasussian(\mu_v, \Sigma_vv)? I guess most ppl in our group are not familiar with Q-partial graph, could you please give a definition? Or maybe add 1 figure to show that? And I think maybe easy for you to explain. So, seems V are nodes in the q-partial graph? G^(q) is q-partial graph? [1] Theorem 5.1 in Lauritzen (1996): “Graphical models.” [2] Dykstra (1970): “Establishing the positive definiteness of the sample covariance matrix.” Ann. Math. Statist. 41(6),pp. 2153–4. 11/30/2018 Gaussian Graphical Model Search From Microarray
5
Graph Theory Tool: Outer Connectivity
[Definition 3: outer connectivity] Here S is a collection of S V that separates i from j. Examples I don’t think someone will read the second graph here. If it’s the core of the method, then, you might explain in your word, maybe with the help of figures. If it’s not the key point, then just kick it off 11/30/2018 Gaussian Graphical Model Search From Microarray
6
Use Outer Connectivity to Prove Properties of q-partial Graphs
Assume G, the concentration graph, is a “perfect map,” which means Markov: cor(XI,XJ|XU)=0 whenever U separates I and J. faithful: cor(XI,XJ|XU)=0 implies U separates I and J. Then this paper shows: [preposition 1] Preposition 2 [corollary 3 (hierarchy)] If we define max d(Ē|G)=a, Preposition 2 tells us that G = G(p-2) = G(p-1) = … = G(a). Corollary 3 implies a hierarchy G = G(p-2) = G(p-1) = … = G(a) G(a-1) … G(0) complete graph. Here at least less words. I don’t understand since I am totally lost in the last slide If you can explain what those propositions mean, that will be great. The 5th bullet is very good, it helps me understanding some. 11/30/2018 Gaussian Graphical Model Search From Microarray
7
Sufficient Conditions and Relation to Graph Sparseness
[Theorem 4] Corollary 5 [Theorem 6: sparseness] Corollary 5 gives us a way to ensure G = G(q) given G unknown but G(q) known. Theorem 6 implies that to yield sparser G(q), q has to be increased. This one is very good. 11/30/2018 Gaussian Graphical Model Search From Microarray
8
Tests for conditional independence in qp-Procedure: Non-Rejection Rate
Here the expectation is taken over all possible Q’s, and T is the indicator variable of whether the null hypothesis H0: ij.Q = cor(Xi,Xj|XQ) = 0 (against HA: cor(Xi,Xj|XQ) 0) is not rejected via the following regression coefficient test. The usual t-test for zero: Here A=Q{i,j}. This regression coefficient test is optimal in the sense that it is uniformly most powerful unbiased (UMPU).[1] The paragraph is long Cox and Wermuth (1996): “Multivariate dependencies: Models, analysis and interpretation.” [1] Page 397 in S.L. Lehmann (1986): “Testing statistical hypotheses, 2nd ed.” 11/30/2018 Gaussian Graphical Model Search From Microarray
9
The qp-Procedure Algorithm
[5.2 on p.2632] β* (type-II error) indicates the power of the tests. Here we see the reason for choosing the t-test on regression coefficients: it is optimized for power. In practice, not every Q is considered; Monte Carlo method is applied. Very good 11/30/2018 Gaussian Graphical Model Search From Microarray
10
Explain qp-Procedure via Toy Example
p=164 variables, 20 non-zero blocks in concentration matrix K each 1010 each overlaps 44 with neighboring blocks 1206 (9%) present edges G(3) is the complete graph G(20) = G Use this K to generate n=40 samples In qp-procedure, 500 Q’s are sampled and applied t-tests for each of the 13,366 (i,j) pairs. 11/30/2018 Gaussian Graphical Model Search From Microarray
11
Gaussian Graphical Model Search From Microarray
Toy Example continued Figure 3 shows that larger q leads to higher separation of non-rejection rates between present and absent edges. Figure 4 gives the size of the largest cliques; every circle below the dotted line (sample size) corresponds to a model whose dimension is small enough for standard techniques like MLE. q=20, β*=.975: only 34 (2.8%) of 1206 edges are wrongly removed. 11/30/2018 Gaussian Graphical Model Search From Microarray
12
Experiment on Simulated Data: p=150, n=20, 150(G1) or 50(G2)
d(E1|G1)=5, |E1|=375 (3.4%) d(E2|G2)=20, |E2|=1499 (13.4%) Figure 6 shows that for G1, sparse graph can be obtained by increasing q. Figure 9 shows that for G2, sparse graph cannot be obtained for various values of n and q. The results imply that the intrinsic graph (G2) may be too dense for q-partial graph heuristic to work. 11/30/2018 Gaussian Graphical Model Search From Microarray
13
Experiment on Breast Tumor Microarrays: p=150, n=49
The 150 genes are associated with estrogen receptor pathway q=20, β*=0.975, |E|=7240 (64.8%), largest clique size=24 11/30/2018 Gaussian Graphical Model Search From Microarray
14
Gaussian Graphical Model Search From Microarray
Discussions This paper has two main contributions (both unrelated to actual microarray analysis): Theory related to q-partial graphs qp-procedure qp-procedure is robust w.r.t. faithfulness assumption. Violations for a few conditional distributions won’t hurt much because statistical tests are averaged from many Q’s. But when p becomes large, as usual cases in microarrays: For efficiency, the procedure has to perform fewer tests for each pair. The robustness therefore cannot be guaranteed. qp-procedure can be generalized to non-Gaussian. Different assumptions on microarray distributions may apply. 11/30/2018 Gaussian Graphical Model Search From Microarray
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.