Presentation is loading. Please wait.

Presentation is loading. Please wait.

Robert Castelo and Alberto Roverato In JMLR 7, December 2006

Similar presentations


Presentation on theme: "Robert Castelo and Alberto Roverato In JMLR 7, December 2006"— Presentation transcript:

1 A Robust Procedure for Gaussian Graphical Model Search from Microarray Data with p Larger Than n
Robert Castelo and Alberto Roverato In JMLR 7, December 2006 Presented by Kuan-ming Lin 11/30/2018 Gaussian Graphical Model Search From Microarray

2 Gaussian Graphical Model Search From Microarray
Outline Motivation and challenge when p (#genes) > n (#microarrays) Theory on undirected Gaussian graphic model (aka Markov random field, Markov network) Full-order graph vs. partial-order (q-partial) graph approximations qp-Procedure algorithm to build partial graphs Simulations and experiment Discussions Can you please give a little explanations (words) about p>n? I believe most in our group are not clear about the relationship between gene & microarray. Also, what’s q, n? 11/30/2018 Gaussian Graphical Model Search From Microarray

3 Motivation and Challenge
A microarray could be modeled as a p-variate random variable XV ~ PV, where V={1,…, p} and PV is some distribution over the p genes defined by biological function. A p n microarray table could be seen as n draws of XV. We can construct a graph G(V,E) to describe gene interactions by computing cor(Xi,Xj|XQ), the conditional correlation of the ith and jth genes given Q, for all (Xi,Xj)E. If Q=V\{i,j}, cor(Xi,Xj|XQ) can only be obtained from joint probabilities distribution PV, which is unknown and hard to estimate from n samples when p>n. This paper proposes q-order partial correlation cor(Xi,Xj|XQ), |Q| ≤ q < n, as an approximation to the full-order one. Could you please give a simple example, or in equation, relationship of microarray & p? Maybe equation is easier to understand. Then, you can express “joint and fully –marginal prob distributions” in mathematical form. 11/30/2018 Gaussian Graphical Model Search From Microarray

4 Gaussian Graphical Model and q-partial Graph
Gaussian graphical model: assume XV ~ PV = Gauss(μV, ΣVV) If p>n, MLE of (μV, ΣVV) does not exist.[1] q-partial graph G(q) is defined as G(q) = (V,E(q)), where (Xi,Xj)Ē(q) iff cor(Xi,Xj|XQ)=0 for some |Q| ≤ q. q = 0: Ē(q) are zero entries of covariance matrix ΣVV. q = p-2: Ē(q) are zero entries of concentration matrix K=(ΣVV )-1.[2] In covariance graph, edges present whenever there exist indirect associations. Thus, covariance graph is usually much denser than concentration graph might be inadequate for describing gene networks What’s n in terms of Xv & Pv? So, from this slide, Pv=Gasussian(\mu_v, \Sigma_vv)? I guess most ppl in our group are not familiar with Q-partial graph, could you please give a definition? Or maybe add 1 figure to show that? And I think maybe easy for you to explain. So, seems V are nodes in the q-partial graph? G^(q) is q-partial graph? [1] Theorem 5.1 in Lauritzen (1996): “Graphical models.” [2] Dykstra (1970): “Establishing the positive definiteness of the sample covariance matrix.” Ann. Math. Statist. 41(6),pp. 2153–4. 11/30/2018 Gaussian Graphical Model Search From Microarray

5 Graph Theory Tool: Outer Connectivity
[Definition 3: outer connectivity] Here S is a collection of S  V that separates i from j. Examples I don’t think someone will read the second graph here. If it’s the core of the method, then, you might explain in your word, maybe with the help of figures. If it’s not the key point, then just kick it off 11/30/2018 Gaussian Graphical Model Search From Microarray

6 Use Outer Connectivity to Prove Properties of q-partial Graphs
Assume G, the concentration graph, is a “perfect map,” which means Markov: cor(XI,XJ|XU)=0 whenever U separates I and J. faithful: cor(XI,XJ|XU)=0 implies U separates I and J. Then this paper shows: [preposition 1] Preposition 2 [corollary 3 (hierarchy)] If we define max d(Ē|G)=a, Preposition 2 tells us that G = G(p-2) = G(p-1) = … = G(a). Corollary 3 implies a hierarchy G = G(p-2) = G(p-1) = … = G(a)  G(a-1)  …  G(0)  complete graph. Here at least less words. I don’t understand since I am totally lost in the last slide If you can explain what those propositions mean, that will be great. The 5th bullet is very good, it helps me understanding some. 11/30/2018 Gaussian Graphical Model Search From Microarray

7 Sufficient Conditions and Relation to Graph Sparseness
[Theorem 4] Corollary 5 [Theorem 6: sparseness] Corollary 5 gives us a way to ensure G = G(q) given G unknown but G(q) known. Theorem 6 implies that to yield sparser G(q), q has to be increased. This one is very good. 11/30/2018 Gaussian Graphical Model Search From Microarray

8 Tests for conditional independence in qp-Procedure: Non-Rejection Rate
Here the expectation is taken over all possible Q’s, and T is the indicator variable of whether the null hypothesis H0: ij.Q = cor(Xi,Xj|XQ) = 0 (against HA: cor(Xi,Xj|XQ)  0) is not rejected via the following regression coefficient test. The usual t-test for zero: Here A=Q{i,j}. This regression coefficient test is optimal in the sense that it is uniformly most powerful unbiased (UMPU).[1] The paragraph is long Cox and Wermuth (1996): “Multivariate dependencies: Models, analysis and interpretation.” [1] Page 397 in S.L. Lehmann (1986): “Testing statistical hypotheses, 2nd ed.” 11/30/2018 Gaussian Graphical Model Search From Microarray

9 The qp-Procedure Algorithm
[5.2 on p.2632] β* (type-II error) indicates the power of the tests. Here we see the reason for choosing the t-test on regression coefficients: it is optimized for power. In practice, not every Q is considered; Monte Carlo method is applied. Very good 11/30/2018 Gaussian Graphical Model Search From Microarray

10 Explain qp-Procedure via Toy Example
p=164 variables, 20 non-zero blocks in concentration matrix K each 1010 each overlaps 44 with neighboring blocks 1206 (9%) present edges G(3) is the complete graph G(20) = G Use this K to generate n=40 samples In qp-procedure, 500 Q’s are sampled and applied t-tests for each of the 13,366 (i,j) pairs. 11/30/2018 Gaussian Graphical Model Search From Microarray

11 Gaussian Graphical Model Search From Microarray
Toy Example continued Figure 3 shows that larger q leads to higher separation of non-rejection rates between present and absent edges. Figure 4 gives the size of the largest cliques; every circle below the dotted line (sample size) corresponds to a model whose dimension is small enough for standard techniques like MLE. q=20, β*=.975: only 34 (2.8%) of 1206 edges are wrongly removed. 11/30/2018 Gaussian Graphical Model Search From Microarray

12 Experiment on Simulated Data: p=150, n=20, 150(G1) or 50(G2)
d(E1|G1)=5, |E1|=375 (3.4%) d(E2|G2)=20, |E2|=1499 (13.4%) Figure 6 shows that for G1, sparse graph can be obtained by increasing q. Figure 9 shows that for G2, sparse graph cannot be obtained for various values of n and q. The results imply that the intrinsic graph (G2) may be too dense for q-partial graph heuristic to work. 11/30/2018 Gaussian Graphical Model Search From Microarray

13 Experiment on Breast Tumor Microarrays: p=150, n=49
The 150 genes are associated with estrogen receptor pathway q=20, β*=0.975, |E|=7240 (64.8%), largest clique size=24 11/30/2018 Gaussian Graphical Model Search From Microarray

14 Gaussian Graphical Model Search From Microarray
Discussions This paper has two main contributions (both unrelated to actual microarray analysis): Theory related to q-partial graphs qp-procedure qp-procedure is robust w.r.t. faithfulness assumption. Violations for a few conditional distributions won’t hurt much because statistical tests are averaged from many Q’s. But when p becomes large, as usual cases in microarrays: For efficiency, the procedure has to perform fewer tests for each pair. The robustness therefore cannot be guaranteed. qp-procedure can be generalized to non-Gaussian. Different assumptions on microarray distributions may apply. 11/30/2018 Gaussian Graphical Model Search From Microarray


Download ppt "Robert Castelo and Alberto Roverato In JMLR 7, December 2006"

Similar presentations


Ads by Google