Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying meta-analysis to Genotype-Tissue Expression data from multiple tissues to find eQTLs and eGenes Dat Duong, Lisa Gai, Sagi Snir, Eun Yong Kang,

Similar presentations


Presentation on theme: "Applying meta-analysis to Genotype-Tissue Expression data from multiple tissues to find eQTLs and eGenes Dat Duong, Lisa Gai, Sagi Snir, Eun Yong Kang,"— Presentation transcript:

1 Applying meta-analysis to Genotype-Tissue Expression data from multiple tissues to find eQTLs and eGenes Dat Duong, Lisa Gai, Sagi Snir, Eun Yong Kang, Buhm Han, Jae Hoon Sul, Eleazar Eskin

2 Genotype-Tissue Expression (GTEx) data
44 tissues samples/tissue Same person can donate samples to many tissues. 15,336 genes In other words, the data in the tissues are not independent.

3 One concept from GTEx: eGenes
What are eGenes? Genes that have at least one SNP associated with their expression. Must find eQTLs to find eGenes. Why do we care? Genes cause diseases ↔ eGenes

4 Observation from the GTEx data
Some tissues have very few samples (for example, brain tissues). To identify eQTLs and eGenes, we need to pool samples across tissues (more statistical power).

5 Another concept from GTEx: eQTL study done jointly across many tissues.
Observation: the same SNP can be an eQTL in similar tissues, but can behave differently in unrelated tissues. Varieties of methods to model how a SNP behaves across different tissues. From this observation, people came up with another important concept to analyze the GTEx data. The idea is to do eqlt study jointly across many tissues. And, what people seen, is that the same SNP can be an eqtl in similar tissues, but can behave differently in unrelated tissues. For example, the same SNP can be eqtl for the same gene in many brain cells, but this observation may not whole true when you compare brain cell and skin cell. In any case, people have came up with different ways to model how a SNP behaves across different tissues.

6 Both are not scalable to more than 20 tissues.
Bayesian approach: A Statistical Framework for Joint eQTL Analysis in Multiple Tissues Timothée Flutre , Xiaoquan Wen , Jonathan Pritchard, Matthew Stephens Meta-analysis + Linear mixed model: Effectively identifying eQTLs from multiple tissues by combining mixed model and meta-analytic approaches. Jae Hoon Sul, Buhm Han, Chun Ye, Ted Choi, and Eleazar Eskin Both are not scalable to more than 20 tissues.

7 This is the intuition for this talk.
Both rely on the idea: There is some pattern among the eQTL studies in different tissues that we can exploit. This is the intuition for this talk. We will also follow the meta-analysis approach. It is more scalable (44 tissues).

8 To illustrate the intuition
Condition on a gene expression Compute the correlation of SNP effect sizes for 44 tissues To illustrate this intuition, condition on a gene, we can compute the correlation of SNP effect sizes in the 44 tissues. This plot shows how different tissues correlate with each other. I have omitted the tissues names from the labels. In this plot, for this gene, the correlation are high among the brain tissues. So, if we want to find SNPs that are eqlt in at least one tissue wrt this gene, we can combine eqtl results from these brain tissues. Brain tissues.

9 To quickly recap For the GTEx data, in each tissue, we have the effect size of the SNP for the gene. Some tissues have low sample sizes: Low power to test if a SNP is an eQTL for a gene in one tissue. Low power to find eGenes. Idea: Combine the outcome from each tissue to gain power to find eQTLs and also eGenes.

10 SNP-gene relationship vary across tissues
Meta-analysis A statistical method to combine outcomes from different experiments (i.e., tissues). SNP-gene relationship vary across tissues Need Random Effects (RE) meta-analysis Allows SNP to have unique true effects on a gene across tissues. All the snp to have unique true effect in each of the 44 tissues.

11 Let’s consider a very simple example.
Assume: We have only one SNP X. We are interested in only 1 gene G expressed in tissue A, B, C. Each tissue A, B, C do not share samples from the same individuals. Goal: test if X is an eQTL in at least one of A, B, C. If X is an eQTL then G is an eGene.

12 SNP X and one gene in 3 tissues.
Example SNP X and one gene in 3 tissues. 𝛽 𝑋𝐴 = estimated effect of X in tissue A, var ( 𝛽 𝑋𝐴 ) 𝛽 𝑋𝐵 = est. effect of X in tissue B, var ( 𝛽 𝑋𝐵 ) 𝛽 𝑋𝐶 = est. effect of X in tissue C, var ( 𝛽 𝑋𝐶 ) In any case, to provide a example that is very concrete and easy to follow, consider snp x, and a gene expressed in 3 tissues A B C. We use beta-hat XA to denote the estimated effects of X in tissue A, beta-hat XB … tissue B, and similarly beta-hat XC Because these beta-hats are estimated from the data, we also have their variances.

13 Random Effect Meta-analysis
Assumption #1: the estimated effects are summation of 2 terms 𝛽 𝑋𝐴 = true effect of X in A 𝛽 𝑋𝐵 = true effect of X in B 𝛽 𝑋𝐶 = true effect of X in C The estimated effects is a summation of 2 terms: the true effects and the noise.

14 Assumption #2: The true effects come from some distribution.
Here we assume the true effects to be normal distr. With mean mu and independent variance tau2 The noise is assumed to be normally distributed, independent, mean 0, variance equal to the var of beta-hat

15 Random Effect Meta-analysis
Estimated effect has distribution: Null hypo: SNP X does not have effect in tissue A, B, C Off-diagonals are zeros. Under these 2 assumptions, the estimated beta-hat has this distribution, Because we have a distribution for the estimated effects, we can test the hypo: that…. This is equivalent to testing that X is not an eqtl in all the tissues.

16 Recall SNP can have similar effect on the same gene across different tissues.

17 Random Effect Meta-analysis + Covariance
U → SNP X can have similar effects on the same gene in tissue A, B, C

18 Random Effect Meta-analysis + Covariance
Estimated effect has distribution: Null hypo: SNP X does not have effect in tissue A, B, C Off-diagonals can be non-zeros. Under this new modification, the estimated effect has this new distrubtion

19 How do we estimate U ? We have other SNPs and their effects on the same gene. Select SNPs in low LD with SNP X. Treat these SNPs like independent replication of X. Estimate covariance of 𝛽 𝑋𝐴 , 𝛽 𝑋𝐵 in the matrix U. Compute the sample covariance using 2 vectors. To estimate the matrix U which describes how a SNP behaves in different tissues, we can rely on the other SNPs in the data. These are 2 sets of snp effects in tissue A, and B, using only SNPs that are in low LD with snp X. Given these 2 vectors, we can compute the covariance of X in tissue A and B.

20 What about overlapping samples?
Genotype-Tissue Expression (GTEx) data has overlapping samples. We can do simulations. Assume SNPs have no effects in all tissues. Assume no LD for the SNPs. Create data with overlapping samples. Create data without overlapping samples. In both data, apply meta-analysis + covariance. In the Gtex data, the tissues have overlapping samples. In the paper, we have a very long explanation on how to handle overlapping samples. In this talk, we will just briefly mention the approach.

21

22 Results

23 Genotype-Tissue Expression (GTEx) data
Goal: to find eGenes in at least one tissue. Compare 3 methods Tissue-by-tissue: for each tissue, test if the gene is an eGene (i.e., test if there is one eQTL for this gene). Random effects Meta-analysis Random effects Meta-analysis + Covariance

24 TBT = Tissue-by-tissue RE = Random Effects meta-analysis
RECOV = Random Effects + Covariance meta-analysis using RECOV, we detected ~81% potential eGenes. This is a 3% increase from the traditional RE model, and is 20% increase from the basic TBT.

25 → SNP X can have similar effects on the same gene across tissue.
In this slide, we look at some of the cases, to understand why there are some egenes detected only by certain methods. Our method relies on the fact that a SNPs needs to have similar effects on the same gene across different tissues. In the first case, the tissues don't show any correlation pattern for the snp effects. The data don't fit our assumption. In the second case, the tissues do have very strong correlation pattern for the snp effects. The data fits our assumption In the last case, the tissues have very weak correlation pattern for the snp effects. In this case, the traditional RE model seems to do best. → SNP X can have similar effects on the same gene across tissue.

26 Dat Duong, Lisa Gai, Sagi Snir, Eun Yong Kang, Buhm Han, Jae Hoon Sul, Eleazar Eskin

27 Funding Genomic Analysis Training Grant at UCLA (Ken Lange).

28 Questions Code: github.com/datduong/RECOV


Download ppt "Applying meta-analysis to Genotype-Tissue Expression data from multiple tissues to find eQTLs and eGenes Dat Duong, Lisa Gai, Sagi Snir, Eun Yong Kang,"

Similar presentations


Ads by Google