SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,

Slides:



Advertisements
Similar presentations
Linear Models for Microarray Data
Advertisements

What is an association study? Define linkage disequilibrium
Association Tests for Rare Variants Using Sequence Data
Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Potato Mapping / QTLs Amir Moarefi VCR
Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip.
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
The Inheritance of Complex Traits
Introduction to Computational Biology Topics. Molecular Data Definition of data  DNA/RNA  Protein  Expression Basics of programming in Matlab  Vectors.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
FINAL EXAM: TAKE-HOME Assessment of Significance in Cancer Gene SNPs.
Chapter 5 Human Heredity by Michael Cummings ©2006 Brooks/Cole-Thomson Learning Chapter 5 Complex Patterns of Inheritance.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Office hours Wednesday 3-4pm 304A Stanley Hall Review session 5pm Thursday, Dec. 11 GPB100.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Quantitative Genetics
Identification of obesity-associated intergenic long noncoding RNAs
Introduction to Population Stratification. Standard definition of confounding A confounder is 1. Associated with the exposure in the study base 2. Associated.
Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Supervisor: Yihong Jennifer Tan Eric Gähwiler Karim Hamidi
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
The Center for Medical Genomics facilitates cutting-edge research with state-of-the-art genomic technologies for studying gene expression and genetics,
Chapter 5 Characterizing Genetic Diversity: Quantitative Variation Quantitative (metric or polygenic) characters of Most concern to conservation biology.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Genome-Wide Association Study (GWAS)
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida.
HW2: exome sequencing and complex disease Jacquemin Jonathan de Bournonville Sébastien.
The International Consortium. The International HapMap Project.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Brendan Burke and Kyle Steffen. Important New Tool in Genomic Medicine GWAS is used to estimate disease risk and test SNPs( the most common type of genetic.
Press report 13/10/ publications selected.
Global Variation in Copy Number in the Human Genome
Gene Hunting: Design and statistics
Genome-wide Association Studies
Loyola Marymount University
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
A twin approach to unraveling epigenetics
Enhancer Connectome Nominates Target Genes of Inherited Risk Variants from Inflammatory Skin Disorders  Mark Y. Jeng, Maxwell R. Mumbach, Jeffrey M. Granja,
Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project  Paul L. Auer, Alex.
Medical genomics BI420 Department of Biology, Boston College
Sherlock: Detecting Gene-Disease Associations by Matching Patterns of Expression QTL and GWAS  Xin He, Chris K. Fuller, Yi Song, Qingying Meng, Bin Zhang,
Five Years of GWAS Discovery
Medical genomics BI420 Department of Biology, Boston College
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Fig. 2 Genotype-induced differential gene expression is different in MDMi cells compared to monocytes. Genotype-induced differential gene expression is.
GWAS-eQTL signal colocalisation methods
Loyola Marymount University
Genomewide profiling of chromatin accessibility in prostate cancer specimens Genomewide profiling of chromatin accessibility in prostate cancer specimens.
SNPs and CNPs By: David Wendel.
Loyola Marymount University
Analysis of protein-coding genetic variation in 60,706 humans
Genetic and Epigenetic Regulation of Human lincRNA Gene Expression
Loyola Marymount University
Large-Scale trans-eQTLs Affect Hundreds of Transcripts and Mediate Patterns of Transcriptional Co-regulation  Boel Brynedal, JinMyung Choi, Towfique Raj,
Volume 26, Issue 23, Pages (December 2016)
Differential Expression of RNA-Seq Data
Presentation transcript:

SHI Meng

Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations.

Abstract We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.

Background Human population differentiation – Neutral DNA sequence – functional variants non-synonymous variants eQTLs Previous eQTLs studies – limited to only several well-defined populations – have not contrasted geographically proximate populations first analysis of eQTL differentiation among eight human population samples

Materials LCLs (lymphoblastoid cell lines) Samples: – 726 individuals of 8 HapMap populations Expression data: – Sentrix Human-6 Expression BeadChip version 2 – 47,294 transcripts, plus controls – 21,800 probes: 18,226 unique autosomal Ensembl genes Genotype data: – MAF > 0.05, < 20% missing data – 1.1 million ~ 1.3 million per population CEUCHBGIHJPTLWKMEXMKKYRI

Methods Raw expression data normalization – log2 scale – quantile normalization across replicates of a single individual – mean normaliztion across all individuals of the eight populations Population stratification correction of expression data – Admixed populations: GIH, LWK, MEX, MKK – EIGENSTRAT: princinple components based on genotype – Expression values were adjusted for each population using ten primary axes of variation form corresponding intra-population PCA

Methods Correction for known and unknown factors: ‘‘REDUCED’’ dataset generation – probabilistic estimation of expression residuals (PEER) framework Structure of gene expression variation among populations – V st : (V T - V S )/V T ; V S =(V 1 *n 1 +V 2 *n 2 )/(n 1 +n 2 ) – top 5% probes: GO term enrichments

Methods Association and multiple-test correction (individual populations) – cis: <= 1Mb from TSS – Association: Spearman Rank Correlation (SRC) model – significance accessment 10,000 permutations of each phenotype (probe) relative to the genotypes threshold: 0.01 – FDR: 1 - (the number of genes with replication/total number of significant genes)

Methods Stepwise association model – determine whether independent cis- regulatory signals exist for a given gene – Steps: regressed out of the expression levels the effect of the most- significant SNP re-ran the SRC analysis stored those SNPs with p-values more significant than the gene’s permutation threshold repeated until there were no SNPs from the initial significant eQTL list left to test

Results Structure of gene expression variation among populations – expression- based PCA plot: not separate distinctly – Vst: Vst values: heavily skewed toward values near 0 the amount of V ST between a pair of populations is correlated with the degree of genetic distance the vast majority of genes do not exhibit highly differentiated expression variation between populations – probes exhbiting top 5% Vst scores: enriched in GO terms significant population-specific GO term enrichment GO terms corresponding to genes significantly diverged in expression in one population are also diverged in expression in the other, closely- related populations

Results Cis associations of gene expression with SNPs

Results Multiple effects underlying cis-eQTLs – at least two significant cis-eQTL SNPs at the 0.01 permutation threshold – a total of 33 (0~2% for 8 populations) genes with multiple eQTLs – At most, a single gene had five independently associated SNPs

Results Population sharing of cis-eQTLs – 1,074 (34%) of 3130 genes had a significant cis-eQTL in at least two populations – more closely-related populations tend to share more cis- associated genes than more distantly-related populations – 98.9–100% concordance of allelic direction – effect size (fold difference between homozygotes of the two different genotypic states of a SNP) is shared between any two populations when the association is also shared – the discovery of an eQTL mainly due to allele frequency differences, not due to differences in absolute effect size

Results Genomic properties of eQTLS – majority of association signals are approximately symmetrically centered on the TSS – the strongest statistical signals located directly at the TSS – population sharing increases from in only one population to all eight populations,s gradual tightening of the distribution around the TSS – SNPs associated with more than one gene 264 genes 52 clusters of 2 or more genes in at least two populations the distance to TSS: larger

Results eQTLs and disease – 62 SNPs from GWAS catalog the most-significant SNP of a cis-eQTL in at least one population 57 Ensembl genes, and 51 traits – Alcohol dependence, Crohn’s disease, (24%) were the most significant SNP of the same gene in at least one additional population – assist in fine-mapping causal variants for complex traits

Disscussion extensive sharing of eQTLs across human populations effect size and the direction of effect for eQTLs: highly conserved symmetric distribution of eQTLs around the TSS additional cell types under a variety different cellular and developmental conditions how the frequency spectrum of regulatory variants has been shaped by selective and demographic processes how these functional variants contribute to higher order phenotypes methods to preprocess microarray data and detect eQTLs comprehensive analysis of eQTLs and functional association

Thank you!