Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inference of cis and trans regulatory variation in the human genome

Similar presentations


Presentation on theme: "Inference of cis and trans regulatory variation in the human genome"— Presentation transcript:

1 Inference of cis and trans regulatory variation in the human genome
Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge, UK

2 Gene expression Altered patterns of gene expression  disease.
e.g., Type 1 diabetes, Burkitt’s lymphomas. Widespread intraspecific variation. Heritable genetic variation for transcript levels. Familial aggregation of expression profiles (Cheung et al. 2003). In humans, ~30% of surveyed loci exhibited a genetic component for expression differences (Monks et al. 2004; Schadt et al. 2003). Much of the influential variation is located cis- to the coding locus. In humans, mouse, and maize, 35%-50% of the genetic basis for intraspecific differences in transcription level are cis- to the coding locus (e.g. Morley et al. 2004; Schadt et al. 2003; Stranger et al. 2005; Cheung et al. 2005, etc.). As an introduction, I’d like to give you a couple of quick facts about gene expression: In general, normal cell function and many aspects of development are highly dependent on having the right genes transcribed at the right time and place…and certainly some diseases are associated with altered patterns of gene expression. Extreme effects or subtle changes in expression. However, in many species there is also quite a lot of variation among individuals with respect to gene expression patterns. Much of this variation has a genetic component, for example in humans, nearly 30% of surveyed loci exhibited a genetic component for expression differences. And more studies are showing that much of the genetic component influencing expression is located cis- to the coding locus, for example a survey of humans mouse and maize estimates that approx 30-50% is attributable to cis-located variants

3 Nature of regulatory variation
DNA REG GENE i) Pre-mRNA ii) mRNA iii) Protein iv) DNA Expression Stranger and Dermitzakis, Human Genomics 2005

4 Effects of Copy Number Variation on
gene expression Copy number variation kb to Mb size variable DNA copy number contribute to disease heritable common in humans/recently appreciated

5 Gene expression association mapping
AA AG GG Quantitative phenotype Stranger et al. PLoS Genet 2005

6 Phenotypic variation space

7 illumina Human 6 x 2 gene GEX arrays

8 Beads in Wells Focus on the fact that at the heart of Illumina’s technology is the fundamental concept of beads in well, and, more specifically, the random self-assembly of those beads into wells. Shown here is a SEM of the end of a fiber strand after population with the assay beads Showing that the etched pits are now completely occupied by the highly regular silica beads. Refer to cross section view, oblique view that has been artificially colored. Also artist rendered view to help explain that each bead is completely covered In approximately 800 thousand copies of one specific type of illumiCode oligo and it is the different illumiCode oligos that determine the bead types and the beads are not directly fluorescently labeled.

9

10

11 Whole-genome gene expression
~48,000 transcripts 24,000 RefSeq 24,000 other transcripts 270 HapMap individuals: CEU: 30 trios, 90 total CHB: 45 unrelated JPT: 45 unrelated YRI: 30 trios, 90 total 2 IVTs each person 2 replicate hybridizations each IVT Quantile normalization of all replicates of each individual. Median normalization across all individuals of a population. Cell line RNA IVT1 IVT2 rep1 rep2 rep3 rep4

12 Within- and between- individual variation
2 replicates; single YRI individual r2 (all genes) = 0.990 Detected genes (0.98 in both samples: 12,076) r2 (detected) = 0.994 2 YRI individuals r2 (all genes) = 0.964 Detected genes (0.98 in both samples: 11,529) r2 (detected) = 0.964

13 HapMap SNPs 60 CEU 45 CHB 44 JPT 60 YRI 14,925 genes
Phase I HapMap; MAF > 0.05 CEU: 762,447 SNPs CHB: 695,601 JPT: 689,295 YRI: 799,242 ~1/5kb The number of expression phenotypes is not a direct correlation to the number of genes in these regions because there were 2 probes per gene

14 Copy Number Variation dataset
Genome Structural Variation Consortium Redon et al. Nature in press Array-CGH using a whole genome tile path array Median clone size ~170 kb All 270 HapMap individuals Quantitative values (log2 ratios) representing diploid genome copy number, not genotypes. 1117 CNVs called from log2 ratios Calls based on standard deviation of log2 ratios Many CNVs experimentally verified 26,563 clones 93.7% euchromatic genome

15 SNP cis-analysis: SNPs within 1Mb of probe midpoint
1Mb window probe gene SNPs

16 Association analysis Additive association model:
Linear regression e.g. CC = 0, CT = 1, TT = 2. - slope of line - p-value - r2 1 2

17 CNV cis-analysis: clone midpoint within 2Mb of probe midpoint
1Mb window probe gene clones

18 Linear regression for CNV and expression
Clone signal (log2 ratio)

19 Multiple-test correction
whole-genome 1. Bonferroni cis- whole-genome 2. False Discovery Rate FDR cis- In order to assess significance, we applied 3 different standard methodologies to correct for multiple testing. This correction is an absolute requirement to control the numbers of false positive associations that could arise from performing so many tests. With nearly 400 phenotypes, this experiment is essentially 400 whole-genome association studies. I just want to be clear that there is no ‘right answer’ yet about which method is the one to use. Nor is it yet obvious that what is deemed statistically significant by any one of these methods is necessarily biologically significant. For each of the tests here, we wanted to examine both the genome-wide distribution of p-values as well as a subset of SNPs within 1Mb of the genes tested. The rationale for the 1Mb subset is that most of the cis regulatory regions of a gene are located within a small distance of the gene and therefore the density of relevant sequences around each gene is high. This can be viewed as a “candidate region” approach similar to the candidate gene approach used in disease studies. Bonf-wg: all genes, all SNPs. Bonf-cis: all genes, only cis SNPs. Perm-wg: 12,5000 permutations, all genes, all SNPs; overall 0.05 threshold. Perm-cis: permute all genes, all cis- SNPs (1Mb); overall 0.05 threshold. FDR-cis: all genes, all cis- SNPs; calculate FDR 0.05. whole-genome 3. permutations cis-

20 Permutation design GENOTYPES GENE EXPRESSION g11 g12 g13 g14 … g1n
gi1 gi2 gi3 gi4 … gin Exp1 Exp2 Exp3 Expi permute - 10,000 permutations – each time keep lowest p-value - Null distribution of 10,000 extreme p-values - Compare observed p-values to the tails of the null

21 Significant expression – cis-SNP associations
CEU genes 323 CHB genes 348 JPT genes 370 YRI genes 411 888 non-redundant genes 67 genes in all 4 populations (8%) 333 genes in at least 2 populations (37%) ~ 6% genes exhibit significant cis- association permutation threshold 0.001; SNP-probe distance < 1Mb

22 Significant expression – cis-CNV clones associations
CEU genes 40 CHB genes 32 JPT genes 40 YRI genes 42 99 non-redundant genes 7 genes associated in all 4 populations (7%) 34 genes in at least 2 populations (34%) permutation threshold 0.001; clone-probe distance < 2Mb

23

24 Some genes ABC1, ABHD6, ACY1L2, ADAT1, ARNT, ARSA, ASAHL, ATP13A, B7, BBS2, BLK, C14orf130, C14orf4, C14orf52, C1orf16, C20orf22, C21orf107, C7orf13, C7orf29, C7orf31, C8orf13, C9orf95, CARD8, CAT, CD151, CD79B, CDKN1A, CDKN2B, CGI-111, CGI-62, CGI-96, CHCHD2, CHI3L2, CHRNE, CNN2, CP110, CPEB4, CPNE1, CRIPT, CSTB, CTNS, CTSH, CTSK, DCLRE1B, DCTD, DERP6, dJ383J4.3, DKFZp434N035, DKFZP566H073, DKFZP566J2046, DKFZP586D0919, DKFZp761A132, DNAJD1, DOM3Z, DPYSL4, DSCR5, DTNB, ECHDC3, EGFL5, EIF2B2, ENTPD1, ERMAP, FCGR2A, FDX1, FKBP1A, FLJ10252, FLJ10904, FLJ12994, FLJ12998, FLJ13576, FLJ14009, FLJ14753, FLJ20444, FLJ20635, FLJ21347, FLJ21616, FLJ22374, FLJ22573, FLJ22635, FLJ23235, FLJ34443, FLJ35827, FLJ36888, FLJ37970, FLJ40432, FLJ46603, FLJ90036, FUT10, GAA, GSTM1, GSTM2, GSTT1, H17, HABP4, HIBCH, HLA-C, HLA-DQA1, HLA-DQA2, hmm1412, hmm23621, hmm26268, hmm31752, hmm31999, hmm3577, hmm3587, hmm5445, hmm665, hmm8232, HNLF, Hs , Hs , Hs , Hs , Hs , Hs , Hs , Hs , Hs , Hs , Hs.26039, Hs , Hs , Hs , Hs , Hs , Hs , Hs.40696, Hs , Hs.43687, Hs , Hs , Hs , Hs , Hs , Hs , Hs , Hs , Hs , Hs , Hs , Hs , Hs.5855, Hs.6637, HSRTSBETA, IFIT5, IL16, IL21R, IMAGE , IMMT, IPP, IREB2, IRF5, KIAA0265, KIAA0483, KIAA0643, KIAA0748, KIAA1463, KIAA1627, LCMT1, LOC113386, LOC132001, LOC132321, LOC135043, LOC151963, LOC282956, LOC283710, LOC283970, LOC284184, LOC284293, LOC285407, LOC286353, LOC339231, LOC339803, LOC339804, LOC340435, LOC347981, LOC348094, LOC348180, LOC374758, LOC375097, LOC375399, LOC378075, LOC388918, LOC389362, LOC389763, LOC399987, LOC400410, LOC400566, LOC400642, LOC400684, LOC400933, LOC401075, LOC401135, LOC401284, LOC51240, LOC90637, LOC90693, MAN1A2, MCMDC1, MGC10120, MGC12458, MGC13186, MGC19764, MGC20235, MGC20481, MGC20781, MGC22773, MGC24665, MGC2752, MGC3794, MGC9084, MMRP19, MRPL21, MRPL43, MTERF, MYOM2, NDUFA10, NDUFS5, NMNAT3, NUDT2, OAS1, PACSIN2, PASK, PBX4, PCTAIRE2BP, PEX5, PEX6, PGS1, PHACS, PHC2, PHEMX, PIP5K1C, PIP5K2A, PKHD1L1, POLR2J, PP3856, PP784, PPA2, PPFIA1, PPIL3, PTER, QRSL1, R29124_1, RABEP1, RAPGEFL1, RDH5, RPAP1, RPL13, RPL36AL, RPL8, RPLP2, RPS16, RPS6KB2, SARS2, SERPINB10, SF1, SH3GLB2, SHMT1, SIAT4C, SIVA, SKIV2L, SNAP29, SNX11, SOD2, SPG7, SQSTM1, ST7L, STAT6, STK25, SYNGR1, SYNGR3, TAP2, TAPBP-R, TBC1D4, TCL6, TEF, TGM5, THAP5, THAP6, THOC3, TIMM10, TINP1, TMEM8, TMPIT, TRAPPC4, TRIM4, TSGA10, TSGA2, TUBB, UBE2G1, UGT2B11, UGT2B17, UGT2B7, UROS, USMG5, VPS28, WARS2, WBSCR27, WWOX, XRRA1, ZNF266, ZNF384, ZNF493, ZNF587, ZNF79, ZNF85, ZRANB1, UGT2B7, 11, 17 GSTM1

25 Genomic location of associations
SNP CNV

26 SNPs CNVs

27 Effects of Copy Number Variation on
gene expression POSITIVE POSITIVE OR NEGATIVE Copy number variation kb to Mb size variable DNA copy number contribute to disease heritable common in humans/recently appreciated NEGATIVE POSITIVE OR NEGATIVE

28 Negative or positive slope in CNV associations
80% positive 20% negative

29

30

31

32 What is the overlap between SNP and CNV effects?
Do SNPs capture the CNV effects through Linkage Disequilibrium?

33 LD between CNV and SNP Gene X Gene X A 2x expression A A Gene X G G

34 CNVs and SNPs mostly capture different effects
Relative impact on gene expression: 82% SNPs 18% CNVs Only 13% of genes with CNV association also had a SNP association in the same population biased toward large effect size. CNV and SNP variation are highly correlated (p-value 0.001). Lack of overlapping effects is not due to CNVs in regions of segmental duplications (few HapMap SNPs). Percentage of associated clones overlapping SDs does not differ from all clones overlapping SDs (p-value: 0.016). Also, the probability that a CNV signal is captured by SNPs does not depend on whether the CNV is in a SD (17.3%) or outside of SDs (15.9%).

35 Phase II HapMap (2.2m SNPs)

36 Direction of allelic effect
POP2 POP1 AGREEMENT OPPOSITE

37 Direction of allelic effects
95% have the same direction

38 Trans effects mirnaSNPs rSNPs nsSNPs spliceSNPs
REG GENE DNA Genome-wide associations Dissect regulatory networks

39

40 Regulatory variants have the highest impact on regulatory networks

41 Conclusions Large number of genes with significant expression variation within and between human population samples and strong association between individual genes and specific SNPs and CNVs. Little overlap between SNP and CNV signals Replication of significant signals across populations. Promising approach for identification of functionally variable regulatory regions. Cis regulatory variation mostly responsible for genome-wide regulatory variation

42 Pre-publication data release www.sanger.ac.uk/genevar/

43 Acknowledgements Cambridge University Cornell University illumina
Mark Dunning Simon Tavaré Barbara Stranger Matthew Forrest Catherine Ingle Antigone Dimas Christine Bird Alexandra Nica Claude Beazley Panos Deloukas Cornell University Andy Clark illumina Jill Orwick Mark Gibbs Genome Structural Variation Consortium Matt Hurles, Richard Redon, Nigel Carter, Charles Lee, Chris Tyler-Smith, Stephen Scherer, The HapMap Consortium Wellcome Trust for funding

44 Working with the HapMap
Wellcome Trust Advanced Courses Working with the HapMap 2-5 April 2007 Closing date for applications: 10 January 2007 Wellcome Trust Genome Campus, Hinxton, Cambridge This 4-day residential workshop will provide a comprehensive overview of the International HapMap Project, including practical experience of working with the HapMap data to map phenotypic traits to locations in the human genome. Theoretical lectures will be combined with hands-on practical sessions and introduction to relevant databases and tools. Course instructors: Paul de Bakker (MIT), Manolis Dermitzakis (Sanger Institute), Mike Feolo (NIH/NCBI), Jonathan Marchini (Oxford University), Gil McVean (Oxford University), Steve Sherry (NIH/NCBI), Albert Vernon Smith (CSHL), Barbara Stranger (Sanger Institute), Eleftheria Zeggini (Wellcome Trust Center for Human Genetics) Speakers: Lon Cardon (Wellcome Trust Center for Human Genetics), Panos Deloukas (Sanger Institute), John Todd (Cambridge University) Full information and application details at:


Download ppt "Inference of cis and trans regulatory variation in the human genome"

Similar presentations


Ads by Google