Gene-gene and gene-environment interactions Manuel Ferreira Massachusetts General Hospital Harvard Medical School Center for Human Genetic Research
Slides can be found at:
Outline 2. What is epistasis? 3. Study designs and tests to detect epistasis 4. Application to genome-wide datasets 1. G-G and G-E interactions in the context of gene mapping
1. G-G and G-E in context
chromosome 4 DNA sequence SNP (single nucleotide polymorphism) …GGCGGTGTTCCGGGCCATCACCATTGCGGG CCGGATCAACTGCCCTGTGTACATCACCAAG GTCATGAGCAAGAGTGCAGCCGACATCATCG CTCTGGCCAGGAAGAAAGGGCCCCTAGTTTT TGGAGAGCCCATTGCCGCCAGCCTGGGGACC GATGGCACCCATTACTGGAGCAAGAACTGGG CCAAGGCTGCGGCGTTCGTGACTTCCCCTCC CCTGAGCCCGGACCCTACCACGCCCGACTA… Find disease-causing variation The Human Genome
? Gene effect Environmental effect The environment modifies the effect of a gene A gene modifies the effect of an environment G x E interaction Gene-environment interaction S.Purcell ©
Epistasis Gene effect Epistasis: one gene modifies the effect of another Gene × gene interaction S.Purcell ©
2. Definition(s) of epistasis
AA Aa aa BB Bb bb Epistasis or not ?
Definitions of epistasis Biological Statistical Individual-level phenomenon Population-level phenomenon S.Purcell ©
Gene RED Pigment 1 Pigment 2 ? Final pigment Gene YELLOW
Gene RED Pigment 1 Pigment 2 Final pigment Gene YELLOW AA Aa aa BB Bb bb
Gene RED Gene YELLOW Pigment 1 Pigment 2 Final pigment X Aa aa BB Bb bb Bateson (1909)
Gene RED Gene YELLOW Pigment 1 Pigment 2 Final pigment X AA BB Bb bb Bateson (1909)
Gene RED Gene YELLOW Pigment 1 Pigment 2 Final pigment Introduced the concept of epistasis as a “masking effect”, whereby a variant or allele at one locus prevents the variant at another locus from manifesting its effect. AA Aa aa BB Bb bb Mendelian concept, closer to biological definition of interaction between 2 molecules Bateson (1909)
Fisher (1918) Gene RED Gene YELLOW Epistasis defined as the extent to which the joint contribution of two alleles in different loci towards a phenotype deviates from that expected under a purely additive model. AA Aa aa BB Bb bb AA Aa aa 022 Expected Observed Mathematical concept, closer to statistical definition of interaction between 2 variables on a linear scale.
Dominance is defined as the extent to which the joint contribution of two alleles in the same locus towards a phenotype deviates from that expected by a purely additive model AA Aa aaAA Aa aaAA Aa aa AA Aa aa Epistasis defined as the extent to which the joint contribution of two alleles in different loci towards a phenotype deviates from that expected under a purely additive model. AdditiveDominant Recessive Genotypic mean
Epistasis is very similar... Deviation from additivity between loci. Within locus: Between loci: Locus A Locus B Additive No effect Additive No effect bb Bb BB BB Bb bb AA Aa aaAA Aa aaAA Aa aa Genotypic mean
Locus A AdditiveDominant Recessive Additive Dominant Recessive Locus B Between loci: Additive (ie. NO epistasis)
Locus A AdditiveDominant Recessive Additive Dominant Recessive Locus B AA Aa aaAA Aa aaAA Aa aa BB Bb bb BB Bb bb BB Bb bb Between loci: Additive (ie. NO epistasis)
AA Aa aaAA Aa aaAA Aa aa BB Bb bb BB Bb bb Between loci: Non-Additive (ie. epistasis)
AA Aa aa BB Bb bb Epistasis or not ?
Statistical definition of epistasis is scale dependent Defined epistasis as a departure from an additive model across loci. Crucial assumption: genotype effects are measured on the appropriate scale.
AA Aa aa AA Aa aa log (x) No departure from additivity Significant departure from additivity log (x)
Penetrances Relative RisksOdds Ratios Disease trait Genotype Means Continuous trait
Penetrance scale Linear scale RR scale OR scale Epistasis defined as departure from: Additive model Multiplicative model Genotype effects measured on: Additive: Multiplicative: y = LocusA + LocusB y = LocusA × LocusB
3. Designs and methods to detect epistasis
Study designs Family-basedCase-ControlCase-only More robust, fewer assumptions More efficient, powerful
Methods 1. Regression 2. “Linkage Disequilibrium” or allelic-association 3. Transmission distortion
+ m 3. (LocusA × LocusB) Methods y = m 1.LocusA + m 2.LocusB y = (m 1 + m 3.LocusB).LocusA + m 2.LocusB Effect of LocusA on y is modified by LocusB 1. Regression y Continuous traitLinear regression Disease traitLogistic regression
+ m 3. (LocusA × Env) Methods y = m 1.LocusA + m 2.Env y = (m 1 + m 3.Env).LocusA + m 2.Env Effect of LocusA on y is modified by Env 1. Regression
Methods 2. LD-based Epistasis induces “LD” in cases, even for unlinked loci: p(a) = 0.2 p(b) = A a B b B b ~ 0 “LD” Epistasis model AA Aa aa Cases Controls BB Bb bb BB Bb bb BB Bb bb AA Aa aa Genotype frequencies “Haplotype frequencies”
Methods 2. LD-based BB Bb bb p(a) = 0.2 p(b) = AA Aa aa AA Aa aa A a B b B b ~ 0 ~ 0.05 Cases Controls Genotype frequencies “Haplotype frequencies” “LD” Epistasis model BB Bb bb BB Bb bb Epistasis induces “LD” in cases, even for unlinked loci:
Two-locus genotypes AA (p A 2 ) Aa (2p A q A ) BB (p B 2 ) Bb (2p B q B ) AABB aa (q A 2 ) bb (q B 2 ) AaBB aa BB AABb AaBb aa Bb AAbb Aabb aa bb Locus A: a A (p A ) (q A ) Locus B: b B (p B ) (q B ) p B + q B = 1 p A + q A = 1 AAbb = Ab / Ab A b A b if and only if AAbb ≠ Ab / Ab A A if b b (2-locus genotype) (haplotype)
Methods 2. LD-based In the presence of Epistasis: LD cases > 0 LD cases > LD controls Statistics that measure the strength of association (δ) between two loci Case-ControlCase-only H 0 : δ = 0H 0 : δ Cases = δ Controls LD (D, r 2 ) Correlation
Cases (Scz) Controls Genes in 5q GABA cluster Pamela Sklar Tracey Petryshen C&M Pato Pamela Sklar Tracey Petryshen C&M Pato
Methods 3. Transmission distortion AA Aa Aa BB probands If the effect of locus A on disease risk is modified by Locus B: AA Aa Aa AA Aa Aa 50% Bb probands 52% bb probands 56% Same applies for Env instead of Locus B
aa Aa aa aa Aa Aa AA Aa Aa AA Aa AA Subset of bb probandsSubset of BB probands →100% →0% →100% If variants A and B are in LD (common haplotypes AB / ab) False positive interactions (due to linkage or population stratification) TDT requires assumption of independence between loci
Design & Methods Case-ControlCase-onlyFamily-based Regression LD-based TDT
Case-only designs offer efficient detection of epistasis
Case-only design isn’t always valid Gene AGene B Gene AGene B stratification 1. Physical distance 2. Population substructure in case sample
LD Fast, often more powerful Less useful for continuous traits and/or family data ProsCons Efficient, powerfulAssumptions Applicable to linked lociLess efficient Few methods that efficiently handle relatives Case-Control Case-only Family-based PLINK Slow(er) Many extensions possible (GxE, covariates, etc) Regression (unlinked loci, no stratification, etc)
4. Application to genome-wide datasets
# SNPs # pairs , , ,000 31,249,880, , ,999,750,000 An “all pairs of SNPs” approach to epistasis does not scale well… … but it is feasible! ~1 week, running PLINK using ~200 CPUs. >3000 individuals
Multiple testing increases false positives
# SNPs # pairs P-value needed e ,225 4e ,750 4e-7 250,000 31,249,880,000 2e , ,999,750,000 4e-13 P-value required for experiment-wide significance must be adjusted for the number of tests performed
Chromosome 13 Chromosomes 1 to 22 Genome-wide epistasis screen in Bipolar-disorder
A B C D E F G H I J A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 B 1 B 2 B 3 B 4 B 5 B 6 B 7 B 8 ……. J 6 J 7 J 8 A single gene-based test 80 allele-based tests
Gene-environment Science 2003, 301: 306
Gene-environment The Journal of Nutrition 2002, 8S: 132
Gene-Gene Nature 2005, 436: 701
Further reading Cordell HJ (2002) Human Molecular Genetics 11: –a statistical review of epistasis, methods and definitions Clayton D & McKeigue P (2001) The Lancet, 358, –a critical appraisal of GxE research Marchini J, Donnelly P & Cardon LR (2005) Nature Genetics, 37, –epistasis in whole-genome association studies