Gene Hunting: Design and statistics
Population-based Association Design: Qualitative Phenotype Genotype: Schiz: Not Schiz: AA AC CC Do c2 test for association.
Population-based Association Design: Quantitative Phenotype Number of C alleles 0 (AA) 1 (AC) 2 (CC) Phenotype Compute the correlation (or regression slope)
GWAS: Genome-wide Association Study DNA arrays with 1,000s of SNPs scattered throughout the genome. (Current chips have several million different SNPs) Select the SNPs so that they cover ALL the genome using haplotype blocks. (Some DNA chips oversample SNPs in protein coding regions) Genotype patients and controls on all the SNPs (or genotype a random sample of the population). Find the SNPs that differ patients from controls (or have a significant correlation with a quantitative phenotype). Problem: number of statistical tests.
GWAS results as of 2012 From http://www.genome.gov/multimedia/illustrations/GWAS_2012-12.pdf
GWAS and Quantitative Phenotype: Height (Weedon et al, 2007) Note: Effect size = c. 0.2 inches, length of a housefly
Problems with GWAS (1) Expensive. (2) Large number of statistical tests. (3) Need very, very large samples (10,000 or more.
Results from GWAS (1) Good success in medicine. (2) More limited success for psychiatric disorders (but things are improving) (3) Success for normal behavioral traits (personality, IQ) just starting (4) Genetics of behavior is hyper-polygenic: many, many, many genes (5) Predictive power is poor but getting better (6) Pointing to biological mechanisms
Used to be hard to find genes From The Consortium on Tobacco and Genetics (2010)
But things are changing … Manhattan plot for IQ From: Coleman et al. (2018) Molecular Psychiatry.
After GWAS Enrichment Analysis aka functional [enrichment] analysis After detecting a “hit” what do you do? Enrichment Analysis aka functional [enrichment] analysis aka genetic set enrichment analysis (GSEA) aka pathway analysis Conglomeration of different techniques aimed at uncovering the coding areas, function(s), tissue specificity, networks, pathways, etc. for the “hits” in a GWAS
First Question: Where is it? Near a coding region: Exon Synonymous (same amino acid) Nonsynonymous (different amino acid) Intron Splice variant Enhancer Near Promoter Actively transcribed (H3K4me3) Not near a coding region: Nearest coding region(s) Enhancer (eQTL) = expression quantitative trait locus
If the “hit” is in or very close to a coding region (< 10% of all GWAS hits) Exon (see next slide) Intron “Header” area (promoter; technically 5’ UTR) “Trailer” area (technically, 3’UTR)
Synonymous (same amino acid) Exon SNP Missense (amino acid codon) Non synonymous (different amino acid) Nonsense (chain terminating codon)
Splice variant (influences the type(s) of mRNA) Intron SNP Splice variant (influences the type(s) of mRNA) Enhancer (influences rate of transcription)
If the “hit” is not close to a coding region (c. 90% of all GWAS hits) Nearest coding region Linear: nearest in base pairs Chromosome confirmation: nearest in 3D Regulatory role (eQTL): what coding region mRNAs does the “hit” influence? Expression in which tissue(s) Expression at which developmental stage(s) eQTL = expression quantitative trait locus (contribute to variation in the amount of mRNA expressed)
Other Questions: What tissues are the [nearest] coding region[s] expressed in? Are histone markers nearby? H3K4me3 active promoter region H3K27aqc enhancer What mRNAs and how much mRNAs are associated with the region = eQTL (expression quantitative trait locus) E.g., does the amount of mRNA differ in patients and controls? What other ”hits” are also functionally related to this “hit” = Network analysis
IQ Genes From: Coleman et al. (2018) Molecular Psychiatry.
Polygenic Risk Score (PRS) AKA Genomic Polygenic Score (GPS) Use the top predictors in GWAS that predict the phenotype Always more loci than just the significant loci Validate in a new sample
Polygenic Risk Scores for Education Year Phenotype R2 Study 2013 Years of Education .02 Rietveld et al. 2016 .04 Okbay et al. 2017 Educational Attainment .16 Selzam et al.