Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene Hunting: Design and statistics

Similar presentations


Presentation on theme: "Gene Hunting: Design and statistics"— Presentation transcript:

1 Gene Hunting: Design and statistics

2 Population-based Association Design: Qualitative Phenotype
Genotype: Schiz: Not Schiz: AA AC CC Do c2 test for association.

3 Population-based Association Design: Quantitative Phenotype
Number of C alleles 0 (AA) 1 (AC) 2 (CC) Phenotype Compute the correlation (or regression slope)

4 GWAS: Genome-wide Association Study
DNA arrays with 1,000s of SNPs scattered throughout the genome. (Current chips have several million different SNPs) Select the SNPs so that they cover ALL the genome using haplotype blocks. (Some DNA chips oversample SNPs in protein coding regions) Genotype patients and controls on all the SNPs (or genotype a random sample of the population). Find the SNPs that differ patients from controls (or have a significant correlation with a quantitative phenotype). Problem: number of statistical tests.

5 GWAS results as of 2012 From

6 GWAS and Quantitative Phenotype: Height (Weedon et al, 2007)
Note: Effect size = c. 0.2 inches, length of a housefly

7 Problems with GWAS (1) Expensive.
(2) Large number of statistical tests. (3) Need very, very large samples (10,000 or more.

8 Results from GWAS (1) Good success in medicine.
(2) More limited success for psychiatric disorders (but things are improving) (3) Success for normal behavioral traits (personality, IQ) just starting (4) Genetics of behavior is hyper-polygenic: many, many, many genes (5) Predictive power is poor but getting better (6) Pointing to biological mechanisms

9 Used to be hard to find genes
From The Consortium on Tobacco and Genetics (2010)

10 But things are changing … Manhattan plot for IQ
From: Coleman et al. (2018) Molecular Psychiatry.

11 After GWAS Enrichment Analysis aka functional [enrichment] analysis
After detecting a “hit” what do you do? Enrichment Analysis aka functional [enrichment] analysis aka genetic set enrichment analysis (GSEA) aka pathway analysis Conglomeration of different techniques aimed at uncovering the coding areas, function(s), tissue specificity, networks, pathways, etc. for the “hits” in a GWAS

12 First Question: Where is it?
Near a coding region: Exon Synonymous (same amino acid) Nonsynonymous (different amino acid) Intron Splice variant Enhancer Near Promoter Actively transcribed (H3K4me3) Not near a coding region: Nearest coding region(s) Enhancer (eQTL) = expression quantitative trait locus

13 If the “hit” is in or very close to a coding region (< 10% of all GWAS hits)
Exon (see next slide) Intron “Header” area (promoter; technically 5’ UTR) “Trailer” area (technically, 3’UTR)

14 Synonymous (same amino acid)
Exon SNP Missense (amino acid codon) Non synonymous (different amino acid) Nonsense (chain terminating codon)

15 Splice variant (influences the type(s) of mRNA)
Intron SNP Splice variant (influences the type(s) of mRNA) Enhancer (influences rate of transcription)

16 If the “hit” is not close to a coding region (c. 90% of all GWAS hits)
Nearest coding region Linear: nearest in base pairs Chromosome confirmation: nearest in 3D Regulatory role (eQTL): what coding region mRNAs does the “hit” influence? Expression in which tissue(s) Expression at which developmental stage(s) eQTL = expression quantitative trait locus (contribute to variation in the amount of mRNA expressed)

17 Other Questions: What tissues are the [nearest] coding region[s] expressed in? Are histone markers nearby? H3K4me3  active promoter region H3K27aqc  enhancer What mRNAs and how much mRNAs are associated with the region = eQTL (expression quantitative trait locus) E.g., does the amount of mRNA differ in patients and controls? What other ”hits” are also functionally related to this “hit” = Network analysis

18 IQ Genes From: Coleman et al. (2018) Molecular Psychiatry.

19 Polygenic Risk Score (PRS) AKA Genomic Polygenic Score (GPS)
Use the top predictors in GWAS that predict the phenotype Always more loci than just the significant loci Validate in a new sample

20 Polygenic Risk Scores for Education
Year Phenotype R2 Study 2013 Years of Education .02 Rietveld et al. 2016 .04 Okbay et al. 2017 Educational Attainment .16 Selzam et al.


Download ppt "Gene Hunting: Design and statistics"

Similar presentations


Ads by Google