Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome-Wide Association Study (GWAS)

Similar presentations


Presentation on theme: "Genome-Wide Association Study (GWAS)"— Presentation transcript:

1 Genome-Wide Association Study (GWAS)
Presented by Karen Xu

2 What you need to know Basic genetic concepts behind GWAS
Genotyping technologies and common study designs Statistical concepts for GWAS analysis Replication, interpretation and follow-up of association results

3 Central Goal of Human Genetics
To identify genetic risk factors for common, complex diseases

4 Goal of GWAS To use genetic risk factors to predict who is at risk
Identify the biological underpinnings of disease susceptibility for developing new prevention and treatment strategies

5 Application in pharmacology
Identifying DNA sequence variations associated w/ drug metabolism and efficacy as well as adverse effects Example, warfarin---determining the appropriate dose Personalized medicine

6 Concepts underlying the study design
SNP---single nucleotide polymorphism Single base pair changes in the DNA sequence that occur with high frequency in the human genome SNP (common) vs. Mutation (rare) Cystic fibrosis---mutations in the CFTR gene Linage analysis---genotyping families affected by cystic fibrosis using a collection of genetic markers across the genome and examining how these genetic markers segregate w/ the disease across multiple familes

7 Common Disease Common Variant Hypothesis
Common disorders are likely influenced by genetic variation that is also common in the population 1. If common genetic variants influence disease, the effect size (or penetrance) for any one variant must be small relative to that found for rare disorders. 2. If common alleles have small genetic effects (low penetrance), but common disorders show heritability (inheritance in families), then multiple common alleles must influence disease susceptibility.

8 Figure 1. Spectrum of Disease Allele Effects.
Bush WS, Moore JH (2012) Chapter 11: Genome-Wide Association Studies. PLoS Comput Biol 8(12): e doi: /journal.pcbi

9 Capturing Common Variation
1. location and density of commonly occurring SNPs is needed to identify the genomic regions and individual sites that must be examined by genetic studies 2. population-specific differences in genetic variation must be cataloged so that studies of phenotypes in different populations can be conducted with the proper design 3. correlations among common genetic variants must be determined so that genetic studies do not collect redundant information

10 International HapMap Project
Used a variety of sequencing techniques to discover and catalog SNPs in European descent populations, the Yoruba populations of African origin, Han Chinese individuals from Beijing, and Japanese individuals from Tokyo Has since been expanded to include 11 human populations

11 Linkage Disequilibrium
A property of SNPs on a contiguous stretch of genomic sequence that describes the degree to which an allele of a SNP is inherited or correlated with an allele of another SNP within a population Linkage between markers on a population scale

12 Figure 2. Linkage and Linkage Disequilibrium.
Bush WS, Moore JH (2012) Chapter 11: Genome-Wide Association Studies. PLoS Comput Biol 8(12): e doi: /journal.pcbi

13 Direct vs. Indirect Association
LD creates two possible positive outcomes from a genetic association study 1. direct association----the SNP influencing a biological system that leads to the phenotype is directly genotyped in the study 2. Indirect association----the influential SNP is not directly typed, but instead a tag SNP in high LD with the influential SNP is typed Therefore, a significant SNP association from a GWAS should not be assumed as the causal variant

14 Genotyping Technologies
Chip-based microarray technology Illumina, NA molecules and primers are first attached on a slide and amplified with polymerase so that local clonal DNA colonies, later coined "DNA clusters", are formed. To determine the sequence, four types of reversible terminator bases (RT-bases) are added and non-incorporated nucleotides are washed away. A camera takes images of the fluorescently labeled nucleotides, then the dye, along with the terminal 3' blocker, is chemically removed from the DNA, allowing for the next cycle to begin.

15 Study Design Case control vs. quantitative design
Two primary classes of phenotypes: categorical or quantitative From the statistical perspective, quantitative traits are preferred, but not required for a successful study

16 Association Test 1. single-locus analysis
When a well-defined phenotype has been selected for a study population, and genotypes are collected using sound techniques, the statistical analysis can begin Quantitative traits----ANOVA (analysis of variance)---null hypothesis is that there is no difference between the trait means of any genotype group Dichotomous case/ control traits are analyzed using logistic regression---null hypothesis---there is no association between the phenotype and genotype

17 Statistical replication
Replication studies should be conducted in an independent dataset drawn from the same population as GWAS Once an effect is confirmed in the target population, other populations may be sampled to determine if the SNP has an ethnic-specific effect Identical phenotype criteria should be used in both GWAS and replication studies A similar effect should be seen in the replication set from the same SNP, or a SNP in high LD with the GWAS-identified SNP

18 Meta-analysis of multiple analysis results
Meta-analysis developed to examine and refine significance and effect size estimates from multiple studies examining the same hypothesis in the published literature However, it is rare to find multiple studies that match perfectly on all criteria Study heterogeneity is often statistically quantified in a meta-analysis to determine the degree to which studies differ.

19 Data Imputation To conduct a meta-analysis properly, the effect of the same allele across multiple distinct studies must be assessed. This can prove difficult if different studies use different genotyping platforms (which use different SNP marker sets). As this is often the case, GWAS datasets can be imputed to generate results for a common set of SNPs across all studies. Genotype imputation exploits known LD patterns and haplotype frequencies from the HapMap or 1000 Genomes project to estimate genotypes for SNPs not directly genotyped in the study [50].

20 Logistic regression Predicting the likelihood that Y is equal to 1 (rather than 0) given certain values of X Example: we try to predict whether or not small business will succeed based on the number of years of experience the owner has in the field prior to starting the business. We presume that those people who have more experience will be more likely to succeed As X (the number of years of experience) increases, the probability that Y will be equal to 1 (success in the business) will tend to increase

21 Logistic Regression

22 Logistic Regression

23 Logistic Regression


Download ppt "Genome-Wide Association Study (GWAS)"

Similar presentations


Ads by Google