Presentation is loading. Please wait.

Presentation is loading. Please wait.

Molecular & Genetic Epi 217 Association Studies

Similar presentations


Presentation on theme: "Molecular & Genetic Epi 217 Association Studies"— Presentation transcript:

1 Molecular & Genetic Epi 217 Association Studies
John Witte

2 Association Studies

3 Association Studies Use of association studies is rapidly expanding, reflecting a number of laudable properties, including their: Ease, since one need not collect large pedigrees; and Potential for being more powerful than conventional linkage-based approaches.

4 Linkage vs. Association
Risch & Merikangas, Science 1996

5 Association Study Approaches
Direct vs Indirect Candidate genes: Functional All common variants All common variants in genome (GWAS) All variants in genome (sequencing) Expensive Rare variants

6 Genomics Revolution Human Genome Project: 13 years, $3B for 1 sequence
Now: 1 week, $10K > 500 times faster < 1/100,000th the cost! Soon: 1 hour, $1K (#1 Innovation, 2010) Improving our ability to study genomics of health and disease Computing has, famously, increased in potency according to Moore’s law. This says that computers double in power roughly every two years—an increase of more than 30 times over the course of a decade, with concomitant reductions in cost. calculates that the cost of DNA sequencing at the institute has fallen to a hundred-thousandth of what it was a decade ago (see chart 1). The genome sequenced by the International Human Genome Sequencing Consortium (actually a composite from several individuals) took 13 years and cost $3 billion. Now, using the latest sequencers from Illumina, of San Diego, California, a human genome can be read in eight days at a cost of about $10,000. Nor is that the end of the story. Another Californian firm, Pacific Biosciences, of Menlo Park, has a technology that can read genomes from single DNA molecules. It thinks that in three years’ time this will be able to map a human genome in 15 minutes for less than $1,000. The Economist, 2010

7

8 Control Selection A critical aspect of association studies is that controls should be selected from the cases’ source population. That is, controls should be those individuals who, if they were diseased, would become cases.

9 Population Stratification
Confounding bias that may occur if one’s sample is comprised of sub-populations with different: allele frequencies (); and disease rates (RpR) Cases are more likely than controls to arise from the sub-population with the higher baseline disease rate. Cases and controls will have different allele frequencies regardless of whether the locus is causal. Gene Sub-population Disease RpR

10 Example of Population Stratification
Higher levels of Native American heritage defined sub-populations with higher risks of NIDDM, and lower frequencies of immunoglobulin haplotype Gm3;5,13,14 Gm3;5,13,14 NIDDM Pima Indians Mixed Caucasians Cases more likely to have higher Native American heritage, and less likely to carry the haplotype. Ignoring stratification gave a false inverse association: OR = 0.3. Adjusting for heritage gave OR = 0.8 (95% CI = ). (Knowler et al. Am J Hum Genet 1988) Cardon & Palmer, 2003

11 Family-Based Association Studies
Siblings Parents G G G G G G Cousins G G

12 Continuum of Assoc Study Designs
Population-based “Ethnicity” Matched Structured Assoc Family-based Population Stratification Overmatching (Bias…………………versus………………...efficiency) Gene Subpopulation Disease  Sharing of genes & envt. Efficiency Also, recruitment issues

13 Association Analysis Genotype Cases Controls OR GG A D 1 GT B E BD/AE
TT C F CD/AF Simple chi-square test comparing genotype frequencies (2 d.f.) Called a co-dominant analysis

14 Genetic Model ORs depend on genetic model R = r = 1 not risk allele
R > r = 1 recessive R = r > 1 dominant R = r2 > 1 log additive (Assuming positive association) Genotype OR GG 1 GT r TT R

15 Tests of association If genetic model known:
Collapse genotypes into 2x2 table, 1 d.f. test Trend test for log additive Use logistic regression: coding; covariates Rarely know genetic model Use all three models (dom, rec, log additive) Compare fit with the co-dominant (2d.f.) model (LR test) Cannot use LR test to compare models with each other as not nested Model with best fit and smallest P is best? Use permutation test here (MAX test)

16 Candidate Gene Studies
Selection of candidates Linkage regions? Biological support? “I am interested in a candidate gene and have samples ready to study. What SNPs do I genotype?”

17 Candidate Gene: Where do I Start?
Location: What chromosome? What position on the chr? Exons/UTR: How many exons? UTR regions? Size: How large is the gene? Use UCSC genome browser.

18 SNP Picking: Things to Consider
Validation: What is the quality of the SNPs? Informativity: Are these SNPs informative in my population? How common are they? Location? Potentially Functional: Do these SNPs have a potential biological impact? Missense variants? Previously Associated: Have previous studies found SNPs in the candidate gene associated with the outcome?

19 SNP Picking: Validation

20 SNP Picking: Validation

21 SNP Picking: Validation

22 SNP Picking: Informative

23 SNP Picking: Potentially Functional

24 SNP Picking: Previously Associated

25 MTHFR Summary Chromosome 1: 11,780,053-11,800,381 Size: 20,329 bp
Exons: 12 Potentially Functional: 5 missense of which 3 MAF >5% Previously Associated: 3 (C677T, A1298C, A2756G)

26 MTHFR SNPs 102 SNPs across MTHFR Too Many SNPs to Genotype!
102 SNPs across MTHFR Too Many SNPs to Genotype!

27 Too many MTHFR SNPs Solution: Tag SNP Selection
SNPs are correlated (aka Linkage Disequilibrium) A T G C A/T 1 G/A 2 G/C 3 T/C 4 5 A/C 6 Pairwise Tagging: SNP 1 SNP 3 SNP 6 3 tags in total Test for association: high r2 high r2 high r2 Carlson et al. (2004) AJHG 74:106

28 Coverage: Measurement Error in TagSNPs
Complete set Subset of these make up the genotyping set For a given SNP Get r2 between that SNP and SNPs in the genotyping set Take the highest r2 value Called maximum r2

29 Common Measures of Coverage
Threshold Measures e.g., 73% of SNPs in the complete set are in LD with at least one SNP in the genotyping set at r2 > 0.8 Average Measures e.g., Average maximum r2 = 0.84

30 Coverage and Sample Size
Sample size required for Direct Association, n Sample size for Indirect Association n* = n/ r2 For r2 = 0.8, increase is 25% For r2 = 0.5, increase is 100%

31 Tag SNPs Database Resources

32 HapMap Re-sequencing to discover millions of additional SNPs; deposited to dbSNP. SNPs from dbSNP were genotyped Looked for 1 SNP every 5kb SNP Validation Polymorphic Frequency Haplotype and Linkage Disequilibrium Estimation LD tagging SNPs

33 HapMap Phase III Populations
ASW African ancestry in Southwest USA CEU Utah residents with Northern and Western European ancestry from the CEPH collection CHB Han Chinese in Beijing, China CHD Chinese in Metropolitan Denver, Colorado GIH Gujarati Indians in Houston, Texas JPT Japanese in Tokyo, Japan LWK Luhya in Webuye, Kenya MEX Mexican ancestry in Los Angeles, California MKK Maasai in Kinyawa, Kenya TSI Toscani in Italia YRI Yoruba in Ibadan, Nigeria

34 Tag SNPs: HapMap

35 Tag SNPs: HapMap

36 Tag SNPs: HapMap & Haploview

37 Tag SNPs: HapMap & Haploview

38 Tag SNPs: HapMap & Haploview

39 Tag SNPs: HapMap & Haploview

40 Tag SNPs: HapMap & Haploview

41 Tag SNPs: HapMap Summary
Identified 33 common MTHR SNPs (MAF > 5%) among Caucasians Forced in 3 potentially functional/previously associated SNPs Identified tag based on pairwise tagging 15 tags SNPs could capture all 33 MTHR SNPs (mean r2 = 97%) Note: number of SNPs required varies from gene to gene and from population to population

42 1K Genomes Project

43 Taster Project: 3 SNPs in the TAS2R38 Gene
P A V P A I P V V P V I A A V Haplotype definition Each individual has two haplotypesdiplotype Haploytpeallele diplotypegenotype A A I A V V A V I

44 TASR: 3 SNPs form Haplotypes
Taster P A V Non-taster A V I 3rd haplotype is the result of recombination. A of non-taster AV of taster Allows us to compare the effect of the 1st SNP vs. the 2nd and 3rd. Rare-not in all combinations

45 TAS2R38 Haplotype Function


Download ppt "Molecular & Genetic Epi 217 Association Studies"

Similar presentations


Ads by Google