Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diabetes Genome Wide Association Alessandra C Goulart Ida Hatoum Stalo Karageorgi Mara Meyer EPI293 January 2008 Harvard School of Public Health Alessandra.

Similar presentations


Presentation on theme: "Diabetes Genome Wide Association Alessandra C Goulart Ida Hatoum Stalo Karageorgi Mara Meyer EPI293 January 2008 Harvard School of Public Health Alessandra."— Presentation transcript:

1 Diabetes Genome Wide Association Alessandra C Goulart Ida Hatoum Stalo Karageorgi Mara Meyer EPI293 January 2008 Harvard School of Public Health Alessandra C Goulart Ida Hatoum Stalo Karageorgi Mara Meyer EPI293 January 2008 Harvard School of Public Health

2 2 Background - Type 2 Diabetes Mellitus Disorder characterized by impaired glucose/insulin function >170 million worldwide

3 3 Background - Genetic Justification Explosion of diabetes plus rapidly decreasing age of onset argues for environmental rather than genetic etiology Genetic justification — Clustering in families — Leveling off of risk by BMI — Mouse data Pattern argues for a polygenic trait - GWAS!

4 4 Methods 3 separate studies, all working collaboratively Different populations, different analyses

5 5 FUSION/Finrisk Population Genome-wide scan study ( N=2,335 Finland) Population-based 1161 1174 (T2D) (Controls) Matched on age, sex, birth province

6 6 DGI Population Genome-wide scan study ( N=2,931/ Finland /Sweden) Population-based Family- based 1022 1075 422 392 (T2D) (Controls) Matched on gender, age, BMI,Discordant siblings matched on age place of origin

7 7 WTCCC Population Genome-wide scan study ( N=4,862 British/Irish) 1924 2938 (T2D) (Controls) No matching From a 1958 birth cohortFrom a diabetes “repository”

8 8 Methods – General Outline All three studies start with study populations between 2335 and 4862 All three run genome-wide association scans initially analyzing 300-400,000 SNPs, and reduce that number with certain criteria All three studies then run second waves of replication or conduct replication studies in independent populations Findings are compared with previously published reports and across the three studies — Weighted meta-analysis Findings are fairly consistent between the three study populations, with many replicated associations

9 9 Population Stratification All three studies investigated potential population stratification by — Cochran-Armitage tests — Genome control inflation factor ( λ ) — Principal components analysis using EIGENSTRAT — Adjustment for region/birthplace – Matching, choice of study population — Replication in independent datasets

10 10 Methods - Platform Genotyping Platform for GWAS — Affymetrix GeneChip Human Mapping 500k Array Set – Wellcome Trust Case Control Consortium (WTCCC)UK – Diabetes Genetics Initiative (DGI) – Both population- (matched on gender, age, BMI and region of origin) and family-based samples — Illumina HumanHap300 BeadChip – Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics (FUSION) – 1161 Finnish T2D cases and 1174 normal glucose-tolerant controls from FUSION and Finrisk 2002 studies (matched by province, sex and age)

11 11 Methods - FUSION FUSION analyzed 315,635 SNPs with MAF > 0.002 with a model that is additive on the log-odds scale — They observed an excess of low p-values (P<10-4), suggesting many common variants with modest effects ( λ = 1.026) — Imputed >2 million SNPs using data from HapMap CEU to cover 89.1% of SNPs with MAF >1% Compared stage 1 results with DGI and WTCCC to increase statistical power and select SNPs for stage 2 — An association was “genome-wide significant” if p<5x10 -8 Stage 2 replication sample of 1215 Finnish T2D cases and 1258 Finnish NGT controls — 80 of 82 selected SNPs genotyped

12 12 Methods - FUSION Stage 2 analysis selected SNPs based on — FUSION genotyped and imputed SNPs from stage 1, using a prioritization algorithm that gave preference to genotyped SNPs — Combined analysis of GWA results from FUSION, DGI, and WTCCC — Previous T2D association results Joint analysis of Stage 1 + Stage 2 All-data meta-analysis of FUSION, DGI, WTCCC and follow- up samples

13 13 Methods - DGI DGI analyzed 386,731 SNPs after applying strict quality control filters, developed 284,968 additional two-marker (haplotype) tests, for a total of 671,699 tests — Each SNP and haplotype was tested for association with T2D and each of 18 clinical traits — Population and family-based samples combined with a weighted meta-analysis — Quantitative traits assessed by linear or logistic regression — “Genome-wide significant” associations at p<5x10 -8 Three strategies to search for systemic bias — P-value distribution in population sample ( λ = 1.05), principal components analysis, and independent genotyping of 114 SNPs with extreme p-values

14 14 Methods - DGI Observed an excess of low p-values — 1000 permuted whole-genome analyses with phenotype data randomized within matched case-control groups to evaluate the significance of excess of low p –values — Suggests many variants with modest effects, not few variants with large effects Replication in independent sample of 10,850 subjects from case-control samples of European ancestry (Sweden, USA, Poland) under the same model — Replication set of 107 SNPs selected on the basis of this study and comparisons with WTCCC and FUSION

15 15 Methods - WTCCC Analyzed 393,453 autosomal SNPs with minor allele frequencies >1% in both cases and controls and no extreme departure from HWE (P<10 -4 ) Additional quality controls to find true associations included cluster-plot visualization, and validation genotyping on a second platform — P-value distribution indicates no substantial confounding by population substructure or genotyping bias ( λ = 1.08) The WTCCC group used 3 replication sets with an additional 3757 cases and 5346 controls from two other UK studies

16 16 Methods - WTCCC First wave of SNPs selected 21 representative SNPs from the 30 SNPs in 9 distinct chromosomal regions with the most extreme p-values from the initial scan (p<10 -5 ) to limit false discovery Second wave relaxed p-value to detect modest associations (p~10 -2 to 10 -5 ) and found 5367 SNPs — Prioritized SNPs by evidence of association in DGI and FUSION; presence of multiple, independent associations within the same locus; and biological candidacy to analyze 56 SNPs

17 17 Results- FUSION GWAS

18 18 Results- FUSION GWAS Common in all 3 studies Common in 2 studies

19 19 Results-FUSION GWAS 10 loci identified: — 5 new: near genes IGF2BP2, CDKAL1, CDKN2A/2B, intergenic region ch. 11, FTO — 5 previously published: near PPARG, SLC30A8, HHEX, TCF7L2, KCNJ11 All loci have biological plausibility. Unknown for non-coding region ch. 11 FUSION study found: — Strong evidence for – TCF7L2 (stage 1+2) – SLC30A8 (stage 1) – IGF2BP2 (stage 1) – Intergenic region ch. 11 (stage 1) — Modest evidence for – HHEX – CDKAL1 (stage 1) – CDKN2A/2B (stage 1+2) – FTO (stage 1+2) — Some evidence for – PPARG (Imputed) – KCNJ11 (Imputed)

20 20 Results-FUSION GWAS Compared results to DGI and WTCCC scans — HHEX, CDKAL1, FTO with modest evidence showed stronger evidence in WTCCC scan — SLC30A8 subsequent genotyping in other studies resulted in stronger evidence in combined sample All SNPs or genes in this study overlap with corresponding SNP/gene in at least one of the other studies except the intergenic region on ch. 11 Intergenic region on ch. 11 — Includes 3 sets spliced Expressed Sequence Tags — Nearby regions reported in other GWA study (Sladek 2007)

21 21 Meta-Analysis All-data meta-analysis of FUSION, DGI, WTCCC and follow- up samples — Weighted log ORs from each study by the inverse of the variance — Total sample size: 32,544 (increased 7-fold from FUSION alone) — Increased sample size, power to detect modest effects All 10 loci reached genome-wide significance in meta- analysis (helping to confirm loci with only some evidence, emphasizing importance of combining data)

22 22 DGI GWA

23 23 Results- DGI GWAS Confirmed T2D susceptibility variants Common in all 3 studies Common in 2 studies

24 24 DGI GWAS. TD2 was trait associated with novel and previous published candidate genes. Association with HHEX was confirmed in this GWA, WTCC/UKT2D and by other studies (Sladek 2007). Association with SLC30A8 was consistently confirmed by WTCC/UKT2D and FUSION. No evidence for association: LOC387761, EXT2-ALX4. Additional loci: FLJ393370, PKN2

25 25 DGI GWAS. Current WGA and collaborators: evidence for association was verified in 3 previously unknown loci with T2D risk ( CDKN2B, IGF2BP2 and CDKAL1). 15 common variants for T2D and lipid levels were identified. New T2D genes suggest a primary role of the pancreatic beta cell

26 26 Results- WTCCC GWA

27 27 Results- WTCCC GWA Confirmed T2D susceptibility variants Common in all three studies Common in 2 studies

28 28 Results- WTCCC GWA In the WTCCC, the strongest association signals were found for SNPs in TCF7L2 (P=6.7x10 -13 ) From the first wave of SNPs, replication was found for SNPs in CDKAL1: ‘Compelling’ evidence across all studies (P~4.1 x 10 -11 ), SNPs map to a 90kb intron, may be involved in regulation of pancreatic beta cell function An association at FTO on chromosome 16 (rs8050136) was found to be mediated through a primary effect on adiposity Confirmed a previously reported association at HHEX The HHEX signal is in an area of LD also containing genes encoding KIF11 and IDE, which have biological plausibility

29 29 Results- WTCCC GWA The second wave found modest associations with SNPs in CDKN2A/CDKN2B replicated across the studies: CDKN2A is a known tumor suppressor, and produces p16 INK4a which inhibits CDK4, a regulator of pancreatic beta cell replication SNPs from the promoter and first 2 exons of IGF2BP2 were replicated in WTCCC, DGI, and FUSION Combined evidence was strong (P~8.6x10 -16 ), biological plausibility Independent genotyping of SLC30A8 (rs13266634) replicated previously reported findings (P=7.0x10 -5 in all UK data) Affymetrix chip does not capture this locus

30 30 Results- WTCCC GWA This study identified several T2D susceptibility loci Confirmed previously reported loci including — TCF7L2: the largest association signal — FTO: the effect disappeared after adjustment for BMI — HHEX/IDE: Strong replication, biological plausibility Three novel loci — CDKAL1, IGF2BP2, and CDKN2A: replicated across the 3 studies in this analysis

31 31 Conclusions - Differences Across Studies Study populations — Location — Family-based vs. unrelated — Matching factors — Definition of diabetes Genotyping platforms — Illumina vs. Affymetrix Analysis plans — Individual tests — Haplotype analysis — Imputation methods — P-value criteria

32 32 Conclusions - Theoretical Considerations Agnostic/statistical vs. prior information/biological plausibility Relaxed vs. strict criteria Ability to replicate

33 33 Conclusions - Future Directions Non-coding regions may be important Many more variants yet to be determined - larger studies needed Resequencing and functional studies are necessary to determine causal variants Generalizability concerns Collaborative model will benefit science!


Download ppt "Diabetes Genome Wide Association Alessandra C Goulart Ida Hatoum Stalo Karageorgi Mara Meyer EPI293 January 2008 Harvard School of Public Health Alessandra."

Similar presentations


Ads by Google