Presentation is loading. Please wait.

Presentation is loading. Please wait.

Selecting Initial GWAS and replication studies

Similar presentations


Presentation on theme: "Selecting Initial GWAS and replication studies"— Presentation transcript:

1 Selecting Initial GWAS and replication studies
David Hunter Harvard School of Public Health Brigham and Women’s Hospital Broad Institute of MIT and Harvard

2 Initial Study for GWAS Cases and controls well matched with respect to ancestry to minimize population stratification (restriction to one self-identified group) Genomic control or other methods e.g. Eigenstrat (Price et al, 2006), may compensate for looser matching

3 45, 19 and 19 SNPs (respectively) with p<10-7 not shown
Control of population stratification e.g. hair color in Nurses’ Health Study (European ancestry) Chi-squared inflation factors and Q-Q plots of –log10 p-values with no adjustment for population stratification and adjusting for the top four and fifty eigenvectors (Price et al, 2006) 45, 19 and 19 SNPs (respectively) with p<10-7 not shown Kraft P, unpublished

4 Article Nature 447, (7 June 2007) | doi: /nature05911; Received 26 March 2007; Accepted 11 May 2007 Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls The Wellcome Trust Case Control Consortium

5 Conclusions Broad matching on ancestry and region adequate for
discovery of strongest hits Statistical methods for control of population stratification (within populations of European ancestry) adequate to assist in discovery of strongest hits Will more rigorous designs permit discovery of weaker associations? When signal-noise is low, how does noise due to multiple comparisons compare with noise due to poor matching of controls? False negatives the biggest problem (can deal with false +ves via replication).

6 Criteria for follow-up of initial reports of genotype–phenotype associations
Replication studies should be of sufficient sample size to convincingly distinguish the proposed effect from no effect Replication studies should preferably be conducted in independent data sets, to avoid the tendency to split one well-powered study into two less conclusive ones The same or a very similar phenotype should be analysed A similar population should be studied, and notable differences between the populations studied in the initial and attempted replication studies should be described Similar magnitude of effect and significance should be demonstrated, in the same direction, with the same SNP or a SNP in perfect or very high linkage disequilibrium with the prior SNP (r2 close to 1.0) Statistical significance should first be obtained using the genetic model reported in the initial study When possible, a joint or combined analysis should lead to a smaller P-value than that seen in the initial report A strong rationale should be provided for selecting SNPs to be replicated from the initial study, including linkage-disequilibrium structure, putative functional data or published literature Replication reports should include the same level of detail for study design and analysis plan as reported for the initial study Chanock, Maniolo et al. Nature, June 7th 2007

7 Initial Study for GWAS: technical issues
Standard advice – case and control samples handled exactly the same at every stage Source of DNA Blood/buffy coat mostly good results Buccal cell variable results (Feigelson et al. CEBP, encouraging) Whole genome amplified DNA (Affy OK, Illumina in development)

8 Replication studies For statistical replication, prefer: Similar phenotype Similar ancestry For generalizability, prefer Different populations Different ancestry backgrounds (may also help with fine mapping)

9 Study design? Prospective Protect from survivor bias Protect from selection bias Interpretability of gene-environment analyses Possibility of interpretable biomarkers

10 Study quality? Importance depends on strength of signal
To date – little apparent relation between probability of replication and quality May matter more for weak signals Sample size may trump quality (within limits)

11 NCI BPC3 Results: 7909 cases, 8683 controls
Rs : Overall p, trend 4 x 10-19 Schumacher et al. Can Res, April 2007

12 a, rs2981582; b, rs3803662; c, rs889312; d, rs13281615; and e, rs3817198
FGFR2 Forest plots of the per-allele odds ratios for each of the five SNPs reaching genome-wide significance for breast cancer. Easton et al. Nature, May 2007

13 Cancer Genetic Markers of Susceptibility (CGEMS):

14 General Strategy for Multistage analysis of Prostate & Breast Cancer
Initial GWAS Study 1150 cases/1150 controls 540,000 Tag SNPs Follow-up Study #1 4500 cases/ 4500 controls ~28,000 SNPs Follow-up Study #2 3500 cases/ 3500 controls at least 1,500 SNPs 30 ±20 loci Fine Mapping

15 Committed Studies CGEMS
Breast Cancer NHS (GWAS) PLCO WHI Polish C/C ACS EPIC MEC Prostate Cancer PLCO (GWAS) ACS HPFS PHS ATBC CeRePP EPIC MEC

16 CGEMS: caBIG Posting Pre-Computed Analysis
No Restrictions Raw Genotype Case/control Age (in 5 yrs) Family Hx (+/-) Registration

17 Association Tests Prostate 10/06 Breast 04/07 ~528,000 SNPs
Illumina 550k Instant Replication!

18 Additional In silico replication possibilities
dbGAP ncbi.nlm.nih.gov/dbgap Framingham nhlbi.nih.gov/about/framingham WTCCC wtccc.org.uk DGI broad.mit.edu/diabetes

19 Chromosomes Log10(p-value) 1 2 3 4 5 6 7 8 -2 -3 -4 -5 9 10 11 12 13
q p q -2 -3 -4 -5 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X p q p q -2 Log10(p-value) -3 -4 FGFR2 -5 -6

20 The six SNPs with the smallest P values of the 528,173 tested among 1,145 cases of postmenopausal invasive breast cancer and 1,141 controls (full results available at ). SNP ID Χ2* P* ORhet* ORhomo* Chromosome Gene rs rs FGFR2 rs RELN rs FGFR2 rs TLR1,TLR6 rs *From analyses adjusting for age, matching factors (see Methods), and three eigenvectors of the principal components identified by Eigenstrat. P value obtained by a score test with 2df. Hunter et al, Nat Gen, May 2007

21 Scatterplot of P values for the FGFR2 locus from the GWAS.

22 Results of associations of rs in the Nurses Health Study, Nurses’ Health Study 2, and the PLCO study. Study Population Allele Frequency ORhet ORhomo Ptrend (N cases/N controls) Cases Controls (95% CI) (95% CI) (%) (%) Nurses’ Health Study (1,145/1,141) x 10-6 ( ) ( ) Nurses’ Health Study 2 (302/594) ( ) ( ) PLCO (919/922) ( ) ( ) ACS CPS-II (555/556) ( ) ( ) Pooled estimates (2,921/3,213) x 10-10 ( ) ( )

23 UNFINISHED AGENDA Where is the causal variant?
Results of associations of rs in the Nurses Health Study, Nurses’ Health Study 2, and the PLCO study. Study Population Allele Frequency ORhet ORhomo Ptrend (N cases/N controls) Cases Controls (95% CI) (95% CI) (%) (%) Nurses’ Health Study (1,145/1,141) x 10-6 ( ) ( ) Nurses’ Health Study 2 (302/594) ( ) ( ) PLCO (919/922) ( ) ( ) ACS CPS-II (555/556) ( ) ( ) Pooled estimates (2,921/3,213) x 10-10 ( ) ( ) UNFINISHED AGENDA Where is the causal variant? What does this tell us about mechanisms of breast carcinogenesis?

24 THE HITS KEEP COMING…. UNFINISHED EPIDEMIOLOGIC/PUBLIC HEALTH AGENDA Gene-environment interaction, what do the genes tell us about environmental exposures? Gene-gene interaction Pathway analysis Clinical implications – risk stratification for screening? Intervention? Health policy implications? Much of the substrate data – publicly available or relatively cheap.

25 NHS/HPFS/PHS GENETIC STUDIES
Immaculata De Vivo NHS/HPFS: Peter Kraft Sue Hankinson Hardeep Ranu Shelley Tworoger Crystal Arnone Eric Rimm Carolyn Guo Frank Hu Pati Soule Meir Stampfer Craig Labadie Walt Willett Carolyn Guo Frank Speizer Jiali Han Charles Fuchs Monica Macgrath Ed Giovannucci Chunyan He Andy Chan, Debra Patrick Dennett Schaumberg David Cox Fran Grodstein, Jae Tim Niu Hee Kang Aditi Hazra PHS: Jing Ma Fred Schumacher Mike Gaziano, P Ridker

26 NCI BPC3 STEERING COMMITTEE: SECRETARIAT: David Hunter, Elio Riboli
Harvard cohorts EPIC cohorts ACS cohort Multiethnic Cohort PLCO cohort ATBC cohort BROAD INSTITUTE NCI Core Gen Facility CEPH NCI BPC3 STEERING COMMITTEE: Harvard David Hunter, Michael Gaziano, Julie Buring, Graham Colditz, Walter Willett EPIC,CEPH, Cambridge Elio Riboli, Rudolf Kaaks, Federico Canzian, Gilles Thomas, ACS Michael Thun, Heather Feigelson, Jeanne Calle NCI Richard Hayes, Demetrius Albanes, Bob Hoover, Stephen Chanock; Program - Mukesh Verma MEC & Broad Brian Henderson, Laurence Kolonel, David Altshuler, Malcolm Pike SECRETARIAT: David Hunter, Elio Riboli GENOMICS subgroup: David Altshuler (Chair) Steve Chanock Gilles Thomas Genotyping subgroup: Chris Haiman (Chair) Federico Canzian Alison Dunning Steve Chanock David Cox David Hunter Loic LeMarchand James Mackay STATISTICS subgroup: Dan Stram (Chair) Peter Kraft Rudolf Kaaks Paul Pharoah Malcolm Pike Gilles Thomas Shalom Wacholder PUBLICATIONS COMMITTEE: Michael Thun (Chair) Elio Riboli Brian Henderson David Hunter Graham Colditz Richard Hayes Demetrius Albanes

27 CGEMS Acknowledgements
HSPH David Hunter Peter Kraft Fred Schumacher David Cox ACS Heather Feigelson Carmen Rodriguez Eugenia Calle Michael Thun PLCO Regina Ziegler Chris Berg Saundra Buys Chris MacCarty NCI Stephen Chanock Gilles Thomas Robert Hoover Joseph Fraumeni Daniela Gerhard Kevin Jacobs Zhaoming Wang Meredith Yeager Robert Welch Richard Hayes Sholom Wacholder Nilanjan Chatterjee Kai Yu Margaret Tucker Marianne Rivera-Silva NCICB

28

29 Selecting initial and replication samples from existing studies
I. What studies of the same phenotype exist? II. Can a consortium or collaborative approach provide a study with adequate power for the initial GWAS, along with pre-planned replication studies? III. Do any of these studies have pre-existing data that would increase power e.g. “free” controls for a prior GWAS of another phenotype? IV. Is the phenotype defined in the same or similar manner? V. Are covariate data available, and defined similarly? VI. Do any of the studies have additional phenotypic information e.g. biomarkers that would create opportunities for “added value” analyses, if these are the subjects of the GWAS?


Download ppt "Selecting Initial GWAS and replication studies"

Similar presentations


Ads by Google