Selecting Initial GWAS and replication studies

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

What is an association study? Define linkage disequilibrium
AllerGen / Vancouver - 01/03//2009 Meta-Analysis of GABRIEL GWAS Asthma & IgE F. Demenais, M. Farrall, D. Strachan GABRIEL Statistical Group.
Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS.
DNA copy number variation and cancer risk John F Pearson Canterbury Statistics Open Day University of Canterbury 2/10/2012.
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Study Designs in GWAS Jess Paulus, ScD January 30, 2013.
Genetic Analysis in Human Disease
Genetic susceptibility: Polymorphisms of the 8q24 chromosome S. Lani Park 05/07/09.
Laura J. Van ‘t Veer Helen Diller Family Comprehensive Cancer Center University of California, San Francisco Biology of disease Who is at risk for what.
Mapping Genes for SLE: A Paradigm for Human Disease? Stephen S. Rich, Ph.D. Department of Public Health Sciences Wake Forest University School of Medicine.
Chance, bias and confounding
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Promoting Collaborations Across Studies for Replication and Follow-up Studies Robert N. Hoover, M.D., Sc.D Director Epidemiology and Biostatistics Program.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Challenges of an Epidemiologist Working in Genomics Wendy Post, MD, MS Associate Professor of Medicine and Epidemiology Cardiology Division Johns Hopkins.
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
WHOLE GENOME SEQUENCING FOR COLORECTAL CANCER Ulrike (Riki) Peters Fred Hutchinson Cancer Research Center University of Washington.
Selecting TagSNPs in Candidate Genes for Genetic Association Studies Shehnaz K. Hussain, PhD, ScM Assistant Professor Department of Epidemiology, UCLA.
The genetic epidemiology of common hormonal cancers Deborah Thompson Centre for Cancer Genetic Epidemiology.
Pharmacogenomics: Studies in Breast Cancer Lynn C. Hartmann MD Mayo Clinic Cancer Center.
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Strong Heart Family Study Phase VI Genetics Center Aims October 8, 2009.
G REG J ONES S URGERY D EPARTMENT, O TAGO M EDICAL S CHOOL N EW Z EALAND.
IUMSP Institut universitaire de médecine sociale et préventive, Lausanne Exploring the association of the CYP1A1- CYP1A2 locus with blood pressure in CoLaus.
Molecular and Genetic Epidemiology Kathryn Penney, ScD January 5, 2012.
Research Study Design and Analysis for Cardiologists Nathan D. Wong, PhD, FACC.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Association study of 5-HT2A genes with schizophrenia in the Malaysian population: A Multiethnic Meta- analysis Study Shiau Foon Tee* 1, Tze Jen Chow 1,
A role for lipids and statins in breast cancer risk and prevention? Dr. Mieke Van Hemelrijck Senior Lecturer in Cancer Epidemiology 3 August 2015.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
Figure S1. Quantile-quantile plot in –log10 scale for the individual studies The red line represents concordance of observed and expected values. The shaded.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Environmental Tobacco Smoke (ETS) and the Risk of Head and Neck Cancer: INHANCE Consortium Yuan-Chin Amy Lee Gene-Environmental Epidemiology Group International.
Prospective Evaluation of B-type Natriuretic Peptide Concentrations and the Risk of Type 2 Diabetes in Women B.M. Everett, N. Cook, D.I. Chasman, M.C.
Supplementary Figure 1 A B C D Colorectal cancer Colon cancer Proximal colon cancer Distal colon cancer.
What host factors are at play? Paul de Bakker Division of Genetics, Brigham and Women’s Hospital Broad Institute of MIT and Harvard
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Biobanks of Cerice Center for Gene Expression Research in Cancer Epidemiology Eiliv Lund, UiTø.
Discussion for a statement for biobank and cohort studies in human genome epidemiology John P.A. Ioannidis, MD International Biobank and Cohort Studies.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Genes and the Environment in Cancer Causation Joseph F. Fraumeni, Jr., M.D. National Cancer Institute January 9, 2007 Third Annual Alan S. Rabson Award.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
SHARe. What is WHI-SHARe? SHARe – stands for SNP Health Association Resource. It is an NHLBI program for genome-wide association studies (GWAS). GWAS.
A PPROACHING THE G ENOME - G ENETIC M ARKERS, L INKAGE AND A SSOCIATION G ENETICS 202 Jon Bernstein Department of Pediatrics October 8, 2015.
The International Consortium. The International HapMap Project.
C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Statistical Analysis of Candidate Gene Association Studies (Categorical Traits) of Biallelic Single Nucleotide Polymorphisms Maani Beigy MD-MPH Student.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
PGC Worldwide Lab Call Details DATE: Friday, April 12 th, 2013 PRESENTER: Alkes Price, Harvard University TITLE: “GWAS in multiple ancestries: heritability,
GENOME WIDE PROGNOSTIC STUDY IN BLADDER CANCER Antoni Picornell, Stephen J Chanock, Montserrat García-Closas, Guillermo Pita, Daniel Rico, Alfonso Valencia,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
Peng Yin1, Andrea L Jorgensen1, Andrew P Morris1, Richard Turner2, Richard Fitzgerald2, Rod Stables3, Anita Hanson2, Munir Pirmohamed2 1. Department of.
colorectal cancer: The Multiethnic Cohort
Supplementary Table 1. PRISMA checklist
Genome Wide Association Studies using SNP
Blanca E. Himes, Gary M. Hunninghake, James W. Baurley, Nicholas M
High level GWAS analysis
Research Techniques Made Simple: Interpreting Measures of Association in Clinical Research Michelle Roberts PhD,1,2 Sepideh Ashrafzadeh,1,2 Maryam Asgari.
Comparison of variant associations from previous reports and results from the KP GWAS meta-analysis for 105 known prostate cancer risk SNPs. Plotted values.
Presentation transcript:

Selecting Initial GWAS and replication studies David Hunter Harvard School of Public Health Brigham and Women’s Hospital Broad Institute of MIT and Harvard

Initial Study for GWAS Cases and controls well matched with respect to ancestry to minimize population stratification (restriction to one self-identified group) Genomic control or other methods e.g. Eigenstrat (Price et al, 2006), may compensate for looser matching

45, 19 and 19 SNPs (respectively) with p<10-7 not shown Control of population stratification e.g. hair color in Nurses’ Health Study (European ancestry) Chi-squared inflation factors and Q-Q plots of –log10 p-values with no adjustment for population stratification and adjusting for the top four and fifty eigenvectors (Price et al, 2006) 45, 19 and 19 SNPs (respectively) with p<10-7 not shown Kraft P, unpublished

Article Nature 447, 661-678 (7 June 2007) | doi:10.1038/nature05911; Received 26 March 2007; Accepted 11 May 2007 Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls The Wellcome Trust Case Control Consortium

Conclusions Broad matching on ancestry and region adequate for discovery of strongest hits Statistical methods for control of population stratification (within populations of European ancestry) adequate to assist in discovery of strongest hits Will more rigorous designs permit discovery of weaker associations? When signal-noise is low, how does noise due to multiple comparisons compare with noise due to poor matching of controls? False negatives the biggest problem (can deal with false +ves via replication).

Criteria for follow-up of initial reports of genotype–phenotype associations Replication studies should be of sufficient sample size to convincingly distinguish the proposed effect from no effect Replication studies should preferably be conducted in independent data sets, to avoid the tendency to split one well-powered study into two less conclusive ones The same or a very similar phenotype should be analysed A similar population should be studied, and notable differences between the populations studied in the initial and attempted replication studies should be described Similar magnitude of effect and significance should be demonstrated, in the same direction, with the same SNP or a SNP in perfect or very high linkage disequilibrium with the prior SNP (r2 close to 1.0) Statistical significance should first be obtained using the genetic model reported in the initial study When possible, a joint or combined analysis should lead to a smaller P-value than that seen in the initial report A strong rationale should be provided for selecting SNPs to be replicated from the initial study, including linkage-disequilibrium structure, putative functional data or published literature Replication reports should include the same level of detail for study design and analysis plan as reported for the initial study Chanock, Maniolo et al. Nature, June 7th 2007

Initial Study for GWAS: technical issues Standard advice – case and control samples handled exactly the same at every stage Source of DNA Blood/buffy coat mostly good results Buccal cell variable results (Feigelson et al. CEBP, 2007 - encouraging) Whole genome amplified DNA (Affy OK, Illumina in development)

Replication studies For statistical replication, prefer: Similar phenotype Similar ancestry For generalizability, prefer Different populations Different ancestry backgrounds (may also help with fine mapping)

Study design? Prospective Protect from survivor bias Protect from selection bias Interpretability of gene-environment analyses Possibility of interpretable biomarkers

Study quality? Importance depends on strength of signal To date – little apparent relation between probability of replication and quality May matter more for weak signals Sample size may trump quality (within limits)

NCI BPC3 Results: 7909 cases, 8683 controls Rs1447295: Overall p, trend 4 x 10-19 Schumacher et al. Can Res, April 2007

a, rs2981582; b, rs3803662; c, rs889312; d, rs13281615; and e, rs3817198 FGFR2 Forest plots of the per-allele odds ratios for each of the five SNPs reaching genome-wide significance for breast cancer. Easton et al. Nature, May 2007

Cancer Genetic Markers of Susceptibility (CGEMS): http://cgems.cancer.gov

General Strategy for Multistage analysis of Prostate & Breast Cancer Initial GWAS Study 1150 cases/1150 controls 540,000 Tag SNPs Follow-up Study #1 4500 cases/ 4500 controls ~28,000 SNPs Follow-up Study #2 3500 cases/ 3500 controls at least 1,500 SNPs 30 ±20 loci Fine Mapping

Committed Studies CGEMS Breast Cancer NHS (GWAS) PLCO WHI Polish C/C ACS EPIC MEC Prostate Cancer PLCO (GWAS) ACS HPFS PHS ATBC CeRePP EPIC MEC

CGEMS: caBIG Posting Pre-Computed Analysis No Restrictions Raw Genotype Case/control Age (in 5 yrs) Family Hx (+/-) Registration http://cgems.cancer.gov/data

Association Tests Prostate 10/06 Breast 04/07 ~528,000 SNPs Illumina 550k Instant Replication! http://cgems.cancer.gov

Additional In silico replication possibilities dbGAP ncbi.nlm.nih.gov/dbgap Framingham nhlbi.nih.gov/about/framingham WTCCC wtccc.org.uk DGI broad.mit.edu/diabetes

Chromosomes Log10(p-value) 1 2 3 4 5 6 7 8 -2 -3 -4 -5 9 10 11 12 13 q p q -2 -3 -4 -5 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X p q p q -2 Log10(p-value) -3 -4 FGFR2 -5 -6

The six SNPs with the smallest P values of the 528,173 tested among 1,145 cases of postmenopausal invasive breast cancer and 1,141 controls (full results available at http://cgems.cancer.gov ). SNP ID Χ2* P* ORhet* ORhomo* Chromosome Gene rs10510126 25.37 0.0000031 0.59 0.62 10 rs1219648 23.56 0.0000076 1.24 1.81 10 FGFR2 rs17157903 23.39 0.0000083 1.60 0.79 7 RELN rs2420946 23.17 0.0000095 1.25 1.81 10 FGFR2 rs7696175 22.40 0.0000137 1.38 0.86 4 TLR1,TLR6 rs12505080 21.99 0.0000168 1.21 0.52 4 *From analyses adjusting for age, matching factors (see Methods), and three eigenvectors of the principal components identified by Eigenstrat. P value obtained by a score test with 2df. Hunter et al, Nat Gen, May 2007

Scatterplot of P values for the FGFR2 locus from the GWAS.

Results of associations of rs1219648 in the Nurses Health Study, Nurses’ Health Study 2, and the PLCO study. Study Population Allele Frequency ORhet ORhomo Ptrend (N cases/N controls) Cases Controls (95% CI) (95% CI) (%) (%) Nurses’ Health Study (1,145/1,141) 45.54 38.47 1.24 1.81 2.0 x 10-6 (1.04-1.50) (1.43-2.31) Nurses’ Health Study 2 (302/594) 48.18 40.57 1.29 1.93 0.002 (0.95-1.75) (1.31-2.86) PLCO (919/922) 44.50 41.49 1.06 1.22 0.13 (0.86-1.30) (0.94-1.58) ACS CPS-II (555/556) 44.95 37.41 1.32 2.06 0.0002 (1.02-1.72) (1.42-2.97) Pooled estimates (2,921/3,213) 1.20 1.64 1.1 x 10-10 (1.07-1.34) (1.42-1.90)

UNFINISHED AGENDA Where is the causal variant? Results of associations of rs1219648 in the Nurses Health Study, Nurses’ Health Study 2, and the PLCO study. Study Population Allele Frequency ORhet ORhomo Ptrend (N cases/N controls) Cases Controls (95% CI) (95% CI) (%) (%) Nurses’ Health Study (1,145/1,141) 45.54 38.47 1.24 1.81 2.0 x 10-6 (1.04-1.50) (1.43-2.31) Nurses’ Health Study 2 (302/594) 48.18 40.57 1.29 1.93 0.002 (0.95-1.75) (1.31-2.86) PLCO (919/922) 44.50 41.49 1.06 1.22 0.13 (0.86-1.30) (0.94-1.58) ACS CPS-II (555/556) 44.95 37.41 1.32 2.06 0.0002 (1.02-1.72) (1.42-2.97) Pooled estimates (2,921/3,213) 1.20 1.64 1.1 x 10-10 (1.07-1.34) (1.42-1.90) UNFINISHED AGENDA Where is the causal variant? What does this tell us about mechanisms of breast carcinogenesis?

THE HITS KEEP COMING…. UNFINISHED EPIDEMIOLOGIC/PUBLIC HEALTH AGENDA Gene-environment interaction, what do the genes tell us about environmental exposures? Gene-gene interaction Pathway analysis Clinical implications – risk stratification for screening? Intervention? Health policy implications? Much of the substrate data – publicly available or relatively cheap.

NHS/HPFS/PHS GENETIC STUDIES Immaculata De Vivo NHS/HPFS: Peter Kraft Sue Hankinson Hardeep Ranu Shelley Tworoger Crystal Arnone Eric Rimm Carolyn Guo Frank Hu Pati Soule Meir Stampfer Craig Labadie Walt Willett Carolyn Guo Frank Speizer Jiali Han Charles Fuchs Monica Macgrath Ed Giovannucci Chunyan He Andy Chan, Debra Patrick Dennett Schaumberg David Cox Fran Grodstein, Jae Tim Niu Hee Kang Aditi Hazra PHS: Jing Ma Fred Schumacher Mike Gaziano, P Ridker

NCI BPC3 STEERING COMMITTEE: SECRETARIAT: David Hunter, Elio Riboli Harvard cohorts EPIC cohorts ACS cohort Multiethnic Cohort PLCO cohort ATBC cohort BROAD INSTITUTE NCI Core Gen Facility CEPH NCI BPC3 STEERING COMMITTEE: Harvard David Hunter, Michael Gaziano, Julie Buring, Graham Colditz, Walter Willett EPIC,CEPH, Cambridge Elio Riboli, Rudolf Kaaks, Federico Canzian, Gilles Thomas, ACS Michael Thun, Heather Feigelson, Jeanne Calle NCI Richard Hayes, Demetrius Albanes, Bob Hoover, Stephen Chanock; Program - Mukesh Verma MEC & Broad Brian Henderson, Laurence Kolonel, David Altshuler, Malcolm Pike SECRETARIAT: David Hunter, Elio Riboli GENOMICS subgroup: David Altshuler (Chair) Steve Chanock Gilles Thomas Genotyping subgroup: Chris Haiman (Chair) Federico Canzian Alison Dunning Steve Chanock David Cox David Hunter Loic LeMarchand James Mackay STATISTICS subgroup: Dan Stram (Chair) Peter Kraft Rudolf Kaaks Paul Pharoah Malcolm Pike Gilles Thomas Shalom Wacholder PUBLICATIONS COMMITTEE: Michael Thun (Chair) Elio Riboli Brian Henderson David Hunter Graham Colditz Richard Hayes Demetrius Albanes

CGEMS Acknowledgements HSPH David Hunter Peter Kraft Fred Schumacher David Cox ACS Heather Feigelson Carmen Rodriguez Eugenia Calle Michael Thun PLCO Regina Ziegler Chris Berg Saundra Buys Chris MacCarty NCI Stephen Chanock Gilles Thomas Robert Hoover Joseph Fraumeni Daniela Gerhard Kevin Jacobs Zhaoming Wang Meredith Yeager Robert Welch Richard Hayes Sholom Wacholder Nilanjan Chatterjee Kai Yu Margaret Tucker Marianne Rivera-Silva NCICB

Selecting initial and replication samples from existing studies I. What studies of the same phenotype exist? II. Can a consortium or collaborative approach provide a study with adequate power for the initial GWAS, along with pre-planned replication studies? III. Do any of these studies have pre-existing data that would increase power e.g. “free” controls for a prior GWAS of another phenotype? IV. Is the phenotype defined in the same or similar manner? V. Are covariate data available, and defined similarly? VI. Do any of the studies have additional phenotypic information e.g. biomarkers that would create opportunities for “added value” analyses, if these are the subjects of the GWAS?