Presentation is loading. Please wait.

Presentation is loading. Please wait.

WHOLE GENOME SEQUENCING FOR COLORECTAL CANCER Ulrike (Riki) Peters Fred Hutchinson Cancer Research Center University of Washington.

Similar presentations


Presentation on theme: "WHOLE GENOME SEQUENCING FOR COLORECTAL CANCER Ulrike (Riki) Peters Fred Hutchinson Cancer Research Center University of Washington."— Presentation transcript:

1 WHOLE GENOME SEQUENCING FOR COLORECTAL CANCER Ulrike (Riki) Peters Fred Hutchinson Cancer Research Center University of Washington

2 Overview Significance and rationale Current efforts on rare and less frequent variants Specific aims and design of whole genome sequencing grant

3 Structure Biology Biology Advancing Improving of genomes of genomes of diseases medicine healthcare & prevention 1990-2003 Human Genome Project 2004 - 2010 2011- 2020 Beyond 2020 Progress of Genomic Research (adapted from Green and Guyer Nature 2011)

4 Examples of GWAS for Drug Targets DrugDrug targetDrug indicationGWAS trait StatinsHMGCRHypercholesterolemiaLDL, cholesterol Znt8 agonistsSLC30A8Type 2 diabetes UstekinumabIL12BPsoriasis, Crohn’s diseasePsoriasis, Crohn’s For additional examples, see Sanseau et al. Nat Biotechnol 2012 DrugDrug targetCurrent drug indicationGWAS trait NepicastatDBHPost-traumatic stress disorder Smoking cessation Denosumab/ AMG-162 TNFSF11Osteoporosis/bone cancerCrohn’s disease Biib-003LINGO-1Multiple sclerosisEssential tremor Examples of GWAS for Drug Repositioning

5 Use of GWAS Findings to Inform Screening Decisions (using breast cancer as example) So et al. Am J Hum Genet 2010 Colors show 10-year risk of breast cancer at different risk percentiles based on 13 GWAS loci Average 10-year risk of breast cancer for a 50- year-old woman is 2.4%

6 What is Known About the Genetic Contribution of Colorectal Cancer Scandinavian Twin Registry, Lichtenstein et al. New Engl J Med 2000 Cancer Site Heritable Factors Environmental Factors Shared Non-shared Prostate0.42 (0.29-0.50) 0 (0-0.09) 0.58 (0.50-0.67) Colorectal0.35 (0.10-0.48) 0.05 (0-0.23) 0.60 (0.52-0.70) Bladder0.31 (0.00-0.45) 0 (0-0.28) 0.69 (0.53-0.86) Breast0.27 (0.04-0.41) 0.06 (0-0.22) 0.67 (0.56-0.76) Lung0.26 (0.00-0.49) 0.12 (0-0.34) 0.62 (0.51-0.73)

7 Colorectal Cancer GWAS  21 GWAS loci  Each SNP associated with a modest increase in risk Published and Newly Discovered Colorectal Cancer Susceptibility Loci Houlston Nat Genet 2010; Tomlinson Nat Genet 2008; Zanke Nat Genet 2007; Haiman Nat Genet 2007; Hutter. BMC Cancer 2010; Tomlinson Nat Genet 2008;Tenesa Nat Genet 2008; Tomlinson Nat Genet 2011; COGENT Nat Genet 2008; Jaeger Nat Genet 2008; Broderick Nat Genet 2007; Peters, Hunter Hum Genet 2011; Dunlop Nat Genet 2012; Peters Gastroenterol (submitted) Identified within GECCO

8 Estimated Total Number of GWAS Hits PhenotypeEstimated number of GWAS hits (95%CI) Total genetic variance explained (95%CI) Height201 (75, 494)16.4 (10.6, 30.6) Crohn’s disease142 (71, 244)20.0 (15.7, 28.0) Breast, Prostate and Colorectal Cancer 67 (31,173)17.1 (11.6, 35.8) Park et al. Nat Genet 2010 => Known familial syndromes, such as FAP and Lynch Syndrome explain less than 3-5%

9 Estimated Total Number of GWAS Hits PhenotypeEstimated number of GWAS hits (95%CI) Total genetic variance explained (95%CI) Height201 (75, 494)16.4 (10.6, 30.6) Crohn’s disease142 (71, 244)20.0 (15.7, 28.0) Breast, Prostate and Colorectal Cancer 67 (31,173)17.1 (11.6, 35.8) Park et al. Nat Genet 2010 => Known familial syndromes, such as FAP and Lynch Syndrome explain less than 3-5%

10 What Explains Missing Heritability of Cancer? Additional familial syndromes Heritable epigenomic variability Gene-gene and gene-environment interaction Less frequent and rare variants Structural variations/ Copy number variation (CNV) Others or heritability may be overestimated

11 What Explains Missing Heritability of Cancer? Additional familial syndromes Heritable epigenomic variability Gene-gene and gene-environment interaction Less frequent and rare variants Structural variations/ Copy number variation (CNV) Others or heritability may be overestimated

12 Most Genetic Variation is Rare Green ESP Orange ENCODE Blue HapMap GWAS only investigated ~15% of genetic variation Next-Generation sequencing can identify rare variants Minor allele frequency all rare variants

13 Feasibility to Identify Genetic Variants by Risk Allele Frequency and Strength of Genetic Effect Manolio et al. Nature 2009

14 Feasibility to Identify Genetic Variants by Risk Allele Frequency and Strength of Genetic Effect Manolio et al. Nature 2009

15 Overview Significance and rationale Current efforts on rare and less frequent variants Specific aims and design of whole genome sequencing grant

16 Current efforts in GECCO to Search for Less Frequent and Rare Variants (Genetics and Epidemiology of Colorectal Cancer Consortium) The global view of genetic contribution to colorectal cancer GECCO Coordinating Center WHI ARCTIC VITALDACHSPLCOCPS ASTERIS K DALS Colo 2&3 MECPHSHPFSNHSCCFRMECCNGCC HRT- CCFR FHCRC Coordinating Center ~30,000 subjects U01 and X01, Peters, 2009-2013 Imputation to 1000 Genomes Project in ~28,000 samples with GWAS Exome chip genotyping On about 25,000 samples CIDR Pilot Whole exome sequencing on 130 high risk colorectal cancer cases + 30 controls 16

17 NHLBI - Exome Sequencing Project Whole Exome Sequencing of 7,000 European and African Americans to identify rare variants associated with common complex diseases Sequencing centers Broad University of Washington Cohorts Women’s Health Initiative HeartGo ARIC, CARDIA, CHS, FHS, JHS, MESA LungGo 17 Phenotypes Early On-set MI Early onset/FH+ Stroke Extreme BMI/T2D Extreme Lipids Extreme Blood pressure COPD Pulmonary hypertension Cystic fibrosis

18 Whole Exome vs Whole Genome Exome covers only 1-2% of genome 88% of all GWAS findings are outside of the well-studied protein-coding regions 78% of GWAS findings with MAF<5% 18

19 Junk No More: ENCODE Project Finds "Biochemical Functions for 80% of the Genome“ The ENCODE Project Consortium, “An integrated encyclopedia of DNA elements in the human genome" Nature 2012 19

20 Overview Significance and rationale Current efforts in GECCO on rare and less frequent variants Specific aims and design of whole genome sequencing grant

21 Aims of the U01 Sequencing Grant Aim 1. To identify novel CRC susceptibility variants across the genome, mainly variants with allele frequency 0.1-5% Rare variants <1% Less frequent variants 1-5% Common variants >5% Aim 2. To investigate whether known environmental risk factors for CRC modify genetic susceptibility to CRC (Gene-Environment interactions)

22 Study Design Overview R01; PI: Peters

23 Funding Information 17% Budget Cut 4 year instead of 5 year U01 designation Expected start date: before 9/31/12 Total budget cut 33%

24 Whole Genome Sequencing N=1,600 cases, 1,600 controls Imputation of WGS Data N=9,129 cases, 11,728 controls Aim 1.1Aim 1.2 F Replication N=3,100 cases, 3,100 controls; ~3,000 variants Gene-Environment Interaction Analyses 2-Stage Screening, Weighted Hypothesis, Empirical Bayes Association Testing Individual & Aggregated Variants Aim 1 Aim 2 N=10,729 cases, 13,328 controls; ~18M variants Aim 1.3 Aim 2Aim 1.2 Total sample size is 13,829 cases and 16,428 controls

25 Classes of Genetic Variants Being Examined Variant TypeDefinition in This Proposal Expected # Single nucleotide variant (SNV) Single base pair change with MAF>0.1% & <5% ~13- 15M Single nucleotide polymorphism (SNP) Single base pair change with MAF>5% ~5 M Insertion/deletion (indel)Insertion/deletion or inversion <50bp ~1.5- 2M Copy number variant (CNV) Insertion/deletion or inversion >5kb ~20K

26 Studies StudyCasesControlsGWAS #SNPs Studies with GWAS (sequencing and imputation) ARCTIC 850800 100K, 500K DACHS 2,9002,400 300K DALS 1,1001,200 300K, 550K, 610K HPFS 850 730K MEC 400 300k NHS 500900 730K PHS 400 730K PLCO 1,2001,800 300K, 610K, 500K ASTERISK 1,000 300K VITAL 300 300K WHI 1,3002,200 300K, 550K Studies with no GWAS (replication) North German CCS4,000 N/A CPS-II1,000 N/A MECC3,4003,000N/A Non-whites400650 Total20,00021,000

27 Data Harmonization of Environmental Risk Factors Collecting 74 variables in 11 categories Multi-step collaborative process leading to common data elements with standardized definitions, permissible values and coding Meta-analysis across 15 studies

28 Sequencing and Genotyping At Genome Science, University of Washington Whole genome-sequencing At lower depth Illumina HiSeq In years 1 to 3 Total ~1,600 cases and 1,600 controls Year 1: ~600 Year 2: ~1,000 Year 3: ~1,700 Replication genotyping In years 3 and 4 6,200 samples for 3000 SNPs 2,400 samples for 384 SNPs

29 Variant Calling Based on Sequencing Data Variant calling Depended on depth of sequencing Multi-sample calling improves accuracy and, hence, we will call in batches of increasing # of samples Structural variation/copy number variant (CNV) calling Indel and CNV calling is error prone and requires genotyping follow up Follow-up genotyping on 384 SNPs in 1,600 samples

30 Imputation of Sequencing data into GWAS Imputation Use whole genome sequencing data as reference panel to impute into samples with only GWAS data Important points raise: Imputation accuracy improves with increasing sample size of reference panel (samples with whole genome sequencing data) Imputation accuracy improves with increasing denser GWAS platform Follow-up genotyping on 384 SNPs in 800 samples Whole genome sequence 3200 samples ~18M variants GWAS 19,000 samples

31 Statistical Analysis Marginal and burden testing Single variant test Aggregated tests to test all rare variants across defined region, such as a gene Motivation: Mendelian diseases show that multiple different mutations can lead to disease Rare variants tested individually have limited power to show association (unless highly penetrant) Gene-environment interaction testing

32 Advisory Committee NCI Stephen Chanock Daniela Seminara Peggy Tucker Suggestions for external investigators Mike Boehnke (U of Michigan) Elaine Mardis (Washington U in St. Lois) Nicole Soranzo (Wellcome Trust / Sanger Inst) Stephen Thibodeau (Mayo Clinic, Rochester)

33 Timeline Activities Yr 1Yr 2Yr 3Yr 4 Yr 5 Sample preparation and QA/QC Whole genome sequencing and variant calling (Aim 1.1) Imputation and association testing (Aim 1.2) Replication genotyping (Aim 1.3) GxE analysis (Aim 2) Preparation of manuscripts


Download ppt "WHOLE GENOME SEQUENCING FOR COLORECTAL CANCER Ulrike (Riki) Peters Fred Hutchinson Cancer Research Center University of Washington."

Similar presentations


Ads by Google