Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve Nakken Postdoctoral.

Similar presentations


Presentation on theme: "Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve Nakken Postdoctoral."— Presentation transcript:

1 Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve Nakken Postdoctoral fellow, Eivind Hovigs group Norwegian Cancer Genomics Consortium (NCGC) Department of Tumor Biology, ICR, OUS

2 Norwegian Cancer genomics Consortium (NCGC)
Founded by oncologists and cancer scientists across the country (Tromsø, Trondheim, Bergen, Oslo) Contributing to and following the national priorization of ”Individualized cancer treatment based on the gene profile of the tumour” as the most important topic in cancer research Has obtained grants of 75 Mkr (≈ 10 MUSD) from the Research Council Industrial partners: OCC, PubGene, BergenBio Project divided into work packages WP4: Data handling and establishment of national infrastructure

3 NCGC sample cohorts Cancer type REK approvals Sequencing Samples
Analysis Melanoma Approved Done 115 On-going Colon cancer 100 Multiple myeloma Lymphoma 76 Leukemia 41 Sarcoma - Prostate 75 Breast cancer Ovarian cancer Submitted

4 NCGC cancer genome sequencing
Exome sequencing Goal: identify & characterize the acquired genetic changes in the tumor sample by massively parallel deep sequencing SNVs & Insertions/deletions Copy number aberrations Structural rearrangements

5 Cancer genome sequencing (II)
Variant calling pipeline

6 Cancer genome sequencing (III)
How deep should I sequence my tumor sample? (to detect a mutant subpopulation at X percent?) Biological complexity Tumor purity Ploidy Local CNAs Technical biases Uneven coverage (GC) PCR artefacts Sequencing quality/errors Oxidation (DNA extraction + library prep) Other Tumor-control mismatch

7 Somatic variant calling
Two key components Read alignment – mapping each read to its proper position in the genome Mutation calling – quantify the likelihood of a true somatic mutation Best-practice workflows defined Still many different algorithms to choose from Need for benchmark

8 ICGC mutation benchmark
Purpose: Assess concordance & accuracy of somatic SNV/indel calling among variant calling pipelines used in different research groups Evaluate impact of different algorithms (aligner, caller etc.) NCGC: optimize and verify running pipeline (“ICGC stamp”) Participants were given raw sequence reads from a medulloblastoma (MB99) genome (tumor + normal), ~40X coverage task: submit somatic indels + snvs Coordinated by CNAG, Barcelona (Ivo Gut’s lab) Weekly global telephone conferences BM1.2

9 SNVs – how well do we agree?

10 InDels – how well do we agree?

11 Verification of calls – GOLD set
300X sequencing of the same genome Six different pipelines called somatic SNVs and InDels SNVs with concordance of > 3 accepted SNVs with concordance < 3 and all indels reviewed manually

12 Accuracy – SNV/InDels

13 Impact of aligner-caller combination

14 Benchmark manuscript

15 Improved accuracy – SNVs/InDels
EH_rev EH_rev

16 Interpretation of variants
Which variants/genes are of functional relevance? Is my variant a frequent mutation? Which cancer types? Is my variant likely to alter the activity of the encoding protein? Is my variant known as a drug sensitivity marker? Which mutant genes are known drug targets? Annotation pipeline Variant calling Functional annotation Prioritization

17 Variants – phenotypic effect?
Computational prediction of damaging variants Machine learning Numerous algorithms SIFT, PolyPhen2, MutationTaster, MutationAssessor, Provean, FATHMM, etc.. Challenge: many have been trained with Mendelian disease mutations Gain-of-function mutations hard to predict

18 Variants – clinical associations?
Recent promising resources/data on clinically associated variants

19 Which genes are key drivers?
Which genes show significantly more mutations than random expectation? Requires sophisticated modeling of the background mutation rates MutSigCV Which genes are enriched with functionally biased variants? IntoGen Lawrence at al., Nature (2013) Gonzalez-Perez at al., Nature Methods (2013)

20 NCGC – data trends

21 Mutational heterogeneity – across cancer types

22 Mutational heterogeneity – within cancer types
CRC Melanoma

23 Functional heterogeneity

24 Mutational signatures
Distinct mutational patterns (mutation types & sequence context) that reflect underlying mutational processes Mathematical framework to infer the k mutational signatures contributing to a cohort What is the relative contribution of each process in each sample? S1 – Alkylating agents (?) S2 – UV damage S3 - Aging

25 In progress/future plans
Evaluation of more read aligners/variant callers Integration of improved calling of copy number aberrations Inference of clonal population structure Report pr. tumor case – QC, mutated cancer genes, actionable targets etc. Improved tools for visualization of results

26 Other activities

27 Acknowledgements NCGC ICGC Technical Validation group
Principal investigators Department of Tumor Biology Leonardo Meza-Zepeda, Susanne Lorenz, Ola Myklebost Daniel Vodak, Ghislain Fournous, Lars Birger Aasheim, Eivind Hovig ICGC Technical Validation group


Download ppt "Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve Nakken Postdoctoral."

Similar presentations


Ads by Google