Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve Nakken Postdoctoral.

Slides:



Advertisements
Similar presentations
Regulation of Consumer Tests in California AAAS Meeting June 1-2, 2009 Beatrice OKeefe Acting Chief, Laboratory Field Services California Department of.
Advertisements

Peter Johnson On behalf of the CR UK Stratified Medicine Programme Molecular diagnosis of cancer: Making it a reality.
Acquisition of tumour multidrug resistance inevitable in most advanced solid tumours – Failing to cure the majority of advanced solid tumours – Declining.
Yan Guo Assistant Professor Department of Cancer Biology Vanderbilt University USA.
Genetic Analysis in Human Disease
Data Integration for Cancer Genomics. Personalized Medicine Tumor Board Question: given all we know about a patient, what is the “optimal” treatment?
Reported by R5 李霖昆 Supervised by 楊慕華 大夫 Genomics-Driven Oncology: Framework for an Emerging Paradigm Review article Journal of Clinical Oncology 31, 15,
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine
1 Single Nucleotide Polymorphisms (SNP) Gary Jones SPE, Technology Center 1600 (703)
TCGA(The cancer genome atlas) catalogue genetic mutations responsible for cancer, using genome sequencing and bioinformatics The TCGA is sequencing the.
Gene 210 Cancer Genomics May 5, Key events in investigating the cancer genome M R Stratton Science 2011;331:
Microarrays Dr Peter Smooker,
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
STAC: A multi-experiment method for analyzing array-based genomic copy number data Sharon J. Diskin, Thomas Eck, Joel P. Greshock, Yael P. Mosse, Tara.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
High Throughput Sequencing
HL7 Clinical Sequencing Symposium Oncology Use Cases Ellen Beasley, Ph.D.September 14, 2011 VP, Ion Bioinformatics.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Habil Zare Department of Genome Sciences University of Washington
Whole Exome Sequencing for Variant Discovery and Prioritisation
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Challenges in Incorporating Integral NGS into Early Clinical Trials
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Assay Development Breakout (red) Who was in the room? About half of attendees are active NGS users N=1 doing whole genome analyses Everyone else doing.
Precision Medicine A New Initiative. The Concept of Precision Medicine (PM) The prevention and treatment strategies that take individual variability into.
Data Analysis Summary. Elephant in the room General Comments General understanding that informatics is integral in medical sequencing and other –omics.
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
Detection of structural variants and copy number alterations in cancer: from computational strategies to the discovery of chromothripsis in neuroblastoma.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon’s Cloud R. Jay Mashl October 20, 2014.
We obtained breast cancer tissues from the Breast Cancer Biospecimen Repository of Fred Hutchinson Cancer Research Center. We performed two rounds of next-gen.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
Cancer genomics Yao Fu March 4, Cancer is a genetic disease In the early 1970’s, Janet Rowley’s microscopy studies of leukemia cell chromosomes.
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
Computational methods for genomics-guided immunotherapy Sahar Al Seesi Computer Science & Engineering Department, UCONN Immunology Department, UCONN Health.
Kathleen Giacomini, Mark J. Ratain, Michiaki Kubo, Naoyuki Kamatani, and Yusuke Nakamura NIH Pharmacogenomics Research Network III & RIKEN Center for Genomic.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Personalized genomics
INTERPRETING GENETIC MUTATIONAL DATA FOR CLINICAL ONCOLOGY Ben Ho Park, M.D., Ph.D. Associate Professor of Oncology Johns Hopkins University May 2014.
Recent Advances in Genomic Science Julian Sampson Institute of Medical Genetics, Cardiff.
(1) Genotype-Tissue Expression (GTEx) Largest systematic study of genetic regulation in multiple tissues to date 53 tissues, 500+ donors, 9K samples, 180M.
CCRC Cancer Conference November 8, 2015.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
From Reads to Results Exome-seq analysis at CCBR
Multi-scale network biology model & the model library 多尺度网络生物学模型 -- 兼论模型库的建立与应用 Jianghui Xiong 熊江辉
Interpreting exomes and genomes: a beginner’s guide
Ultra-Deep Sequencing of Multiplex-PCR Enriched Hotspot and
GraDe-SVM: Graph-Diffused Classification for the Analysis of Somatic Mutations in Cancer Morteza H.Chalabi, Fabio Vandin Hello.
WABI: Workshop on Algorithms in Bioinformatics
Gil McVean Department of Statistics
Transcriptional heterogeneity of breast cancer subtypes,
Optimizing Biological Data Integration
Figure 1 Overview of the IMPACT Analysis Pipeline
Connecting Cancer Genomics to Cancer Biology using Proteomics
Dept of Biomedical Informatics University of Pittsburgh
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Figure 1 The genomic nephrology workflow: genetic diagnosis and clinical application Figure 1 |The genomic nephrology workflow: genetic diagnosis and clinical.
Volume 67, Issue 4, Pages (April 2015)
European Urology Oncology
Variant Triaging and ESMO Guidelines
Cancer WGS Analytical Pipeline Validation
SNPs and CNPs By: David Wendel.
Presentation transcript:

Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve Nakken Postdoctoral fellow, Eivind Hovigs group Norwegian Cancer Genomics Consortium (NCGC) Department of Tumor Biology, ICR, OUS

Norwegian Cancer genomics Consortium (NCGC) Founded by oncologists and cancer scientists across the country (Tromsø, Trondheim, Bergen, Oslo) Contributing to and following the national priorization of ”Individualized cancer treatment based on the gene profile of the tumour” as the most important topic in cancer research Has obtained grants of 75 Mkr (≈ 10 MUSD) from the Research Council Industrial partners: OCC, PubGene, BergenBio Project divided into work packages WP4: Data handling and establishment of national infrastructure

NCGC sample cohorts Cancer type REK approvals Sequencing Samples Analysis Melanoma Approved Done 115 On-going Colon cancer 100 Multiple myeloma Lymphoma 76 Leukemia 41 Sarcoma - Prostate 75 Breast cancer Ovarian cancer Submitted

NCGC cancer genome sequencing Exome sequencing Goal: identify & characterize the acquired genetic changes in the tumor sample by massively parallel deep sequencing SNVs & Insertions/deletions Copy number aberrations Structural rearrangements

Cancer genome sequencing (II) Variant calling pipeline

Cancer genome sequencing (III) How deep should I sequence my tumor sample? (to detect a mutant subpopulation at X percent?) Biological complexity Tumor purity Ploidy Local CNAs Technical biases Uneven coverage (GC) PCR artefacts Sequencing quality/errors Oxidation (DNA extraction + library prep) Other Tumor-control mismatch

Somatic variant calling Two key components Read alignment – mapping each read to its proper position in the genome Mutation calling – quantify the likelihood of a true somatic mutation Best-practice workflows defined Still many different algorithms to choose from Need for benchmark

ICGC mutation benchmark Purpose: Assess concordance & accuracy of somatic SNV/indel calling among variant calling pipelines used in different research groups Evaluate impact of different algorithms (aligner, caller etc.) NCGC: optimize and verify running pipeline (“ICGC stamp”) Participants were given raw sequence reads from a medulloblastoma (MB99) genome (tumor + normal), ~40X coverage task: submit somatic indels + snvs Coordinated by CNAG, Barcelona (Ivo Gut’s lab) Weekly global telephone conferences BM1.2

SNVs – how well do we agree?

InDels – how well do we agree?

Verification of calls – GOLD set 300X sequencing of the same genome Six different pipelines called somatic SNVs and InDels SNVs with concordance of > 3 accepted SNVs with concordance < 3 and all indels reviewed manually

Accuracy – SNV/InDels

Impact of aligner-caller combination

Benchmark manuscript

Improved accuracy – SNVs/InDels EH_rev EH_rev

Interpretation of variants Which variants/genes are of functional relevance? Is my variant a frequent mutation? Which cancer types? Is my variant likely to alter the activity of the encoding protein? Is my variant known as a drug sensitivity marker? Which mutant genes are known drug targets? Annotation pipeline Variant calling Functional annotation Prioritization

Variants – phenotypic effect? Computational prediction of damaging variants Machine learning Numerous algorithms SIFT, PolyPhen2, MutationTaster, MutationAssessor, Provean, FATHMM, etc.. Challenge: many have been trained with Mendelian disease mutations Gain-of-function mutations hard to predict

Variants – clinical associations? Recent promising resources/data on clinically associated variants

Which genes are key drivers? Which genes show significantly more mutations than random expectation? Requires sophisticated modeling of the background mutation rates MutSigCV Which genes are enriched with functionally biased variants? IntoGen Lawrence at al., Nature (2013) Gonzalez-Perez at al., Nature Methods (2013)

NCGC – data trends

Mutational heterogeneity – across cancer types

Mutational heterogeneity – within cancer types CRC Melanoma

Functional heterogeneity

Mutational signatures Distinct mutational patterns (mutation types & sequence context) that reflect underlying mutational processes Mathematical framework to infer the k mutational signatures contributing to a cohort What is the relative contribution of each process in each sample? S1 – Alkylating agents (?) S2 – UV damage S3 - Aging

In progress/future plans Evaluation of more read aligners/variant callers Integration of improved calling of copy number aberrations Inference of clonal population structure Report pr. tumor case – QC, mutated cancer genes, actionable targets etc. Improved tools for visualization of results

Other activities

Acknowledgements NCGC ICGC Technical Validation group Principal investigators Department of Tumor Biology Leonardo Meza-Zepeda, Susanne Lorenz, Ola Myklebost Daniel Vodak, Ghislain Fournous, Lars Birger Aasheim, Eivind Hovig ICGC Technical Validation group