Importance of Semantics in Precision Oncology at NCI

Slides:



Advertisements
Similar presentations
Genomic Medicine Pilot Demonstration Projects
Advertisements

The Diagnostic Laboratory ……the ideal system……. Molecular Genetics Diagnostic Laboratory Exciting area of medical pathology Need to continually up-date.
NHLBI Strategic Visioning Process: Charting Our Future Together
Brian A. Harris-Kojetin, Ph.D. Statistical and Science Policy
Wrapup. NHGRI strategic plan What does the NIH think genomics should be for the next 10 years? [Nature, Feb. 2011]
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Biospecimens Carolyn Compton, MD, PhD James Robb, MD Office of Biorepositories and Biospecimen Research (OBBR), NCI June 25, 2007.
Aug. 20, JPL, SoCalBSI '091 The power of bioinformatics tools in cancer research Early Detection Research Network, JPL Mentors: Dr. Chris Mattmann,
Overview of Biomedical Informatics Rakesh Nagarajan.
Data the NIH What is Happening & What is Coming A Conversation Philip E. Bourne, PhD, FACMI Associate Director for Data Science National Institutes.
EleMAP: An Online Tool for Harmonizing Data Elements using Standardized Metadata Registries and Biomedical Vocabularies Jyotishman Pathak, PhD 1 Janey.
George A. Komatsoulis, Ph.D. National Center for Biotechnology Information National Library of Medicine National Institutes of Health U.S. Department of.
National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health NCI Perspective on Informatics and Clinical Decision.
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
The NIH Roadmap for Medical Research
Institute of Cancer Research - Institut du cancer ICR’s Activities in Cancer Imaging.
Clinical Trials, TCGA: Deep Integrative Research RT, Imaging, Pathology, “omics” Joel Saltz MD, PhD Director Center for Comprehensive Informatics.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
The BIO Directorate Microbial Biology Emphasis BIO Advisory Committee April, 2005.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
CceHUB A Knowledge Discovery Environment for Cancer Care Engineering Research Ann Christine Catlin HUBzero Workshop November 7, 2008.
NIH Roadmap for Medical Research and Common Fund Update on Recent Changes Dinah Singer, Ph.D. Director, Division of Cancer Biology June 18, 2008.
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
ACRIN BDMC Fall 2011 Biostatistics and Data Management Center Constantine Gatsonis, PhD Department of Biostatistics Center for Statistical Sciences Brown.
Future Use of Stored Samples & Data and the NIH Policy on GWAS and dbGaP NIAID/DAIDS Dione Washington, M.S. -- ProPEP Sudha Srinivasan, Ph.D.-- TRP Tanisha.
Implementing universal Lynch Syndrome screening in a large healthcare system.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Michael F. Huerta, Ph.D. Associate Director for Program Development National Library of Medicine, NIH BD2K CDE Webinar – September 8, 2015 Common Data.
Precision Medicine A New Initiative. The Concept of Precision Medicine (PM) The prevention and treatment strategies that take individual variability into.
Data Analysis Summary. Elephant in the room General Comments General understanding that informatics is integral in medical sequencing and other –omics.
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.
U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES FITBIR: A Platform for International.
The Quantitative Imaging Network (QIN) Robert Nordstrom, Ph.D. Larry Clarke, Ph.D.
NIH Activities Related to Big Data Jerry Sheehan Assistant Director for Policy Development National Library of Medicine Board on Research Data and Information.
Access to Personalised Medicine for PDAC patients STSM of the application of an EU-index for barriers Denis Horgan (EAPM) & Angela Brand (IPHG) on behalf.
Update From FDA: Office of the Commissioner and Center for Drug Evaluation and Research Janet Woodcock, M.D. Acting Deputy Commissioner for Operations.
The Cancer Systems Biology Consortium (CSBC)
Facilitate Scientific Data Sharing by Sharing Informatics Tools and Standards Belinda Seto and James Luo National Institute of Biomedical Imaging and Bioengineering.
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
Valentina Di Francesco Senior Program Officer for Bioinformatics, Structural Genomics and Systems Biology Microbial Genomics.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
Robert H. Wiltrout Director, CCR Director’s Address.
NIH and the Clinical Research Enterprise Third Annual Medical Research Summit March 6, 2003 Mary S. McCabe National Institute of Health.
MPS Workshop 1: Gauging the Impact of Requirements for Public Access to Data November 19, 2015 Jennie Larkin, Ph.D. Office of the Associate Director for.
CaBIG Architecture Working Group Face-To-Face Meeting  Best Practices SIG  March 18th, 2005  David Kane and Jim Harrison.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
May 2007 CTMS / Imaging Interoperability Scenarios March 2009.
Welcome to the caBIG Community! The cancer Biomedical Informatics Grid (caBIG ® ) offers more than 120 open source tools, technologies and infrastructure.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
1 LS DAM Overview August 7, 2012 Current Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Mervi Heiskanen, NCI-CBIIT, Joyce.
Data Coordinating Center University of Washington Department of Biostatistics Elizabeth Brown, ScD Siiri Bennett, MD.
An Overview of The Cancer Genome Atlas (TCGA)
C3PR: An Introduction for Users A Tool Demonstration from caBIG™ Vijaya Chadaram Duke Cancer Center April 29, 2008.
Enhancements to Galaxy for delivering on NIH Commons
To develop the scientific evidence base that will lessen the burden of cancer in the United States and around the world. NCI Mission Key message:
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
Epidemiology and Genomics Research Program
AACR Genomics In Clinical Medicine Think Tank
An Artificial Intelligence Approach to Precision Oncology
National and International Efforts worth knowing about
NCI’s Genomics Data Commons (GDC) & NCI Cloud Pilots
Update of the TCIA Imaging Data Pilot
An ecosystem of contributions
Carolina Mendoza-Puccini, MD
Metadata Construction in Collaborative Research Networks
TOPMed Analysis Workshop Genetic Analysis Center Biostatistics Department University of Washington TOPMed Data Coordinating Center August 7-9, 2017 Introduction.
Network-wide Milestones – Plan to Address & Achieve Domains of focus for supplemental funding request. Sites will work with workgroups to generate milestones.
The NCI Genomic Data Commons as an engine for precision medicine
Presentation transcript:

Importance of Semantics in Precision Oncology at NCI 4/25/2017 National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health Importance of Semantics in Precision Oncology at NCI Sherri de Coronado, MS, MBA NCI CBIIT May 15, 2015 National Cancer Institute

Mind Map of Precision Oncology Space 4/25/2017 If Precision Oncology is the goal, good semantics is important. Crude mind map of the complex multi-dimensional space to explore areas where we are concerned about semantics. Basic Ingredients underlying all big data science Calls for Precision Medicine Science and Compute capability Clinical Research and Care Semantics related Challenges Mind Map of Precision Oncology Space May 12 2015 National Cancer Institute

4/25/2017 + Reusable +BD2K Precision medicine: Prevention and treatment strategies that take individual variability into account. Several of the talks have been directly or indirectly related to Precision Medicine. Using a mindmap to show some major NCI efforts towards Precision Medicine and how important semantics is and will be in the future to success. Open Science – BD2K activities apply here, as Phil Bourne discussed -- and to other areas of this mindmap as well. A research commons, that ties together data description and access, software description and access, description with standards… Interoperability through Metadata – NCI has worked in that space for many years with caDSR. Off the graph, examples of other important efforts – NIH CDE portal, PROMIS, NeuroQOL, NLM’s VSAC (Value Set Authority Center). Semantic Interoperability through Integration of ontologies – that is an active area, and also a challenge area – with needs to reuse existing, integrate existing, and integrate research and clinical data streams. Data Sharing critical – Towards that end, will describe the New Genomic Data Sharing Policy Resources (of course! Data sharing is hard and takes a lot of resources –people/time/money for IRBs, clearances, consenting, publishing,…) Tools – A challenge area. Of recent note, Pistoia Alliance Mapping effort to make it easier to map ontologies. Calls for Precision Medicine: See: A New Initiative on Precision Medicine Francis S. Collins, M.D., Ph.D., and Harold Varmus, M.D. N Engl J Med 2015; 372:793-795February 26, 2015DOI: 10.1056/NEJMp1500523 National Cancer Institute

4/25/2017 +BD2K TCGA = The Cancer Genome Anatomy Project – the project that needed to be done to move precision medicine forward -- comprehensively characterize the genomic and molecular features of ovarian and GBM, expanded to over 20 types. Systematic protocols to generate the data, etc. TARGET = Therapeutically Applicable Research To Generate Effective Treatments (Consortium effort with comprehensive approach to study genomic drivers of childhood cancers, identifying therapeutic targets and prognostic markers) ALCHEMIST = Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials (studying treatments for certain genetic changes in two genes, ALK and EGFR.) COSMIC = Catalogue of Somatic Mutations in Cancer MedDRA = Medical Dictionary of Regulatory Activities CTCAE= Common Terminology Criteria for Adverse Events MATCH = Molecular Analysis for Therapy Choice ICGC = International Cancer Gene Consortium National Cancer Institute

Semantics Related Opportunities

New Genomic Data Sharing Policy The new Genomic Data Sharing (GDS) Policy was released in draft form in September 2013 (NOT-OD-13-119) Draft Policy put out in Federal Register for a 60-day public comment period November 2013 public comments collected by the Office of Science Policy. Policy modified with feedback from the IC Directors and NIH GWAS data sharing Governance committees (TSDS, PPDM, SOC) The final Genomic Data Sharing (GDS) Policy was released August 27 2014 (NOT-OD-14-124) NOT-HG-10-006 Notice on Development of Data Sharing Policy for Sequence and Related Genomic Data

Trans-NCI Data Sharing WG Responsible for the activities necessary for the Institute to implement and maintain the GDS policy framework Develop a plan & recommend any resources needed Propose governance needs Develop and disseminate materials for implementation Focus Areas Data Standards: Define baseline expectations, including data types & timelines Process: Develop processes and resources facilitate implementation and compliance. Resources: Consider all resource needs to implement and oversee policy expectations. Governance: Consider governance needs and procedures for adjudication of implementation issues, and oversight.

Extending Genomic Data Sharing Policies GWAS Policy GDS Policy Scope Applies to human GWAS data Applies to all genomic data types, human and non-human Consent Standard -- Existing* Collections *Before the effective date of the GDS policy If research consent, IRB reviews for consistency. If no research consent exists, data may still be submitted to NIH databases. Same Consent Standard – Future* Collections *After the effective date of the GDS policy N/A Samples or cell lines should be consented for research use and broad data sharing. Exceptions can be requested. Data Submission Data submitted as soon as quality control procedures are completed Timelines vary by data type, but generally as soon quality control procedures are complete Data Release Immediate data release. 12 month publication embargo 6 month deferral of data release. No publication embargo Source: Elizabeth Gillanders, Ph.D., for NCAB Informatics WG , September 26th, 2014 GDS Policy expects (with exceptions) explicit consent for research use for materials collected after policy’s effective date. The GDS Policy is applicable to any NIH-funded research project involving non-human organisms or human specimens that generates genomic, metagenomic, epigenomic, or transcriptomic data. Quality control procedures include data cleaning – so from date data cleaning is completed.

4/25/2017 New NCI MATCH TRIAL "Precision Medicine uses genetic information from a person’s cancer to determine a patient’s treatment with a treatment targeted to that particular genetic abnormality." http://dctd.cancer.gov/MajorInitiatives/NCI-sponsored_trials_in_precision_medicine.htm  http://dctd.cancer.gov/MajorInitiatives/NCI-MATCH.pdf http://www.seminoncol.org/article/S0093-7754%2814%2900122-5/abstract MATCH is one of several NCI precision medicine initiatives: The initial set of trials will focus on different questions: (1) Exceptional Responders Initiative—why do a minority of patients with solid tumors or lymphoma respond very well to some drugs even if the majority do not?; (2) NCI MATCH trial—can molecular markers predict response to targeted therapies in patients with advanced cancer resistant to standard treatment?; (3) ALCHEMIST trial—will targeted epidermal growth factor receptor (EGFR) and anaplastic lymphoma kinase (ALK) inhibitors improve survival for adenocarcinoma of the lung in the adjuvant setting? http://meetinglibrary.asco.org/content/114000071-144 National Cancer Institute

4/25/2017 NCI MATCH trial Question: Can molecular markers predict response to targeted therapies in patients with advanced cancer resistant to standard treatment? Biopsies from tumors from up to 3,000 patients to undergo DNA/RNA extraction; assay workflow to identify actionable mutations. ECOG-ACRIN leading study with NCI; Multiple arms, matching particular molecular profile to specific available drugs. Objectives: Assess response and time to progression based on tumor profile, regardless of tumor origin. See: Seminars in Oncology, Vol 41 No 3, June 2014, pp 297-299. Up to 3000 patients from sites participating in NCTN network. Need to process and sequence many tumor biopsies, and match people to multiple trials / arms. 11-14 day Workflow assay process. Tumors from a patient may need to be sequenced multiple times to study genomic changes at progression. At progression, patient could move to a different arm. Pediatric MATCH still in development. (To be led by Children’s Oncology Group) National Cancer Institute

TCGA History About three years post-Human Genome Project – Large scale tumor profiling in a systematic way. Initiated in 2005, pilots 2006, extend 2009 Collaboration of NHGRI and NCI to examine GBM, Lung and Ovarian cancer using genomic techniques in 2006. Expanded to 20+ tumor types Began with Ovarian. Lung and GBM – extended to 20- 25 tumor types (33?). Most have sequenced exomes at this point. Our ability to understand the many mechanisms of gene regulation, protein maturation, and the ability to have data to support systems biology. The tools, techniques, and our ability to analyze these data have changed immensely since the beginning of TCGA in 2006. Our knowledge of cancer and the kinds of questions we want to ask, and can ask have also changed.

TCGA Drivers Provide high quality reference sets for 20+ tissue types Provide a platform for systems biology and hypothesis generation Provide a test bed for understanding the real world implications of consent and data access policies on genomic and clinical data. Now, data collection over, but MANY users and many pan cancer and other papers. (>2700) Kinds of questions we want to ask and CAN ask have changed and grown.

We now understand underlying basic and cancer biology due to the human genome project and the technologies emerging from it. TCGA activities and analyses were built upon the success of the HGP

Genomic Data Commons (GDC) In transition from The Cancer Genome Atlas (TCGA) to GDC, a Commons to host TCGA, TARGET and other future genomic data sets University of Chicago and NCI collaborating to initiate the Genomic Data Commons (GDC), (Robert Grossman, Dir) To enable any researcher to test their ideas, to bring their analytics to the data. From: Transforming Cancer Research: The Genomic Data Commons Posted on December 2, 2014 by Kevin Jiang in At the Bench Now, a wealth of data: “However, this wealth of data has come with limitations. These data are gathered by different research groups, with different technologies and protocols. They’re stored in different locations, using different software and management systems. They’re complex and just plain huge. A cancer researcher would need millions of dollars, several years and a dedicated team to set up the infrastructure necessary to analyze these datasets. Just downloading can take months. This has impeded research at all but the largest groups and institutions, and has stymied collaboration.

NCI Cancer Genomics Data Commons . . . Genomic + clinical data GDC Cancer information donor NCI Genomics Data Commons

NCI Genomic Data Commons 4/25/2017 NCI Genomic Data Commons Unified repository for cancer genomics data Accept from both NCI Center for Cancer Genomics (CCG) and external projects Including submissions from small laboratories Unifying repository for cancer genomics data Perform reproducible, consistent bioinformatics pipelines to generate standard higher-level data (e.g., tumor variant calls) Pipelines designed and updated with community input to represent the best practices of the field The availability of genomic data will make it possible for researchers to better classify disease. NCI News Note NCI establishes Genomic Data Commons to facilitate identification of molecular subtypes of cancer and potential drug targets Posted: December 2, 2014 The GDC will facilitate access to data generated by many existing and forthcoming NCI programs The GDC will be built out over a number of years to ensure that results of individual projects can be combined to create broadly useful and accessible datasets and will be operated with funding from NCI to the University of Chicago under a subcontract from Leidos Biomedical Research at the Frederick National Laboratory for Cancer Research. NCI’s Center for Cancer Genomics is establishing the data service with the assistance of NCI’s bioinformatics and cloud research program in the NCI Center for Biomedical Informatics and Information Technology. Re: Consistent pipelines – importance of being able to specify all the software tools, data, parameters, and compute environment so that it will run the same way everytime for each set of input data to get a particular output product. National Cancer Institute

GDC Context From: Mark Jensen GDC

GDC ConOps From: Mark Jensen, GDC

Clinical Data at GDC Key issues: Ideal: 4/25/2017 Clinical Data at GDC Key issues: Low barriers to data submission Minimal number of required data elements Ongoing curation and semantic assignment Balance acceptance of submitter-provided semantic information with GDC curation Provide cross-project searches over clinical data elements to filter genomic data Allow users acquire data intuitively, but also provide semantic sources and IDs as available Ideal: Expose clinical data intuitively, but manage with rigorous semantic information National Cancer Institute

Cancer Genome Cloud Pilots 4/25/2017 Cancer Genome Cloud Pilots Three pilots, initiated Fall 2014, to be public "cancer knowledge clouds" in which data repositories would be co-located with advanced computing resources. Broad Institute, UCSC, UC Berkeley ISB-led team, Google, SRA Seven Bridges Genomics Begin piloting components and gathering feedback required by Jan 2016 Could be a template(s) for hosting public- multi-omics data. To host TCGA plus other optional data (e.g. 1000 Genomes). National Cancer Institute

Cancer Genome Cloud Pilots Goals: democratize access to large-scale data repositories and computational infrastructure co-locate data and compute to minimize unnecessary data transfer integrate public and private datasets allow web-based exploration of hosted data transform and accelerate collaborative cancer research Broad Team: (1) The Broad; (2) UCSC; (3) UC Berkeley ISB (Institute for Systems Biology) Team: (1) ISB; (2Google; (3) SRA Seven Bridges Genomics Team: 90 people team HQ Cambridge, MA – and London and Belgrade Developing an open standard for reproducible genomic pipelines (@ rabix.org).

Cancer Genome Cloud Pilots People can register at any or all of these sites, if they are interested in getting involved: Seven Bridges ‪cancergenomicscloud.org  Broad Firecloud.org Institute for Systems Biology cgc.systemsbiology.net  Broad Team: (1) The Broad; (2) UCSC; (3) UC Berkeley ISB (Institute for Systems Biology) Team: (1) ISB; (2Google; (3) SRA Seven Bridges Genomics Team: 90 people team HQ Cambridge, MA – and London and Belgrade Developing an open standard for reproducible genomic pipelines (@ rabix.org).

Precision Medicine Opportunities involve Semantics The era of precision medicine and precision oncology is predicated on the integration of research, care, and molecular medicine and the availability of data for modeling, risk analysis, and optimal care Warren Kibbe The promise of precision medicine will only be fully realized if the research community can adapt its clinical trials methodology to study molecularly characterized tumors instead of the traditional histologic classification. Abrams et al, National Cancer Institute's Precision Medicine Initiatives for the New National Clinical Trials Network, 2014

Semantic Opportunities: Heard from this meeting and beyond 4/25/2017 Semantic Opportunities: Heard from this meeting and beyond Imaging Pathology Imaging ontology gaps - terms/formal defs to characterize histopathology images and algorithms. NLP effort to automate image annotation with ontologies to create metadata for large image collections by training classifiers. QHIO- terms/relationships whole lifecycle of images Proteomics, Chris Kinsinger, CPTAC – better clinical biospecimen annotation Cancer Phenotypes Cohorts/ finding patients Cancer Pathology Protocol changes Modeling tumor micro environments – integration of multiscale cancer data –effort to model cancer state as an ecological problem Cancer classification Data Needs vs Ontological Classification Pan Cancer analyses can be improved using DO (Hive) Rebecca Crowley - Precise phenotype information is needed to advance translational cancer research, particularly to unravel the effects of genetic, epigenetic, and othe factors on tumor behavior and responsiveness. Examples of phenotypic variables in cancer include: tumor morphology (e.g. histopathologic diagnosis), co-morbid conditions (e.g. associated immune disease), laboratory findings (e.g. gene amplification status), specific tumor behaviors (e.g. metastasis) and response to treatment (e.g. effect of a chemotherapeutic agent on tumor). eMERGE - eMERGE is a national network organized and funded by the National Human Genome Research Institute (NHGRI) that combines DNA biorepositories with electronic medical record (EMR) systems for large scale, high-throughput genetic research in support of implementing genomic medicine. Ilya Golderge: urgent need to provide set of terms and formal definitions necessary to characterize both the histopathological images and the algorithms that operate on them.” National Cancer Institute

Semantic Opportunities (2): Heard from this meeting and beyond 4/25/2017 Semantic Opportunities (2): Heard from this meeting and beyond Tools/ Resources/Standards Getting usable, effective, efficient software into peoples hands will increase uptake of semantically well described metadata, terms and ontologies, and better integration of metadata and terminology Integrated use of a variety of ontologies Ways to manage research and clinical data streams, bridge Tools to help harmonize/ use/ metadata and terminology Provenance – use of checklists early on. Bottom up. Research Commons Rebecca Crowley - Precise phenotype information is needed to advance translational cancer research, particularly to unravel the effects of genetic, epigenetic, and othe factors on tumor behavior and responsiveness. Examples of phenotypic variables in cancer include: tumor morphology (e.g. histopathologic diagnosis), co-morbid conditions (e.g. associated immune disease), laboratory findings (e.g. gene amplification status), specific tumor behaviors (e.g. metastasis) and response to treatment (e.g. effect of a chemotherapeutic agent on tumor). eMERGE - eMERGE is a national network organized and funded by the National Human Genome Research Institute (NHGRI) that combines DNA biorepositories with electronic medical record (EMR) systems for large scale, high-throughput genetic research in support of implementing genomic medicine. Ilya Golderge: urgent need to provide set of terms and formal definitions necessary to characterize both the histopathological images and the algorithms that operate on them.” National Cancer Institute

Thank you Sherri de Coronado decorons@mail.nih.gov 4/25/2017 Thank you Sherri de Coronado decorons@mail.nih.gov Thanks to content contributors: Gilberto Fragoso, Mark Jensen, Warren Kibbe, Juli Klemm, Elizabeth Gillanders and others. National Cancer Institute

4/25/2017 National Cancer Institute