Presentation on theme: "Controlled Terminology for Clinical Research"— Presentation transcript:
1Controlled Terminology for Clinical Research RCRIM Vocabulary andControlled Terminology for Clinical ResearchMargaret Haber, RN, OCNCo-DirectorEnterprise Vocabulary ServicesNational Cancer Institute
2Clinical Research Challenges Fundamental new capacities to characterize and intervene in biological systems and the disease processHampered by our inability to integrate huge volumes of data due to information fragmentationMany diverse research and delivery platforms that are disconnected due to a lack of common, interoperable systems and semanticsThe problem is International in scope, and with enormous implications for our ability to translate information into knowledge
3No Controlled Terminology? No Interoperability Systems cannot exchange or use information if they use incompatible codes or tokens to signify meaningTerminology services provide token and codesProper use of them assures consistent meaning across and among enterprises
4The Pillars of Interoperability Necessary but not sufficient Common information models across all domains of interestA foundation of rigorously defined data types (metadata)A methodology for interfacing with controlled vocabularies
5Interoperability Keys for Terminology Use of Industry Standards, where feasibleMust allow for extensions to core standardsSpecialty terminology remains commonMapping is therefore essentialConformance with Data ModelsFor process (logical models)For data flow (messages)For data at rest (database design)
6Clinical Data Interchange Standards Consortium (CDISC) CDISC is an open, multidisciplinary, non-profit organization committed to the development of worldwide industry standards to support the electronic acquisition, exchange, submission and archiving of clinical trials data and metadata for medical and biopharmaceutical product development.
7HL7 (Health Level Seven) HL7 is a volunteer, ANSI-accredited Standards Developing Organization (SDO) that focuses on clinical and administrative healthcare data.Mission:"To provide standards for the exchange, management and integration of data that support clinical patient care and the management, delivery and evaluation of healthcare services. Specifically, to create flexible, cost effective approaches, standards, guidelines, methodologies, and related services for interoperability between healthcare information systems.“
8Bringing It All Together: RCRIM The HL7 “Regulated Clinical Research Information Management” Technical Committee formed as a collaboration of CDISC, FDA, and HL7To facilitate the development of common standards for clinical research information management across a variety of organizations, including government agencies, private research efforts, and sponsored researchTo develop standards for interchange of regulated data that are interoperable with general healthcare standards.
9HL7 Vocabulary (including RCRIM) Value sets associated with certain domain portions of HL7 modelsMost vocabulary domains are published as informative references onlyThose domains that have a formal ballot status are shown in bold in the HL7 vocabulary tables on their web siteThere are current initiatives to map these values to standard controlled terminologies
10HL7 Vocabulary - AccessHL7 publishes atThere are approximately 8,000 terms or “concepts” in the current HL7 vocabularyScroll down to select a specific “table” or set of termsAlso available through an NCI developed “HL7 SDK” (software development kit) application toolConversion notes are included, see “HL7_Design.pdf” on NCI’s website
11What’s Happening Now? CDISC, RCRIM and NCI CDISC terminology group has established an independent working environment at NCI for the specification and development of broad based clinical trials standard terminology, based on CDISC models (SDTM)Using the NCI Data Standards Repository (caDSR), which draws controlled terminology from NCI EVS systems, including but not limited to leveraging NCI Thesaurus resources for novel terminology development
12CollaborationThese open standards, developed in collaboration with FDA, NIH, HL7 and industry experts, can provide the basis for a controlled terminology set submitted to HL7 RCRIM as proposed standards for adoption by the clinical trials community
13Where? NCI Enterprise Vocabulary Services (EVS) Services and resources that address NCI and Partner’s needs for controlled vocabularyA collaborationNCI Office of CommunicationsPhysician Data Query (PDQ), Clinical Trials Portal, Cancer Information Service and the NCI web portalNCI Center for BioinformaticsBioinformatics Core Infrastructure (caCORE), including a metadata repository (caDSR) and object models built using EVS terminology for their core semanticsOC using EVS and working to integrate EVS terminology into PDQ and cancer.gov.
14NCI EVS Goal – Integration by Meaning Clinical, translational, and basic research terminology have overlapping but specialized needs, therefore EVS assists to:Integrate different conceptual frameworksCreate terminological and taxonomic conventions across systemsVocabulary ProductsNCI Thesaurus – an ontology-like terminologyNCI Metathesaurus – maps vocabulariesExternal vocabularies maintained and served: MedDRA, HL7, NDF-RT, LOINC, etc.
15NCI Thesaurus (NCIt) Reference Terminology for NCI, Partners A Federal Standard TerminologyBroad coverage of the cancer, other research, and clinical domain including prevention and treatment trialsNeoplastic and other DiseasesFindings and AbnormalitiesAnatomy, Tissues, Subcellular StructuresAgents, Drugs, ChemicalsGenes, Gene Products, Biological ProcessesAnimal Models – Mouse, otherResearch techniques and management, apparatus, clinical trials, lab, radiology, imageryNeoplastic disease – clinical goes to theMolecular – findings and abnormalities
16NCI Thesaurus (2) Published Monthly Public domain, open content licenseAvailable on-line and by download (OWL, Ontylog XML, flat files)55,000+ “Concepts” hierarchically organizedDescription-logic based“Roles” establish machine readable semantic relationships between ConceptsKind==(disjoint classes)Terms Concepts Category1,691, ,158 NCI Metathesaurus100,000 34,000 NCI Thesaurus10,521 Disease, Abnormality, Finding(5,901) ...Neoplasm (within Disease)3,531 Drug2,773 Chemotherapy Regimens4,320 Anatomy1,767 Gene2,200 Protein (Gene Product)
17NCI Thesaurus is Deployed: (full documentation)API: caCORE public accessFulfills NCI and collaborators’ needs for controlled vocabularyPublic domain, open content license
18Example Disease Concept Gastric Mucosa-Associated Lymphoid Tissue Lymphoma A low grade, indolent B-cell lymphoma, usually associated with Helicobacter Pylori infection. Morphologically it is characterized by a dense mucosal atypical lymphocytic (centrocyte-like cell) infiltrate with often prominent lymphoepithelial lesions and plasmacytic differentiation. Approximately 40% of gastric MALT lymphomas carry the t(11;18)(q21;q21). Such cases are resistant to Helicobacter Pylori therapyMolecular abnormalities:Disease_May_Have_Cytogenetic_Abnormality: Trisomy 3Disease_May_Have_Cytogenetic_Abnormality: Trisomy 18Role group 1:Disease_May_Have_Cytogenetic_Abnormality: t(11;18)(q21;q21)Disease_May_Have_Molecular_Abnormality: AP12-MLT fusion protein expressionHistogenesis:Disease_Has_Normal_Cell_Origin: Post-germinal center marginal zone B-lymphocytePathology:Disease_Has_Abnormal_Cell: Centrocyte-like cellDisease_May_Have_Abnormal_Cell: Neoplastic monocytoid B-lymphocyteDisease_May_Have_Abnormal_Cell: Neoplastic plasma cellDisease_May_Have_Finding: Lymphoepithelial lesionAnatomy:Disease_Has_Primary_Anatomic_Site: StomachDisease_Has_Normal_Tissue_Origin: Gut associated lymphoid tissueClinical information:Disease_May_Have_Finding: Indolent clinical courseDisease_May_Have_Associated_Disease: Hepatitis C
19Clinical data, regulatory submissions, discovery research? A holistic view of information exchange also requires broader interoperability, but where do we place the fences?Clinical data, regulatory submissions, discovery research?Industry agreements, nationally accredited, global standardization?One answer is mapping:Relating Terminologies for Effective Data Exchange
20Mapping: NCI Metathesaurus A filtered version of the NLM UMLS Metathesaurus, extended with additional required vocabularies1,100,000 concepts, 2,200,000+ terms and phrases with definitionsMappings among over 55 vocabulariesExtensive synonymy: Over 40,000 terms for neoplasms mapped to 7,000 conceptsUsed as online dictionary and thesaurus, for mapping and document indexingThe current numbers are a little lower than this, 1.6 M, but they will be somewhat higher after our next build (will include CT)1,691, ,158 NCI Metathesaurus
21NCI Metathesaurus (2)Minor releases monthly, Major releases two to three times a yearProvides a mapped overlap and partial inter-relation of current versions of NCI and partner required vocabularies, ex. The ICD’s, MedDRA, SNOMED, MeSH (NLM Medical Subject Headings), HCPCS (procedures), LOINC (lab values), drug terminologies (VA NDF-RT, AOD, RxNORM, Multum, NCI Thesaurus drugs, etc.)
23EVS Products & Services Are Open NCI Thesaurus is Open Content ftp://ftp1.nci.nih.gov/pub/cacore/EVS/ThesaurusTermsofUse.htmNCI Metathesaurus is Mostly Open SourceSee Each Source’s LicenseNCI EVS Servers Are Freely AccessibleOn the Web:Via API:All Software Developed by NCI EVS is Public Open Source and Free for the Asking:and
24NCI builds on EVS via caCORE Infrastructure Enhanced Information integrationCross-discipline reasoning capabilitiesbiomedical objectscommon data elementscontrolled vocabulary
25Enterprise Vocabulary NCI Meta-Thesaurus (Cross-mapped standard vocabularies, e.g. ICD’s, MedDRA, SNOMED)Semantic integration, inter-vocabulary mapping among 55+ vocabulariesUMLS Metathesaurus extended with numerous additional vocabularies1,100,000+ Concepts, 2,200,000 terms and phrasesNCI ThesaurusDescription logic-based55,000+ “Concepts”Concept is the semantic unitOne or more terms describe a Concept – synonymySemantic relationships between ConceptsFreestanding terminologiesMedDRA, MGED, NDF-RT, GO, SNOMED, etc.biomedical objectscommon data elementscontrolled vocabulary
26Common Data Elements (caDSR) Structured data reporting elementsPrecisely defined, harmonized questions and answersStandardized questions for formsStandard lists of coded valid values for answersbiomedical objectscommon data elementscontrolled vocabulary
27Biomedical Information Objects (caBIO) UML object models representing clinical and research entities such as genes, sequences, chromosomes, pathways, etc.Public access APIs provide an information interface independent of back-end data platformsbiomedical objectscommon data elementscontrolled vocabulary
28Controlled Terminology is integrated into NCI’s standards supporting infrastructure Enterprise Vocabulary Services (EVS)Core Semantics for caCORE and many other applicationsPublic access browsersAPIscancer Data Standards Repository (caDSR)ISO metadata repositoryCommon Data Elements (CDE’s) for multiple templates, such as Case Report Forms, drawn from EVS terminologycancer Bioinformatics Infrastructure Objects (caBIO)UML Models annotated with EVS concepts/terms, loadable into caDSRPublic access APIs
29EVS: Extending Interoperability Beyond the Enterprise Leverage CollaborationsFederal: FDA, VA, CDC, other NIH InstitutesMajor Standards Organizations: HL7, CDISC, W3CCancer Centers and Cooperative Groups (caBIG, caGRID)Many research collaborators such as the Microarray Gene Expression Data Society (MGED)
30FDA-NCI MOU Significance of MOU Leverages multiple efforts NCI is leveraging its terminology-related resources to address FDA needsAvoids expenditure at FDA to replicate existing, available resources at NCI, increases return on investment for NIH/NCILeverages multiple effortsFDA collaboration with NIH/NCI will result in improved trial drug and related regulatory terminology for the broader clinical trials communityFDA and NCI are to coordinate regarding terminology standards efforts such as HL7 RCRIM (including CDISC)
31Example: NCI EVS and FDA SPL NCI EVS maintains and provides access to FDA SPL TerminologyNCI Thesaurus will be a primary namespace usedAlso FDA standard terminology for the ICSR, IND/NDA, device nomenclature, othersAccess ViaDownload at ftp://ftp1.nci.nih.gov/pub/cacore/EVS/Public, open APIWeb Servlet at
32Concept DetailsURI:dictionary=NCI_Thesaurus&code=C42887Version: December 30, 2004 (04.12g)Aerosol Dosage FormIdentifiers:name Aerosol_Dosage_Formcode C42887Information about this concept:Preferred_Name Aerosol Dosage FormSemantic_Type Manufactured ObjectDEFINITION FDA|A product that is packaged under pressure andcontains therapeutically active ingredients that arereleased upon activation of an appropriate valvesystem; it is intended for topical application to theskin as well as local application into the nose(nasal aerosols), mouth (lingual aerosols), or lungs(inhalation aerosols).Synonym with source data AER|AB|FDA_CDER|246Synonym with source data Aerosol Dosage Form|PT|NCISynonym with source data Aerosol|PT|FDA|246Synonym AERSynonym AerosolSynonym Aerosol Dosage FormSynonym Aerosol Dose FormSuperconcepts:Pharmaceutical Dosage FormSubconcepts:Aerosol Foam Dosage FormAerosol Spray Dosage FormMetered Aerosol Dosage FormPowder Aerosol Dosage FormThis indicates the concept is used in the FDA Structured Product Label (SPL)
33A Vital Collaboration: CDISC and NCI – Shared models, metadata standards, and core semantics drawn from standard terminologyCDISC terminology group is working with NCI tools through EVS for the specification and development of broad based clinical trials standard terminology, based on CDISC modelsCDISC is using the NCI Data Standards Repository and controlled terminology from NCI EVS, including but not limited to NCI Thesaurus, for novel terminology developmentThese open CDISC standards, developed in collaboration with FDA, NIH, HL7 and others, can provide the basis for a controlled terminology set able to be adopted across the clinical trials community
35NCI Thesaurus Concept: Race Terminology concept for Race showing harmonization of different users, including CDISC, NCI, CDC, etc.This first slide is a screen shot from an NCI-Thesaurus search on the terminology concept RACE. This middle area shows the results of the search.The yellow area depicts what the CDISC tag for this concept will look like and how the CDISC term for RACE appears in the terminology. So, RACE is a CDISC preferred term (PT)You can also see the harmonization of this term with other major stakeholders and that RACE is a shared terminology concept with other context users…such as NCI, CDC, etc. Could also be FDA,, etc… If we referred back to our example of SEX, the GENDER context would likely appear here as well as another preferred termAlso, for this term there is a direct link to the UMLS concept identifier.At the bottom of the form, will be listed the parent / child relationships.
36Leverages the power of shared knowledge Benefits of Terminology Development in a Common Environment A Step Towards Semantic InteroperabilitySupport and maintenance of terminologies in NCI EVS provides access to and common usage of standard terminologiesEnables use of controlled terminology by clinicians and researchers for data encoding, retrieval, reporting, and aggregationFacilitates collaboration and information exchange by increasing the ability to predictably use information that is gatheredLeverages the power of shared knowledgeControlled vocabulary A restricted set of preferred terms used within an organization for a given purpose. (van Bemmel & Musen)A controlled vocabulary should be a restricted set of terminological phrases used within an organization for a given purpose in a specific subject field…. should provide unambiguous transformation tables towards relevant coding systems. (CEN)Provide significant advantagesnon-ambiguousrepresentation of conceptslossless data transformationfacilitation of mapping among terminologiesdata re-use in different contexts
37You can collaborateJoint Participation: In standards groups such as HL7 RCRIM in order to inform relevant standards decisionsJoint Development: Contributing to clinical trials standard terminology development efforts, i.e. through CDISC terminology groupProviding validation and testing: Content and modeling developed with industry input is more robust, better able to meet your needs, and you can better plan/anticipate implementation/impacts on your organization
38Participate in HL7 RCRIM The HL7 “Regulated Clinical Research Information Management” Technical Committee, formed as a collaboration of CDISC, FDA, and HL7To facilitate the development of common standards for clinical research information management across a variety of organizations -- including government agencies, private research efforts, and sponsored researchTo develop standards for interchange of regulated data that are interoperable with general healthcare standards.
39Contact: Margaret W. Haber, RN, OCN Co-Director NCI Enterprise Vocabulary Services NCI Office of the Director