Presentation on theme: "ECO R European Centre for Ontological Research Application of Ontology in Cancer Bioinformatics. Dr. Werner Ceusters, MD Executive Director European Centre."— Presentation transcript:
ECO R European Centre for Ontological Research Application of Ontology in Cancer Bioinformatics. Dr. Werner Ceusters, MD Executive Director European Centre for Ontological Research Saarland University Saarbrücken, Germany
ECO R European Centre for Ontological Research 11th World Conference on Medical Informatics San Francisco 7-11/9/2004 759 papers 48 contain word bioinformatics 124 contain cancer 1 contains cancer bioinformatics But: about 50 deal with cancer bioinformatics 89 contain ontology
ECO R European Centre for Ontological Research A Log Likelihood Predictor for Genomic Classification of Oral Cancer using Principle Component Analysis for Feature Selection Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development A Text Mining Approach to Enable Detection of Candidate Risk Factors Cancer-related Complementary and Alternative Medicine Online: Factors Affecting Information Retrieval (by patients) Development of the ICNP based cancer nursing information system NCI Thesaurus: Using Science-Based Terminology to Integrate Cancer Research Results Extraction of Diagnosis Related Terminological Info from Discharge Summary Automated Clinical Annotation of Tissue Bank Specimens Mining OMIM for Insight into Complex Diseases A new parameter enhancing breast cancer detection in computer aided diagnosis of X-ray mammograms Tools for the Performance of Clinical Trials Research Formal Representation of Medical Goals for Medical Guidelines Using Internet Survey Among Cancer Patients Ontology related Cancer Bioinformatics at MEDINFO 2004
ECO R European Centre for Ontological Research Goals of Cancer Bioinformatics To integrate molecular, biological and clinical knowledge about cancer with analytic methods from bioinformatics. The ultimate aim is to create comprehensive prognostic and predictive models as aids to diagnosis, treatment and the design of new therapeutics.
ECO R European Centre for Ontological Research Task descriptions Sequence similarity searching – Nucleic acid vs nucleic acid 28 – Protein vs protein 39 – Translated nucleic acid vs protein 6 – Unspecified sequence type 29 – Search for non-coding DNA 9 Functional motif searching 35 Sequence retrieval 27 Multiple sequence alignment 21 Restriction mapping 19 Secondary and tertiary structure prediction 14 Other DNA analysis including translation 14 Primer design 12 ORF analysis 11 Literature searching 10 Phylogenetic analysis 9 Protein analysis 10 Sequence assembly 8 Location of expression 7 Miscellaneous 7 Total 315 Stevens R, Goble C, Baker P, and Brass A. A Classification of Tasks in Bioinformatics. Bioinformatics 2001: 17 (2):180-188.
ECO R European Centre for Ontological Research Three major challenges Analyse massive amounts of data: – Eg: high throughput technologies based upon cDNA or oligonucleotide microarrays for analysis of gene expression, analysis of sequence polymorphisms and mutations, and sequencing Appropriately link clinical histories to molecular or other biomarker data generated by genomic and proteomic technologies. Development of user-friendly computer-based platforms – that can be accessed and utilized by the average researcher for searching, retrieval, manipulation, and analysis of information from large-scale datasets
ECO R European Centre for Ontological Research Words of Wisdom Ontology is too often not taken seriously, and only few people understand that. But there is hope: – The promise of Web Services, augmented with the Semantic Web, is to provide THE major solution for integration, the largest IT cost / sector, at $ 500 BN/year. The Web Services and Semantic Web trends are heading for a major failure (i.e., the most recent Silver Bullet). In reality, Web Services, as a technology, is in its infancy.... There is no technical solution (i.e., no basis) other than fantasy for the rest of the Web Services story. Analyst claims of maturity and adoption (...) are already false.... Verizon must understand it so as not to invest too heavily in technologies that will fail or that will not produce a reasonable ROI. Dr. Michael L. Brodie, Chief Scientist, Verizon IT OntoWeb Meeting, Innsbruck, Austria, December 16-18, 2002
ECO R European Centre for Ontological Research Setup of this presentation Look at some popular views, statements, claims, systems, beliefs,... about ontology, and indicate where and how they fail to do justice to what ontology is actually about; Explain the basics of the principled approach that we use and give examples of practical applications; Some comments on the future of ontology in Buffalo and the US.
ECO R European Centre for Ontological Research Data Integration approaches 1.Data Warehousing : Data from various data sources are converted, merged and stored in a centralized DBMS. (Examples) Integrated Genomic Database 2.Hyperlinking approaches: Where links are set up between related information and data sources. SRS, Entrez (NCBI) 3.Standardization: Efforts which address the need for a common metadata model for various application domains. 4.Integration systems: Systems that can gather and integrate information from multiple sources. Some of these systems have a Mediator-Wrapper Architecture others are language based systems like Bio-Kleisli. 5.Federated Database: Cooperating, yet autonomous, databases map their individual schemas to a single global schema. Operations are preformed against the federated schema. Steve Brady System Integration approaches
ECO R European Centre for Ontological Research Data integration approaches Protein interaction databases Small molecule databases Genome databases Pathway databases Protein databases Enzyme databases Gene Ontology at least, the beginnings of...
ECO R European Centre for Ontological Research GO deals with basic ontological notions very haphazardly GOs three main term-hierarchies are: component, function and process But GO confuses functions with structures, and also with executions of functions and has no clear account of the relation between functions and processes
ECO R European Centre for Ontological Research A flavour of ontology
ECO R European Centre for Ontological Research HAS- PARTIAL- SPATIAL- OVERLAP IS- TOPO- INSIDE- OF IS-GEO- INSIDE- OF IS- INSIDE- CONVEX- HULL-OF IS-PARTLY- IN-CONVEX- HULL-OF IS- OUTSIDE- CONVEX- HULL-OF HAS- DISCONNECTED- REGION HAS- EXTERNAL- CONNECTING- REGION HAS-DISCRETED- REGION HAS- TANG.- SPAT.- PART HAS-NON- TANG.- SPAT.- PART IS- SPAT.- EQUIV.- OF IS- TANG.- SPAT.- PART-OF IS-NON- TANG.- SPAT.- PART-OF HAS- PROPER- SPATIAL -PART IS- PROPER- SPAT.- PART-OF HAS- SPATIAL -PART IS- SPATIAL -PART- OF HAS- OVERLAPPING -REGION HAS- CONNECTING- REGION HAS-SPATIAL- POINT- REFERENCE Mereo-topology
ECO R European Centre for Ontological Research caCORE: The NCICB Cancer Informatics Infrastructure Backbone cancer Bioinformatics Infrastructure Objects : Biomedical objects to facilitate the communication and integration of information from the various initiatives supported by the NCICB cancer Data Standards Repository: meta-data used for cancer research NCI Enterprise Vocabulary Services : standard vocabularies for a variety of settings in the life sciences
ECO R European Centre for Ontological Research caBIO architecture Connectivity at programming interface level, NOT content
ECO R European Centre for Ontological Research CoMeDIAS (France)
ECO R European Centre for Ontological Research GenesTrace TM : Biological Knowledge Discovery via Structured Terminology
ECO R European Centre for Ontological Research But.... Talking to each other does not mean Understanding each other
ECO R European Centre for Ontological Research Cancer Data Standards Repository (caDSR) One of the problems confronting the biomedical data management community is the panoply of ways that similar or identical concepts are described. Amen !? But more appropriate would it be to say: –THE problem confronting the biomedical data management community is that concepts are described.
ECO R European Centre for Ontological Research Triadic models of meaning: The Semiotic/Semantic triangle Sign: Language/ Term/ Symbol Referent: Reality/ Object Reference: Concept / Sense / Model / View
ECO R European Centre for Ontological Research Ontology In Information Science: – An ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. In Philosophy: –Ontology is the science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality.
ECO R European Centre for Ontological Research Why are concepts not enough? Why must our theory address also the referents in reality? – Because referents are observable fixed points in relation to which we can work out how the concepts used by different communities relate to each other ; – Because only by looking at referents can we establish the degree to which concepts are good for their purpose.
ECO R European Centre for Ontological Research NCI Enterprice Vocabulary Services environment
ECO R European Centre for Ontological Research NCI Thesaurus a biomedical thesaurus created specifically to meet the needs of the NCI semantically modeled cancer-related terminology built using description logic
ECO R European Centre for Ontological Research Why description logics are not enough SNOMED-RT (2000) SNOMED-CT (2003)
ECO R European Centre for Ontological Research Underspecification new-1 new-2
ECO R European Centre for Ontological Research Use of description logics does not guarantee correct representations !
ECO R European Centre for Ontological Research Its not just a problem in Healthcare Ontologies for Legal Information Serving and Knowledge Management Joost Breuker, Abdullatif Elhag, Emil Petkov and Radboud Winkels
ECO R European Centre for Ontological Research Ontology versus Description Logics In the Description Logic world – terms and definitions come first, – the job is to validate them and reason with them In the realist ontology world – robust ontology (with all its reasoning power) comes first – and terms and term-hierarchies must be subjected to the constraints of ontological coherence
ECO R European Centre for Ontological Research Search for cancer
ECO R European Centre for Ontological Research NCI Thesaurus Root concepts Anatomic Structure, Anatomic System, or Anatomic Substance ? Or ? Does the NCI not know to which category Any item classified there belongs ? Anatomic Substance ? If yes, why is gene product not subsumed by it ? If no, why are drugs and chemicals not subsumed by it ?
ECO R European Centre for Ontological Research Conceptual entity Definition: none Semantic type: – Conceptual entity – Classification Subconcepts: – Action: definition: action; a thing done – And: Definition: an article which expresses the relation of connection or addition, used to conjoin a word with a word,... – Classification Definition: the grouping of things into classes or categories
ECO R European Centre for Ontological Research Definition of cancer gene
ECO R European Centre for Ontological Research NCI Thesaurus architecture Disease BreastBreast neoplasm Disease-has-associated-anatomy ISA Findings-And- Disorders-Kind Anatomy-Kind Formal subsumption or inheritance Associative relationships providing differentiae Kinds restrict the domain and range of associative relationships What diseases have a diameter of over 3 cm ?
ECO R European Centre for Ontological Research Problems with C - rel - C Ad hoc readings of statements of the type C1-relationship- C2 – Human has-part head // Human has-part finger – California is-part-of United States // California isa name – labial vein isa vein of head // labial vein isa vulval vein Concepts not necessarily correspond to something that (will) exist(ed) – Sorcerer, unicorn, leprechaun,... Definitions set the conditions under which terms may be used, and may not be abused as conditions an entity must satisfy to be what it is Language can make strings of words look as if it were terms –Middle lobe of left lung
ECO R European Centre for Ontological Research NCI Metathesaurus based on NLM's Unified Medical Language System Metathesaurus supplemented with additional cancer-centric vocabulary a database of many biomedical terminologies, mapped where possible to NCI Thesaurus terms and shared conceptual meanings
ECO R European Centre for Ontological Research NCI and Partner Data Sources SAGE Data (CGAP) – NCI and Duke university SAGE experiment data Expression Measurements (NCICB GEDP) - Probe sets Sequence Trace Files (GAI) - EST traces and full-length mRNA clone traces Genetic Annotation Initiative (GAI) - SNPs Sequence Verified Clones (as of caBIO version 2.0) (NCICB internal pre-processed) - Human and mouse sequence-verified clone information Cancer Clinical Trials (NCI CTEP and PDQ) - Trials and drug agent information CMAP Annotation Data (CMAP) - Drug targets, anomalies Cancer Vocabulary (NCI) - Cancer related terminology and concepts
ECO R European Centre for Ontological Research External Data Sources Unigene (NCBI) - Human and mouse genes, sequences, map locations, clones, proteins and protein homologs Homologene (NCBI) - Human and mouse gene homologs LocusLink (NCBI) - Genes, gene ontologies, gene aliases, taxons RefSeq (NCBI) - Reference sequences EST Data (NCICB) - Tissue-specific expression level ESTs cDNA library information (NCICB) - cDNA libraries for disease and tissue Human Genome via UCSC DAS server (UCSC) - Genomic sequences, annotations, and map coordinates BioCarta (BioCarta) - Pathways Gene Ontology - Hierarchy of gene functions
ECO R European Centre for Ontological Research Metathesaurus traps UMLS example
ECO R European Centre for Ontological Research IFOMIS: Institute for Formal Ontology and Medical Information Science The Institute for Formal Ontology and Medical Information Science was founded in April 2002 as part of the Faculty of Medicine of the University of Leipzig utilizing a grant of the Alexander von Humboldt Foundation. It comprehends an interdisciplinary research group with members from Philosophy, Computer and Information Science, Logic, Medicine, and Medical Informatics. IFOMIS established itself as a center of theoretically grounded research in both formal and applied ontology. Its goal is to develop a formal ontology that will be applied and tested in the domain of medical and biomedical information science. In August 2004 IFOMIS moved its base of operations from Leipzig to Saarland University in Saarbrücken. IFOMIS Universität des Saarlandes Postfach 151150 D-66041 Saarbrücken Germany Secretariat Tel.: +49 (0)681-302-64770 Fax: +49 (0)681-302-64772
ECO R European Centre for Ontological Research IFOMISs long-term goal Build a robust high-level BFO-MedO framework THE WORLDS FIRST INDUSTRIAL- STRENGTH PHILOSOPHY which can serve as the basis for an ontologically coherent unification of medical knowledge and terminology
ECO R European Centre for Ontological Research IFOMIS research in Formal Ontology Formal treatment of universals, individuals, endurants, perdurants, scales, functions, collections,... Universals / Concepts Meriology and topology Vagueness and granularity Applicability to domain ontologies, terminologies,...
ECO R European Centre for Ontological Research Reference Ontology a theory of a domain of entities in the world based on realizing the goals of maximal expressiveness and adequacy to reality sacrificing computational tractability for the sake of representational adequacy
ECO R European Centre for Ontological Research Basic Ontological Notions Identity – How are instances of a class distinguished from each other Unity – How are all the parts of an instance isolated Essence – Can a property change over time Dependence – Can an entity exist without some others
ECO R European Centre for Ontological Research (Simplified) Logic of classes primitive: – entities: particulars versus universals – relation inst such that: all classes are universals; all instances are particulars some universals are not classes, hence have no instances: pet, adult, physician some particulars are not instances; e.g. some mereological sums subsumption defined resorting to instances:
ECO R European Centre for Ontological Research Basic Formal Ontology Basic Formal Ontology consists in a series of sub-ontologies (most properly conceived as a series of perspectives on reality), the most important of which are: –SnapBFO, a series of snapshot ontologies (O ti ), indexed by times: continuants –SpanBFO a single videoscopic ontology (O v ): occurants. Each O ti is an inventory of all entities existing at a time. O v is an inventory (processory) of all processes unfolding through time.
ECO R European Centre for Ontological Research Occurants and continuants Picture by Vladimir Brajic
ECO R European Centre for Ontological Research Levels of granularity in biomedical ontology Populationenvironmentscreening PersonRace, age, disease, symptom ADL, working, treatment, prevention OrganLiver, lung, organ part, sign Heart beat, digestion, surgery TissueElasticity,Turgor, Strength Resorption, protection CellBone cell, Alveolar cell Cell size, bacterium Fagocytosis, Cell growth, Reparation, hormone production SubcellularCell membrane, Protein DNA, Oncogene, Protooncogene, Virus, oncogenic molecule Transcription Splicing Mutation Gene regulation Granularity levelContinuantsOccurrents
ECO R European Centre for Ontological Research Missed subsumption detection in SNOMED-CT Missing: ISA neoplasm of heart
ECO R European Centre for Ontological Research Correction of MGEDs ontology upper part MGEDOntology MGEDCoreOntology The MGED Ontology is a top level container for the MGEDCoreOntology and the MGEDExtendedOntology. The MGED ontology describes microarray experimentsand is split into the MGEDCoreOntology, which supports MAGE-OM v1.0 and is organized consistently with MAGE, and the MGEDExtendedOntology, which expands MAGE v1.0 and contains concepts and relationships which are not included in MAGE. Cancer Site SubClassOf Primary site Metastatic site InstanceOf the organism part in which additional tumors are identified remote from the primary site BioMaterial Package SubClassOf BioMaterial Characteristics OrganismPart SubClass Of DiseaseLocation SubClass Of has_cancer_site has-class one-of Anatomical location(s) of disease.
ECO R European Centre for Ontological Research Text mining and classification Having a healthcare phenomenon Generalised Possession Healthcare phenomenon Human IS-A Has- possessor Has- possessed Patient Is-possessor-of Cancer patient IS-A Has-Healthcare- phenomenon Malignant neoplasm IS-A 1 1 1 2 2 3 3 lung carcinoma IS-A Mr. Smith has a pulmonary carcinoma
ECO R European Centre for Ontological Research The near future: International Cancer Ontology Project Healthcare Informatics call 6th FP of EU Applying realist ontology to: – Connect relevant databases for combatting cancer, covering all levels of granularity (from molecules to entire patients) at deep semantic level Independent of the dataformat (text, structured, coded,...)
ECO R European Centre for Ontological Research Knowledge discovery and use
ECO R European Centre for Ontological Research Towards a US-based XCORs BCOR: Buffalo Centre for Ontological Research NCOR: National Centre for Ontological Research – Involving Stanford Introducing realist ontology (as a sound analytical philosophical discipline) to improve ontologies (as representations).