Presentation on theme: "1 The OBO Foundry A Gold Standard Approach to Ontology Evaluation Barry Smith"— Presentation transcript:
1 The OBO Foundry A Gold Standard Approach to Ontology Evaluation Barry Smith
2 Two types of ontology natural-science ontologies capture terminology-level knowledge underlying the best current science contrasted with administrative ontologies (e.g. billing ontologies, bloodbank ontologies, lab workflow ontologies) prepared for specific, local purposes
3 scientific ontologies have special features Every term in a scientific ontology must be such that the developers of the ontology believe it to refer to some entity on the basis of the best current evidence scientific ontologies are realism-based
4 For scientific ontologies reusability is crucial compatibility with neighboring scientific ontologies it is generalizations that are important = universals, types, kinds
5 An ontology is a representation of universals We learn about universals in reality from looking at the results of scientific experiments in the form of scientific theories experiments relate to what is particular science describes what is general
6 what is the difference between an ontology and a scientific theory? an ontology is also a terminological standardization WHAT DOES THIS MEAN?
7 1st aspect: additivity cell = def. plant cell, consisting of protoplast and cell wall;... [Plant Ontology] what happens when the users of the Plant Ontology need to consider bacterial pathogens in plants?
8 2nd aspect: calibration with reality gold standard kilogram the same universal is defined by reference either to some artifact or to some universal physical constant (for realists there is no problem here)
9 VIM: the International Vocabulary of Metrology (i) repeated measurements always give rise to some variation in values, (ii) one can never be sure (fallibilism) that one has got the true value, Hence: (iii) there are no true values. To keep happy those who dismiss the notion of the true value, the international community is agreeing to a set of terms which intentionally allow two possible interpretations once again: bad philosophy leads to bad standards Compare:http://ontology.buffalo.edu/medo/Wuesteria.pdf
10 from: The NIST Reference on Constants, Units and Uncertainty The creation of the decimal Metric System at the time of the French Revolution and the subsequent deposition of two platinum standards representing the meter and the kilogram, on 22 June 1799, in the Archives de la République in Paris can be seen as the first step in the development of the present International System of Units.
11 from: The NIST Reference on Constants, Units and Uncertainty In the 1860s Maxwell and Thomson formulated the requirement for a coherent system of units with base units and derived units. In 1874 the British Association for the Advancement of Science introduced the CGS system, a three-dimensional coherent unit system based on the three mechanical units centimeter, gram and second, using prefixes ranging from micro to mega to express decimal submultiples and multiples. The following development of physics as an experimental science was largely based on this system.
13 Base and Derived Units Units based on undefined SI dimensions: meter, second, kilogram, ampere, candela, kelvin, mole. Units based on defined SI dimensions: volume, area, velocity, acceleration, newton, joule, pascal, coulomb, farad, henry, hertz, lumen, lux, ohm, etc. Dimensions can be multiplied and divided (meters/second).
14 The SI System of Units is a qualitative ontology: it captures qualitative dimensions of reality to which quantities can be applied (it captures measurable dimensions of reality) there is a degree of conventionality in the choice of basic vs. derived units, and in the standard [e.g. the Paris meter] that is used to define the unit in each dimension
15 but the dimensions themselves exist independently of our conventions so that an ontology of these dimensions is a true representation of an independently existing reality
16 Quantities are Universals Ingvar Johansson: Many different things can simultaneously have a mass of 5kg (length of 4m, etc.). Determinate quantities are universals, which means that they have many instances
17 Units Ontology developed in conjunction with PATO, the Phenotypic qualities ontology obo.sourceforge.net/cgi-bin/detail.cgi?quality
18 fiat subtypes of qualities spatial quality lengthweight temperature is_a 1mm1cm1g1kg … quality
19 Representation of measurements spatial quality length weight temperature is_a mm cm kg g quality unit measurement_of
20 Ingvar Johansson: (a) no object can possibly at one and the same time take two values of the same quantity dimension (b) in case of additive quantities, only quantities of the same dimension can be added together to give rise to a sum: no material object can have two masses, and masses can only be added to other masses
21 Controlled vocabulary Each SI unit is represented by a symbol, not an abbreviation. The use of unit symbols is regulated by precise rules.precise rules These symbols are the same in every language of the world, even though the names of the units themselves vary in spelling according to national conventions.
22 The SI system of units gives you: a gold standard controlled vocabulary for the expression of scientific results which makes these results comparable and integratable –my hypotheses can be checked against your data my measuring equipment can be callibrated against your measuring equipment (because each can be callibrated against the same gold standard) the SI system of units can serve as a gold standard because it is a true reflection of an independent reality
23 a system of units is a legend for measurement data heartrate cadence speed torque power
24 compare: legends for maps
25 Creating a system of units is not easy; it has to match the way the measurable dimensions are interconnected in reality it may need to be revised in light of new discoveries about how reality is structured
26 after Maxwell and Thomson the subsequent development of physics as an experimental science was largely based on their system of standardized units.
27 analogous achievements also in chemistry IUPAC InChI and in molecular biology, for proteins, enzymes, genes, etc. IUBMB HUGO Gene Nomenclature Committee, etc.
28 Periodic Table
29 the goal of realist ontology to generalize this achievement –specifically in biology –and in medicine (where forces are at work which tend to thwart standardization of vocabulary) to move from standardizations of nouns to standardizations of sentences
gene expression data realist ontologies are legends for data
31 where in the body ? what kind of disease process ? need for semantic annotation of data in what kind of cell?
33 the Gene Ontology is already a de facto standard
34 natural language labels organized in a graph- theoretic structure,designed to ma ke the data cognitively accessible to human beings algorithmically accessible to machines linked up to other data resources because the same labels have been used
35 compare: legends for cartoons (for diagrams in scientific texts)
36 x i = vector of measurements of gene i k = the state of the gene ( as on or off) θ i = set of parameters of the Gaussian model... ontologies are legends for mathematical equations
37 or chemistry diagrams Prasanna, et al. Chemical Compound Navigator: A Web-Based Chem-BLAST, Chemical Taxonomy-Based Search Engine for Browsing Compounds PROTEINS: Structure, Function, and Bioinformatics 63:907–917 (2006)
38 annotation using common ontologies yields integration of databases MouseEcotope GlyProt DiabetInGene GluChem Holliday junction helicase complex
39 What is mapping (1) Given two ontologies A and B, mapping one ontology with another means that for each concept (node) in ontology A, we try to find a corresponding concept (node), which has the same or similar semantics, in ontology B and vice verse. M. Ehrig M and Y. Sure, Ontology mapping - an integrated approach. In Proceedings of the First European Semantic Web Symposium, ESWS 2004, volume 3053 of Lecture Notes in Computer Science, pages 76–91, Heraklion, Greece, May Springer Verlag.
40 What is mapping (2) the task of relating the vocabulary of two ontologies in such a way that the mathematical structure of ontological signatures and their intended interpretations, as specified by the ontological axioms, are respected. [ontological signature = a hierarchy of concept symbols together with a set of relation symbols whose arguments are defined over the concepts of the concept hierarchy] Y. Kalfoglou and M. Schorlemmer, Ontology mapping: the state of the art. Knowl. Eng. Rev., 18(1): 2003.
41 What is mapping (3) a formal expression that states the semantic relation between two entities belonging to different ontologies, Simple examples are: concept c1 in ontology O1 is equivalent to concept c2 in ontology O2; concept c1 in ontology O1 is similar to concept c2 in ontology O2; individual i1 in ontology O1 is the same as individual i2 in ontology O2 P. Bouquet et al. KnowledgeWeb deliverable D Specification of a common framework for characterizing alignment.
42 One way to support ontology matching (and evaluation) have experts manually prepare for each given matching problem a gold standard to which matching efforts could be compared. –M. Ehrig and J. Euzenat, Relaxed Precision and Recall for Ontology Matching, in: Proc. K-Cap 2005 workshop on Integrating ontology, Banff (CA), p , 2005.
43 Gold standard methodology for ontology evaluation is very expensive who are the experts? sometimes cannot be done for political reasons UMLS metathesaurus even a gold standard can contain errors
44 Solution: The OBO Foundry 1.some large pieces already exist (especially Gene Ontology, Foundational Model of Anatomy) 2.processes of unification and reform already in place 3.all participants aiming for additivity 4.procedures for constant update in light of scientific advance
45 science basis of the GO: trained experts curating peer- reviewed literature RESULT: a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form Contrast: data-mining based approaches to ontology construction The GO methodology of annotations
46 Systematic annotation of references to gene products in literature leads to improvements and extensions of the ontology leads to better annotations leads to a virtuous cycle of improvement in the quality and reach of both future annotations and the ontology itself
47 Five bangs for your GO buck science base cross-species database integration cross-granularity database integration through links to the entities in biological reality semantic searchability links people to software
48 a shared portal for (so far) 58 ontologies (low regimentation) NCBO BioPortal First step (2003)
50 Second step (2004) reform efforts initiated, e.g. linking GO to other OBO ontologies to ensure orthogonality id: CL: name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL: relationship: develops_from CL: relationship: develops_from CL: GO Cell type New Definition + = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.
51 The OBO Foundry Third step (2006)
52 A prospective standard designed to guarantee interoperability of ontologies from the very start (contrast to: post hoc mapping) established March initial candidate OBO ontologies – focused primarily on basic science domains several being constructed ab initio by influential consortia who have the authority to impose their use on large parts of the relevant communities.
53 undergoing rigorous reform new GO Gene Ontology ChEBI Chemical Ontology CL Cell Ontology FMA Foundational Model of Anatomy PaTO Phenotype Quality Ontology SO Sequence Ontology CARO Common Anatomy Reference Ontology CTO Clinical Trial Ontology FuGO Functional Genomics Investigation Ontology PrO Protein Ontology RnaO RNA Ontology RO Relation Ontology The OBO Foundry
54 OntologyScopeURLCustodians Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgi- bin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman Chemical Entities of Bio- logical Interest (ChEBI) molecular entitiesebi.ac.uk/chebi Paula Dematos, Rafael Alcantara Common Anatomy Refer- ence Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland, Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.netFuGO Working Group Gene Ontology (GO) cellular components, molecular functions, biological processes Ontology Consortium Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos Protein Ontology (PrO) protein types and modifications (under development)Protein Ontology Consortium Relation Ontology (RO) relationsobo.sf.net/relationshipBarry Smith, Chris Mungall RNA Ontology (RnaO) three-dimensional RNA structures (under development)RNA Ontology Consortium Sequence Ontology (SO) properties and features of nucleic sequences song.sf.netKaren Eilbeck
55 to providing a FRAMEWORK OF RULES to counteract the current policy of ad hoc creation of new ontologies y each clinical research group REUSABILITY: if data-schemas are formulated using a single well-integrated framework ontology system in widespread use, then this data will be to this degree itself become more widely accessible and usable GOALS The OBO Foundry
56 to serve as BENCHMARK FOR IMPROVEMENTS: once a system of interoperable reference ontologies is there, it will make sense to calibrate existing terminologies in its terms in order to achieve more robust alignment and greater domain coverage GOALS The OBO Foundry
57 Gold standard Two aspects: 1. an expression of practice carried out perfectly (for example, the optimal therapy for a given medical problem) 2. based on complete acceptance or consensus: everyone qualified to render a judgement would agree to what the gold standard is. Friedman CP, Wyatt J. Evaluation Methods in Medical Informatics
58 Gold standards are worth approximating. That is, tarnished or fuzzy standards are better than no standards at all.... studies comparing the performance of information resources against imperfect standards, so long as the degree of imperfection has been estimated, represent a stronger approach than those that bypass the issue of a standard altogether. Friedman CP, Wyatt J. Evaluation Methods in Medical Informatics
59 Gold standards can also be partial: to serve ontology matching and evaluation it is enough to have ontologies comprehending even selected aspects of biomedical reality, provided the assertions contained in these ontologies are universally true in non-closed worlds, gold standards will always be partial in complex disciplines gold standards will always be evolving
60 the constraint of universality OBO Foundry ontologies accept only those relations between their terms which obtain universally (= for all instances) lung is_a anatomical structure lobe of lung part_of lung Compare: electrons have a negative electric charge electrons have a negative electric charge of 1.6 × coulomb
61 Principle of Low Hanging Fruit Ontologies should include even absolutely trivial assertions (assertions you know to be universally true) herpes virus is_a virus Computers need to be led by the hand
62 if the standard is to work it has to simulate the achievements of the SI system of units simple controlled vocabulary wide acceptance uncontroversial allows cross-disciplinary, cross-experimenter callibration my data can confirm or disconfirm your hypothesis