Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Ontology (Science) vs. Ontology (Engineering) Barry Smith University at Buffalo

Similar presentations


Presentation on theme: "1 Ontology (Science) vs. Ontology (Engineering) Barry Smith University at Buffalo"— Presentation transcript:

1 1 Ontology (Science) vs. Ontology (Engineering) Barry Smith University at Buffalo http://ontology.buffalo.edu/smith

2 Working in ontology since 1975 Working with biomedical ontologists since 2002 –Gene Ontology –Protein Ontology –Infectious Disease Ontology –OBO (Open Biomedical Ontologies) Foundry 2

3 NCBO National Center for Biomedical Ontology Dissemination and Ontology Best Practices http://bioontology.org 3

4 ICBO International Conference on Biomedical Ontology Buffalo, NY. July 24-26, 2009 http://icbo.buffalo.edu 4

5 Example ontologies Basic Formal Ontology (BFO) Common Anatomy Reference Ontology (CARO) Environment Ontology (EnvO) Foundational Model of Anatomy (FMA) Infectious Disease Ontology (IDO) Ontology for Biomedical Investigations (OBI) Ontology for Clinical Investigations (OCI) Phenotypic Quality Ontology (PATO) Relation Ontology (RO) 5

6 Multiple kinds of data in multiple kinds of silos Lab / pathology data Electronic Health Record data Clinical trial data Patient histories Medical imaging Microarray data Protein chip data Flow cytometry 6

7 How to find your data? How to find other people’s data? How to reason with data when you find it? How to understand the significance of the data you collected 3 years earlier? To solve the silo problem medical researchers need the help of ontology engineers 7

8 by allowing grouping of annotations brain 20 hindbrain 15 rhombomere 10 Query brain without ontology 20 Query brain with ontology 45 8 Ontologies facilitate retrieval of data

9 Uses of ‘ontology’ in PubMed abstracts 9

10 biologists need help from ontology engineers 10

11 MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFES IPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVY TLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKY GYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRAC ALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDD NNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGI SLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLK TLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPW MDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEY ATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSG TTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV How to do biology across the genome?

12 MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDR KRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTL SLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYM FLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRA CALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCAC TARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTR RIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDP NQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCS FSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEI YMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPV RNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQS QFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMF NLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVV WIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGG LCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIE RMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTAST NVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATT TESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTS ATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTN SNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSEN MNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEAL AVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTR GKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSM LIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG RFDILLCRDSSREVGE 12

13 13 what cellular component? what molecular function? what biological process? Gene Ontology: three types of questions

14 Clark et al., 2005 part_of is_a 14

15 15 what cellular component? what molecular function? what biological process? and through curation of literature

16 16 The Idea of Common Controlled Vocabularies MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity

17 17 The Idea of Common Controlled Vocabularies MouseEcotope GlyProt DiabetInGene GluChem Holliday junction helicase complex

18 18 male courtship behavior, orientation prior to leg tapping and wing vibration Gene Ontology ca. 25,000 nodes

19 19

20 20

21 Benefits of GO 1.rooted in basic experimental biology 2.links people to data and to literature 3.links data to data across species (human, mouse, yeast, fly...) across granularities (molecule, cell, organ, organism, population) 21

22 Benefits of GO 4.links medicine to biological science 5.allows cumulation of scientific knowledge in algorithmically tractable form LET’S GENERALIZE THESE BENEFITS TO OTHER AREAS OF BIOLOGY AND MEDICINE … 22

23 The standard engineering methodology Pragmatics (‘usefulness’) is everything Usefulness = we get to write software which runs on our machines 23

24 It is easier to write useful software if one works with a simplified model (“…we can’t know what reality is like in any case; we only have our concepts…”) This looks like a useful model to me (One week goes by:) This other thing looks like a useful model to him Data in Pittsburgh does not interoperate with data in Vancouver The standard engineering methodology

25 Pragmatics (‘usefulness’) is everything  Science is siloed 25

26 Why build scientific ontologies There are many ways to create ontologies Multiple ontologies only make our data silo problems worse We need to constrain ontologies so that they converge 26

27 Science-based ontology development Q: What is to serve as constraint in order to avoid silo creation ? A: Reality, as revealed, incrementally, by experimentally-based science 27

28 Ontological realism Find out what the world is like by doing science Ontology is ineluctably a multi- disciplinary enterprise – it cannot be left to the engineers Build representations adequate to this world, not to some simplified model in your laptop 28

29 In the olden days people measured lengths using inches, ulnas, perches, king’s feet, Swiss feet, leagues of Portugal, varas of Texas, etc., etc. 29

30 on June 22, 1799, in Paris, everything changed 30

31 we now have the International System of Units 31

32 The SI is a Controlled Vocabulary Each SI unit is represented by a symbol, not an abbreviation. The use of unit symbols is regulated by precise rules. The symbols are designed to be the same in every language. Use of the SI system makes scientific results comparable 32

33 The SI is an Ontology Quantities are universals one each for each measurable dimension of reality Can we provide an analogue of the SI system for (basic dimensions of) biology? 33

34 First step OBO (Open Biomedical Ontologies) library comprehends some 70 ontologies now made available also on the NCBO Bioportal the majority of these ontologies are built to work well with the Gene Ontology 34

35 Goal of the OBO Foundry all biomedical research data should cumulate to form a single, algorithmically processable, whole Smith, et al. Nature Biotechnology, Nov 2007 35

36 Goal of the OBO Foundry to provide a suite of controlled structured vocabularies for the callibrated annotation of data to support integration and algorithmic reasoning across the entire domain of biomedicine as biomedical knowledge grows, these ontologies must be evolved in tandem 36

37 37 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Gene Ontology within the OBO Foundry

38 38 OntologyScopeURLCustodians Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgi- bin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman Chemical Entities of Bio- logical Interest (ChEBI) molecular entitiesebi.ac.uk/chebi Paula Dematos, Rafael Alcantara Common Anatomy Refer- ence Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland, Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.netFuGO Working Group Gene Ontology (GO) cellular components, molecular functions, biological processes www.geneontology.orgGene Ontology Consortium Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos Protein Ontology (PrO) protein types and modifications (under development)Protein Ontology Consortium Relation Ontology (RO) relationsobo.sf.net/relationshipBarry Smith, Chris Mungall RNA Ontology (RnaO) three-dimensional RNA structures (under development)RNA Ontology Consortium Sequence Ontology (SO) properties and features of nucleic sequences song.sf.netKaren Eilbeck

39 39 CRITERIA The ontology is open and available to be used by all. The ontology is instantiated in, a common formal language and shares a common formal architecture The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap. OBO FOUNDRY CRITERIA

40 40 OBO FOUNDRY CRITERIA  The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement.  They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary. http://obofoundry.org

41 Orthogonality = modularity one ontology for each domain no need for semantic matching no need for ontology integration no need for mappings (which are in any case too expensive, too fragile, very difficult to keep up-to-date as mapped ontologies change) 41

42 Orthogonality is our best (perhaps our only) hope of solving the data silo problem Why do computer engineers hate orthogonality (and like ‘relativism’ – every project its own, new ontology) – so much? 42

43 All OBO Foundry ontologies work in the same way –we have data (biosample, haplotype, clinical data, survey data,...) –we need to make this data available for not just string-based search and algorithmic processing –we create a consensus-based ontology for annotating the data

44 We have data BioHealthBase:Tuberculosis Database, VFDB: Virulence Factor DB TropNetEurop: Dengue Case Data BioHealthBase: Influenza Database PathPort: Pathogen Portal Project IMBB: Malaria Data 44

45 We need to annotate this data to allow retrieval and integration of –sequence and protein data for pathogens –case report data for patients –clinical trial data for drugs, vaccines –epidemiological data for surveillance, prevention –... Goal: to make data deriving from different sources comparable and computable 45

46 We need common controlled vocabularies to describe these data in ways that will assure comparability and cumulation What content is needed to adequately cover the infectious domain? –Host-related terms (e.g. carrier, susceptibility) –Pathogen-related terms (e.g. virulence) –Vector-related terms (e.g. reservoir) –Terms for the biology of disease pathogenesis (e.g. evasion of host defense) –Population-level terms (e.g. epidemic, endemic, pandemic) 46

47 IDO provides a common template It contains terms (like ‘pathogen’, ‘vector’, ‘host’) which apply to organisms of all species involved in infectious disease and its transmission Disease- and organism-specific ontologies are then built as refinements of the IDO core – the common core guarantees some level of comparability of data 47

48 IDO Processes 48

49 IDO Qualities 49

50 IDO Roles 50

51 IDO needs to work well with Gene Ontology Immunology Branch Phenotypic Quality Ontology (PATO) Protein Ontology (PRO) Sequence Ontology (SO) Environment Ontology (EnvO)... 51

52 Ontology (Science) Experimental results are being described in algorithmically useful ways with the help of ontologies like the GO Such ontologies are authored and maintained by scientists to support the sharing, retrieval, integration and analysis of their data Thesis: these ontologies are part of science.

53 Ontologies like the GO are part of science They must be associated with computer implementations (with engineering artifacts) But the ontologies are not themselves engineering artifacts The same ontology can be associated with multiple engineering artifacts 53

54 Ontologies like the GO are comparable to –scientific theories –scientific databases –scientific journal publications 54

55 Ontologies like the GO are being used by scientific journal publications – to provide more useful access to article content via controlled structured keyword lists – to provide a basis for creating formally structured versions of journal articles themselves 55

56 OBO Foundry working with journal publishers to advance orthogonality by creating a methodology for expert peer review of ontologies 56

57 A good solution to the silo problem must be: modular incremental bottom-up evidence-based revisable incorporate a strategy for motivating potential developers and users 57

58 Because the ontologies in the Foundry are built as orthogonal modules which form an incrementally evolving network scientists are motivated to commit to developing ontologies because they will need in their own work ontologies that fit into this network users are motivated by the assurance that the ontologies they turn to are maintained by experts 58

59 More benefits of orthogonality helps those new to ontology to find what they need to find models of good practice ensures mutual consistency of ontologies (trivially) and thereby ensures additivity of annotations 59

60 More benefits of orthogonality it rules out the sorts of simplification and partiality which may be acceptable under more pluralistic regimes thereby brings an obligation on the part of ontology developers to commit to scientific accuracy and domain- completeness 60

61 More benefits of orthogonality orthogonality helps to eliminate redundancy serves the division of ontological labor: allows experts to focus on their own domains of expertise makes possible the establishment of clear lines of authority 61

62 orthogonality is no different from one basic goal of science it is a pillar of the scientific method that scientists should strive always to seek out and resolve conflicts between competing theories hence importance of –experimental tests –comparability e.g. via common units of measure –expert peer review 62

63 is there a problem with orthogonality? what if I need my own ontology of cellular membranes to meet my own special purposes? strategy of application ontologies should be developed from the start using terms whose definitions employ the resources of orthogonal ontologies like those within the Foundry any other approach creates silos 63

64 The orthogonal reference ontologies do not have to be perfect Consider the Ontology for Biomedical Investigations (http://obi.sourceforge.net).http://obi.sourceforge.net Its goal is to provide controlled, structured representations for the design, protocols, instrumentation, materials, data and data analysis in biological and biomedical investigations of all types. OBI is, like every other ontology, a work in progress – it is constantly subject to improvement 64

65 OBI serves multiple purposes imperfectly OBI is an OBO Foundry reference ontology because some 20 major communities involved in high throughput experimentation recognize the advantages brought by creating a single ontology that can serve as a stable attractor and as basis for incremental improvement and expansion in the future 65

66 Better to have one consensus ontology serving multiple purposes imperfectly because multiple ontologies addressing the same domain, whether they are good ones or bad ones, create silos 66

67 Goals of ontology (science) 1.to determine, empirically, the consensus core of ontology (science) – which high- level principles work best? 2.to train a community of ontology experts who will be in a position to apply and to extend this core in their scientific work. 3.to establish ontology development as being, like statistics, a recognized part of the scientific enterprise. 67

68 Goals of ontology (science) 4.to establish empirical methods of ontology evaluation 5.to establish a system of expert peer review for ontologies 6.to work with journals to institute publishing of peer-reviewed ontologies 68

69 Benefits of ontology peer review 1.will provide an impetus to the improvement of scientific knowledge over time 2.brings benefits to readers, since they need only absorb and collate vetted ontologies, as opposed to all the ontologies available e.g. on the semantic web. 69

70 such filtering especially needed useful in biomedical ontology Bill Bug: Until there is a reliable vetting procedure, we cannot expect to re-use existing ontologies effectively for the purpose of bringing like data together in novel ways.... Without vetting, we cannot expect to provide other developers with clear advice on what are the reliable ontological shoulders to build on. 70

71 Peer review creates incentives for investment of effort in ontology work It gives career-related credit to both authors and reviewers (university promotions and funding based on peer review credit) Supports creation of a professional career path for ontologists It gives credit to experts for investment of expertise It allows measurement of citations of ontologies It magnifies the motivating potential of the factor of influence 71

72 The special features of ontologies created to serve scientific purposes: 1.they are developed to be common resources (thus they cannot be bought or sold) 2.for representation of well-demarcated scientific domains 3.subject to constant maintenance by domain experts 4.designed to be used in tandem with other, complementary ontologies 5.independent of format and implementation 72

73 For engineers, ontologies need possess none of these features 1.they can be bought and sold 2.they need have no well-demarcated scientific domains 3.they need not be subject to further maintenance 4.they can be stand-alone products 5.they are typically tied to one specific implementation Ontology (engineering) thereby makes the silo problem worse 73

74 The solution If we are to have a chance of resolving the silo problem, ontologists working in support of scientific research should not be trained as engineers the authoring and maintenance and evaluation of scientific ontologies is an incremental, empirical, cumulative, and collaborative (i.e., precisely, scientific) activity 74


Download ppt "1 Ontology (Science) vs. Ontology (Engineering) Barry Smith University at Buffalo"

Similar presentations


Ads by Google