Presentation is loading. Please wait.

Presentation is loading. Please wait.

The CROP (Common Reference Ontologies for Plants) Initiative Barry Smith September 13, 2013 1.

Similar presentations


Presentation on theme: "The CROP (Common Reference Ontologies for Plants) Initiative Barry Smith September 13, 2013 1."— Presentation transcript:

1 The CROP (Common Reference Ontologies for Plants) Initiative Barry Smith September 13, 2013 http://ontology.buffalo.edu/smith 1

2 2 The OBO Foundry Principles Reference ontologies vs. application ontologies Other ontology consortia The CROP Initiative Examples of ontologies within CROP Agenda

3 On June 22, 1799, in Paris, everything changed 3

4 International System of Units 4

5 How to find data? How to find other people’s data? How to reason with data when you find it? How to work out what data does not yet exist? 5

6 6 How to solve the problem of making the data we find queryable and re- usable by others? Part of the solution must involve: standardized terminologies and coding schemes

7 But there are multiple kinds of standardization for biological data, and they do not work well together 7 Proposed solution: Ontology-based annotation of data

8 8 ontologies = standardized labels designed for use in annotations to make the data cognitively accessible to human beings and algorithmically accessible to computers

9 9 ontologies = high quality controlled structured vocabularies for the annotation (description) of data, images, journal articles …

10 Ramirez et al. Linking of Digital Images to Phylogenetic Data Matrices Using a Morphological Ontology Syst. Biol. 56(2):283–294, 2007

11 11 what cellular component? what molecular function? what biological process? ontologies used in curation of literature

12 Proposed framework: the Semantic Web html demonstrated the power of the Web to allow sharing of information can we use semantic technology to create a Web 2.0 which would allow algorithmic reasoning with online information based on a common Web Ontology Language (OWL)? can we use netcentricity, common URLs, to break down silos, and create useful integration of on-line data and information 12/24

13 Ontology success stories, and some reasons for failure A fragment of the “Linked Open Data” in the biomedical domain 13

14 http://bioportal.bioontology.org/ 14

15 15

16 16

17 17

18 18

19 The more ontology-building is successful, the more it fails OWL breaks down data silos via controlled vocabularies for the description of data dictionaries Unfortunately the very success of this approach led to the creation of multiple, new, semantic silos – because multiple ontologies are being created in ad hoc ways 19/24

20 http://bioportal.bioontology.org/ Many ontologies in bioportal are created by importing content from existing ontologies and giving the terms imported new names and new IDs The result is chaos, with bits and pieces of the same ontologies chopped in multiple different places. Leads to massively redundant effort, forking and doom 20

21 It is easier to write useful software if one works with a simplified model (“…we can’t know what reality is like in any case; we only have our concepts…”) This looks like a useful model to me (One week goes by:) This other thing looks like a useful model to him Data in Pittsburgh does not interoperate with data in Vancouver Science is siloed A standard engineering methodology

22 A good solution to this silo problem must be: modular incremental independent of hardware and software bottom-up evidence-based revisable incorporate a strategy for motivating potential developers and users 22

23 Uses of ‘ontology’ in PubMed abstracts 23

24 24

25 main reason for GO’s success Gene Ontology and associated databases “make it possible to systematically dissect large gene lists in an attempt to assemble a summary of the most enriched and pertinent biology” PMC2615629

26 GO provides a controlled system of terms for use in annotating (describing, tagging) data multi-species, multi-disciplinary, open source contributing to the cumulativity of scientific results obtained by distinct research communities compare use of kilograms, meters, seconds in formulating experimental results 26

27 GO is 3 ontologies biological process cellular component molecular function

28 Top-Level Architecture Continuant Occurrent (Process, Event) Independent Continuant Dependent Continuant 28............... universals instances

29 Problem with the GO it covers only three types of entities no diseases no laboratory artifacts no anatomy (above the cell) only species-terms for development no phenotypes 29

30 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry 30

31 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO) rationale of OBO Foundry coverage GRANULARITY RELATION TO TIME 31

32 32 a shared portal for (so far) 58 ontologies (low regimentation) http://obo.sourceforge.nethttp://obo.sourceforge.net  NCBO BioPortal First step (2001)

33 33

34 OBO builds on the principles successfully implemented by the GO recognizing that ontologies need to be developed in tandem 34

35 35 The OBO Foundry http://obofoundry.org/ Second step (2006)

36 36 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Building out from the original GO

37 37 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) initial OBO Foundry coverage GRANULARITY RELATION TO TIME

38 OBO Foundry Principles  common formal architecture  clearly delineated content (redundant – overlaps with orthogonality)  the ontology is well-documented (– overlaps with rules for definitions; needs expanding, for developers, for users, minimal metadata)  plurality of independent users  single locus of authority, trackers, help desk 38

39 OBO Foundry Principles  textual definitions plus formal definitions  all definitions should be of the genus-species form A =def. a B which Cs where B is the parent term of A in the ontology hierarchy formal definitions use OBO format or OWL 39

40 Orthogonality For each domain, there should be convergence upon a single ontology that is recommended for use by those who wish to become involved with the Foundry initiative Part of the goal here is to avoid the need for mappings – which are in any case too expensive, too fragile, too difficult to keep up-to-date as mapped ontologies change Orthogonality means: – everyone knows where to look to find out how to annotate each kind of data – everyone knows where to look to find content for application ontologies 40

41 Orthogonality = non-redundancy for the reference ontologies inside the Foundry application ontologies can overlap, but then only in those areas where common coverage is supplied by a reference ontology 41

42 42  COMMON FORMAL ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the Basic Formal Ontology (BFO)  http://www.ifomis.uni-saarland.de/bfo/ http://www.ifomis.uni-saarland.de/bfo/ ‘formal’= domain neutral PRINCIPLES

43 Continuant Occurrent Independent Continuant Dependent Continuant cell component biological process molecular function Basic Formal Ontology

44 OBO Foundry provides guidelines (traffic laws) to new groups of ontology developers in ways which can counteract current dispersion of effort

45 New principle: Employ the methodology of cross-products compound terms in ontologies are to be defined as cross-products of simpler terms: E.g elevated blood glucose is a cross-product of PATO: increased concentration with FMA: blood and CheBI: glucose. = factoring out of ontologies into discipline- specific modules (orthogonality) 45

46 The methodology of cross-products enforcing use of common relations in linking terms drawn from Foundry ontologies serves to ensure that the ontologies are maintained and revised in tandem logically defined relations serve to bind terms in different ontologies together to create a network 46

47 47 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Building out from the original GO

48 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Population-level ontologies 48

49 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Environment Ontology environments 49

50 Anatomy Ontology (FMA*, CARO) Environment Ontology (EnvO) Infectious Disease Ontology (IDO*) Biological Process Ontology (GO*) Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Phenotypic Quality Ontology (PaTO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Molecular Function (GO*) Protein Ontology (PRO*) Extension Strategy + Modular Organization 50 top level mid-level domain level Information Artifact Ontology (IAO) Ontology for Biomedical Investigations (OBI) Spatial Ontology (BSPO) Basic Formal Ontology (BFO)

51 Third step: Third step: Creation of new ontology consortia, modeled on the OBO Foundry 51 OBO FoundryOpen Biological and Biomedical Ontologies NIF StandardNeuroscience Information Framework eagle-I Ontologiesused by VIVO and CTSAconnect IDO ConsortiumInfectious Disease Ontology

52 A good solution to the silo problem must be: modular incremental independent of software and hardware bottom-up evidence-based revisable incorporate a strategy for motivating potential developers and users 52

53 Because the ontologies in the Foundry are built as orthogonal modules which form an incrementally evolving network scientists are motivated to commit to developing ontologies because they will need in their own work ontologies that fit into this network users are motivated by the assurance that the ontologies they turn to are maintained by experts 53

54 More benefits of orthogonality helps those new to ontology to find what they need to find models of good practice ensures mutual consistency of ontologies (trivially) and thereby ensures additivity of annotations 54

55 More benefits of orthogonality it rules out the sorts of simplification and partiality which may be acceptable under more pluralistic regimes thereby brings an obligation on the part of ontology developers to commit to scientific accuracy and domain-completeness 55

56 More benefits of orthogonality No need to reinvent the wheel for each new domain Can profit from storehouse of lessons learned Can more easily reuse what is made by others Can more easily reuse training Can more easily inspect and criticize results of others’ work Leads to innovations (e.g. Mireot, Ontofox) in strategies for combining ontologies 56

57 Reference Ontologies vs. Application Ontologies Reference ontology = an ontology that captures generic content and is designed for aggressive reuse in multiple different types of context. Our assumption is that most reference ontologies will be created manually on the basis of explicit assertion of the taxonomical and other relations between their terms.

58 Reference Ontologies vs. Application Ontologies By ‘application ontology’ we mean an ontology that is tied to specific local applications. Each application ontology is created by using ontology merging software to combine new, local content with generic content taken over from relevant reference ontologies Xiang, et al., “OntoFox: Web-Based Support for Ontology Reuse”, BMC Research Notes. 2010, 3:175.

59 Normalization of the ontology space – content from reference ontologies is maximally re-used, e.g. in formulation of compound terms and of cross-product definitions (Compare normalization of a vector space) (Compare, again, SI System of Units)

60 International System of Units 60

61 Infectious Disease Ontology (IDO) 61

62 We have data, e.g.: TBDB: Tuberculosis Database, including Microarray data VFDB: Virulence Factor DB TropNetEurop Dengue Case Data ISD: Influenza Sequence Database at LANL MPD/MRD/CPP: Protein Data of PIR Resource Center for Biodefense Proteomics Research PathPort: Pathogen Portal Project 62

63 Purpose of Infectious Disease Ontology (IDO) Retrieval and integration of infectious disease relevant data – Sequence and protein data for pathogens – Case report data for patients – Clinical trial data for drugs, vaccines – Epidemiological Data for surveillance, prevention –... Goal: to make data deriving from different sources comparable and computable 63

64 IDO Strategy Reference ontology (IDO Core) with terms relevant to any infectious disease Disease- and organism-specific application ontologies – for different types of host, types of vector, types of pathogen, types of disease 64

65 Infectious Disease Ontology (IDO) Member of the OBO Foundry A suite of ontologies – IDO Core: General terms in the ID domain. A hub for all IDO extensions. – IDO Extensions: Disease specific. Developed by subject matter experts. Provides: – Clear, precise, and consistent natural language definitions – Computable logical representations (OWL, OBO)

66 How IDO evolves IDOCore IDOSa IDOHumanSa IDORatSa IDOStrep IDORatStrep IDOHumanStrep IDOMRSa IDOHumanBacterial IDOAntibioticResistant IDOMALIDOHIV CORE and SPOKES: Domain ontologies SEMI-LATTICE: By subject matter experts in different communities of interest. IDOFLU

67 IDO Process Model

68 Sample Application: A lattice of infectious disease application ontologies from NARSA isolate data Expose value of Genotype-Phenotype Linked Data by converting a free-text database from NARSA (Network on Antimicrobial Resistance in Staphylococcus Aureu) into a computational resource

69 Ways of differentiating Staphylococcus aureus infectious diseases Infectious Disease – By host type – By (sub-)species of pathogen – By antibiotic resistance – By anatomical site of infection Bacterial Infectious Disease – By PFGE (Strain) – By MLST (Sequence Type) – By BURST (Clonal Complex) Sa Infectious Disease – By SCCmec type By ccr type By mec class – spa type http://www.sccmec.org/Pages/SCC_ClassificationEN.html

70 ido.owl narsa.owl narsa-isolates.owl ndf-rt NRS701’s resistance to clindamycin

71 Further extensions of IDO Vaccine (Vaccine Ontology) Plant IDO from ICBO 2012: 71

72 Founding CROP

73 The ontologies in CROP General ontologies taken over from OBO Foundry ChEBI Chemistry ontology GO Gene Ontology PRO Protein Ontology ENVO Environment Ontology + GAZ Gazetteer built on ontological principles PATO Phenotype Ontology 73

74 Plant specific ontologies to be developed by CROP group PO Plant Ontology TO Trait Ontology EO Plant Environment Ontology Plant IDO Plant Disease Action items: fix relation between EnvO and EO fix relation between PATO and TO

75 Taxonomy resource (for diseases of host and causal organisms + vectors/secondary hosts) NCBI Taxonomy has most of the hosts, but not the viruses

76 Next steps in CROP: PRO-PO-GO Meeting Buffalo, Spring 2013 PRO = protein ontology PO = plant ontology GO = gene ontology

77 The Environment Ontology 77 OBO Foundry Genomic Standards Consortium National Environment Research Council (UK) USDA, Gramene, J. Craig Venter Institute...

78 78 Applications of EnvO in biology

79 79

80 80

81 81

82 How EnvO currently works for information retrieval Retrieve all experiments on organisms obtained from: – deep-sea thermal vents – arctic ice cores – rainforest canopy – alpine melt zone Retrieve all data on organisms sampled from: – hot and dry environments – cold and wet environments – a height above 5,000 meters Retrieve all the omic data from soil organisms subject to: – moderate heavy metal contamination 82

83 extending EnvO to clinical and translational research we have public heath, community and population data we need to make this data available for search and algorithmic processing we create a consensus-based ontology which can interoperate with ontologies for neighboring domains of medicine and basic biology 83

84 Environment = totality of circumstances external to a living organism or group of organisms – pH – evapotranspiration – turbidity – available light – predominant vegetation – predatory pressure – nutrient limitation … 84

85 extend EnvO to the clinical domain – dietary patterns (Food Ontology: FAO, USDA)... allergies – neighborhood patterns built environment, living conditions climate social networking crime, transport education, religion, work health, hygiene – disease patterns bio-environment (bacteriological,...) patterns of disease transmission (links to IDO) 85

86 continuant systemecosystembiome environmental feature site pond mountain slope … object organism spatial region Aligning EnvO to the Basic Formal Ontology

87 habitat Habitat =def. An ecosystem which can support the life of a given organism, population, or community Realized niche =def. An ecosystem which is that part of a habitat which supports the life of a given organism, population or community

88 continuant system ecosystembiome habitat environmental feature site pond mountain slope … object organism spatial region Aligning EnvO to the Basic Formal Ontology

89 Hutchinsonion niche (niche as volume in a functionally defined hyperspace) =def. an n-dimensional hyper-volume whose dimensions correspond to resource gradients over which species are distributed – degree of slope, exposure to sunlight, soil fertility, foliage density, salinity...

90 G.E. Hutchinson (1957, 1965)

91

92 continuant system ecosystembiome habitatniche environmental feature site pond mountain slope … object organism spatial region Aligning EnvO to the Basic Formal Ontology part_of

93 93

94 94


Download ppt "The CROP (Common Reference Ontologies for Plants) Initiative Barry Smith September 13, 2013 1."

Similar presentations


Ads by Google