1 Introduction to Ontology Barry Smith

Slides:



Advertisements
Similar presentations
More than one way to dissect an animal Melissa Haendel ZFIN Scientific Curator.
Advertisements

Species-Neutral vs. Multi-Species Ontologies Barry Smith.
On the Future of the NeuroBehavior Ontology and Its Relation to the Mental Functioning Ontology Barry Smith
Goal and Status of the OBO Foundry Barry Smith. 2 Semantic Web, Moby, wikis, crowd sourcing, NLP, etc.  let a million flowers (and weeds) bloom  to.
The Environment Ontology Barry Smith 1.
Overview of Biomedical Informatics Rakesh Nagarajan.
1 Introduction to Biomedical Ontology Barry Smith University at Buffalo
The Future of Health Information Barry Smith Ontology Research Group Center of Excellence in Bioinformatics and Life Sciences University at Buffalo ontology.buffalo.edu/smith.
1 The OBO Foundry Towards Gold Standard Terminology Resources in the Biomedical Domain Thomas Bittner (based on a presentation by Barry Smith)
1 Intelligence Ontology: A Strategy for the Future Barry Smith University at Buffalo
1 How Ontologies Create Research Communities Barry Smith
1 Workshop 7.00 Welcoming Remarks 7.15 Barry Smith (Buffalo, NY) 7.40 Lindsay Cowell (Duke University, NC) 8.05 Nigam Shah (Stanford University, CA) 8.30.
1 An Ontology of Relations for Biomedical Informatics Barry Smith 10 January 2005.
1 The OBO Foundry Barry Smith University at Buffalo
1 Introduction to (Geo)Ontology Barry Smith
1 How Ontologies Create Research Communities Barry Smith University at Buffalo
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
1 The OBO Foundry 2 A prospective standard designed to guarantee interoperability of ontologies from the very start (contrast.
The Problem of Reusability of Biomedical Data OBO Foundry & HL7 RIM Barry Smith.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
National center for ontological research. Part One: The History of NCOR and ECOR Part Two: How to Establish JCOR: The Japanese Consortium.
Underlying Ontologies for Biomedical work - The Relation Ontology (RO) and Basic Formal Ontology (BFO) Thomas Bittner SUNY Buffalo
1 Logical Tools and Theories in Contemporary Bioinformatics Barry Smith
AN INTRODUCTION TO BIOMEDICAL ONTOLOGY Barry Smith University at Buffalo 1.
VT. From Basic Formal Ontology to Medicine Barry Smith and Anand Kumar.
Room for Lunch: Arlington Room Room for Evening Reception: Grand Prairie Room.
Biological Ontologies Neocles Leontis April 20, 2005.
OBO-Foundry. OBO was conceived and announced in in october 2001 Michael Ashburner and Suzanna Lewis with acknowledgements of others in the GO.
CTO - Clinical Trials/Research in the Ontology of Biomedical Investigation Richard H. Scheuermann U.T. Southwestern Medical Center.
The RNA Ontology RNAO Colin Batchelor Neocles Leontis May 2009 Eckart, Colin and Jane In Cambridge.
1 BIOLOGICAL DOMAIN ONTOLOGIES & BASIC FORMAL ONTOLOGY Barry Smith.
1 The OBO Foundry Barry Smith Center of Excellence in Bioinformatics & Life Sciences, University at Buffalo IFOMIS, Saarland University
CoE Ontology Research Group (ORG) Barry Smith Center of Excellence in Bioinformatics and Life Sciences Ontology Research Group Department of Philosophy.
How to Organize the World of Ontologies Barry Smith 1.
New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo Part I: The Gene Ontology Barry Smith and Werner Ceusters.
Introduction to Ontologies for Environmental Biology Barry Smith
1 Part III.The OBO Foundry Project: Towards Scientific Standards and Principles-Based Coordination in Biomedical Ontology Development.
1 How Ontologies Create Research Communities Barry Smith
1 The Canonical Life Barry Smith
1 Ontology (Science) Barry Smith University at Buffalo
Limning the CTS Ontology Landscape Barry Smith 1.
Ontological realism as a strategy for integrating ontologies Ontology Summit February 7, 2013 Barry Smith 1.
Intelligence Ontology A Strategy for the Future Barry Smith University at Buffalo
Why we need the OBO Core Michael Ashburner, Suzanna Lewis and Barry Smith.
1 How Ontologies Create Research Communities Barry Smith University at Buffalo
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
Alan Ruttenberg PONS R&D Task force Alan Ruttenberg Science Commons.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
1 Ontology (Science) vs. Ontology (Engineering) Barry Smith University at Buffalo
Introduction to Biomedical Ontology for Imaging Informatics Barry Smith, PhD, FACMI University at Buffalo May 11, 2015.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
The Plant Ontology: Development of a Reference Ontology for all Plants Plant Ontology Consortium Members and Curators*: Laurel D.
How to integrate data Barry Smith. The problem: many, many silos DoD spends more than $6B annually developing a portfolio of more than 2,000 business.
2 3 where in the body ? where in the cell ?
About ontologies Melissa Haendel. And who am I that I am giving you this talk? Melissa Haendel Anatomist, developmental neuroscientist, molecular biologist,
Ontology and the Semantic Web Barry Smith August 26,
What is an ontology and Why should you care? Barry Smith 1.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Need for common standard upper ontology
Introduction to Biomedical Ontology for Imaging Informatics Barry Smith, PhD, FACMI University at Buffalo May 11, 2015.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
1 Ontology (Science) vs. Ontology (Engineering) Barry Smith University at Buffalo
OBO Foundry Principles BFO RO Barry Smith 1. OBO Foundry Principles  open  common formal language (OBO Format, OWL DL, CL)  commitment to collaboration.
Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1.
Basic Formal Ontology Barry Smith August 26, 2013.
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
1 Standards and Ontology Barry Smith
What is an ontology and Why should you care?
Intelligence Ontology: A Strategy for the Future
OBO Foundry Update: April 2010
Presentation transcript:

1 Introduction to Ontology Barry Smith

Who am I? NCBO: National Center for Biomedical Ontology (NIH Roadmap Center) 2 Stanford Medical Informatics University of San Francisco Medical Center Berkeley Drosophila Genome Project Cambridge University Department of Genetics The Mayo Clinic University at Buffalo Department of Philosophy

Who am I? NYS Center of Excellence in Bioinformatics and Life Sciences Ontology Research Group Buffalo Clinical and Translational Science Institute (CTSI) 3

Who am I? Cleveland Clinic Semantic Database Gene Ontology Ontology for Biomedical Investigations Open Biomedical Ontologies Consortium Institute for Formal Ontology and Medical Information Science BIRN Ontology Task Force... 4

5

6 natural language labels to make the data cognitively accessible to human beings and algorithmically tractable to computers

7 compare: legends for maps

8 common legends allow (cross-border) integration

9 ontologies are legends for data

10 legends help human beings use and understand complex representations of reality help human beings create useful complex representations of reality help computers process complex representations of reality help glue data together

11 annotations using common ontologies can yield integration of image data

12 computationally tractable legends help human beings find things in very large complex representations of reality

13 where in the body ? where in the cell ? what kind of organism ? what kind of disease process ?

14 to yield: distributed accessibility of the data to humans reasoning with the data cumulation for purposes of research incrementality and evolvability integration with clinical data Creating broad-coverage semantic annotation systems for biomedicine

15

16

The Gene Ontology

18 The Gene Ontology

19

20

21 The Idea of Common Controlled Vocabularies MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity

22 The Idea of Common Controlled Vocabularies MouseEcotope GlyProt DiabetInGene GluChem Holliday junction helicase complex

Multiple kinds of data in multiple kinds of silos Lab / pathology data Electronic Health Record data Clinical trial data Patient histories Medical imaging Microarray data Protein chip data Flow cytometry Mass spec Genotype / SNP data 23

How to find your data? How to find other people’s data? How to reason with data when you find it? How to work out what data does not yet exist? 24

Multiple kinds of standardization for data Terminologies (SNOMED, UMLS) CDEs (Clinical research) Information Exchange Standards (HL7 RIM) LIMS (LOINC) MGED standards for microarray data, etc. 25

26 how solve the problem of making such data queryable and re-usable by others to address NIH mandates? part of the solution must involve: standardized terminologies and coding schemes

27 most successful, thus far: UMLS collection of separate terminologies built by trained experts massively useful for information retrieval and information integration UMLS Metathesaurus a system of post hoc mappings between overlapping source vocabularies

28 for UMLS local usage respected regimentation frowned upon cross-framework consistency not important no concern to establish consistency with basic science different grades of formal rigor, different degrees of completeness, different update policies

caBIG approach: BRIDG (top-down imposition) 29

31 where do you find scientifically validated information linking gene products and other entities represented in biochemical databases to semantically meaningful terms pertaining to disease, anatomy, development in different model organisms? A new approach for science

caBIG BRIDG 32 Top-down (master-model- based) Bottom-up (evidence-based) prospective standardization caBIG SNOMED HL7 OBO Foundry retrospective mapping UMLS (multiple authorities) NLP / data + text-mining

SNOMED Ultimately as data become attached to the samples (e.g., pathology data, genotypes) these will be linked to the patient records. 33

34 where in the body ? where in the cell ? what kind of organism ? what kind of disease process ?

35 ontologies = high quality controlled structured vocabularies for the annotation (description) of data

36 compare: legends for diagrams

or chemistry diagrams Prasanna, et al. Chemical Compound Navigator: A Web-Based Chem-BLAST, Chemical Taxonomy-Based Search Engine for Browsing Compounds PROTEINS: Structure, Function, and Bioinformatics 63:907–917 (2006) legends for chemistry diagrams

Ramirez et al. Linking of Digital Images to Phylogenetic Data Matrices Using a Morphological Ontology Syst. Biol. 56(2):283–294, 2007

39 The Network Effects of Synchronization MouseEcotope GlyProt DiabetInGene GluChem Holliday junction helicase complex

40 Five bangs for your GO buck 1.based in biological science 2.incremental approach (evidence-based evolutionary pathway) 3.cross-species data comparability (human, mouse, yeast, fly...) 4.cross-granularity data integration (molecule, cell, organ, organism) 5.cumulation of scientific knowledge in algorithmically tractable form, links people to software

41 Model organism databases employ scientific curators who use the experimental observations reported in the biomedical literature to associate GO terms with entries in gene product and other molecular biology databases ($4 mill. p.a. NIH funding) The methodology of annotations

42 How to extend the GO methodology to other domains of clinical and translational medicine?

43 the problem existing clinical vocabularies are of variable quality and low mutual consistency current proliferation of tiny ontologies by different groups with urgent annotation needs

44 the solution establish common rules governing best practices for creating ontologies in coordinated fashion, with an evidence- based pathway to incremental improvement

45 How to build an ontology work with scientists to create an initial top-level classification find ~50 most commonly used terms corresponding to types in reality arrange these terms into an informal is_a hierarchy according to the universality principle A is_a B  every instance of A is an instance of B fill in missing terms to give a complete hierarchy (leave it to domain scientists to populate the lower levels of the hierarchy)

46 a shared portal for (so far) 58 ontologies (low regimentation)  NCBO BioPortal First step (2003)

47

48 OBO now the principal entry point for creation of web-accessible biomedical data OBO and OBOEdit low-tech to encourage users Simple (web-service-based) tools created to support the work of biologists in creating annotations (data entry) OBO  OWL DL converters make OBO Foundry annotated data immediately accessible to Semantic Web data integration projects

49 Second step (2004): reform efforts initiated, e.g. linking GO formally to other ontologies and data sources id: CL: name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL: relationship: develops_from CL: relationship: develops_from CL: GO Cell type New Definition + = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

50 The OBO Foundry Third step (2006)

51 OntologyScopeURLCustodians Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgi- bin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman Chemical Entities of Bio- logical Interest (ChEBI) molecular entitiesebi.ac.uk/chebi Paula Dematos, Rafael Alcantara Common Anatomy Refer- ence Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland, Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.netFuGO Working Group Gene Ontology (GO) cellular components, molecular functions, biological processes Ontology Consortium Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos Protein Ontology (PrO) protein types and modifications (under development)Protein Ontology Consortium Relation Ontology (RO) relationsobo.sf.net/relationshipBarry Smith, Chris Mungall RNA Ontology (RnaO) three-dimensional RNA structures (under development)RNA Ontology Consortium Sequence Ontology (SO) properties and features of nucleic sequences song.sf.netKaren Eilbeck

52 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Building out from the original GO

53 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) initial OBO Foundry coverage GRANULARITY RELATION TO TIME

54 Continuants (aka endurants) have continuous existence in time preserve their identity through change exist in toto whenever they exist at all Occurrents (aka processes) have temporal parts unfold themselves in successive phases exist only in their phases

55 You are a continuant Your life is an occurrent You are 3-dimensional Your life is 4-dimensional

56 Dependent entities require independent continuants as their bearers There is no run without a runner There is no grin without a cat

57 Dependent vs. independent continuants Independent continuants (organisms, buildings, environments) Dependent continuants (quality, shape, role, propensity, function, status, power, right)

58 All occurrents are dependent entities They are dependent on those independent continuants which are their participants (agents, patients, media...)

59 BFO Top-Level Ontology Continuant Occurrent (always dependent on one or more independent continuants) Independent Continuant Dependent Continuant

60 = A representation of top-level types Continuant Occurrent Independent Continuant Dependent Continuant cell component biological process molecular function

61 Top-Level Ontology Continuant Occurrent Independent Continuant Dependent Continuant Functioning Side-Effect, Stochastic Process,... Function

62 Top-Level Ontology Continuant Occurrent Independent Continuant Dependent Continuant Functioning Side-Effect, Stochastic Process,... Function

63 Top-Level Ontology Continuant Occurrent Independent Continuant Dependent Continuant Quality Function Spatial Region Functioning Side-Effect, Stochastic Process,... instances (in space and time)

64 CRITERIA  The ontology is open and available to be used by all.  The ontology is in, or can be instantiated in, a common formal language.  The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap. CRITERIA

65 CRITERIA  UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement.  ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary.

66 communities must work together to ensure consistency  orthogonality  modular development plus additivity of annotations: if we annotate a database or body of literature with one OBO Foundry ontology, we should be able to add annotations from a second such ontology without conflicts ontologies do not need to create tiny theories of anatomy or chemistry within themselves ORTHOGONALITY

67 CRITERIA  IDENTIFIERS: The ontology possesses a unique identifier space within OBO.  VERSIONING: The ontology provider has procedures for identifying distinct successive versions.  The ontology includes textual definitions for all terms. CRITERIA

68  CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content.  DOCUMENTATION: The ontology is well- documented.  USERS: The ontology has a plurality of independent users. CRITERIA

69  COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology CRITERIA

70  OBO Foundry is serving as a benchmark for improvements in discipline-focused terminology resources  yielding callibration of existing terminologies and data resources and alignment of different views Consequences

71 Foundry ontologies all work in the same way all are built to represent the types existing in a pre- existing domain and the relations between these types in a way which can support reasoning –we have data –we need to make this data available for semantic search and algorithmic processing –we create a consensus-based ontology for annotating the data –and ensure that it can interoperate with Foundry ontologies for neighboring domains

72 Mature OBO Foundry ontologies (now undergoing reform) Cell Ontology (CL) Chemical Entities of Biological Interest (ChEBI) Foundational Model of Anatomy (FMA) Gene Ontology (GO) Phenotypic Quality Ontology (PaTO) Relation Ontology (RO) Sequence Ontology (SO)

73 Ontologies being built to satisfy Foundry principles ab initio Ontology for Clinical Investigations (OCI) Common Anatomy Reference Ontology (CARO) Ontology for Biomedical Investigations (OBI) Protein Ontology (PRO) RNA Ontology (RnaO) Subcellular Anatomy Ontology (SAO)

74 Ontologies in planning phase Biobank/Biorepository Ontology (BrO, part of OBI) Environment Ontology (EnvO) Immunology Ontology (ImmunO) Infectious Disease Ontology (IDO) Mouse Adult Neurogenesis Ontology (MANGO)

OBO Foundry provides a method for handling legacy databases 75

Senselab/NeuronDB* NeuronDB comprehends three types of neuronal properties: voltage gated conductances neurotransmitter receptors neurotransmitter substances Many questions immediately arise: what are receptors? Proteins? Protein complexes? The Foundry framework provides an opportunity to evaluate such choices. 76 *

Senselab/NeuronDB The GO Molecular Function (MF) ontology already has classes such as receptor activity (GO_ ) plus subclasses describing receptor activities already referred to in NeuronDB. This provides a roadmap for further development. Review the 130 receptor classes to see if they exist in MF, where not, create subclasses and submit to GO for future inclusion. We can then e.g. take advantage of GO Annotations to find the proteins that correspond to these receptor classes in different species. 77

OBO Foundry Success Story Model organism research seeks results valuable for the understanding of human disease. This requires the ability to make reliable cross- species comparisons, and for this anatomy is crucial. But different MOD communities have developed their anatomy ontologies in uncoordinated fashion. 78

Multiple axes of classification Functional: cardiovascular system, nervous system Spatial: head, trunk, limb Developmental: endoderm, germ ring, lens placode Structural: tissue, organ, cell Stage: developmental staging series

80 Developmental terms are often lumped together for lack of a way to categorize them Stages are represented in a variety of ways. Terms can be children of superstages, stages can be integrated into each term, or stages can be assigned to terms from a separate ontology

Ontologies facilitate grouping of annotations brain 20 hindbrain 15 rhombomere 10 Query brain without ontology 20 Query brain with ontology 45 81

CARO – Common Anatomy Reference Ontology for the first time provides guidelines for model organism researchers who wish to achieve comparability of annotations for the first time provides guidelines for those new to ontology work See Haendel et al., “CARO: The Common Anatomy Reference Ontology”, in: Burger (ed.), Anatomy Ontologies for Bioinformatics: Springer, in press. 82

83 CARO-conformant ontologies already in development: Fish Multi-Species Anatomy Ontology (NSF funding received) Ixodidae and Argasidae (Tick) Anatomy Ontology Mosquito Anatomy Ontology (MAO) Spider Anatomy Ontology Xenopus Anatomy Ontology (XAO) undergoing reform: Drosophila and Zebrafish Anatomy Ontologies

OBI / OCI Ontology for Biomedical Investigations overarching terminology resource for MIBBI Foundry Ontology for Clinical Investigations collaboration with EPOCH ontology for clinical trial management and with CDISC (FDA mandated vocabulary for clinical trial reports) 84

85 INDEPENDENT CONTINUANTS organism system organ organ part tissue cell acellular anatomical structure biological molecule genome DEPENDENT CONTINUANTS physiology (functions) pathology acute stage progressive stage resolution stage next step: repertoire of disease ontologies built out of OBO Foundry elements

86 Scope of Draft Ontology for Multiple Sclerosis