Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 How Ontologies Create Research Communities Barry Smith University at Buffalo

Similar presentations


Presentation on theme: "1 How Ontologies Create Research Communities Barry Smith University at Buffalo"— Presentation transcript:

1 1 How Ontologies Create Research Communities Barry Smith University at Buffalo http://ontology.buffalo.edu/smith

2 Who am I? NCBO: National Center for Biomedical Ontology (NIH Roadmap Center) 2 Stanford Medical Informatics University of San Francisco Medical Center Berkeley Drosophila Genome Project Cambridge University Department of Genetics The Mayo Clinic University at Buffalo Department of Philosophy

3 Who am I? NYS Center of Excellence in Bioinformatics and Life Sciences Ontology Research Group Buffalo Clinical and Translational Science Institute (CTSI) Duke/Dallas/Houston CTSA Ontology Consortium 3

4 Who am I? Cleveland Clinic Semantic Database Gene Ontology Ontology for Biomedical Investigations Open Biomedical Ontologies Consortium Institute for Formal Ontology and Medical Information Science BIRN Ontology Task Force... 4

5 Multiple kinds of data in multiple kinds of silos Lab / pathology data Electronic Health Record data Clinical trial data Patient histories Medical imaging Microarray data Protein chip data Flow cytometry Mass spec Genotype / SNP data 5

6 How to find your data? How to find other people’s data? How to reason with data when you find it? How to work out what data does not yet exist? 6

7 Multiple kinds of standardization for data Terminologies (SNOMED, UMLS) CDEs (Clinical research) Information Exchange Standards (HL7 RIM) LIMS (LOINC) MGED standards for microarray data, etc. 7

8 8 how solve the problem of making such data queryable and re-usable by others to address NIH mandates? part of the solution must involve: standardized terminologies and coding schemes

9 9 most successful, thus far: UMLS collection of separate terminologies built by trained experts massively useful for information retrieval and information integration UMLS Metathesaurus a system of post hoc mappings between overlapping source vocabularies

10 10 for UMLS local usage respected regimentation frowned upon cross-framework consistency not important no concern to establish consistency with basic science different grades of formal rigor, different degrees of completeness, different update policies

11 caBIG approach: BRIDG (top-down imposition) 11

12

13 13 for science where do you find scientifically validated information linking gene products and other entities represented in biochemical databases to semantically meaningful terms pertaining to disease, anatomy, development in different model organisms? A new approach

14 14

15 15 where in the body ? where in the cell ?

16 16 where in the body ? where in the cell ? what kind of organism ?

17 17 where in the body ? where in the cell ? what kind of organism ? what kind of disease process ?

18 18 = we need ontologies we need semantic annotation of data

19 19 = natural language labels designed for use in annotations to make the data cognitively accessible to human beings and algorithmically tractable to computers

20 20 compare: legends for maps

21 21 compare: legends for maps common legends allow (cross-border) integration

22 22 ontologies are legends for data

23 23 ontologies = high quality controlled structured vocabularies for the annotation (description) of data

24 24 compare: legends for diagrams

25 or chemistry diagrams Prasanna, et al. Chemical Compound Navigator: A Web-Based Chem-BLAST, Chemical Taxonomy-Based Search Engine for Browsing Compounds PROTEINS: Structure, Function, and Bioinformatics 63:907–917 (2006) legends for chemistry diagrams

26 Ramirez et al. Linking of Digital Images to Phylogenetic Data Matrices Using a Morphological Ontology Syst. Biol. 56(2):283–294, 2007

27 27 computationally tractable legends help integrate complex representations of reality help human beings find things in complex representations of reality help computers reason with complex representations of reality

28 28

29 29

30 The Gene Ontology

31 31 what cellular component? what molecular function? what biological process?

32 32 The Idea of Common Controlled Vocabularies MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity

33 33 The Network Effects of Synchronization MouseEcotope GlyProt DiabetInGene GluChem Holliday junction helicase complex

34 34 Five bangs for your GO buck 1.based in biological science 2.incremental approach (evidence-based evolutionary pathway) 3.cross-species data comparability (human, mouse, yeast, fly...) 4.cross-granularity data integration (molecule, cell, organ, organism) 5.cumulation of scientific knowledge in algorithmically tractable form, links people to software

35 35 Model organism databases employ scientific curators who use the experimental observations reported in the biomedical literature to associate GO terms with entries in gene product and other molecular biology databases ($4 mill. p.a. NIH funding) The methodology of annotations

36 36 what cellular component? what molecular function? what biological process?

37 37 How to extend the GO methodology to other domains of clinical and translational medicine?

38 38 the problem existing clinical vocabularies are of variable quality and low mutual consistency current proliferation of tiny ontologies by different groups with urgent annotation needs

39 http://ontologist.com39

40 40 the solution establish common rules governing best practices for creating ontologies in coordinated fashion, with an evidence- based pathway to incremental improvement

41 41 a shared portal for (so far) 58 ontologies (low regimentation) http://obo.sourceforge.nethttp://obo.sourceforge.net  NCBO BioPortal First step (2003)

42 42

43 43 OBO now the principal entry point for creation of web-accessible biomedical data OBO and OBOEdit low-tech to encourage users Simple (web-service-based) tools created to support the work of biologists in creating annotations (data entry) OBO  OWL DL converters make OBO Foundry annotated data immediately accessible to Semantic Web data integration projects

44 44 Second step (2004): reform efforts initiated, e.g. linking GO formally to other ontologies and data sources id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 GO Cell type New Definition + = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

45 45 The OBO Foundry http://obofoundry.org/ Third step (2006)

46 46 OntologyScopeURLCustodians Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgi- bin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman Chemical Entities of Bio- logical Interest (ChEBI) molecular entitiesebi.ac.uk/chebi Paula Dematos, Rafael Alcantara Common Anatomy Refer- ence Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland, Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.netFuGO Working Group Gene Ontology (GO) cellular components, molecular functions, biological processes www.geneontology.orgGene Ontology Consortium Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos Protein Ontology (PrO) protein types and modifications (under development)Protein Ontology Consortium Relation Ontology (RO) relationsobo.sf.net/relationshipBarry Smith, Chris Mungall RNA Ontology (RnaO) three-dimensional RNA structures (under development)RNA Ontology Consortium Sequence Ontology (SO) properties and features of nucleic sequences song.sf.netKaren Eilbeck

47 47 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Building out from the original GO

48 48 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) initial OBO Foundry coverage GRANULARITY RELATION TO TIME

49 49 CRITERIA  The ontology is open and available to be used by all.  The ontology is in, or can be instantiated in, a common formal language.  The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap. CRITERIA

50 50 CRITERIA  UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement.  ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary.

51 51 for science communities must work together to ensure consistency  orthogonality  modular development plus additivity of annotations: if we annotate a database or body of literature with one OBO Foundry ontology, we should be able to add annotations from a second such ontology without conflicts ontologies do not need to create tiny theories of anatomy or chemistry within themselves ORTHOGONALITY

52 52 CRITERIA  IDENTIFIERS: The ontology possesses a unique identifier space within OBO.  VERSIONING: The ontology provider has procedures for identifying distinct successive versions.  The ontology includes textual definitions for all terms. CRITERIA

53 53  CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content.  DOCUMENTATION: The ontology is well- documented.  USERS: The ontology has a plurality of independent users. CRITERIA

54 54  COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology CRITERIA

55 55  OBO Foundry is serving as a benchmark for improvements in discipline-focused terminology resources  yielding callibration of existing terminologies and data resources and alignment of different views Consequences

56 56 Foundry ontologies all work in the same way all are built to represent the types existing in a pre- existing domain and the relations between these types in a way which can support reasoning –we have data –we need to make this data available for semantic search and algorithmic processing –we create a consensus-based ontology for annotating the data –and ensure that it can interoperate with Foundry ontologies for neighboring domains

57 57 Mature OBO Foundry ontologies (now undergoing reform) Cell Ontology (CL) Chemical Entities of Biological Interest (ChEBI) Foundational Model of Anatomy (FMA) Gene Ontology (GO) Phenotypic Quality Ontology (PaTO) Relation Ontology (RO) Sequence Ontology (SO)

58 58 Ontologies being built to satisfy Foundry principles ab initio Ontology for Clinical Investigations (OCI) Common Anatomy Reference Ontology (CARO) Ontology for Biomedical Investigations (OBI) Protein Ontology (PRO) RNA Ontology (RnaO) Subcellular Anatomy Ontology (SAO)

59 59 Ontologies in planning phase Biobank/Biorepository Ontology (BrO, part of OBI) Environment Ontology (EnvO) Immunology Ontology (ImmunO) Infectious Disease Ontology (IDO) Mouse Adult Neurogenesis Ontology (MANGO)

60 OBO Foundry Success Story Model organism research seeks results valuable for the understanding of human disease. This requires the ability to make reliable cross- species comparisons, and for this anatomy is crucial. But different MOD communities have developed their anatomy ontologies in uncoordinated fashion. 60

61 Ontologies facilitate grouping of annotations brain 20 hindbrain 15 rhombomere 10 Query brain without ontology 20 Query brain with ontology 45 61

62 CARO – Common Anatomy Reference Ontology for the first time provides guidelines for model organism researchers who wish to achieve comparability of annotations for the first time provides guidelines for those new to ontology work See Haendel et al., “CARO: The Common Anatomy Reference Ontology”, in: Burger (ed.), Anatomy Ontologies for Bioinformatics: Springer, in press. 62

63 63 CARO-conformant ontologies already in development: Fish Multi-Species Anatomy Ontology (NSF funding received) Ixodidae and Argasidae (Tick) Anatomy Ontology Mosquito Anatomy Ontology (MAO) Spider Anatomy Ontology Xenopus Anatomy Ontology (XAO) undergoing reform: Drosophila and Zebrafish Anatomy Ontologies

64 64 June 2006: establishment of MICheck: reflects growing need for prescriptive checklists specifying the key information to include when reporting experimental results (concerning methods, data, analyses and results). Minimal Information Checklists

65 65  MIBBI: ‘a common resource for minimum information checklists’ analogous to OBO / NCBO BioPortal  MIBBI Foundry: will create ‘a suite of self- consistent, clearly bounded, orthogonal, integrable checklist modules’ * * Taylor CF, et al. Nature Biotech, in press The vision is spreading

66 66 Transcriptomics (MIAME Working Group / MGED) Proteomics (Proteomics Standards Initiative) Metabolomics (Metabolomics Standards Initiative) Genomics and Metagenomics (Genomic Standards Consortium) In Situ Hybridization and Immunohistochemistry (MISFISHIE Working Group) Phylogenetics (Phylogenetics Community) RNA Interference (RNAi Community) Toxicogenomics (Toxicogenomics WG) Environmental Genomics (Environmental Genomics WG) Nutrigenomics (Nutrigenomics WG) Flow Cytometry (Flow Cytometry Community) MIBBI Foundry communities

67 OBI / OCI Ontology for Biological Investigations overarching terminology resource for MIBBI Foundry Ontology for Clinical Investigations collaboration with EPOCH ontology for clinical trial management and with CDISC (FDA mandated vocabulary for clinical trial reports) 67

68 68 INDEPENDENT CONTINUANTS organism system organ organ part tissue cell acellular anatomical structure biological molecule genome DEPENDENT CONTINUANTS physiology (functions) pathology acute stage progressive stage resolution stage next step: repertoire of disease ontologies built out of OBO Foundry elements

69 69 Draft Ontology for Multiple Sclerosis to apprehend what is unknown requires a complete demarcation of the relevant space of alternatives

70 CTSA Ontology Consortium Duke Clinical Research Institute (DCRI) Dallas University of Texas Southwestern Medical Center Clinical and Translational Science Initiative Division of Biomedical Informatics University of Texas Health Science Center at Houston Center for Clinical and Translational Sciences 70

71 Multiple kinds of standardization for data Terminologies (SNOMED, UMLS) CDEs (Clinical research) Ontologies (Biology, Disease Models) Information Exchange Standards (HL7 RIM) LIMS (LOINC) Duke DCRI project to deal with 3 of these 71

72 Houston CTSA Biomedical Informatics Specific aim 1: To design and implement the biological data interface... based on existing biological ontologies, specifically those included in the NIH Roadmap funded Open Biomedical Ontologies (OBO) project, and [to] leverage previous informatics research in ontology management. 72

73 Houston CTSA proposal providing a coherent and integrated framework for CTSI investigators to integrate disparate sources of data, improve the communication among researchers, and establish better contact between researchers and the community. Of critical importance, by combining isolated data clusters the biomedical informatics component will empower investigators to redefine human disease and the response to diagnostic and therapeutic strategies through the use of combined clinical and molecular profiling. 73

74 PAR-07-425: Data Ontologies for Biomedical Research (R01) Adoption of ontologies also depends on the ontology being in a format that is broadly supported, fully machine interpretable and not subject to intellectual property restrictions.... Another determinate of ontology acceptance is the degree to which the ontology conforms to best practices governing ontology design and construction. Criteria have been developed, and are undergoing empirical validation, by the Vocabulary and Common Data Element Work Group of caBIG. Other criteria have been specified by the OBO Foundry (http://obofoundry.org). 74

75 caBIG BRIDG 75 Top-down (master-model- based) Bottom-up (evidence-based) prospective standardization caBIG SNOMED HL7 OBO Foundry retrospective mapping UMLS (multiple authorities) NLP / data + text-mining

76 SNOMED Ultimately as data become attached to the samples (e.g., pathology data, genotypes) these will be linked to the patient records. 76


Download ppt "1 How Ontologies Create Research Communities Barry Smith University at Buffalo"

Similar presentations


Ads by Google