Presentation on theme: "BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics."— Presentation transcript:
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics Institute (EBI) NET Project – Postdoctoral Ontologist www.ebi.ac.uk/net-project
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 2 Metabolomics Standards Initiative (MSI) –Describe metabolomics laboratory workflows Minimal requirements, augmenting exchange formats –Ontology working group under OBI… Ontology for Biomedical Investigations (OBI) –Larger collaborative, multi-domains effort Brings together p various omics and biomedical communities –Describe general laboratory workflow Experimental Design, protocols, data analysis etc. –Developed under OBO Foundry… Open Biomedical Ontologies (OBO) Foundry –Provides best practices for ontology engineering –Creates a complete suite of orthogonal and interoperable ontologies Over 60 ontologies and ~10 core foundry Collaborative Efforts – Scenario
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 3 Collaborative Efforts – Challenges Create networked orthogonal ontologies –Integrating MSI ontology with OBI –Integrating OBI with BFO and other OBO-Foundry ontologies, e.g. PATO (qualities), ChEBI (chemicals), … Integrate modular developments –Parallel branch development –OWL-import, referencing Improve the communication among developers –Database developers and biologists –Semantic web and text miners -> We need common naming conventions - To harmonize the appearance and design of modules
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 4 Representational artefacts built according to different - Engineering methodologies MethOntology, Tove, Enterprise, … –Engineering Tools Protégé, OBO-Edit, OntoEdit, … –Representation languages and semantics OBO, OWL and CLIPS-Frames, … - Engineering schools and philosophies GO, semantics web, AI (Protégé Frames), … Manchester, Saarbruecken, Stanford, Trento, Karlsruhe, … Realists, Conceptualists, … As diverse as these backgrounds are the naming conventions applied ! –Diverse ad hoc ways to name what is represented Common Naming Conventions – Why?
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 5 Separator Space vs. underscore vs. nil Case UpperCamelCase vs. underscore Namespace prefix Acronyms Synonyms Administrative helper classes Compound name Singular vs. Plural, xref Instance convention ID convention uppercase prefix, underscore, number vs. lowercase prefix, colon, string or no name just ID string Omissions
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 6 Semantic web best practices and deployment group web –Format specific: OWL –Limited visibility: information dispersed and embedded into many documents BioPax manual –Limited visibility: naming conventions only implicitly dealt with in general documentation –Implementation specific: naming conventions discussed at implementation level (Protégé/OWL) –Limited coverage: IDs addressed marginally (page 53, Technical Notes RDF:ID), no conventions on relations GO developers style guide –Format specific: mainly OBO; has its own definition for namespace which differs from the one in OWL/semantic web –Limited visibility: naming conventions dispersed throughout websites, e.g. GO namespace, term names and identifiers are explained in different documents Existing Naming Conventions – Status
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 7 ISO-Standards –Information overflow: About 40 documents that contain closely related guidelines –Limited access: commercial ANSI/ISO Z39.19-2005 –Semantics specific: Controlled vocabulary, e.g. about terms, not classes –Limited coverage: No term ID handling or versioning addressed Law and order - Assessing and enforcing compliance with ontological modeling principles in the Foundational Model of Anatomy (FMA) S Zhang, O Bodenreider, Computers in Biology and Medicine 36 (2006) –Scientific domain dependent: anatomy –Hardly visible: paper access Acceptance and visibility is limited to specific target community We need universally applicable conventions Existing Naming Conventions – Status
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 8 Overcome diversity and fragmentation –Collect existing naming conventions Make them accessible via repository –Review and compare Create a single common document –Distil universally valid aspects for OWL and OBO –Ensure visibility for target domains –Move towards a common resource for the OBO Foundry groups Provide best practice guidelines –Provide robust names for ontology classes –Not a knowledge representation language for names, like e.g. HUGO does for gene symbols (awg Tg(GBtslenv)832Pkw ) Engage in discussion with other groups –A two phases approach … Our Goals
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 9 Phase 1: Straw man document -Working towards naming conventions for use in controlled vocabulary and ontology engineering See Bio-Ontologies SIG Proceedings, p. 29-32 -Created for MSI Ontology WG, targeting the larger OBI group -Implementation and format independent Phase 2: Survey OBO Foundry groups -Questionnaire (work in progress) Ontology and engineering process Current practice in naming entities Envisioned benefits of common conventions In depth questions on particular conventions –Results to be posted under OBO Foundry wiki Towards Common Naming Conventions
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 10 Explicit and concise names –Avoid omissions and ellipses Plant Ontology (PO) used 'cell' for 'plant cell' –Avoid negative names like non-separation device –Avoid ambiguous words 30 meanings of set; e.g. plurality protocol set or action parameter set –Brand name convention: use [company name+brand name+superclass] US 2 becomes Bruker US 2 NMR magnet To ensure shared understanding of intended meaning Typographical issues –Use lowercase as in natural language most flexible, e.g. pH, DNA_hybridisation (no acronym boarder problems) –Avoid punctuation, sub/superscripts –Resolve special characters consistently, e.g. ->alpha To ensure readability, reduce diversity in appearance Naming Convention Straw Man - Examples
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 11 Lexical issues –Reuse words and avoid synonyms within compound names x_part_of_process, y_part_of_process and z_part_of_process instead of x_component_of_process, y_portion_of_process, z_part_of_process To decrease learning- and search-burden on user side, to ease text mining by reducing string variability –Use underscore or space separator (instead of CamelCase) prevents distortions like CapNMRProbe and pHValue, yet allows brandnames like SampleJet To ease text mining and readability (demarked word borders) –Use singular nominal word form Avoid inconsistencies like biphenyl (CHEBI:17097) under a IUPAC required biphenyls (CHEBI:22888) To harmonize appearance, to avoid redundancy, to ease ontology cross-referencing and import Naming Convention Straw Man - Examples
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 12 Syntactic issues –Qualifier order: put the qualifier term before the part being qualified ? NMR_instrument in place of instrument_for_NMR –Helper strings in class names: establish general ones ? E.g. sensu postfix in GO to indicate species specificity, fruiting body development (sensu Bacteria) (GO:0030583) Semantic issues –Administrative helper classes: how to name these metadata bins ? unclassified (OBI_200067), ChEBI_objects (OBI_336), toBeDiscussed, _collected_relations –Identifiers and namespace: are conventions useful ? OBI uses [group prefix+underscore+unique number], e.g. OBI_334 BFO uses [meaningful string], e.g. IndependentContinuant Common Naming Convention – Open Issues
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 13 Communication has improvedp … -In geographically distributed, collaborative efforts -Between developers from different domains and backgrounds Appearance of what we represent has been normalized - Not just a matter of aesthetics - Manoeuvring within the hierarchy became faster … we further envision … Facilitated access to ontologies through meta-tools Reducing diversity with which ontology libraries and tools have to cope with, e.g. OLS, BioPortal, PROMPT and text mining tools Facilitating ontology integration and cross-referencing Comparison, alignment (OWL-import) and mapping Serving as guideline for new communities Common Naming Convention - Benefits
BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 14 Authors and those contributing to the discussion –Susanna-Assunta Sansone, Philippe Rocca-Serra, Suzi Lewis, Waclaw Kusnierczyk, Barry Smith, Chris Mungall, Jane Lomax, Robert Stevens, Frank Gibson, Luisa Montecchi-Palazzi, Dietrich Rebholz Members of MSI, PSI, OBI groups and OBO Foundry coordinators –http://msi-ontology.sf.net –http://psidev.sf.net –http://obi.sf.net –http://obofoundry.org Further info -Working towards naming conventions for use in controlled vocabulary and ontology engineering, Bio-Ontologies SIG Proceedings, p. 29-32 Funding sources (supporting my work) –UK BBSRC e-Science BB/D524283/1 and BB/E025080/1 –Semantic Mining NoE (visits to IFOMIS and Manchester) Acknowledgements and Resources
Your consent to our cookies if you continue to use this website.