Ontologies 101 Melissa Haendel and Jim Balhoff INCF taskforce cross program meeting Stockholm, Aug. 30, 2013.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
More than one way to dissect an animal Melissa Haendel ZFIN Scientific Curator.
+ OWL for annotators David Osumi-Sutherland. + What is OWL? Web Ontology Language Can express everything in OBO and more. Certified web standard Fast.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Ontologies - Design principles Cartic Ramakrishnan LSDIS Lab University of Georgia.
+ From OBO to OWL and back again – a tutorial David Osumi-Sutherland, Virtual Fly Brain/FlyBase Chris Mungall – GO/LBL.
Application of OBO Foundry Principles in GO Chris Mungall Lawrence Berkeley Labs NCBO GO Consortium.
Automated tools to help construction of Trait Ontologies Chris Mungall Monarch Initiative Gene.
Ontology Notes are from:
Chapter 8: Web Ontology Language (OWL) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley, 2005.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
The RNA Ontology RNAO Colin Batchelor Neocles Leontis May 2009 Eckart, Colin and Jane In Cambridge.
From SHIQ and RDF to OWL: The Making of a Web Ontology Language
GO Ontology Editing Workshop: Using Protege and OWL Hinxton Jan 2012.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Chapter 6 Understanding Each Other CSE 431 – Intelligent Agents.
Protege OWL Plugin Short Tutorial. OWL Usage The world wide web is a natural application area of ontologies, because ontologies could be used to describe.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
8/11/2011 Web Ontology Language (OWL) Máster Universitario en Inteligencia Artificial Mikel Egaña Aranguren 3205 Facultad de Informática Universidad Politécnica.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
1 Representing Data with XML September 27, 2005 Shawn Henry with slides from Neal Arthorne.
OWL and SDD Dave Thau University of Kansas
Logics for Data and Knowledge Representation
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
Imports, MIREOT Contributors: Carlo Torniai, Melanie Courtot, Chris Mungall, Allen Xiang.
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Principles and Practice of Ontology Development: Making Definitions Computable Chris Mungall LBL.
OWL 2 in use. OWL 2 OWL 2 is a knowledge representation language, designed to formulate, exchange and reason with knowledge about a domain of interest.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Semantic Web Ontology Design Pattern Li Ding Department of Computer Science Rensselaer Polytechnic Institute October 3, 2007 Class notes for CSCI-6962.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
+ CARO 2.0 & FUNCARO David Osumi-Sutherland. + Review of CARO (v1) Many definitions are complicated and opaque: ‘anatomical group: “[An] anatomical structure.
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
Semantic Web - an introduction By Daniel Wu (danielwujr)
Ontological Foundations of Biological Continuants Stefan Schulz, Udo Hahn Text Knowledge Engineering Lab University of Jena (Germany) Department of Medical.
The “über-ontology” (Uberon) Melissa Häendel, Chris Müngall, George Gkoütos Cell Ontology Workshop May, 2010.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
To Boldly GO… Amelia Ireland GO Curator EBI, Hinxton, UK.
Artificial Intelligence 2004 Ontology
2 3 where in the body ? where in the cell ?
About ontologies Melissa Haendel. And who am I that I am giving you this talk? Melissa Haendel Anatomist, developmental neuroscientist, molecular biologist,
OilEd An Introduction to OilEd Sean Bechhofer. Topics we will discuss Basic OilEd use –Defining Classes, Properties and Individuals in an Ontology –This.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Expanding species-specific anatomy ontologies to include the cell ontology Melissa Haendel (1), Ceri Van Slyke (1), Chris Mungall (2), Peiran Song (1),
+ From OBO to OWL and back again – a tutorial David Osumi-Sutherland, Virtual Fly Brain/FlyBase Chris Mungall – GO/LBL.
ONTOLOGY ENGINEERING Lab #3 – September 15,
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12 RDF, OWL, Minimax.
Approach to building ontologies A high-level view Chris Wroe.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
Ccs.  Ontologies are used to capture knowledge about some domain of interest. ◦ An ontology describes the concepts in the domain and also the relationships.
Syntax and semantics >AMYLASEE1 TGCATNGY A very simple FASTA file.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
The Semantic Web By: Maulik Parikh.
ece 627 intelligent web: ontology and beyond
The Common Anatomy Reference Ontology (CARO) and queries across species Melissa Haendel ZFIN.
ece 720 intelligent web: ontology and beyond
Information Networks: State of the Art
Presentation transcript:

Ontologies 101 Melissa Haendel and Jim Balhoff INCF taskforce cross program meeting Stockholm, Aug. 30, 2013

All about ontologies 1.What is an ontology? 2.A little logic 3.Upper ontologies 4.OWL and RDF 5.Data integration

MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity Common controlled vocabularies indicate the same meaning under different annotation circumstances

Any closed, prescribed list of terms used for classifying data What is a controlled vocabulary? Key Features:  Terms are not usually defined.  Relationships between the terms are not usually defined.  Can be a list. Key Features:  Terms are not usually defined.  Relationships between the terms are not usually defined.  Can be a list. Here is a CV of wines: Pinot noir, red, chardonnay, Chianti, Bordeaux, Riesling…. These are all different types- color, location, varietal, and are present in a list. Another example would be the map locations list at the end of your Gazeteer.

Any controlled vocabulary that is arranged in a hierarchy What is a Taxonomy? Key Features: Terms are not usually defined. Relationships between the terms are not usually defined. Terms are arranged in a hierarchy. Key Features: Terms are not usually defined. Relationships between the terms are not usually defined. Terms are arranged in a hierarchy. Here is a wine taxonomy: Wine Red merlot zinfandel cabernet pinot noir White chardonnay pinot gris Riesling Wine Red merlot zinfandel cabernet pinot noir White chardonnay pinot gris Riesling

A taxonomy that contains additional information about use of the terms What is a Thesaurus? Key Features: Terms are not usually defined. Relationships between the terms are not usually defined. Terms are arranged in a hierarchy. Statements about the terms are included such as scope notes or instructions for use. Key Features: Terms are not usually defined. Relationships between the terms are not usually defined. Terms are arranged in a hierarchy. Statements about the terms are included such as scope notes or instructions for use. Some well known thesauri are: WordNet, NCI cancer thesaurus, MeSH

A formal conceptualization of a specified domain of interest What is an ontology? Key Features: Terms are defined. Relationships between the terms are defined, allowing logical inference. Terms are arranged in a hierarchy. Expressed in a knowledge representation language such as RDFS, OBO, or OWL. Key Features: Terms are defined. Relationships between the terms are defined, allowing logical inference. Terms are arranged in a hierarchy. Expressed in a knowledge representation language such as RDFS, OBO, or OWL. Some well known ontologies are: SnoMED, Foundational Model of Anatomy, Gene Ontology, Linnean Taxonomy of species Reproduced with permission, Jason Freeny

The Ontology spectrum: Bottom line: you get what you pay for. OBO

Are ontologies about terms or things? What matters are things Data annotated to the right schema will be more consistent Dessert Ice cream gelato sno-cone soft-serve custard cream Cake torte double-layer galette angel food cake DES: DES: DES: DES: DES: DES: DES: DES: DES: DES: DES: DES: Frozen milk and/or cream with various sugars, and flavoring such as fresh fruit and nut purees.[1] Labels are handles, not things

Why build an ontology? A simple example Number of genes annotated to each of the following brain parts in an ontology: brain 20 part_of hindbrain 15 part_of rhombomere 10 Query brain without ontology 20 Query brain with ontology 45 Ontologies can facilitate grouping and retrieval of data

is_a entity organism cat mammal animal is_a human is_a instance_of PeanutChris Shaffer Classes, subclasses, and instances Subtyping relation is_a = SubClassOf OWL individuals:

How do you tell if it is an instance or a class? Is there more than one in existence? Is the entity referencing a group of things with common properties? Class or instance? There is only one Snoopy There is a class of things labeled “Snoopy toys” Class or instance? There is only one Alaska There is a class of things labeled “States” There is only one blue morpho in my specimen collection There is a class of things labeled “Blue morpho butterflies”

General Principle for Logical Definitions Definitions are of the following Genus-Differentia form: X = a Y which has one or more differentiating characteristics. where X is the is_a parent of Y. Definition: Blue cylinder = Cylinder that has color blue. Definition: cylinder = Surface formed by the set of lines perpendicular to a plane, which pass through a given circle in that plane. is_a Definition: Red cylinder = Cylinder that has color red.

The True Path Rule cuticle synthesis --[i] chitin metabolism cell wall biosynthesis --[i] chitin metabolism ----[i] chitin biosynthesis ----[i] chitin catabolism chitin metabolism --[i] chitin biosynthesis --[i] chitin catabolism --[i] cuticle chitin metabolism ----[i] cuticle chitin biosynthesis ----[i] cuticle chitin catabolism --[i] cell wall chitin metabolism ----[i] cell wall chitin biosynthesis ----[i] cell wall chitin catabolism GO Before:GO After: BUT: A fly chitin synthase gene could be annotated to chitin biosynthesis, and appear in a query for genes annotated to cell wall biosynthesis (and its children), which makes no sense because flies don't have cell walls. NOW: all the subClass terms can be followed up to chitin metabolism, but cuticle chitin metabolism terms do not trace back to cell wall terms, so all the paths are true. The pathway from a subClass all the way up to its top level parent(s) must be universally true.

Where does the True Path Rule come from? Transitivity. Some relations are transitive, and apply across all levels of the hierarchy. For example, a cat is_a mammal, and a mammal is_a vertebrate SO a cat is_a vertebrate => This is the true path rule and is because the is_a relation is transitive. Some properties are not transitive. For example, head has_quality round. and, head part_of organism. So is the organism round? Of course not! BUT, eyes are part_of head, and head part_of organism, SO eye part_of organism is true, because part_of is a tranistive relation. Relations are logically defined in a common relation ontology or within each ontology that uses them. ≠ >

Relationships and definitions A relationship from one class to another is a formalized part of its definition (an object property restriction in OWL) A subtype relation (is_a in OBO, SubClassOf in OWL) specifies necessary conditions for membership of a class. For example, finger part_of hand (all finger part_of some hand) states that a necessary condition of being in the class finger is to be part of some hand. So… if a finger exists, it is part of some hand. But…this does not mean that if a hand exists, it has as a part a finger.

There are many useful ways to classify parts of organisms:  its parts and their arrangement  its relation to other structures what is it: part of; connected to; adjacent to, overlapping?  its shape  its function  its developmental origins  its species or clade  its evolutionary history Cajal 1915, “Accept the view that nothing in nature is useless, even from the human point of view.”

Not all classification is useful Be practical: Build ontologies for what you need and for what can be reused About thirty years ago there was much talk that geologists ought only to observe and not theorise; and I well remember some one saying that at this rate a man might as well go into a gravel-pit and count the pebbles and describe the colours. C. Darwin

neuron anatomical structure lumen anatomical space epidural space anatomical space anatomical structure ✗ Some classes are declared to never share any instances in common OWL DisjointWith OBO: disjoint_from NO!

About reasoners A piece of software able to infer logical consequences from a set of asserted facts or axioms. They are used to check the logical consistency of the ontologies and to extend the ontologies with "inferred" facts or axioms For example, a reasoner would infer: Major premise: All mortals die. Minor premise: Some men are mortals. Conclusion: Some men die. Different reasoners can perform slightly differently. There are a number of reasoners to be aware of: ELK, HermiT, Pellet, etc.

Different kinds of definitions  An ontology is a collection of axioms An axiom is simply a sentence or a statement  Axioms can be  non-logical (aka “annotations” or “text definitions”) E.G. GO_ has synonym ‘mtDNA’ opaque to reasoners  logical well-defined semantics understood by reasoners Example: SubClassOf axioms Arguments can be classes or class expressions (for example, the class of things that are parts of the midbrain)

Classifying anatomy appendage antenna fore wing fore wing hind wing

Relationships record classifications too substantia nigra part_of some ‘midbrain’ trochlear nucleus ‘substantia nigra’ SubClassOf part_of some ‘midbrain’

The knowledge in an ontology can make the reasons for classification explicit Any sense organ that functions in the detection of smell is an olfactory sense organ sense organ capable_of some detection of smell olfactory sense organ

nose sense organ nose capable_of some detection of smell Classifying sense organ capable_of some detection of smell olfactory sense organ nose => These are necessary and sufficient conditions, also called an equivalent class axiom

Compositionality and avoiding asserted multiple inheritance We can logically define composed classes and create complex definitions from simpler ones  aka: building blocks, cross-products, logical definitions Descriptions can be composed at any time  Ontology construction time (pre-composition)  Annotation time (post-composition) Formal necessary and sufficient definitions + a reasoner  Automatic (and therefore manageable) classification  Requires subtype classification, so apart from the root term(s), no term should lack a superclass parent. Let the reasoner do the work!

Post-composed versus pre-composed anatomical entities are logically equivalent Named class: Plasma membrane of neuron plasma membrane and part_of some neuron Gene Ontology Relation Ontology Cell Ontology GenusDifferentia Has same semantic meaning as:

chemical entities Many perspectives, many ontologies – that overlap in content gross anatomy gross anatomy tissues cells cell anatomy cell anatomy proteins phenotypes clinical disorders processes physiological processes development reactions cellular processes behavior evolutionary characters evolutionary characters nervous system neural crest

Domain ontologies are organized according to upper ontologies (Basic Formal Ontology) that specify the general types of things that exist RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) => Classification according to these higher level types helps ensure the True Path Rule holds

The Common Anatomy Reference Ontology CARO is a structural classification based on granularity From the bottom up: Cell component Cell Tissue Multi-tissue structure From the top down: Organism subdivision Anatomical system Acellular structures CARO is an upper reference ontology that can be used to structure new anatomy ontologies

Example of complexity arising from multiple species-contexts erythrocyte cell nucleate cell enucleate cell not applicable in all contexts

Example of complexity arising from multiple species-contexts erythrocyte nucleate erythrocyte enucleate erythrocyte cell nucleate cell enucleate cell zebrafish nucleate erythrocyte human erythrocyte ZFA: … … CL: CL: CL: FMA:81100 species ontologies attached at appropriate level

Developmental Biology, Scott Gilbert, 6 th ed. Using reasoners to detect errors Fruit fly FBbt ‘tibia’Human FMA ‘tibia’ UBERON: tibia UBERON: bone is_a Vertebrata Drosophila melanogaster part_of Homo sapiens is_a only_in_taxon part_of disjoint_with ✗

Resource Description Framework (RDF)  Knowledge and data represented in simple statements called “triples”  Engineers shouldn’t be allowed to name things

RDF Ontologies support semantic classification of data RDF is the data standard for the semantic web Extremely simple and flexible data model (not a file format) A W3C recommendation: / /

RDF Relate “ things ” with simple statements: triples Everything has a global, unique identifier: URI Objects/values can be “ things ” or literal values (text, number, date)

Linked Data Principles Use URIs as names for things. Use HTTP URIs so we can look up the names. Return useful information using standards when someone looks up a name. Link to other URIs so we can discover more things (“follow your nose”). edData.html

RDF data integration Easy to combine separate datasets – Shared identifiers – Unordered collection of triples (facts)

example:john example:hasPet example:fido example:John rdfs:label “ John ” example:fido rdfs:label “ Fido ” example:john rdf:type example:Person example:fido fdf:type example:Dog example:fido example:age “6” Dataset 1 Dataset 2 RDF data integration

Combined dataset example:john example:hasPet example:fido example:John rdfs:label “ John ” example:fido rdfs:label “ Fido ” example:john rdf:type example:Person example:fido rdf:type example:Dog example:fido example:age “ 6 ” RDF data integration

An Introduction to OWL

What is OWL? Web Ontology Language (OWL) is a language for writing ontologies for the Web OWL 2 is the current version of OWL and a W3C recommendation as of October 2009 The previous version of OWL (OWL 1) was a W3C recommendation in 2004 An overview of the language can be found at:

OWL  The things or objects about which knowledge is represented (e.g. Melissa, Jim, Sweden) are called individuals (aka instances)  Group of things (e.g. person, conference) are called classes  Relations between things (e.g. siblings) are called properties

What does OWL add to RDF?  More meaningful semantics – OWL allows one to specify characteristics of properties and classes "If Melissa isMarriedTo John" then this implies “John isMarriedTo Melissa” (symmetric property) Complex class definitions composed of properties and other classes—“expressions” – muscle and (attached_to some bone and (part_of some head))  Enables reasoning

OWL - added semantics  Infer new RDF triples (facts) based on asserted triples and OWL axioms  OWL 2 Full - direct extension of RDF  OWL 2 DL - practical reasoning based on Description Logic – strict separation of classes and instances – entails some constraints on RDF

Class versus individual assertions EquivalentTo, subClassOf, disjointWith In OWL, the things you can say about relationships between classes are basically limited to: All other relationships (aka properties) are between OWL individuals: Built in properties: sameAs, differentFrom Relationship between individual and class: Type Object properties: user-added content in the ontology

OWL primer This is an EXCELLENT resource

Object Properties Object properties link another individual to an individual The above representations means that the individual Melissa and INCV ontology lecture are related through the teaches property Melissa INCF Ontology Lecture teaches

Object Properties Properties may have a domain and a range specified. For instance, the property teaches can have domain Person and range Lecture teaches rdf:type owl:ObjectProperty teaches rdfs:domain foaf:Person teaches rdfs:range Lecture Melissa INCF Ontology Lecture Person teaches

Complex Class definition (Equivalent Class) In OWL, we can say that an instructor is equivalent to a person that teaches some workshop Person teaches some Lecture instructor Class: Instructor EquivalentTo: Person and (teaches some lecture)

Reasoning Example Assuming we have defined the object property teaches with domain foaf:person and range lecture We assert that Melissa teaches INCF_ontology_lecture  Melissa rdf:type person  INCF_ontology_lecture rdf:type lecture And because of the equivalent class, we also infer that:  Melissa rdf:type instructor Melissa INCF Ontology Lecture Person teaches Instructor

Reasoning Example Assuming we have defined the object property teaches with domain foaf:person and range lecture We assert that Melissa teaches INCF_ontology_lecture INCF_ontology_lecture rdf:type book book owl:disjointWith lecture  ??

Reasoning Example Assuming we have defined the object property teaches with domain foaf:person and range lecture We assert that mandy teaches INCF_ontology_lecture INCF_ontology_lecture rdf:type book book owl:disjointWith lecture  Error!!

Satisfiability Unsatisfiable class: impossible to have instances Incoherent ontology: contains unsatisfiable classes – still usable Inconsistent ontology: contains impossible instances (such as instances of unsatisfiable classes) – not really usable: “ anything follows from a contradiction ” –

Open World vs. Closed World Open world assumption: everything we don’t know is undefined Closed world assumption: everything we don’t know is false Example: Statement: Vertebrate has_part spinal cord Query: “Do fruit flies have spinal cords?” Closed world response= no Open world response= unknown

OWL Syntaxes Many flavors – Functional OWL Syntax – RDF based Syntaxes RDF/XML Turtle OWL/XML Manchester Syntax Still they all can be represented as a graph or a set of triples!

So what can we do with all this ontology “stuff”?

Querying across anatomical granularity CJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel Uberon, an integrative multi-species anatomy ontology. Genome biology 13 (1), R5 Uberon.org

Top hit mouse Top hit fish

Exomiser: Using comparative phenotype analyses to improve variant prioritization

Ontology considerations  An ontology is a classification and there are many useful ways to classify biology  Everybody makes mistakes Let the computer find errors, manage knowledge for you  Re-use other people’s work where possible Import class hierarchies, use common patterns  Cautionary note – formal languages have limitations. Don’t expect or want to express everything!  RDF is a model for semantic, linked data  OWL is an ontology language for RDF and the web  Ontologies are a tool for data exchange and integration via a variety of formats (tomorrow’s discussion)

A C B D E Vertebrata Ascidians Arthropoda Annelida Mollusca Echinodermata tetrapod limbs ampullae tube feet parapodia Querying for genes in similar structures across species Panganiban et al., PNAS, 1997 Distal-less orthologs participate in distal-proximal pattern formation and appendage morphogenesis mouse limb sea urchin tube feet ascidian ampulla polychaete parapodia