Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontologies 101 Melissa Haendel and Jim Balhoff INCF taskforce cross program meeting Stockholm, Aug. 30, 2013.

Similar presentations


Presentation on theme: "Ontologies 101 Melissa Haendel and Jim Balhoff INCF taskforce cross program meeting Stockholm, Aug. 30, 2013."— Presentation transcript:

1 Ontologies 101 Melissa Haendel and Jim Balhoff INCF taskforce cross program meeting Stockholm, Aug. 30, 2013

2 All about ontologies 1.What is an ontology? 2.A little logic 3.Upper ontologies 4.OWL and RDF 5.Data integration

3 MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity Common controlled vocabularies indicate the same meaning under different annotation circumstances

4 Any closed, prescribed list of terms used for classifying data What is a controlled vocabulary? Key Features:  Terms are not usually defined.  Relationships between the terms are not usually defined.  Can be a list. Key Features:  Terms are not usually defined.  Relationships between the terms are not usually defined.  Can be a list. Here is a CV of wines: Pinot noir, red, chardonnay, Chianti, Bordeaux, Riesling…. These are all different types- color, location, varietal, and are present in a list. Another example would be the map locations list at the end of your Gazeteer.

5 Any controlled vocabulary that is arranged in a hierarchy What is a Taxonomy? Key Features: Terms are not usually defined. Relationships between the terms are not usually defined. Terms are arranged in a hierarchy. Key Features: Terms are not usually defined. Relationships between the terms are not usually defined. Terms are arranged in a hierarchy. Here is a wine taxonomy: Wine Red merlot zinfandel cabernet pinot noir White chardonnay pinot gris Riesling Wine Red merlot zinfandel cabernet pinot noir White chardonnay pinot gris Riesling

6 A taxonomy that contains additional information about use of the terms What is a Thesaurus? Key Features: Terms are not usually defined. Relationships between the terms are not usually defined. Terms are arranged in a hierarchy. Statements about the terms are included such as scope notes or instructions for use. Key Features: Terms are not usually defined. Relationships between the terms are not usually defined. Terms are arranged in a hierarchy. Statements about the terms are included such as scope notes or instructions for use. Some well known thesauri are: WordNet, NCI cancer thesaurus, MeSH

7 A formal conceptualization of a specified domain of interest What is an ontology? Key Features: Terms are defined. Relationships between the terms are defined, allowing logical inference. Terms are arranged in a hierarchy. Expressed in a knowledge representation language such as RDFS, OBO, or OWL. Key Features: Terms are defined. Relationships between the terms are defined, allowing logical inference. Terms are arranged in a hierarchy. Expressed in a knowledge representation language such as RDFS, OBO, or OWL. Some well known ontologies are: SnoMED, Foundational Model of Anatomy, Gene Ontology, Linnean Taxonomy of species Reproduced with permission, Jason Freeny http://web.mac.com/moistproduction/flash/index.html

8 http://www.mkbergman.com/?m=20070516 The Ontology spectrum: Bottom line: you get what you pay for. OBO

9 Are ontologies about terms or things? What matters are things Data annotated to the right schema will be more consistent Dessert Ice cream gelato sno-cone soft-serve custard cream Cake torte double-layer galette angel food cake DES:000038 DES:000034 DES:000456 DES:005566 DES:000564 DES:000744 DES:0003221 DES:000667 DES:000975 DES:000544 DES:000765 DES:000034 - Frozen milk and/or cream with various sugars, and flavoring such as fresh fruit and nut purees.[1] Labels are handles, not things

10 Why build an ontology? A simple example Number of genes annotated to each of the following brain parts in an ontology: brain 20 part_of hindbrain 15 part_of rhombomere 10 Query brain without ontology 20 Query brain with ontology 45 Ontologies can facilitate grouping and retrieval of data

11 is_a entity organism cat mammal animal is_a human is_a instance_of PeanutChris Shaffer Classes, subclasses, and instances Subtyping relation is_a = SubClassOf OWL individuals:

12 How do you tell if it is an instance or a class? Is there more than one in existence? Is the entity referencing a group of things with common properties? Class or instance? There is only one Snoopy There is a class of things labeled “Snoopy toys” Class or instance? There is only one Alaska There is a class of things labeled “States” There is only one blue morpho in my specimen collection There is a class of things labeled “Blue morpho butterflies”

13 General Principle for Logical Definitions Definitions are of the following Genus-Differentia form: X = a Y which has one or more differentiating characteristics. where X is the is_a parent of Y. Definition: Blue cylinder = Cylinder that has color blue. Definition: cylinder = Surface formed by the set of lines perpendicular to a plane, which pass through a given circle in that plane. is_a Definition: Red cylinder = Cylinder that has color red.

14 The True Path Rule cuticle synthesis --[i] chitin metabolism cell wall biosynthesis --[i] chitin metabolism ----[i] chitin biosynthesis ----[i] chitin catabolism chitin metabolism --[i] chitin biosynthesis --[i] chitin catabolism --[i] cuticle chitin metabolism ----[i] cuticle chitin biosynthesis ----[i] cuticle chitin catabolism --[i] cell wall chitin metabolism ----[i] cell wall chitin biosynthesis ----[i] cell wall chitin catabolism GO Before:GO After: BUT: A fly chitin synthase gene could be annotated to chitin biosynthesis, and appear in a query for genes annotated to cell wall biosynthesis (and its children), which makes no sense because flies don't have cell walls. NOW: all the subClass terms can be followed up to chitin metabolism, but cuticle chitin metabolism terms do not trace back to cell wall terms, so all the paths are true. The pathway from a subClass all the way up to its top level parent(s) must be universally true.

15 Where does the True Path Rule come from? Transitivity. Some relations are transitive, and apply across all levels of the hierarchy. For example, a cat is_a mammal, and a mammal is_a vertebrate SO a cat is_a vertebrate => This is the true path rule and is because the is_a relation is transitive. Some properties are not transitive. For example, head has_quality round. and, head part_of organism. So is the organism round? Of course not! BUT, eyes are part_of head, and head part_of organism, SO eye part_of organism is true, because part_of is a tranistive relation. Relations are logically defined in a common relation ontology or within each ontology that uses them. ≠ >

16 Relationships and definitions A relationship from one class to another is a formalized part of its definition (an object property restriction in OWL) A subtype relation (is_a in OBO, SubClassOf in OWL) specifies necessary conditions for membership of a class. For example, finger part_of hand (all finger part_of some hand) states that a necessary condition of being in the class finger is to be part of some hand. So… if a finger exists, it is part of some hand. But…this does not mean that if a hand exists, it has as a part a finger.

17 There are many useful ways to classify parts of organisms:  its parts and their arrangement  its relation to other structures what is it: part of; connected to; adjacent to, overlapping?  its shape  its function  its developmental origins  its species or clade  its evolutionary history Cajal 1915, “Accept the view that nothing in nature is useless, even from the human point of view.”

18 Not all classification is useful Be practical: Build ontologies for what you need and for what can be reused About thirty years ago there was much talk that geologists ought only to observe and not theorise; and I well remember some one saying that at this rate a man might as well go into a gravel-pit and count the pebbles and describe the colours. C. Darwin

19 neuron anatomical structure lumen anatomical space epidural space anatomical space anatomical structure ✗ Some classes are declared to never share any instances in common OWL DisjointWith OBO: disjoint_from NO!

20 About reasoners A piece of software able to infer logical consequences from a set of asserted facts or axioms. They are used to check the logical consistency of the ontologies and to extend the ontologies with "inferred" facts or axioms For example, a reasoner would infer: Major premise: All mortals die. Minor premise: Some men are mortals. Conclusion: Some men die. Different reasoners can perform slightly differently. There are a number of reasoners to be aware of: ELK, HermiT, Pellet, etc.

21 Different kinds of definitions  An ontology is a collection of axioms An axiom is simply a sentence or a statement  Axioms can be  non-logical (aka “annotations” or “text definitions”) E.G. GO_0000262 has synonym ‘mtDNA’ opaque to reasoners  logical well-defined semantics understood by reasoners Example: SubClassOf axioms Arguments can be classes or class expressions (for example, the class of things that are parts of the midbrain)

22 Classifying anatomy appendage antenna fore wing fore wing hind wing

23 Relationships record classifications too substantia nigra part_of some ‘midbrain’ trochlear nucleus ‘substantia nigra’ SubClassOf part_of some ‘midbrain’

24 The knowledge in an ontology can make the reasons for classification explicit Any sense organ that functions in the detection of smell is an olfactory sense organ sense organ capable_of some detection of smell olfactory sense organ

25 nose sense organ nose capable_of some detection of smell Classifying sense organ capable_of some detection of smell olfactory sense organ nose => These are necessary and sufficient conditions, also called an equivalent class axiom

26 Compositionality and avoiding asserted multiple inheritance We can logically define composed classes and create complex definitions from simpler ones  aka: building blocks, cross-products, logical definitions Descriptions can be composed at any time  Ontology construction time (pre-composition)  Annotation time (post-composition) Formal necessary and sufficient definitions + a reasoner  Automatic (and therefore manageable) classification  Requires subtype classification, so apart from the root term(s), no term should lack a superclass parent. Let the reasoner do the work!

27 Post-composed versus pre-composed anatomical entities are logically equivalent Named class: Plasma membrane of neuron plasma membrane and part_of some neuron Gene Ontology Relation Ontology Cell Ontology GenusDifferentia Has same semantic meaning as:

28 chemical entities Many perspectives, many ontologies – that overlap in content gross anatomy gross anatomy tissues cells cell anatomy cell anatomy proteins phenotypes clinical disorders processes physiological processes development reactions cellular processes behavior evolutionary characters evolutionary characters nervous system neural crest

29 Domain ontologies are organized according to upper ontologies (Basic Formal Ontology) that specify the general types of things that exist RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) => Classification according to these higher level types helps ensure the True Path Rule holds

30 The Common Anatomy Reference Ontology CARO is a structural classification based on granularity From the bottom up: Cell component Cell Tissue Multi-tissue structure From the top down: Organism subdivision Anatomical system Acellular structures CARO is an upper reference ontology that can be used to structure new anatomy ontologies

31 Example of complexity arising from multiple species-contexts erythrocyte cell nucleate cell enucleate cell not applicable in all contexts

32 Example of complexity arising from multiple species-contexts erythrocyte nucleate erythrocyte enucleate erythrocyte cell nucleate cell enucleate cell zebrafish nucleate erythrocyte human erythrocyte ZFA:0009256 … … CL:0000562 CL:0000232 CL:0000592 FMA:81100 species ontologies attached at appropriate level

33 Developmental Biology, Scott Gilbert, 6 th ed. Using reasoners to detect errors Fruit fly FBbt ‘tibia’Human FMA ‘tibia’ UBERON: tibia UBERON: bone is_a Vertebrata Drosophila melanogaster part_of Homo sapiens is_a only_in_taxon part_of disjoint_with ✗

34 Resource Description Framework (RDF)  Knowledge and data represented in simple statements called “triples”  Engineers shouldn’t be allowed to name things

35 RDF Ontologies support semantic classification of data RDF is the data standard for the semantic web Extremely simple and flexible data model (not a file format) A W3C recommendation: http://www.w3.org/TR/2004/REC-rdf-primer- 20040210/ http://www.w3.org/TR/2004/REC-rdf-primer- 20040210/

36 RDF Relate “ things ” with simple statements: triples Everything has a global, unique identifier: URI Objects/values can be “ things ” or literal values (text, number, date)

37 Linked Data Principles Use URIs as names for things. Use HTTP URIs so we can look up the names. Return useful information using standards when someone looks up a name. Link to other URIs so we can discover more things (“follow your nose”). http://www.w3.org/DesignIssues/Link edData.html

38 RDF data integration Easy to combine separate datasets – Shared identifiers – Unordered collection of triples (facts)

39 example:john example:hasPet example:fido example:John rdfs:label “ John ” example:fido rdfs:label “ Fido ” example:john rdf:type example:Person example:fido fdf:type example:Dog example:fido example:age “6” Dataset 1 Dataset 2 RDF data integration

40 Combined dataset example:john example:hasPet example:fido example:John rdfs:label “ John ” example:fido rdfs:label “ Fido ” example:john rdf:type example:Person example:fido rdf:type example:Dog example:fido example:age “ 6 ” RDF data integration

41 An Introduction to OWL

42 What is OWL? Web Ontology Language (OWL) is a language for writing ontologies for the Web OWL 2 is the current version of OWL and a W3C recommendation as of October 2009 The previous version of OWL (OWL 1) was a W3C recommendation in 2004 An overview of the language can be found at: http://www.w3.org/TR/owl2-overview/ http://www.w3.org/TR/owl2-overview/

43 OWL  The things or objects about which knowledge is represented (e.g. Melissa, Jim, Sweden) are called individuals (aka instances)  Group of things (e.g. person, conference) are called classes  Relations between things (e.g. siblings) are called properties

44 What does OWL add to RDF?  More meaningful semantics – OWL allows one to specify characteristics of properties and classes "If Melissa isMarriedTo John" then this implies “John isMarriedTo Melissa” (symmetric property) Complex class definitions composed of properties and other classes—“expressions” – muscle and (attached_to some bone and (part_of some head))  Enables reasoning

45 OWL - added semantics  Infer new RDF triples (facts) based on asserted triples and OWL axioms  OWL 2 Full - direct extension of RDF  OWL 2 DL - practical reasoning based on Description Logic – strict separation of classes and instances – entails some constraints on RDF

46 Class versus individual assertions EquivalentTo, subClassOf, disjointWith In OWL, the things you can say about relationships between classes are basically limited to: All other relationships (aka properties) are between OWL individuals: Built in properties: sameAs, differentFrom Relationship between individual and class: Type Object properties: user-added content in the ontology

47 OWL primer http://www.w3.org/TR/owl2-primer/ This is an EXCELLENT resource

48 Object Properties Object properties link another individual to an individual The above representations means that the individual Melissa and INCV ontology lecture are related through the teaches property Melissa INCF Ontology Lecture teaches

49 Object Properties Properties may have a domain and a range specified. For instance, the property teaches can have domain Person and range Lecture teaches rdf:type owl:ObjectProperty teaches rdfs:domain foaf:Person teaches rdfs:range Lecture Melissa INCF Ontology Lecture Person teaches

50 Complex Class definition (Equivalent Class) In OWL, we can say that an instructor is equivalent to a person that teaches some workshop Person teaches some Lecture instructor Class: Instructor EquivalentTo: Person and (teaches some lecture)

51 Reasoning Example Assuming we have defined the object property teaches with domain foaf:person and range lecture We assert that Melissa teaches INCF_ontology_lecture  Melissa rdf:type person  INCF_ontology_lecture rdf:type lecture And because of the equivalent class, we also infer that:  Melissa rdf:type instructor Melissa INCF Ontology Lecture Person teaches Instructor

52 Reasoning Example Assuming we have defined the object property teaches with domain foaf:person and range lecture We assert that Melissa teaches INCF_ontology_lecture INCF_ontology_lecture rdf:type book book owl:disjointWith lecture  ??

53 Reasoning Example Assuming we have defined the object property teaches with domain foaf:person and range lecture We assert that mandy teaches INCF_ontology_lecture INCF_ontology_lecture rdf:type book book owl:disjointWith lecture  Error!!

54 Satisfiability Unsatisfiable class: impossible to have instances Incoherent ontology: contains unsatisfiable classes – still usable Inconsistent ontology: contains impossible instances (such as instances of unsatisfiable classes) – not really usable: “ anything follows from a contradiction ” – http://ontogenesis.knowledgeblog.org/1329 http://ontogenesis.knowledgeblog.org/1329

55 Open World vs. Closed World Open world assumption: everything we don’t know is undefined Closed world assumption: everything we don’t know is false Example: Statement: Vertebrate has_part spinal cord Query: “Do fruit flies have spinal cords?” Closed world response= no Open world response= unknown

56 OWL Syntaxes Many flavors – Functional OWL Syntax – RDF based Syntaxes RDF/XML Turtle OWL/XML Manchester Syntax Still they all can be represented as a graph or a set of triples!

57 So what can we do with all this ontology “stuff”?

58 Querying across anatomical granularity CJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel. 2012. Uberon, an integrative multi-species anatomy ontology. Genome biology 13 (1), R5 Uberon.org

59

60 Top hit mouse Top hit fish

61 Exomiser: Using comparative phenotype analyses to improve variant prioritization

62 Ontology considerations  An ontology is a classification and there are many useful ways to classify biology  Everybody makes mistakes Let the computer find errors, manage knowledge for you  Re-use other people’s work where possible Import class hierarchies, use common patterns  Cautionary note – formal languages have limitations. Don’t expect or want to express everything!  RDF is a model for semantic, linked data  OWL is an ontology language for RDF and the web  Ontologies are a tool for data exchange and integration via a variety of formats (tomorrow’s discussion)

63 A C B D E Vertebrata Ascidians Arthropoda Annelida Mollusca Echinodermata tetrapod limbs ampullae tube feet parapodia Querying for genes in similar structures across species Panganiban et al., PNAS, 1997 Distal-less orthologs participate in distal-proximal pattern formation and appendage morphogenesis mouse limb sea urchin tube feet ascidian ampulla polychaete parapodia


Download ppt "Ontologies 101 Melissa Haendel and Jim Balhoff INCF taskforce cross program meeting Stockholm, Aug. 30, 2013."

Similar presentations


Ads by Google