Download presentation
Presentation is loading. Please wait.
Published byNoah Francis Modified over 9 years ago
1
Building and Using Ontologies Dr. Robert Stevens Department of Computer Science University of Manchester Robert.stevens@cs.man.ac.uk
2
Introduction Knowledge & metadata The nature of bioinformatics resources A shared understanding Terminologies and ontologies Building an ontology Using an ontology
3
What is Knowledge? Knowledge – all information and an understanding to carry out tasks and to infer new information Information -- data equipped with meaning Data -- un-interpreted signals that reach our senses Michael Ashburner Professor University of Cambridge UK ISMBISMB Name Job Institution Country ConfConf man academic, senior ancient university, 5 rated European important figure in biology BIOLOGYBIOLOGY
4
What is Metadata? Metadata is data about data (information about information) A schema is a DBs metadata; as is the administrator's name; the creator, date of creation, documentation The label on an Ependorf tube in a freezer is metadata A DBs entry’s annotation is metadata on the sequence data
5
Syntax & Semantics Infix2 + 3 = 5 Prefix= + 2 3 5 Postfix2 3 + 5 = Binary010 + 011 = 101 RomanII + III = V 7 + 3 = 42
6
Types of Semantics –An operational semantics for a language is defined by what a sentence in that language will do. –Denotational semantics is a precise mathematical definition of the objects and relations of language in which each sentence of the language names, or denotes, a mathematical object, such as a function. –Natural semantics are the loose ordinary language sense, in which the semantics of a statement is its "meaning". –The term logistic semantics refers to formal models that attempt to represent the natural semantics of some external domain.
7
Knowledge in Bioinformatics
8
A Shared Understanding Synonyms and homonyms are rife Need to know that terms in one resource mean the same in another resource Means comparisons are much easier: Can ask questions over many resources Structure enables discovery and query abstractions Useful for both humans and computers The Gene Ontology allows queries outside one model organism
9
London Bills of Mortality
10
Aggregated Stats
11
What is an Ontology? A means of capturing knowledge in a computationally amenable form A shared understanding for humans and computers A set of vocabulary terms that represents a community’s understanding of a domain A set of definitions for those terms The relations between those terms A formal semantics A conceptual model whose labels provide a vocabulary Nucleic acid DNARNA tRNArRNA Ribosome
12
The art of ranking things in genera and species is of no small importance and very much assists our judgment as well as our memory. You know how much it matters in botany, not to mention animals and other substances, or again moral and notional entities as some call them. Order largely depends on it, and many good authors write in such a way that their whole account could be divided and subdivided according to a procedure related to genera and species. This helps one not merely to retain things, but also to find them. And those who have laid out all sorts of notions under certain headings or categories have done something very useful. Gottfried Wilhelm Leibniz, New Essays on Human Understanding
13
Components of an Ontology: Concepts Concepts: A unit of thought –AKA: Class, Set, Type, Predicate –Gene, Reaction, Macromolecule Terms are labels of concepts Taxonomy of concepts –Generalization ordering among concepts –Concept A is a parent of concept B iff every instance of B is also an instance of A –Superset / subset –“A kind of” vs. “a part of” Nucleic acid DNARNA tRNArRNA Ribosome
14
Components of an Ontology: Relations Relations and Attributes –AKA: Slots, properties, roles –Product of Gene, Map-Position of Gene –Reactants of Reaction, K eq of Reaction Meta information about relations –Cardinality, optionality, type restrictions on filler –Transitive, symmetric, functional role properties –Role hierarchies Slot: Expresses Range: Polypeptide or RNA Domain: Genes Cardinality: At-least-1 General Axioms (constraints) –Nucleic acids < 20 residues are oligonucleiotides
15
Gene Ontology http://www.geneontology.org “a dynamic controlled vocabulary that can be applied to all eukaryotes” Built by the community for the community. Three organising principles: Molecular function, Biological process, Cellular component Is-a and Part of taxonomy ~15,000 concepts
16
Components of an Ontology: Instances Instances –AKA: objects, individuals, set members –trpA Gene, Reaction 1.1.2.4, Death-receptor-3 –Strictly speaking, an ontology with instances is a knowledge base –The distinction between an instance and a concept is difficult. –Lard-binding-proteins are all those that bind Death-receptor-3.
17
Components of an Ontology: Properties Primitive: properties are necessary –Globular protein must have hydrophobic core, but a protein with a hydrophobic core need not be a globular protein Defined: properties are necessary + sufficient –Eukaryotic cells must have a nucleus. Every cell that contains a nucleus must be Eukaryotic.
18
An Ontology Building Life-cycle Identify purpose and scope Knowledge acquisition Evaluation Language and representation Available development tools Conceptualisation Integrating existing ontologies Encoding Building Ontology Learning Consistency Checking
19
How to do it Collect terms: MacroMolecule, Protein, Enzyme, Holoprotein, Holoenzyme. Arrange into a Polyhierarchy (by hand) Write a definition for each term Encode in some representation Carry on Test against scope, requirements and competency questions
20
How to do it Enzyme: is-a MacroMolecule polymerOf AminoAcid Catalyses Reaction HoloEnzyme: is-a MacroMolecule polymerOf AminoAcid binds ProstheticGroup Catalyses Reaction HoloProtein: is-a MacroMolecule polymerOf AminoAcid binds ProstheticGroup Protein: is-a MacroMolecule polymer of AminoAcid
21
Tips for Building your Terminology Choose a narrow,but useful area Build using domain experts Regard computer scientists as a service You’ll never be complete or correct: Publish early Be practical: Truth and beauty is a bonus Be open A large commitment and a never ending process Start simple and migrate to expressivity and “correctness” as you develop OWL can do this migratory path
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.