Weaving and untangling the GO is_a completeness ~9 slides granularity & BP ~3 slides Linking MF to BP ~15 slides Sensu ~13 slides –linguistic qualifiers.

Slides:



Advertisements
Similar presentations
GO Content Meeting November 15 and 16, 2005 Improving the Representation of Immunology in the Gene Ontology.
Advertisements

Relations in GO for Intro We have many relations ready to GO live in the scratch directory – within GO ontologies – across GO ontologies – between.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Automated Test Design ™ © 2011 Conformiq, Inc. CONFORMIQ DESIGNER On ES v1.2.1 Stephan Schulz MBT Working Meeting/MTS#56, Göttingen.
+ OWL for annotators David Osumi-Sutherland. + What is OWL? Web Ontology Language Can express everything in OBO and more. Certified web standard Fast.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Advanced Data Modeling
What is Ontology? Dictionary:A branch of metaphysics concerned with the nature and relations of being. Barry Smith:The science of what is, of the kinds.
Application of OBO Foundry Principles in GO Chris Mungall Lawrence Berkeley Labs NCBO GO Consortium.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 6 Advanced Data Modeling.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 5 Advanced Data Modeling.
Database Systems: Design, Implementation, and Management Tenth Edition
Chapter 6 Advanced Data Modelling
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 6 Advanced Data Modeling.
Real-life ontology development: lessons from the Gene Ontology.
Iowa State University Animal Science Department Bioinformatics & Computational Biology Program - 01/16/06 1 Overview of Animal Trait Ontology and PATO.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 8 Slide 1 System models.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Normalization of Database Tables
Chapter 9 Describing Process Specifications and Structured Decisions
Chapter 9 Describing Process Specifications and Structured Decisions Systems Analysis and Design Kendall & Kendall Sixth Edition © 2005 Pearson Prentice.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Chapter 9 Describing Process Specifications and Structured Decisions
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Normalization of Database Tables
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
POC tutorial #2: Ontology Development This tutorial will run automatically in Quicktime. To run the tutorial at your own pace use the internal controllers.
GO Ontology Editing Workshop: Using Protege and OWL Hinxton Jan 2012.
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
OBOL Open Bio-Ontology Language GO Meeting Stanford Jan 2004.
Amo amos amot amomus amotis amont. Happy birthday Swiss-Prot Fortaleza August 2006.
Terry Meehan Scientific Curator Mouse Genome Informatics The Jackson Laboratory Logical Definitions for Hematopoietic Cell Terms.
SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Principles and Practice of Ontology Development: Making Definitions Computable Chris Mungall LBL.
Querying Structured Text in an XML Database By Xuemei Luo.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
1 DATABASE SYSTEMS DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT Chapter 7 Normalisation.
Gene Ontology Consortium
Chapter 8 Data Modeling Advanced Concepts Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Chapter 7 System models.
System models l Abstract descriptions of systems whose requirements are being analysed.
Modified by Juan M. Gomez Software Engineering, 6th edition. Chapter 7 Slide 1 Chapter 7 System Models.
BioHealth Informatics Group A Practical Introduction to Ontologies & OWL Session 2: Defined Classes and Additional Modelling Constructs in OWL Nick Drummond.
Sommerville 2004,Mejia-Alvarez 2009Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
Ontology of Disease and the OBO Foundry Chris Mungall NCBO GO Nov 2006.
GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.
Chapter 9 Logical Database Design : Mapping ER Model To Tables.
To Boldly GO… Amelia Ireland GO Curator EBI, Hinxton, UK.
Based on “A Practical Introduction to Ontologies & OWL” © 2005, The University of Manchester A Practical Introduction to Ontologies & OWL Session 2: Defined.
Expanding species-specific anatomy ontologies to include the cell ontology Melissa Haendel (1), Ceri Van Slyke (1), Chris Mungall (2), Peiran Song (1),
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
+ From OBO to OWL and back again – a tutorial David Osumi-Sutherland, Virtual Fly Brain/FlyBase Chris Mungall – GO/LBL.
Gene Ontology Consortium The Pathogen Group Schizosaccharomyces pombe Genome Sequencing Project DictyBase.
Copyright © 2011 Pearson Education Process Specifications and Structured Decisions Systems Analysis and Design, 8e Kendall & Kendall Global Edition 9.
The Data Large Number of Workbooks Each Workbook has multiple worksheets Transaction worksheets have large (LARGE) number of lines (millions of records.
Syntax and semantics >AMYLASEE1 TGCATNGY A very simple FASTA file.
Logical Database Design and the Rational Model
The Gene Ontology Project
Many GO terms are implicitly composite
The Gene Ontology: an evolution
What is Ontology? s Dictionary:A branch of metaphysics concerned with the nature and relations of being. Barry Smith:The science of what is, of.
Chapter 11 Describing Process Specifications and Structured Decisions
Presentation transcript:

Weaving and untangling the GO is_a completeness ~9 slides granularity & BP ~3 slides Linking MF to BP ~15 slides Sensu ~13 slides –linguistic qualifiers vs relations Linking GO to other ontologies ~40 slides –GO+Cell

Tangled DAGs and complexity paths increasing GO process in general has a multiple axes of classification –qualifier -ve +ve –anatomy structural spatial –chemical structural functional

is_a completeness

GO and is_a completeness Why? What’s wrong with every term having at least one is_a or part_of parent? –this is the way we’ve always done things

Ontologies should be complete No errors of omission is_a completeness is the ontologically correct thing to do –every entity type is a subtype of some other thing Accurate ontologies = accurate queries –currently a query for “find all kinds of development” does not return “ovarian follicle development” this is wrong

missing is_as hinders common tool use We should play nicely with the others in the playground Most (non-GOC) tools expect is_a completeness –GO looks funny when viewed in other tools the standard is to show only is_a relations in default tree view –missing is_as breaks reasoners

Filling is_a gaps brings practical benefits Easier for tools to find inconsistencies in GO We can start to untangle displays

Example: current displays mix relations it’s a mess

untangling is_a and part_of difficult if is_a hierarchy is incomplete –is_a orphans show up at root node in pure is_a display not everything must have an asserted part_of parent –can infer from is_a parents

The new complete cellular component Current CC: –277 is_a orphans / 1688 terms –avg is-a-paths-to-root 1.4 –avg mixed-paths-to-root 6.97 Jane’s fixed CC: –0 is_a orphans –avg is-a-paths-to-root 3.36 –avg mixed-paths-to-root 38.6

Granularity and the organisation of GO:BP

Fixing the upper levels of BP The upper portion of any ontology is very important for organisation Design decisions percolate down Many users exploring GO top-down see this first Diamonds are particularly bad in the upper level –significantly increases tangledness

biological process cellular process physiological process organismal physiological process cellular physiological process others

The processes pertinent to the function of an organism above the cellular level; includes the integrated processes of tissues and organs The processes pertinent to the integrated function of a cell A phenomenon marked by changes that lead to a particular result, mediated by one or more gene products Processes that are carried out at the cellular level, but are not necessarily restricted to a single cell. For example, cell communication occurs among more than one cell, but occurs at the cellular level Those processes specifically pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms biological process cellular process physiological process organismal physiological process cellular physiological process

Consider… (long term view) Making top division by granularity of the process itself –biological process molecular level process? cellular level process (multi-cellular) level process These types are disjoint But what about physiological process? –this is not disjoint from the granularity of the process itself

Relations between GO ontologies

Outline We focus on MF & BP biological example from David the types and relations in reality –maintaining the ALL-SOME definition of relations how should this be implemented in the GO? –what links should be manifested –retain some level of redundancy, or eliminate it?

GO: Histidine catabolism GO: Histidine ammonia lyase activity GO: Urocanate hydratase activity GO: imidazolopropionase activity GO: Glutamate- Formimidoyl transferase GO: Formimidoyl- Glutamase activity GO: N-formylglutamate deformylase activity GO: Formimidoylglutamate deiminase activity GO: Histidine catabolism to glutamate and formate GO: Histidine catabolism to glutamate and formamide GO:???????? Histidine catabolism to glutamate and formiminotetrahydrofolate Overbeek, et al. The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes. NAR 2005, 33-17:

Ontological Representation I will try and be clear when I am talking about –types in reality –types we wish to manifest as terms in the GO (or in other ontologies) all GO terms should be types not all types need to have terms created - we limit for practical reasons

What are the relations in reality? Between types in the same ontology, different levels of granularity –part_of Between functions and processes (at the same level of granularity) –functioning_of Between component and function –has_function Between process and component –located_in

What are the instances and relations in reality? some molecular function instance some molecular functionING instance some multistep process instance functioning of part_of some gene product instance has function process

What are the types and type- level relations in reality? some type of molecular function some type of molecular functionING some type of multistep process functioning of part (direction?) some type of gene product has function process

types example histidine ammonia lyase function histidine ammonia lyase reaction histidine catabolism functioning of part? issues: -- ALL-SOME structure functionprocess coarse fine

What are the types and relations in reality? Formimidoylglutmat e deiminase function Formimidoylglutmat e deiminase reaction histidine catabolism to glutamate and formate functioning of issues: -- ALL-SOME structure functionprocess has part? coarse fine

We want to capture these real relationships between biological types Between granular levels Between orthogonal ontologies But first we must be clear on the definitions of these types, and which types should be manifested as GO terms

Can we just manifest this in the GO? some type of molecular function some type of molecular functionING some type of multistep process functioning of has part(?) issues: -- not all function terms have a functionING corresponding term -- even if they do, redundancy is generally to be avoided coarse fine functionprocess

We already have some redundancy function & process redundancy iron transport (BP) iron transporter (MF) function & component redundancy voltage-gated ion channel function voltage-gated ion channel complex If we retain this redundancy, these relations can be trivially added But we don’t always have this redundancy –not all functions have a corresponding functioning term

Manifest shortcut relationships some type of molecular function some type of molecular functionING some type of process functioning of has part(?) coarse fine functionprocess one relation standing for two

most functionings are implicit histidine ammonia lysase function histidine ammonia lyase REACTION histidine catabolism functioning of has part(?) coarse fine functionprocess current paradigm

When do we manifest functions and processes? Need consistent stable policy Nothing in function ontology should have activity suffix –even though to a biochemist activity==potential, this is still confusing Beyond this, do we retain current policy –some redundancy Or take a more extreme approach –eliminate redundancy –eliminate current ‘activity’ MF terms and manifest corresponding reaction terms in BP (Amelia)

‘purist process’ approach histidine ammonia lysase reaction histidine ammonia lyase function histidine catabolism functioning of functionprocess some type of gene product has function part

When is it safe to eliminate redundancy? Does functioning always imply function? –iron transport does not imply iron transporter –but we could still extend annotation to allow for specification of functioning-as-function Reactions and other ‘single-step’ processes involving no helper –function and corresponding functioning imply one another Redundancy between function and component should be retained Any obsoletion obviously causes disruption

Difficult functionings Structural constituents functioning happens at lower level of granularity than is covered by GO these will not be linked to process - for now

Implementation Still need to curate the actual links –trivial links can be computed automatically Can proceed independently of resolving ontological issues –most likely retain current policy re: manifesting terms –need maintain 3 kinds of links granular (part, same ontology) functioning_of (function and functioning) ‘diagonal’ –ALL-SOME definition

Sensu

Sensu - outline Original use –A linguistic qualifier –denote differing community usage of a terminological entity (a term) Perverted use –A type qualifier –Used for when the part_of structure is specific to an organism type The fix –provide separate mechanisms for each

Terms vs kinds The term ‘term’ is confusing –Term (sensu GO) –Term (sensu normal usage) strings, tokens GO is not a terminology A GO ID identifies a type of entity –a kind of entity –a universal (as opposed to instance) –more specific than a class –but not a concept

Sensu - original usage Sometimes the same string refers to different types –nucleus (sensu particle physicist) –nucleus (sensu astrophysicist) –nucleus (sensu biologist) Canonical GO example: –bud no longer relevant, terms obsoleted –trichome

Linguistic qualifiers are about language, not biological reality No ontological requirement for linguistically related terms to be ontologically related –current GO docs are not correct trichome, sensu plant community –should not state that there is some biological relation between an instance of a trichome and the plant community

The original usage has been conflated Organism type specificity is a genuine challenge for the GO –‘contextual’ part_ofs –e.g. X part_of Y in species Z Sensu has been wrongly recruited to fix this –standard pattern: X, sensu Z part_of Y X, sensu Z is_a Z Two problems –conflation of meaning of sensu –conflation results in lack of precision “as in, but not restricted to taxon” not rigorous enough

Two problems, two solutions Retain sensu as a linguistic qualifier only –re-interpret as: sensu S community –no requirement for taxon IDs –no ontology structure requirements Introduce a new relation for genuine organism-type specific terms –in_organism –standard inference rules can be used e.g. –X in_organism X’, Y in_organism Y’, X is_a Y X’ is_a Y’

Contextual synonyms [Term] name: trichome (sensu insecta) synonym: EXACT “hair” [] synonym: EXACT “trichome” [] {context=insecta} def: “ a polarized cellular extension that covers much of the insect epidermis ” [Term] name: trichome (sensu plant) synonym: EXACT “trichome” [] {context=plant} def: “ An outgrowth from the epidermis. Trichomes vary in size and complexity and include hairs, scales, and other structures and may be glandular. In Arabidopsis, patterning of trichome development is not random but does not appear to be lineage-based like stomata ”

Advantages Lexical qualifiers dealt with use lexical oboedit tags No need to be as specific as a taxon –only as specific as is needed to decontextualise No false reasoning is done over synonyms –cellular component types and cell types should not be siblings Big user-friendliness win? –Displays customised for particular users may choose to display contextual exact synonyms in place of the wordier sensu name

in_organism Standard ALL-SOME definition: Type level definition: –P in_organism O for all instances p of P, there exists some organism o of type O, and some time t, such that p in_organism o at time t More specific relation than located_in in OBO relations ontology Standard logical rules can be applied

photosystem I photosystem I, in cyanobacteria is_a cyanobacteria in organism thylakoid thylakoid, in cyanobacteria is_a in organism part of

Open question Sometimes the relation between two types is largely lexical –eg trichome Sometimes it isn’t so clear Can we have both a relation to a taxon, and a contextual synonyms Is ‘eye’ an exact contextual synonym for ‘compound eye’ for the arthropod community?

Practical considerations Use NCBI Taxonomy as our organism ontology xref or relationship tags? –xrefs are more lightweight –relationship tags are more accurate –relationship tags would be ‘dangling’ unless organism ontology is loaded See next section…

Composite terms in GO - finally…

Composite terms - outline The problems inherent in composite terms and diamonds - brief review Actively managing composite terms in GO –big change: parseable logical definitions Implementation plan Progress so far: logical definitions referring to cell types Pre vs post composition –composite terms in ontologies and annotations

biosynthesis is_a metabolism

cysteine is_a serine family amino acid is_a amino acid is_a amine

cysteine is_a serine family amino acid is_a amino acid is_a serine

Composed terms currently cause problems –No link to external ontology term –Redundancy –Inconsistency –Extra work –Annotation bottleneck –Tangled DAGs and confusing displays we have no way to disentangle Solution so far: –fix errors based on results of term name parsing (Obol) reactive, not proactive

Solution: actively manage composed terms Composed terms should now/soon be generated using oboedit plugin –building block terms are recorded in ontology along with composite term Correct DAG structure can be inferred from external ontologies –placement & consistency checking automated –additional work can be automated synonyms, text definitions

How will composite terms be recorded by oboedit? How do we record a definition for a composite term? –using a logical definition (computational essence) A logical definition consists of: –a generic term (aka genus) –relationships to other terms which serve to discriminate this specific term from other is_a children of the generic term (aka differentiae) Can be written in natural language as: –A which

Example of composite term record cysteine biosynthesis –generic term: biosynthesis –discriminating characteristics: outputs cysteine –a biosynthesis process which outputs cysteine id: GO: ! cysteine biosynthesis intersection_of: GO: ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine

Now we have the ability to untangle Process axis view (primary is_as, via generic term): –biological_process metabolism –biosynthesis »cysteine biosynthesis Process participant axis view: –amine amino acid –serine family amino acid »cysteine Combined view –(same as current tangled diamond lattice)

Recording the relationship is important Why not just a simple cross-product? –e.g. biosynthesis x cysteine Relationships are important for reasoning and querying –Consider: cysteine biosynthesis from serine mRNA export from nucleus during heat stress Without the relations, the logical definition is not specific enough –the essence is not captured

Multiple discriminating characteristics are allowed Cysteine biosynthesis from serine –Generic term: biosynthesis –Discriminating characteristics: output cysteine input serine intersection_of: GO: intersection_of: outputs CHEBI:15356 intersection_of: input CHEBI:17822

Composite terms can be nested regulation of cysteine biosynthesis intersection_of: GO: ! regulation of biological process intersection_of: regulates GO: ! cysteine biosynthesis id: GO: ! cysteine biosynthesis intersection_of: GO: intersection_of: outputs CHEBI:15356

Composite terms can optionally be manufactured in bulk Generic term: {metabolism,biosynthesis} Differentia: has_output {serine, cysteine, …} With caution… –Sparse vs dense matrices –not all combinations are types

On the importance of necessary and sufficient conditions Why intersection_of? Why not just make normal links in the GO DAG? –normal relationships are for necessary conditions only –we want both necessary and sufficient conditions captures the essence of the term

Normal DAG links only capture necessary conditions, not essence immune cell activation inflammatory response part_of A change in morphology and behavior of a macrophage resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor text def: macrophage activation

Normal DAG links only capture necessary conditions, not essence macrophage activation immune cell activation is_a inflammatory response part_of macrophage activates

essence captured by genus- differentia macrophage activation immune cell activation is_a inflammatory response part_of id: GO:macrophage_activation intersection_of: GO:cell_activation intersection_of: activates CL:macrophage

essence captured by genus- differentia macrophage activation immune cell activation is_a inflammatory response part_of id: GO:macrophage_activation intersection_of: GO:cell_activation intersection_of: activates CL:macrophage A change in morphology and behavior of a macrophage resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor text def:

essence captured by genus- differentia macrophage activation immune cell activation is_a inflammatory response part_of cell activation macrophage (genus) activates

The power of reason with genus-differentia definitions that are computationally parseable, we can do a lot more consistency checking

Pre- vs post- composition It makes sense to pre-compose terms and maintain them as part of GO Annotations can post-compose terms if they choose to do so –MGI, DictyBase are doing this already results remain local to MOD –AmiGO-NG will allow querying of these The two approaches are complementary and compatible –proviso: if done properly

SO already contains composite terms A silenced gene is a gene which has the quality of being silenced

Plan: outline We want all new composite terms to be created using appropriate oboedit plugin –logical definitions automatically recorded –term management automated Changes: –editors must now be ‘OBO-aware’ –annotators and end-users can remain unaware of changes if they choose to do so but using the logical defs can bring benefits But first we need to find logical definitions for all the existing composite terms

Where we were at, 2005 Lots of terms to be retrofitted –Where to start? Previous strategy: –Obol guesses logical def for each term –Obol uses logical def to reason errors of omission inconsistencies –Batch reports to curators

go.obo oboedit obol report cell.obo cjm GO editor OBO editor obol config name parser go+ ldefs reasoner go ‘fixed’ obol

go.obo oboedit obol report cell.obo cjm GO editor OBO editor obol config name parser Ego.obo reasoner go ‘fixed’ Obol produces genus-differentia logical definitions

Limitations of this approach Good as proof-of-principle But.. –only the end results are evaluated –Obol makes the identical mistakes in guessing logical definitions each iteration –we want to evaluate and preserve the logical definitions that are generated by Obol

What we’ve been doing since then Focused on OBO Cell ontology Used Obol to infer logical defs Manually curate logical defs Feed back results to improve Obol Iterate and refine Use oboedit reasoner to check consistency between GO & CellO Next: incorporate into curation process

go.obo oboedit obol cell.obo cjm GO editor OBO editor obol config name parser ego-cell.obo

Results so far Test set of 337 logical definitions curated –only a fraction of the composite terms in GO Relations not finalised Composite terms involving CellO present some interesting challenges …but first, here’s a demo

Open issues: what relations do we use? We are concerned for now with relations between processes and cells –neuroblast activation & neuroblast –T cell differentiation & T cell –T cell homeostasis & T cell –cell homeostasis & homeostasis –sperm incapacitation & sperm –sperm motility & sperm

OBO Relations ontology OBO Relations ontology has –has_participant sub-relations: –has_agent (active participant) –has_patient (inactive participant) »(not in obo-rel yet) –between a process and a continuant –follows standard ALL-SOME structure

has_participant P has_participant C if and only if: given any process p that instantiates P there is some continuant c, and some time t, such that: c instantiates C at t and c participates in p at t has_participant is a primitive instance-level relation between a process, a continuant, and a time at which the continuant participates in some way in the process. The relation obtains, for example, when this particular process of oxygen exchange across this particular alveolar membrane has_participant this particular sample of hemoglobin at this particular time

Is this the appropriate relation? neuroblast activation has_participant neuroblast T cell differentiation has_participant T cell T cell homeostasis has_participant T cell cell homeostasis has_participant homeostasis sperm incapacitation has_participant sperm sperm motility has_participant sperm these are all correct… …but are they too general?

more specific kinds of participation has_agent (has_active_participant) –As for has_participant, but with the additional condition that the component instance is causally active in the relevant process has_patient (has_inactive_participant) –Yes, this is a daft name –The component instance is acted upon (not yet in OBO REL)

Cell differentiation T cell differentiation –A cell differentiation instance in which a cell acquires_features_of T cell problem: –not a simple relation between the process (T cell differentiation) and the cell (T cell) 3-place relation: process, instance, type

Cell differentiation, attempt 2 T cell differentiation has_output T cell –Compare to: cysteine biosynthesis has_output cysteine We should distinguish between participation relations in which the continuant relations are –transformation_of –derives_from e.g. something made (biosynthesis) vs something transformed (differentiation)

Cell differentiation, attempt 3 T cell differentiation has_transformed_output_participant T cell –…not exactly catchy…

has_primary_participant T cell differentiation has_primary_participant T cell –aka has_theme ontologically a good relation? Meaning partly resides in the process term Can be migrated to other relations later

To decompose or not to decompose We could have a logical definition for sperm incapacitation –genus: incapacitation –differentia: has_participant sperm Requires creating a new term –incapacitation Not used in any other logical def Logical def does not capture full essence –this term is a little more complex involves at least three continuants Instead just use a relationship to capture necessary conditions only

‘Anonymous’ terms border follicle cell delamination –The splitting off of border cells from the anterior epithelium genus: delamination –no such term we can create as ‘anonymous’ term –exists only in order to make logical definitions..or we can just create a normal term

Implementation We have 337 logical definitions (nearly) ready When can we merge them into the GO?

adding logical defs to the GO Will this cause disruption to users? gene_ontology.obo file exactly the same as before, but will have –fewer inconsistencies! –new intersection_of tags specified in obo v1.2 can easily be ignored by parsers oboedit users must either: –load cell.obo, relationship.obo at same time as go.obo –OR select “allow dangling terms” may still confuse some users –‘anonymous’ terms

cvs gene_ontology _edit.obo oboedit cell.obo GO editor CellO editor cvs rel.obo gene_ontology.obo filter normal downstream stuff (website, amigo, users) unaffected power users & advanced applications

Applications may want to take advantage of enhanced GO enhanced GO isn’t just to help curation queries possible with ego: –find genes associated with blood cells annotations to microglial cell activation –differentiation of any microglial precursor annotations to monocyte differentiation

Post-composition This approach is highly compatible with post- composition We should extend the annotation format to allow denoting more specific classes –e.g. cholesterol transport in liver –advanced applications can query this –standard applications suffer no loss –extended annotations can be used to help seed new terms in the ontology This is already being done (MGI,Dicty) –we just want to capture this in interopeable way

Post-composition in gene association files New column in file format Gene Product Term ID…Slots AABC1GO: (cholesterol transport) OBOREL:located_in[MA:liver] AABC2GO: (neuron fate development) OBOREL:has_primary_participant[FB bt:Y_neuron] AABC3GO:000003

Important note on post- composition This is not an either-or situation We will retain pre-composed terms –terms will continue to be created for real biological types Annotation post-composition can be used to further refine existing pre-composed terms –if the post-composed term is later created in the GO, the annotation can be automatically migrated Tools can ignore post-composition for small loss in specificity –defaults to the current paradigm

Avoiding diamonds Surely larval locomotory behavior involves a diamond? yes, but we can disentangle the two axes of classification

id: GO:larval_locomotory_behavior intersection_of: GO:locomotory_behavor intersection_of: occurs_in FBbt:larval_stage Solution Curator asserts: Oboedit infers diamond: id: GO:larval_locomotory_behavior intersection_of: GO:locomotory_behavor intersection_of: occurs_in FBbt:larval_stage is_a: GO:locomotory_behavor ! genus is_a: GO:larval_behavior ! inferred

Next Steps Tidy up cell logical definitions integrate them into curation process Look at composite terms within GO –larval locomotory behaviour –regulation Chemicals Anatomical entities