Presentation is loading. Please wait.

Presentation is loading. Please wait.

GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Similar presentations


Presentation on theme: "GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation."— Presentation transcript:

1

2 GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation epidermal cell differentiation regulation of flower development interleukin-18 receptor complex B-cell differentiation dorsal ectoderm

3

4 biosynthesis is_a metabolism

5 cysteine is_a serine family amino acid is_a amino acid is_a amine

6 cysteine is_a serine family amino acid is_a amino acid is_a serine

7 Composed terms currently cause problems –No link to external ontology term –Redundancy –Inconsistency –Extra work –Annotation bottleneck –Tangled DAGs and confusing displays we have no way to disentangle Solution so far: –fix errors based on results of term name parsing (Obol) reactive, not proactive

8 Solution: actively manage composed terms Explicit pre-coordination –Composed terms should now/soon be coordinated using oboedit plugin building block terms are recorded in ontology along with composite term Benefits: –Correct DAG structure can be inferred from external ontologies e.g. make sure GO + CHEBI “align” –placement & consistency checking automated –additional work can be automated synonyms, text definitions

9 How will terms be pre- coordinated by oboedit? How do we record a definition for a composite term? –using a logical definition (computational essence) A logical definition consists of: –a generic term (aka genus) –relationships to other terms which serve to discriminate this specific term from other is_a children of the generic term (aka differentiae) Can be written in natural language as: –A which

10 Example of pre-coordination cysteine biosynthesis generic term: –biosynthesis discriminating characteristics: –outputs cysteine –natural language (Aristotelian style): a biosynthesis process which outputs cysteine

11 Example in Obo format [Term] id: GO:0019344 name: cysteine biosynthesis intersection_of: GO:0009058 ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine is_a: GO:0009070 ! serine family amino acid biosynthesis is_a: GO:0006534 ! cysteine metabolism

12 Alternate syntax used in pheno-syntax more compact similar to OWL abstract syntax I use Obo1.2 format or natural language in the rest of this presentation GO:cysteine_biosynthesis == GO:biosynthesis ∏ outputs(CHEBI:cysteine)

13 This allows us to dynamically untangle Process axis view (primary is_as, via generic term): –biological_process metabolism –biosynthesis »cysteine biosynthesis Process participant axis view: –amine amino acid –serine family amino acid »cysteine Combined view –(same as current tangled diamond lattice)

14 Obol demo http://yuri.lbl.gov/amigo/obol

15 Recording the relationship is important Why not just a simple cross-product? –e.g. biosynthesis x cysteine Relationships are important for reasoning and querying –Consider: cysteine biosynthesis from serine mRNA export from nucleus during heat stress Without the relations, the logical definition is not specific enough –the essence is not captured Relations should come from RO –more required

16 Multiple discriminating characteristics are allowed Cysteine biosynthesis from serine –Generic term: biosynthesis –Discriminating characteristics: output cysteine input serine [Term] name: cysteine biosynthesis from serine intersection_of: GO:0009058 ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine intersection_of: input CHEBI:17822 ! serine

17 Composite terms can be nested [Term] id: GO:xxxxxxx name: regulation of cysteine biosynthesis intersection_of: GO:0050789 ! regulation of biological process intersection_of: regulates GO:0019344 ! cysteine biosynthesis [Term] id: GO:0019344 name: cysteine biosynthesis intersection_of: GO:0009058 ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine regulation^regulates(biosynthesis^outputs(cysteine)) regulation^regulates(biosynthesis)^outputs(cysteine) YES NO

18 Composite terms can optionally be manufactured in bulk Generic term: {metabolism,biosynthesis} Differentia: has_output {serine, cysteine, …} With caution… –Sparse vs dense matrices –not all combinations are types

19 On the importance of necessary and sufficient conditions Why intersection_of? Why not just make normal links in the GO DAG? –normal relationships are for necessary conditions only –we want both necessary and sufficient conditions captures the essence of the term

20 Normal DAG links only capture necessary conditions, not essence immune cell activation inflammatory response part_of A change in morphology and behavior of a macrophage resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor text def: macrophage activation is_a

21 Indistinguishable by DAG immune cell activation inflammatory response part_of A change in morphology and behavior of a monocyte resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor text def: monocyte activation is_a

22 essence captured by genus- differentia macrophage activation immune cell activation is_a inflammatory response part_of id: GO:macrophage_activation intersection_of: GO:cell_activation intersection_of: activates CL:macrophage

23 essence captured by genus- differentia macrophage activation immune cell activation is_a inflammatory response part_of id: GO:macrophage_activation intersection_of: GO:cell_activation intersection_of: activates CL:macrophage CL:macrophage cell activation is_a genus activates

24 Current status of pre- coordinated terms SO already contains composite terms –46 pre-coordinated terms –A silenced gene is a gene which has the quality of being silenced GO-BP/CL integration underway –retrospectively pre-coordinated terms Obol page has pre-coordinated terms from automatic parsing –http://www.fruitfly.org/~cjm/obolhttp://www.fruitfly.org/~cjm/obol

25 Pre- vs post- coordinated Pre-coordination –terms are in ontology with IDs and computable definitions –increases complexity of ontology –complexity can be managed by tools e.g. new oboedit features Post-coordination –terms are combined in the database –forces more complexity in database schema and database applications

26 Pre-coordination is useful in moderation Commonly used terms should be pre- coordinated eg cysteine biosynthesis; oocyte differentiation; pectoral fin Avoid taking to extremes cf ICD-9 Where do we draw the line? –ontologies should be built around one or a few axes of classification term ‘explosion’ typically gets large when multiple axes are combined –we can change our minds later pre- and post- coordination is commensurable

27 Commensurability Annotator annotates to –nucleus^part_of(astrocyte) Anatomy editor creates new term –uses oboedit cross-product plugin –astrocyte_nucleus = nucleus^part_of(astrocyte) Annotation can be dynamically ‘promoted’ to new term in answer to queries –various software techniques for achieving this

28 Post-coordination in GO annotations Pre- and post- coordination are compatible and commensurable We should extend the annotation format to allow denoting more specific classes –e.g. cholesterol transport in liver –advanced applications can query this –standard applications suffer no loss –extended annotations can be used to help seed new terms in the ontology This is already being done (MGI,Dicty) –we just want to capture this in interopeable way

29 Post-composition in gene association files New column in GA file format Gene Product Term ID…Properties AABC1GO:0030301 (cholesterol transport) located_in(MA:liver) AABC2GO:0048663 (neuron fate development) has_participant(FBbt:Y_neuron)

30 Database issues Chado and GO DB can handle pre- and post- coordination –in theory anyway not yet fully tested How does it work? –‘anonymous term’ created for coordinated term –documentation in chado cvs chado/modules/cv/doc/


Download ppt "GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation."

Similar presentations


Ads by Google