Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontologies for Informatics. Infrastructure for Systems Biology. Oxford October 19 2004.

Similar presentations


Presentation on theme: "Ontologies for Informatics. Infrastructure for Systems Biology. Oxford October 19 2004."— Presentation transcript:

1 Ontologies for Informatics. Infrastructure for Systems Biology. Oxford October 19 2004

2 To provide structured controlled vocabularies for the representation of biological knowledge in biological databases.

3 Manifesto of Liberation Bioinformatics  Be open source  Use open standards  Make data & code available without constraint  Involve your community

4 Gene Ontology - 1998 FlyBaseDrosophilaCambridge, EBI, Harvard Berkeley & Bloomington. SGDSaccharomycesStanford. MGIMusJackson Labs., Bar Harbor.

5 Gene Ontology - 2004  Fruitfly - FlyBase  Budding yeast - Saccharomyces Genome Database (SGD)  Mouse - Mouse Genome Database (MGD & GXD)  Rat - Rat Genome Database (RGD)  Weed - The Arabidopsis Information Resource (TAIR)  Worm - WormBase  Dictyostelium discoidem - Dictybase  InterPro/UniProt at EBI - InterPro  Fission yeast - Pombase  Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen  Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB - Sanger  Microbes - Vibrio, Shewanella, B. anthracus, … - TIGR  Grasses - rice & maize - Gramene database  zebra fish - Zfin  Coming: Xenopus, Chlamydomonas, Tetrahymena, Gallus & more.

6 GO Three (Orthogonal) Ontologies  Biological Process Goal or objective within cell, tissue..  Molecular Function Elemental activity or task  Cellular Component location or complex

7 molecular function 7422 terms biological process 8972 terms cellular component 1472 terms all 17,866 terms definitions 16,600 (93%) Content of GO

8 What is the least complex data structure that is sufficient?  Key word list?  Hierarchical tree?  Directed acyclic graph?  Other? What data structure to use ?

9 Directed Acyclic Graph tree directed acyclic graph

10  ISA (hypernomy/hyponomy) as in: an elephant is a mammal  PARTOF (meronomy/holonomy) as in: a trunk is part of an elephant  REGULATES carbohydrate metabolism  regulates: regulation of carbohydrate metabolism Classes of parent-child relationship

11 Cellular component %membrane %vacuolar membrane %nuclear membrane %intracellular %cell <cytoplasm <vacuole <vacuolar membrane <vacuolar lumen <nucleus <nuclear membrane Cellular component vacuolar membrane intracellular vacuole vacuolar lumen cytoplasmnucleus nuclear membrane cell ISA (%) PARTOf (<) Structure of the Ontologies

12 term: chloroplast go_id: GO:0009507 definition: A chlorophyll-containing plastid with thylakoids organized into grana and frets, or stroma thylakoids, and embedded in a stroma. definition_reference: ISBN:0471245208 term: ketone catabolism goid: GO:0042182 definition: The breakdown into simpler components of ketones, a class of organic compounds that contain the carbonyl group, CO, and in which the carbonyl group is bonded only to carbon atoms. The general formula for a ketone is RCOR, where R and R are alkyl or aryl groups. definition_reference: GO:curators GO terms are defined & have unique id’s

13 literature curation: Inferred from Mutant Phenotype Inferred from Direct Assay Inferred from Genetic Interaction Inferred from Physical Interaction Inferred from Expression Pattern Traceable Author Statement Non-traceable Author Statement. “homologies”: Inferred from Sequence Similarity computed annotation: Inferred from Electronic Annotation Annotation of GO terms to gene products

14 GO Gene Association Tables Herpes viruses Vibrio cholerae, B. anthracis, Coxiella burnetii, Pseudomonas syringae, Shewanella oneidensis … Dictyostelium discoidem Saccharomyces cerevisiae, Schizosaccharomyces pombe Trypanosoma brucei, Leishmania major, Plasmodium falciparum Caenorhabditis elegans Drosophila melanogaster, Glossina morsitans Danio rerio Mus “domesticus”, Rattus norvegicus, Homo sapiens bioinformaticus Arabidopsis thaliana, Oryza sativa

15 FBFBgn0015567&agr;-AdaptinGO:0005886FB:FBrf0093110|PMID:9118220IDAC FBFBgn0015567&agr;-AdaptinGO:0007269FB:FBrf0108281|PMID:10218159NASP FBFBgn0015567&agr;-AdaptinGO:0016192FB:FBrf0124164NASP FBFBgn0015567&agr;-AdaptinGO:0030122FB:FBrf0115359NASC FBFBgn0015567&agr;-AdaptinGO:0030122FB:FBrf0124164NASC FBFBgn0015567&agr;-AdaptinGO:0006901FB:FBrf0108281|PMID:10218159TASP FBFBgn0015567&agr;-AdaptinGO:0008021FB:FBrf0108281|PMID:10218159TASC FBFBgn0015567&agr;-AdaptinGO:0016181FB:FBrf0141528|PMID:11697879TASP FBFBgn0015567&agr;-AdaptinGO:0016183FB:FBrf0108281|PMID:10218159TASP FBFBgn0015567&agr;-AdaptinGO:0030135FB:FBrf0108281|PMID:10218159TASC FBFBgn0010215&agr;-CatGO:0003779FB:FBrf0132100ISSF FBFBgn0010215&agr;-CatGO:0007016FB:FBrf0129868|PMID:10908592ISSP FBFBgn0010215&agr;-CatGO:0008092FB:FBrf0132100ISSF FBFBgn0010215&agr;-CatGO:0016342FB:FBrf0129868|PMID:10908592ISSC FBFBgn0010215&agr;-CatGO:0016343FB:FBrf0129868|PMID:10908592ISSF FBFBgn0010215&agr;-CatGO:0005912FB:FBrf0151280|PMID:12147138NASC SGD S0004660 AAC1 GO:0005743 SGD_REF:12031|PMID:2167309 TAS C SGD S0004660 AAC1 GO:0006854 SGD_REF:12031|PMID:2167309 IDA P SGD S0004660 AAC1 GO:0005471 SGD_REF:12031|PMID:2167309 IDA F SGD S0000289 AAC3 GO:0005743 SGD_REF:13606|PMID:1915842 TAS C SGD S0000289 AAC3 GO:0006854 SGD_REF:13606|PMID:1915842 IMP P SGD S0000289 AAC3 GO:0005471 SGD_REF:13606|PMID:1915842 IMP F ADP/ATP translocator YBR085W|ANC3 gene taxid:4932 20010213 SGD go/gene_associations/

16 Curated GO Annotations 1.12.2001 1.12.2003 Gene products 42421 253962 GO terms 4262 7741

17 Expression studies:Human ontogenic tumor gene expression Human breast cancer gene expression Human endothelial cell gene expression Human fibrosarcoma cell cDNAs Human osteoblast progenitor cell gene expression Human fibrosarcoma cell gene expression Mouse cDNAs - FANTOM/FANTOM2 Projects Mouse lung gene expression Mouse dendritic cell gene expression Mouse hepatic and hippocampal gene expression Mouse liver tumor gene expression Drosophila gene expression during aging Drosophila embryo gene expression Affymetrix Probe Sets Protein annotation:Vertebrate nuclear proteins Human GPCR proteins Mouse proteome PANTHER protein families EST collections:Cattle ESTs, Pig ESTs, Dog ESTs Paracoccidioides brasiliensis ESTs Plasmodium falciparum ESTs Honey bee ETSs Schizophyllum commune ESTs Meloidogyne incognita ESTs Plasmodium vivax ESTs Amblyomma variegatum ETSs Genomic annotation:Drosophila melanogaster genome Caenorhabditis briggsae genome Anopheles gambiae genome Schizosaccharomyces pombe genome Plasmodium yoelli genome Plasmodium falciparum genome Dictyostelium genome Rice genome Plant alternatively spliced genes Human pseudogenes http://www.geneontology.org/GO.biblio.html

18 SGD: Dwight et al. 2002 Database annotations

19 Meloidogyne incognita: McCarter et al. 2003 Annotation summaries

20 The combinatorial nightmare

21 Combinatoric explosion ProcessBody part Regulation Negative or Positive 2 * 1 * (# of processes - 1) Induction 2 * 2 * (# of processes - 2) 2 * 2 * (# of processes - 2) * (# of body parts)

22

23

24 OBOL - Open Biological Ontologies Language Chris Mungall

25 The OBOL System  Approach: annotation-time term composition vs tools for maintenance of large directed acyclic graphs  Requires new generalization hierarchies  Term decomposition using grammars  Generating computable logical definitions  Using logical definitions – term creation and error checking

26 A A Formal Grammar for OBO terms Formal Grammar for OBO terms  All GO terms are NOUN-PHRASES  A NOUN-PHRASE is (recursively) made from a NOUN (includes inflected verbs; eg binding) an ADJECTIVE followed by a NOUN-PHRASE a NOUN-PHRASE preceeded by a NOUN-PHRASE acting as ADJECTIVE; eg clathrin coat a NOUN-PHRASE then PREPOSITION then NOUN- PHRASE; eg regulation of transcription an (optional) NOUN-PHRASE then a RELATIONAL ADJECTIVE then a NOUN-PHRASE; eg clathrin- coated vesicle  Precedence rules are also required to prune parse forest A Formal Grammar for OBO terms

27 Gene Ontology Software Browsers - Amigo Database - mySQL Editor - DAG-EDIT geneontology.sourceforge.net Third party software (e.g. Spotfire; TreeMap; GoFish; FatiGO)

28

29

30

31

32 OBO-Edit - a powerful editor for directed acyclic graphs. data adaptors multiple edits on same graph define your own relationship types plug in architecture - e.g. add an external in-line dictionary

33

34 The importance of community feedback Everyone can suggest new terms for GO and tell us what errors we have made. geneontology.sourceforge.net


Download ppt "Ontologies for Informatics. Infrastructure for Systems Biology. Oxford October 19 2004."

Similar presentations


Ads by Google