Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.

Similar presentations


Presentation on theme: "Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader."— Presentation transcript:

1 Tools in Bioinformatics Ontologies and pathways

2 Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader However, it is a lousy way to tell that to a computer When are we interested in a computer-interperable annotation? We want all the proteins associated with a certain disease All the proteins localized to a lysosome We found a cluster of “interesting” genes and we want to know what are they involved it We want to measure the similarity between gene pairs

3 Simple solution The simplest solution is to use a set of keywords for every protein Why is this a bad solution?

4 What’s in a name? Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis All refer to the process of making glucose from simpler components

5 What’s in a name? Same name for different concepts Different names for the same concept Vast amounts of biological data from different sources  Cross-species or cross-database comparison is difficult The problem:

6 What is the Gene Ontology? A (part of the) solution: The Gene Ontology: “a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing” A controlled vocabulary to describe gene products - proteins and RNA - in any organism.

7 What is GO? One of the Open Biological Ontologies Standard, species-neutral way of representing biology Three structured networks of defined terms to describe gene product attributes More like a phrase book than a biology text book

8 How does GO work? What does the gene product do? Molecular function Where and when does it act? Cellular compartment What is the purpose of these activities? Biological process What information might we want to capture about a gene product?

9 Molecular Function insulin binding insulin receptor activity activities or “jobs” of a gene product

10 Cellular Component where a gene product acts

11 Cellular Component

12 Enzyme complexes in the component ontology refer to places, not activities.

13 Biological Process a commonly recognized series of events cell division

14 Biological Process transcription

15 Ontology Structure Ontologies are structured as a hierarchical directed acyclic graph Terms can have more than one parent and zero, one or more children Terms are linked by two relationships is-a  part-of 

16 Ontology Structure cell membrane chloroplast mitochondrial chloroplast membrane is-a part-of

17 True Path Rule The path from a child term all the way up to its top-level parent(s) must always be true cell  nucleus  chromosome But what about bacteria?

18 True Path Rule Resolved component ontology structure: cell  cytoplasm  chromosome  nuclear chromosome  nucleus  nuclear chromosome

19 GO Annotation Using GO terms to represent the activities and localizations of a gene product Annotations contributed by members of the GO Consortium model organism databases cross-species databases, eg. UniProt Annotations freely available from GO website

20 GO Annotation Electronic annotation from mappings files e.g. UniProt keyword2go High quantity but low quality Annotations to low level terms Not checked by curators Manual annotation From literature curation Time consuming but high quality

21 Where do we see GO annotations Entrez Gene / GeneCards / SwissProt Organism-specific databases amigo.geneontology.org/

22 Pathways – beyond terms Saying that a gene participates in gluconeogenesis and binds pyruvate in the nucleus does not provide us with all the information Pathway databases specify where is the plays of a specific gene/protein with respect to other genes doing similar jobs

23 KEGG – Kyoto Encyclopedia of Genes and Genomes www.genome.jp/kegg/ http://www.genome.jp/kegg/pathway.html Manually annotated “Reference maps” linked to hundreds of genomes Focus on metabolic pathways Can be used to answer questions: Give me all the genes involved in pathway X! Given a set of genes, is there a pathway that has a lot of genes in our set?

24 KEGG

25 BioCarta http://www.biocarta.com/genes/index.asp Focus on human signaling pathways

26 MSigDB So far we saw curated databases Focus on the established knowledge Always lagging behind MSigDB – combines “established” with gene sets that came up in some experiment Up regulated after UV exposure Down in colorectal cancers Predicted targets of some transcription factor Frequently more useful than GO/KEGG


Download ppt "Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader."

Similar presentations


Ads by Google