Presentation is loading. Please wait.

Presentation is loading. Please wait.

I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –

Similar presentations


Presentation on theme: "I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –"— Presentation transcript:

1 I529: Lab5 02/20/2009 AI : Kwangmin Choi

2 Today’s topics Gene Ontology prediction/mapping – AmiGo http://amigo.geneontology.org/cgi-bin/amigo/go.cgi – PFP http://dragon.bio.purdue.edu/pfp/ – GOtcha http://www.compbio.dundee.ac.uk/gotcha/ Pathway prediction/mapping – KAAS http://www.genome.jp/kegg/kaas

3 Gene Ontology In a species-independent manner., the GO project has developed three structured controlled vocabularies (ontologies) that describe gene products in terms of their associated

4 GO:biological process A biological process is series of events accomplished by one or more ordered assemblies of molecular functions. – E.g. cellular physiological process or signal transduction. – E.g. pyrimidine metabolic process or alpha-glucoside transport. It can be difficult to distinguish between a biological process and a molecular function, but the general rule is that a process must have more than one distinct steps. A biological process is not equivalent to a pathway; at present, GO does not try to represent the dynamics or dependencies that would be required to fully describe a pathway.

5 GO: molecular functions Molecular function describes activities, such as catalytic or binding activities, that occur at the molecular level. GO molecular function terms represent activities rather than the entities (molecules or complexes) that perform the actions, GO milecular function terms do not specify where or when, or in what context, the action takes place. – E..g. (general) catalytic activity, transporter activity, or binding etc. – E.g. (specific) adenylate cyclase activity, Toll receptor binding etc.

6 GO: cellular components A cellular component is just that, a component of a cell, but with the proviso that it is part of some larger object; Less informative This may be an anatomical structure – e.g. rough endoplasmic reticulum or nucleus or a gene product group – e.g. ribosome, proteasome or a protein dimer

7 AmiGO URL http://amigo.geneontology.org/cgi-bin/amigo/go.cgihttp://amigo.geneontology.org/cgi-bin/amigo/go.cgi AmiGO is the official tool for searching and browsing the Gene Ontology database Simple blast search is provided (not useful) AmiGO consists of a controlled vocabulary of terms covering biological concepts, and a large number of genes or gene products whose attributes have been annotated using GO terms.

8 PFP (Automated Protein Function Prediction Server) Hawkins, T., Luban, S. and Kihara, D. 2006. Enhanced Automated Function Prediction Using Distantly Related Sequences and Contextual Association by PFP. Protein Science 15: 1550-6.Enhanced Automated Function Prediction Using Distantly Related Sequences and Contextual Association by PFP The PFP algorithm has been shown to increase coverage of sequence-based function annotation more than fivefold by extending a PSI-BLAST search to extract and score GO terms individually It applies the Function Association Matrix (FAM), to score significantly associating pairs of annotations.

9 PFP method PFP uses a scoring scheme to rank GO annotations assigned to all of the most similar sequences according to – (1) their frequency of occurrence in those sequences – (2) the degree of similarity of the originating sequence to the query. This is similar to the scoring basis for the R-value used by the GOtcha method to score annotations from pairwise alignment matches (Martin et al. 2004)Martin et al. 2004

10 PFP method A GO term, f a s(f a ) is the final score assigned to the GO term, f a N is the number of the similar sequences retrieved by PSI-BLAST E_value(i) is the E-value given to the sequence I b = 2 (or log 10 [100]) to allow the use of sequence matches to an E-value of 100. Function Association Matrix (FAM), – f j is a GO term assigned to the sequence i. – P(f a | f j ) is the conditional probability that f a is associated with f j, – c(f a, f j ) is number of times f a and f j are assigned simultaneously to each sequence in UniProt – c(f j ) is the total number of times f j appeared in UniProt, – μ is the size of one dimension of the FAM (i.e., the total number of unique GO terms) – ɛ is the pseudo-count.

11 PFP Web server http://dragon.bio.purdue.edu/pfp/queue/116 8_kw.f.result.html http://dragon.bio.purdue.edu/pfp/queue/116 8_kw.f.result.html Local installation – http://dragon.bio.purdue.edu/pfp/dist http://dragon.bio.purdue.edu/pfp/dist – Installed in /home/kwchoi/public_html/PFP – You need to specify the path of blastpgp – And also need BLOSUM62

12 PFP (Automated Protein Function Prediction Server) PFP output – /home/kwchoi/public_html/I529-09-lab/Lab5/Data/pfp_data Columns – 1: predicted GO term – 2: GO category (f/p/c) – 3: raw term score – 4: term p-value – 5: rank (by p-value) – 6: confidence to be exact match – 7: rank (by column 7) – 8: confidence within 2 edges on the GO DAG – 9: rank (by column 8) – 10: confidence within 4 edges on the GO DAG – 11: rank (by column 10) – 12: GO term short definition

13 GOtcha The GOtcha method – Martin et al. BMC Bioinformatics (2004) 5:178. Martin et al. BMC Bioinformatics (2004) 5:178 GOtcha assigns functional terms transitively based upon sequence similarity. These terms are ranked by probability and displayed graphically on a subtree of Gene Ontology.

14 GOtcha performs a BLAST search of the query sequence against individual well annotated genomes. Annotations are transitively assigned from all hits, with a score corresponding to the E- value, individual GO-terms receiving cumulative scores from multiple sequence similarity matches. Cumulative scores are normalized and, for each term, two scores are obtained – the I-score which is normalized to the root node, – the C-score which is the cumulative score at the root node. For each GO-term a precomputed scoring table is used to establish the assignment likelihood for that term given that I-score and that C-score. This is represented as a probability Gotcha method

15 Pathway mapping E.g E.coli K-12 pathway (00300)

16 KAAS KAAS (KEGG Automatic Annotation Server) provides functional annotation of genes in a genome by BLAST comparisons against a manually curated set of ortholog groups in KEGG GENES. The result contains KO (KEGG Orthology) assignments and automatically generated KEGG pathways. Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A., and Kanehisa, M.; KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35, W182-W185 (2007). [NAR]NAR

17 KAAS Web server: http://www.genome.jp/kegg/kaas/http://www.genome.jp/kegg/kaas/ KAAS works best when a complete set of genes in a genome is known. Prepare query amino acid sequences and use the BBH (bi- directional best hit) method to assign orthologs. KAAS can also be used for a limited number of genes. Prepare query amino acid sequences and use the SBH (single-directional best hit) method to assign orthologs. When ESTs are comprehensive enough, a set of consensus contigs can be generated by the EGassembler server and used as a gene set for KAAS with the BBH method. Otherwise, use ESTs as they are with the SBH method.EGassembler server

18 KAAS workflow

19 Pathway mapping KAAS returns – KO list KO list – KEGG Atlas Metabolism map [Create atlas]Create atlas – Pathway maps [Create all maps]Create all maps – Hierarchy files Hierarchy files You can highlight KEGG maps using KEGG API – http://www.genome.jp/kegg/soap/doc/keggapi_man ual.html http://www.genome.jp/kegg/soap/doc/keggapi_man ual.html – See: color_pathway_by_objects

20


Download ppt "I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –"

Similar presentations


Ads by Google