Presentation is loading. Please wait.

Presentation is loading. Please wait.

This presentation is designed to show the features of four ‘third-party’ GO analysis tools. These tools and others listed on

Similar presentations

Presentation on theme: "This presentation is designed to show the features of four ‘third-party’ GO analysis tools. These tools and others listed on"— Presentation transcript:


2 This presentation is designed to show the features of four ‘third-party’ GO analysis tools. These tools and others listed on can be used in proteomics studies to view GO terms associated with a list of proteins obtained from high-throughput experiments and their statistical significance compared with a reference set of proteins.* Each presentation was prepared by the developers of the tools, using for the analysis a list of human cardiovascular-related protein accessions (or in the case of Blast2GO, the equivalent bovine protein sequences). *All of these tools have been created outside of the GO Consortium. The articles authors do not intend to recommend any tool, merely demonstrate how GO analysis of proteome sets could be performed using some of these tools. We advise researchers to try several different tools to find one which suits their needs. Introduction

3 Blast2GOSlide 4 FatiGOSlide 13 Onto-ExpressSlide 20 OntologizerSlide 27 Contents Accession list ISlide 35 Accession list IISlide 36

4 Functional Annotation: First, the BLAST step to obtain the homologue sequences for the query sequences. Second, the actual GO annotation by applying the Blast2GO method which, basically, transfers the most confident and appropriate GO annotations to the novel sequences. Statistical charts help here to understand and interpret the annotation results. Visualization: This step allows the users to get an overall idea of the assigned GO annotations of the sequence dataset making use of GO's graph structure. Conesa, A., Götz, S., García-Gómez, J.M., Terol, J., Talón, M. & Robles, M. (2005). Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21: 3674-3676 Blast2GO in Babelomics Bioinformatics Department Centro de Investigación Príncipe Felipe (CIPF)‏

5 2. GO terms are selected from this original pool to extract the most reliable annotation Once all this information is gathered, an annotation score is computed for each {GO,Query Sequence} pair. Only the most specific GO term within a branch of the GO is assigned to the query sequence, and this assignment is dependent on the 'annotation score', the threshold for which is preset by the user. The annotation score is computed as: Annotation score{GO, Seq} = (max.sim * ECw) + (#GO-1 * GOw)‏ where: max.sim: is the maximal value of similarity between the query and hit sequences that have the given GO annotation ECw: is the weight given to the Evidence Code of the original annotation. Blast2GO has defined values for these weights, which can also be modified by the user. In general, ECw = 1 for experimental evidence codes and ECw < 1 for non-experimental evidence codes. #GO: is the number of annotated children terms GOw: is the weight given to the contribution of annotated children term to a given term Annotation is the process of assigning functional categories to gene or gene products. In Blast2GO this assignment is performed for each sequence based on the information available for the homologous sequences retrieved by BLAST. Blast2GO annotation proceeds through a 2 step strategy: Functional Annotation with Blast2GO 1. All GO terms for the BLAST hit sequences are collected For the first step, BLAST results are parsed and the identifiers of the BLAST hits are found and used to query the Gene Ontology database to recover associated functional terms. Also the evidence code of each particular annotation is recovered. The evidence codes indicate how the functional assignment in the Gene Ontology database has been obtained.

6 Upload your sequence file in FASTA format, choose the appropriate BLAST parameters and database (blastp for protein sequences) and press RUN The homology search is the first and most time consuming step when attempting to transfer functional information from similar sequences to uncharacterized sequence data. This simple tool gives you the option to perform high-throughput BLAST searches against several protein databases, keep processes running until they are finished monitoring its actual status and saving the generated alignments as XML file. These XML-files can than be used as input data for the Blast2GO annotation method. In this tab you can see the actual status of your job and for big datasets come back later to retrieve the results. The BLAST Step (1/2)‏

7 Open the results with this link Save your results as an XML file. The BLAST Step (2/2)‏

8 Evidence code weights can be set to in/decrease the influence of different kinds of annotation evidence e.g. automatically generated source annotation Annotation rule parameters: e-Value cut-off as minimal quality criteria annotation rule cut-off (coverage vs. exactness) GO- Weight (more general vs. more specific terms) define a minimal alignment length allowed for function transfer Upload and parse your BLAST results in NCBI's XML format applying several filters The Annotation Step Start the annotation assignment

9 The Blast2GO web tool generates a multitude of statistical charts to understand the underlying dataset and to better interpret the generated annotation results A chart showing the e-value distribution of the BLAST results A chart showing from which source databases the transferred GO terms were originally coming from The result table to browse and export the generated annotations review browse export

10 A chart showing the most frequent GO terms throughout the dataset A chart showing the success of the annotations process giving the number of successfully ‘BLASTed’, GO-mapped and annotated sequences A chart showing how many GO terms were assigned to how many sequences A chart showing the distribution of the different evidence codes throughout the GO terms per sequence A chart showing the number of sequences annotated at a certain GO level and category A chart showing the distribution of BLAST sequence similarities A chart showing the distribution of the different evidence codes throughout the GO terms per BLAST hit A chart showing the distribution of the different species from which the BLAST hits originate

11 Open and save the results in a tabular format for further use in the GO-Graph-Viewer or as download data in Blast2GO project format for direct import into Blast2GO Saving and exporting results Blast2GO annotations are exported in a tabular format: SeqId GOterm SeqDesc Browse the generated annotations in the result table

12 Start the interactive graph visualization tool with Java Web Start Visualization: The GO-Graph-Viewer The DAG viewer tool generates joined Gene Ontology graphs (DAGs) to create overviews of the functional context of groups of sequences. Interactive graph visualization allows the navigation of large and unwieldy graphs often generated when trying to biologically explore large sets of sequence annotations. Zoom and graph navigation is provided through the DAG viewer Java Web Start tool. Define graph filtering parameters for more dense and informative graphs Save parts of your graphs in high resolution images to better communicate your results Upload your Blast2GO generated annotations

13 FatiGO Functional enrichment analysis Al-Shahrour, F., et al. (2005), Babelomics: a suite of web-tools for functional annotation and analysis of group of genes in high-throughput experiments, Nucleic Acids Research, 33, W460-W464 Al-Shahrour, F., et al. (2004), FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes, Bioinformatics, 20, 578-580 Bioinformatics Department Centro de Investigación Príncipe Felipe (CIPF)‏

14 Select your organism *Several types of identifier are acceptable, such as UniProtKB, Ensembl IDs, HGNC symbols, RefSeq, Entrez Gene etc. Enter your list or file of genes/proteins* In this example, list #1 is a list of BHF-UCL annotated cardiovascular-related proteins (see Slide 35) and list #2 is the “Rest of genome” Click options to filter the database (optional) Select the database(s) you want to query

15 Filter Tool Babelomics allows for sub-selection of gene annotations, in which gene modules are based, in order to test hypotheses in a more focused and sensitive manner. Removing from the analysis modules whose testing is unnecessary and superfluous increases the power of the tests in the multiple-testing adjustment step. Use the level of the DAG and the evidence code as filtering criteria Select subsets of annotations based on keywords and on the size of the gene module

16 Results of GO analysis Level 3 is less- granular terms. Level 9 is more- granular terms. The number of annotated proteins per GO level is displayed

17 FatiGO returns a list of GO terms which are over-represented in the list of interest, in this case the BHF-UCL list. For Biological Process terms at level 3 of the ontology, the terms that are over-represented in the BHF-UCL list include muscle contraction, cell cycle and anatomical structure development. Low p-value = more significant The proteins from your query set that are annotated to each GO term are listed

18 Best p-value FatiGO shows terms deeper in the ontology, at level 6, which are over-represented in the BHF-UCL list (but not necessarily significantly – compare p-values) such as regulation of progression through cell cycle, heart development and cholesterol absorption. These are all processes you would expect cardiovascular-related proteins to be involved in.

19 The DAG viewer tool allows visualization of the significant GO terms as a GO graph. The GO term names are displayed together with the annotation score. GO-Graph-Viewer Tool You can upload your FatiGO results to the interactive graph visualization tool

20 Onto-Express Features at a Glance Purvesh Khatri (‏ Sorin Draghici (‏ Intelligent Systems and Bioinformatics Lab Department of Computer Science Wayne State University

21 Input interface Select organism Select type of IDs in input file Choose from more than 300 microarrays. If an array of choice is not available, use your own reference. Choose a statistical distribution from: 1. hypergeometric 2. binomial 3. chi-square Choose a correction for multiple hypotheses from: 1. Bonferroni, 2. FDR, 3. Holm, 4. Sidak Supported input types are GenBank accession numbers, UniGene cluster IDs, Entrez Gene IDs, gene symbols, Affymetrix probe IDs, any of the IDs used in GO database.

22 Results – Flat view

23 Results – tree view Choose a level to expand the GO tree and click “Expand” button. Only the GO terms with at least one input gene are displayed in the tree.

24 Results – chromosome view Chromosome information is supported for human, mouse and rat. It displays number of genes on each chromosome and their positions. Clicking on “NCBI Genome view” links out to NCBI Mapviewer.

25 Results – single gene view Selecting “show in gene view” in the tree view displays the annotations for the selected gene in the GO hierarchy in the single gene view.

26 References Purvesh Khatri, Sorin Draghici, G. Charles Ostermeier, Stephen A. Krawetz. Profiling Gene Expression Using Onto-Express. Genomics, 79(2):266-270, February 2002. Sorin Draghici, Purvesh Khatri, Rui P. Martins, G. Charles Ostermeier and Stephen A.Krawetz. Global functional profiling of gene expression. Genomics 81(2):98-104, February 2003. Purvesh Khatri and Sorin Draghici. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21(18):3587-95, September 2005.

27 Ontologizer Institute for Medical Genetics Charité Universitätsmedizin Berlin Ontologizer Open Source Team located at Robinson P.N., Wollstein A., Böhme U., Beattie B. Ontologizing gene-expression microarray data: characterizing clusters with Gene Ontology. Bioinformatics. 2004 Apr 12;20(6):979-81. Grossman S., Bauer S., Robinson P.N., Vingron M. Improved detection of overrepresentation of Gene Ontology annotations with parent child analysis. Bioinformatics. 2007 Nov 15;23(22):3024-31.

28 Ontologizer – Setting up a Project Ontology, defines the GO structure Annotations, map genes to GO terms There are several predefined entries for various settings… …or you may specify the fields manually. Inputs:

29 The induced graph of these terms can be displayed. Annotated identifiers are highlighted on the fly. Ontologizer – Editing Sets of Identifiers Mouse hovering reveals direct annotations. No annotation for this one

30 Of interest here are two lists of identifiers – study and population.* *In this example the study list is a list of BHF-UCL annotated cardiovascular-related proteins (see Slide 35) and the population list is a random list of human UniProtKB accessions. Choose analysis method; parent-child takes account of the ontology structure, term-for-term treats each term independently. Ontologizer – Overview But multiple projects may reside in the workspace.

31 A list of terms is displayed. The shading indicates significance – darker shading is more significant. Click on a term to display its position in the ontology, definition and the proteins annotated to it and its parents. Ontologizer – Results

32 The term highlighted in the table will also be highlighted red in the graph. Yellow = Molecular Function Pink = Cellular Component Green = Biological Process Ontologizer – Graphical View of Results

33 Ontologizer – What Else? Can be easily invoked from the Web. Input files can be located remotely. Several procedures of multiple testing correction are supported. Results can be filtered and stored in a tabular as well as in a graphical fashion. A command line version is available.

34 Acknowledgments The authors wish to thank the developers of the tools for preparing these presentations as follows; FatiGO Fatima Al-Shahrour Blast2GOStefan Götz OntologizerSebastian Bauer and Peter Robinson Onto-Express Sorin Draghici and Purvesh Khatri

35 List of human UniProtKB accessions used in FatiGO, Onto-Express and Ontologizer analyses O00273 O60543 O75955 O95477 P00519 P01127 P01137 P01375 P01584 P02647 P02649 P02652 P02655 P02656 P04114 P04180 P05231 P05976 P06727 P06741 P06858 P07203 P08590 P09493 P09958 P10253 P10636 P10916 P11597 P11802 P12643 P12829 P12830 P13501 P16519 P17947 P18510 P22301 P24385 P25098 P25103 P29120 P30279 P30281 P34947 P35226 P36897 P37173 P38936 P40337 P42684 P42771 P42772 P42773 P45379 P45844 P46527 P49918 P50150 P55273 P55290 P61812 P84022 Q00534 Q00872 Q01449 Q13485 Q14114 Q14896 Q15796 Q16665 Q5JRA6 Q6PGN9 Q6Q788 Q86Y82 Q8N726 Q8TBM5 Q92673 Q96AB3 Q96N67 Q9BQE4 Q9H172 Q9H1R3 Q9H221 Q9H222 Q9HC96 Q9UKX2 Q9UNQ0 Q9UPY8 Q9Y5C1 Q9Y623

36 List of bovine UniProtKB accessions used in Blast2GO analysis A0JNJ5 A1A3Z1 A4FUX1 A4FUZ9 A4IFM7 A5PJI9 A5PKM2 A6QLS3 A6QP89 A7MBB9 O46680 O77482 O97919 P00435 P05363 P09428 P11151 P13789 P15497 P18341 P19034 P19035 P21146 P21214 P26892 P43249 P43480 P81644 P85100 Q03247 Q06599 Q08DE0 Q0P5D3 Q0VC16 Q0VC37 Q0VD56 Q1HE26 Q1RMM7 Q1W668 Q24JY8 Q28193 Q29RJ9 Q29RV0 Q2KI22 Q2KI76 Q2KIW4 Q2KJB3 Q2KJD8 Q2TBI0 Q32KX0 Q32KX7 Q32KY4 Q32PJ1 Q32PJ2 Q3B7N0 Q3MHH5 Q3SYR3 Q3SZE5 Q4GZT4 Q4TTZ1 Q4ZJV8 Q4ZJV9 Q58D48 Q5E9I5 Q5KR49 Q6R8F2 Q9BE40 Q9BE41 Q9GLR0 Q9GLR1 Q9MYM4 Q9XTA5

Download ppt "This presentation is designed to show the features of four ‘third-party’ GO analysis tools. These tools and others listed on"

Similar presentations

Ads by Google