Presentation is loading. Please wait.

Presentation is loading. Please wait.

The European Nutrigenomics Organisation Gene Ontology (GO) analysis Chris Evelo and Lars Eijssen, Maastricht University.

Similar presentations


Presentation on theme: "The European Nutrigenomics Organisation Gene Ontology (GO) analysis Chris Evelo and Lars Eijssen, Maastricht University."— Presentation transcript:

1 the European Nutrigenomics Organisation Gene Ontology (GO) analysis Chris Evelo and Lars Eijssen, Maastricht University

2 the European Nutrigenomics Organisation Amigo browser GO consortium: The Gene Ontology (GO) project gives a consistent description of gene products based on information from different databases. Gene Ontology (GO) levels (I)

3 Gene Product annotation with GO terms Cellular Component  nucleus  chromosome  DNA topoisomerase complex Biological Process  DNA replication  DNA topological change  DNA ligation  DNA repair Molecular Function  chromatin binding  DNA topoisomerase activity  DNA-dependent ATPase activity Human DNA topoisomerase IIA (P11388)

4

5 the European Nutrigenomics Organisation Gene Ontology (GO) levels (II)

6 the European Nutrigenomics Organisation All freely available from the internet Gene Ontology analysis tools Onto-Express GOToolbox MAPPFinder (GenMAPP) GOstat GO-Elite GeneMerge GOSurfer David/EASE Fatigo Metacore (not free, NuGO has licences)

7 GO analysis versus pathway analysis  Biological pathways contain more information, GO classes are just sets of genes that share an annotation  Pathways are generally more curated  GO classes are however organised in a tree, biological pathways are (in practice) not  GO classes are also more uniformly covering the space of biological processes, pathway analysis depends heavily on the pathways that have been contributed/added  GO also covers cellular localisation and biochemical function

8 NuGO GenePattern modules  NuGO Quality Control Analysis –Quality control; Bioconductor packages affy, affyPLM, simpleaffy  NuGO Expression File Creator –Normalisation (rma, gcrma, Mas 5.0, dchip); Bioconductor packages rma, gcrma  Limma Analysis –Differential Gene Expression; Bioconductor package limma  TopGO Analysis –Gene Ontology based functional analysis; Bioconductor package topgo  Get Result for GO –Functional analysis; Bioconductor R script, to filter data for genes associated with one particular GO identifier Slide from: Caroline Reiff, RRI, Aberdeen If you use these modules for your publication, please cite: De Groot, P.J., Reiff, C., Mayer, C., Mueller, M. NuGO contributions to GenePattern. Genes Nutrition 2008; 3:

9 Slide from: Caroline Reiff, RRI, Aberdeen TopGO analysis  Runs topGO (bioconductor) –Gene enrichment analysis tool, which integrates the knowledge about the relationship between GO terms (BP, MF, CC) for the calculation of statistical significance (Alexa et al., 2006).  2 test statistics –Fisher`s exact test (define threshold i.e. FDR<0.05) –Kolmogorov Smirnov (KS) test (looks at distribution of P values)  3 GO scoring algorithms (classic, elim, weight) –classic scores each node independent –elim scores nodes bottom up, scores parent nodes after elimination of genes present in significant child node –weight scores nodes bottom up, assigns weights to genes based on P values obtained for each node

10 Scoring the tree (I)  Classic: 2/20 2/20 (20/100) 5/10 (7/30) 7/10 3/25 (11/50) 1/15 This node This node plus subtree these values are used to score! (because the genes belong in fact to that term as well) Suppose all the bold values are significant  The classis algorithm would return all these processes!

11 Scoring the tree (II)  However, it would be better to only return the best term in every branch –Best could mean: the most specific significant one –This can be achieved by removing genes that are present in significant child leaves, from the parent’s score  Elim does this: 2/20 2/20 (20/100) 5/10 (7/30) 7/10 3/25 (11/50) 1/15 (4/40)

12 Scoring the tree (III)  Another option to score branches would be to compute the significance of each leave just as the classis algorithm  Hereafter, for every branch the most significant leave is the one that is reported back

13

14 GO_Elite  Compatible with GenMAPP Mappfinder  Smart algorithm  Done in Python –Fast –Runs on Windows and Linux (incl NBX)  Still under development  Collaborative development

15 Go_Elite Searches relationships in a hierarchical nature Identifies most significant scoring GO term: with higher score than all sibling terms For sibling terms, if one sibling branch scores higher than the parent and another branch does not, the highest scoring term from the latter sibling branch is also selected for the GO-Elite output, but the parent term is not

16 TopGO Analysis (GenePattern) implements bioconductor package topgo Input: Limma results table (renamed to contain characters only) or table obtained from other analysis containing the following 3 columns: topGO Analysis tests performed in GenePattern: GenePattern topgotopgo statisticstopgo algorithm classic FisherFisher`s exact test classic classic KSKS testclassic elim FisherFisher`s exact testelim elim KSKS testelim Weight FisherFisher`s exact testweight Slide from: Caroline Reiff, RRI, Aberdeen

17 Load limma table Enter threshold (P value or FDR) Enter cdf name within quotes and with.db extension Slide from: Caroline Reiff, RRI, Aberdeen

18 Example results table for elim Fisher test (top 15 GO biological processes) GO.IDTermAnnotatedSignificantExpectedelim GO: immune response E-12 GO: electron transport E-07 GO: fatty acid metabolic process E-06 GO: DNA replication E-06 GO: antigen processing and presentation of e E-06 GO: mitosis E-05 GO: myelination GO: cell division GO: immunoglobulin mediated immune response GO: regulation of fatty acid metabolic proce GO: RNA splicing GO: regulation of progression through cell c GO: mast cell activation GO: cholesterol biosynthetic process GO: rRNA processing Slide from: Caroline Reiff, RRI, Aberdeen TopGO analysis output

19 A GO Graph for each of the 5 tests (squares= 15 most significant GO Ids) Slide from: Caroline Reiff, RRI, Aberdeen

20 Load Limma table Enter Chip name Click run Slide from: Caroline Reiff, RRI, Aberdeen Get Results For GO

21 Highly significant FDR plus strong down-regulation Slide from: Caroline Reiff, RRI, Aberdeen Example: results FA metabolism

22 load GCT CLS CHIP file Click run Wait for the result Slide from: Caroline Reiff, RRI, Aberdeen GenePattern also has a GSEA module

23 Enrichment in phenotype: C (6 samples) 64 / 162 gene sets are upregulated in phenotype C 1 gene sets are significant at FDR < 25% 1 gene sets are significantly enriched at nominal pvalue < 1% 5 gene sets are significantly enriched at nominal pvalue < 5% SnapshotSnapshot of enrichment results Detailed enrichment results in html formatenrichment results in html Detailed enrichment results in excel format (tab delimited text)enrichment results in excel Guide toGuide to interpret results Enrichment in phenotype: ILko (6 samples) 98 / 162 gene sets are upregulated in phenotype ILko 0 gene sets are significantly enriched at FDR < 25% 3 gene sets are significantly enriched at nominal pvalue < 1% 13 gene sets are significantly enriched at nominal pvalue < 5% SnapshotSnapshot of enrichment results Detailed enrichment results in html formatenrichment results in html Detailed enrichment results in excel format (tab delimited text)enrichment results in excel Guide toGuide to interpret results Slide from: Caroline Reiff, RRI, Aberdeen Example results of GSEA

24 What to use for gene sets You can use whatever you like… (meaning one GSEA is not the same as another even if it uses the same statistics)  Genesets from WikiPathways pathways PathVisio now also have GSEA  Metabolite sets (you could do MSEA…)  Could even use GO classes

25 Caroline Reiff Philip De Groot Sarah Wieland Kenneth Strouts Claus Mayer Tony Travis NuGO Nathan Salmonis Stan Gaj Lars Eijssen Acknowledgements


Download ppt "The European Nutrigenomics Organisation Gene Ontology (GO) analysis Chris Evelo and Lars Eijssen, Maastricht University."

Similar presentations


Ads by Google