GO enrichment and GOrilla

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Asking translational research questions using ontology enrichment analysis Nigam Shah
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Ontology John Pinney
Enrichment Network Analysis and Visualization (ENViz) Cytoscape plugin for integrative statistical analysis and visualization of multiple sample matched.
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Differentially expressed genes
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
. Differentially Expressed Genes, Class Discovery & Classification.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Working with enriched gene sets in R Peter Svensson Micheline Giphart-Gassler Harry Vrieling.
1Module 2: Analyzing Gene Lists Canadian Bioinformatics Workshops
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
GO::TermFinder Gavin Sherlock Department of Genetics Stanford University
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Frédéric Schütz Statistics and bioinformatics applied to –omics technologies Part II: Integrating biological knowledge Center.
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –
Entities and Objects The major components in a model are entities, entity types are implemented as Java classes The active entities have a life of their.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
Gene expression analysis
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
PaLS: Pathways and Literature Strainer Filtering common literature, ontology terms and pathway information. Andrés Cañada Pallarés Instituto Nacional de.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Statistical Testing with Genes Saurabh Sinha CS 466.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Flat clustering approaches
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
2/3/2005 Gene Ontology (GO) The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Nature as blueprint to design antibody factories Life Science Technologies Project course 2016 Aalto CHEM.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Module 2: Analyzing gene lists: over-representation analysis
a Cytoscape plugin to assess enrichment of
Clustering Manpreet S. Katari.
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
GO : the Gene Ontology & Functional enrichment analysis
::: Schedule. Biological (Functional) Databases
Statistical Testing with Genes
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Gene expression analysis
Statistical Testing with Genes
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Presentation transcript:

GO enrichment and GOrilla Roy Navon Agilent Labs Tel-Aviv

Gene Ontology (GO) The Gene Ontology (GO) project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. These GO terms are represented in an hierarchical manner as a Directed Acyclic Graph (DAG). Most GO terms contain several genes and each gene may belong to several GO terms.

Gene Ontology (GO) - 2 The ontology covers three domains: cellular component, the parts of a cell or its extracellular environment such as rough endoplasmic reticulum or nucleus. molecular function, the elemental activities of a gene product at the molecular level, such as binding or catalysis. biological process, operations or sets of molecular events with a defined beginning and end, such as cell cycle or immune response.

Motivation Current high throughput experiments (such as microarrays) often generate gene lists as a result. Instead of analyzing these genes one by one, a more global approach can be used. We can use to GO database to find genes with a common annotation in our data.

GO Enrichment Tools Several tools that perform GO enrichment are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. Typically, the hyper geometric distribution is used to test this enrichment. Add heatmap

The hypergeometric distribution Consider the following scenario: A drawer contains N socks. Exactly B of the socks are black and the remaining (N − B) are white. We pick n socks by random and b of them are black. Do the n socks we picked contain significantly more black socks than we expected? In other words, are the black socks enriched in the n socks we randomly chose?

The hypergeometric distribution (2) Under a uniform distribution the probability of finding exactly b black socks in the n randomly chosen socks is described by the hyper-geometric function: We are usually intersted in the tail probability: finding b or more black socks :

Flexible Threshold The hyper geometric method requires the user to define what is the target set and what is the background set. In most experiments (such as differential expression) the user ranks all genes (by, for example, fold change) and then needs to set an arbitrary threshold (such as fold change>x, p-value<y, top 50 genes, etc.) to define the target set. A better solution is to use the entire list and find GO terms enriched at the TOP of this list (without defining what “top” is).

mHG score 1 . b(n) 1s Threshold n |v| = N, with B 1s

mHG p-values Consider a random vector V uniformly distributed in {0,1}N, with B 1s. What is the distribution of mHG(V)? What is the probability of mHG(V)  s? Union bound (Bonferroni): p-val(s)  Ns . A more subtle bound (Eden et al): p-val(s)  Bs Dynamic programming in O(N2) yields the exact distribution (Eden et al).

GOrilla GOrilla is a web based tool we developed for GO enrichment analysis. Its main advantages over other GO enrichment tools are: Flexible threshold and exact p-value (no simulations) Graphical output – color coded GO DAG bases on enrichment p-values. Fast and easy to use. Takes only a few seconds (while other tools take minutes)

GOrilla – GO enrichment analysis tool -log HG p-value gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 . 1 . 1 .

Summary of GOrilla’s advandages While most other tools require the user to explicitly define a target list and a background list, GOrilla searches for GO terms enriched at the top of the list – without requiring the user to explicitly set the threshold that defines what “top” is. An exact p-value for the enrichment of each GO term is reported as part of the output. GOrilla provides an easy to use intuitive web based interface. The enriched GO terms are graphically presented in the context of the complete GO DAG, in addition to tabular results. GOrilla is very fast taking only a few seconds for each analysis. Accepts RefSeq accessions, gene symbols and others.

Comparison to other GO enrichment tools (as of late 2008)

GOrilla usage statistics http://cbl-gorilla.cs.technion.ac.il/

Thanks to: Israel Steinfeld Eran Eden Doron Lipson Zohar Yakhini

Demo and Hands-On

Rank by t-test: =TTEST(classA,classB,2,2) Up/down regulated: Calculate the 2 averages - =AVERAGE(classA) Calculate fold change – average1 – average2 -log(pvalue): =-LOG(ttest p-value) Up/down regulated: =SIGN(fold change)*(-logpvalue)

3. Kittelson – ischemic vs. non ischemic 1. Van’t veer: Rank all genes according to t-test Run GOrilla (and go over all the parameters) Rank genes again according to up regulated genes Run GOrilla again Random permutation HG 2. Espen Correlation (positive) with miR-18 (cell cycle) 3. Kittelson – ischemic vs. non ischemic

GOrilla webpage http://cbl-gorilla. cs. technion. ac GOrilla webpage http://cbl-gorilla.cs.technion.ac.il/ Eden, Navon et al – BMC Bioinformatics