Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html
Gene Set Enrichment Analysis (GSEA)
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Introduction to Microarry Data Analysis - II BMI 730
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Differentially expressed genes
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
. Differentially Expressed Genes, Class Discovery & Classification.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran, Chris Thomas Yan Gary Chris Leon.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
1Module 2: Analyzing Gene Lists Canadian Bioinformatics Workshops
Multiple testing in high- throughput biology Petter Mostad.
GO::TermFinder Gavin Sherlock Department of Genetics Stanford University
1 Identifying differentially expressed sets of genes in microarray experiments Lecture 23, Statistics 246, April 15, 2004.
Frédéric Schütz Statistics and bioinformatics applied to –omics technologies Part II: Integrating biological knowledge Center.
Gene Set Enrichment Analysis (GSEA)
Essential Statistics in Biology: Getting the Numbers Right
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Suppose we have analyzed total of N genes, n of which turned out to be differentially expressed/co-expressed (experimentally identified - call them significant)
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Differential Gene Expression Dennis Kostka, Christine Steinhoff Slides adapted from Rainer Spang.
GSEA Overview -- Workflow GSEA is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant.
Course on Functional Analysis
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
BIOS6660 shRNAseq Gene Set Enrichment Analysis Tzu L Phang PhD Robert Stearman PhD April 16, 2014.
‘Omics’ - Analysis of high dimensional Data
Integrating Biology and Statistics: Gene Set Methods BIOS Winter/Spring 2010.
Summarizing Differential Expression Using Mann-Whitney U-tests.
Statistical Testing with Genes Saurabh Sinha CS 466.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
The Broad Institute of MIT and Harvard Differential Analysis.
GO enrichment and GOrilla
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
1 A Discussion of False Discovery Rate and the Identification of Differentially Expressed Gene Categories in Microarray Studies Ames, Iowa August 8, 2007.
Gene Set Enrichment Analysis. GSEA: Key Features Ranks all genes on array based on their differential expression Identifies gene sets whose member genes.
Canadian Bioinformatics Workshops
David Amar, Tom Hait, and Ron Shamir
Module 2: Analyzing gene lists: over-representation analysis
::: Schedule. Biological (Functional) Databases
Differential Gene Expression
Statistical Testing with Genes
Canadian Bioinformatics Workshops
Significance analysis of microarrays (SAM)
Genesets and Enrichment
Functional enrichment of differentially expressed genes.
Varying Intolerance of Gene Pathways to Mutational Classes Explain Genetic Convergence across Neuropsychiatric Disorders  Shahar Shohat, Eyal Ben-David,
Statistical Testing with Genes
Statistical chart of significantly differentially expressed genes
Extended analysis of differential expression datasets.
Distinct molecular and clinical correlates of H3F3A mutation subgroups
Global analysis of the chemical–genetic interaction map.
Transcriptome profiling of PD-L1 antibody–treated macrophages showed inflammatory phenotype, increased survival and proliferation, and decreased apoptosis.
Presentation transcript:

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

 Gene expression profiling  Which molecular processes/functions are involved in a certain phenotype (e.g., disease, stress response, etc.)  The Gene Ontology (GO) Project  Provides shared vocabulary/annotation  GO terms are linked in a complex structure  Enrichment analysis:  Find the “most” differentially expressed genes  Identify functional annotations that are over-represented  Modified Fisher's exact test A quick review

A quick review: Modified Fisher's exact test Differentially expressed (DE) genes/balls 4 out of 810 out of 50 2 out of 8 4 out of 81 out of 82 out of 85 out of 83 out of 8 Null model: the 8 genes/balls are selected randomly … So, if you have 50 balls, 10 of them are blue, and you pick 8 balls randomly, what is the probability that k of them are blue? Do I have a surprisingly high number of blue genes? Genes/balls

A quick review: Modified Fisher's exact test m=50, m t =10, n=8 Hypergeometric distribution So … do I have a surprisingly high number of blue genes? Can such high numbers (4 or above) occur by change? What is the probability of getting at least 4 blue genes in the null model? P(σ t >=4) Probability k

Enrichment Analysis ClassA ClassB Genes ranked by expression correlation to Class A Cutoff Biological function?

Enrichment Analysis ClassA ClassB Genes ranked by expression correlation to Class A Cutoff Biological function? 2 / 10 Function 1 (e.g., metabolism) 5 / 11 Function 2 (e.g., signaling) 3 / 10 Function 3 (e.g., regulation)

 After correcting for multiple hypotheses testing, no individual gene may meet the threshold due to noise.  Alternatively, one may be left with a long list of significant genes without any unifying biological theme.  The cutoff value is often arbitrary!  We are really examining only a handful of genes, totally ignoring much of the data Problems with cutoff-based analysis

 MIT, Broad Institute  V 2.0 available since Jan 2007 Gene Set Enrichment Analysis (Subramanian et al. PNAS )

 Calculates a score for the enrichment of a entire set of genes rather than single genes!  Does not require setting a cutoff!  Identifies the set of relevant genes as part of the analysis!  Provides a more robust statistical framework! GSEA key features

Gene Set Enrichment Analysis ClassA ClassB Genes ranked by expression correlation to Class A Cutoff Biological function? 2 / 10 5 / 11 3 / 10 Function 1 (e.g., metabolism) Function 2 (e.g., signaling) Function 3 (e.g., regulation)

Gene Set Enrichment Analysis ClassA ClassB Genes ranked by expression correlation to Class A Running sum: Increase when gene is in set Decrease otherwise Function 1 (e.g., metabolism) Function 2 (e.g., signaling) Function 3 (e.g., regulation)

Gene Set Enrichment Analysis What would you expect if the hits were randomly distributed? What would you expect if most of the hits cluster at the top of the list?

Gene Set Enrichment Analysis Genes within functional set (hits) Running sum Enrichment score (ES) = max deviation from 0 Leading Edge genes

Gene Set Enrichment Analysis Low ES (evenly distributed) ES = 0.43ES = -0.45

Ducray et al. Molecular Cancer :41 Gene Set Enrichment Analysis

1.Calculation of an enrichment score (ES) for each functional category 2.Estimation of significance level of the ES  An empirical permutation test  Phenotype labels are shuffled and the ES for this functional set is recomputed. Repeat 1000 times.  Generating a null distribution 3.Adjustment for multiple hypotheses testing  Necessary if comparing multiple gene sets (i.e.,functions)  Computes FDR (false discovery rate) GSEA Steps