Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Shibing Deng Pfizer, Inc. Efficient Outlier Identification in Lung Cancer Study.
Charlie Whittaker – BIG meeting 12/3/14
1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis (GSEA)
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Introduction to Microarry Data Analysis - II BMI 730
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Differentially expressed genes
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
. Differentially Expressed Genes, Class Discovery & Classification.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran, Chris Thomas Yan Gary Chris Leon.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
1Module 2: Analyzing Gene Lists Canadian Bioinformatics Workshops
Multiple testing in high- throughput biology Petter Mostad.
GO::TermFinder Gavin Sherlock Department of Genetics Stanford University
1 Identifying differentially expressed sets of genes in microarray experiments Lecture 23, Statistics 246, April 15, 2004.
Frédéric Schütz Statistics and bioinformatics applied to –omics technologies Part II: Integrating biological knowledge Center.
Gene Set Enrichment Analysis (GSEA)
Essential Statistics in Biology: Getting the Numbers Right
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Differential Gene Expression Dennis Kostka, Christine Steinhoff Slides adapted from Rainer Spang.
GSEA Overview -- Workflow GSEA is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant.
Course on Functional Analysis
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
BIOS6660 shRNAseq Gene Set Enrichment Analysis Tzu L Phang PhD Robert Stearman PhD April 16, 2014.
‘Omics’ - Analysis of high dimensional Data
Integrating Biology and Statistics: Gene Set Methods BIOS Winter/Spring 2010.
Summarizing Differential Expression Using Mann-Whitney U-tests.
Statistical Testing with Genes Saurabh Sinha CS 466.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Cluster validation Integration ICES Bioinformatics.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
The Broad Institute of MIT and Harvard Differential Analysis.
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Fewer permutations, more accurate P-values Theo A. Knijnenburg 1,*, Lodewyk F. A. Wessels 2, Marcel J. T. Reinders 3 and Ilya Shmulevich 1 1Institute for.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
1 A Discussion of False Discovery Rate and the Identification of Differentially Expressed Gene Categories in Microarray Studies Ames, Iowa August 8, 2007.
Gene Set Enrichment Analysis. GSEA: Key Features Ranks all genes on array based on their differential expression Identifies gene sets whose member genes.
Canadian Bioinformatics Workshops
David Amar, Tom Hait, and Ron Shamir
Module 2: Analyzing gene lists: over-representation analysis
::: Schedule. Biological (Functional) Databases
Differential Gene Expression
Statistical Testing with Genes
Canadian Bioinformatics Workshops
Genesets and Enrichment
Functional enrichment of differentially expressed genes.
Varying Intolerance of Gene Pathways to Mutational Classes Explain Genetic Convergence across Neuropsychiatric Disorders  Shahar Shohat, Eyal Ben-David,
Metabolite predictability across metabolic categories (A) and disease states (B) in the vaginal microbiome. Metabolite predictability across metabolic.
Statistical Testing with Genes
Identification of aging-related genes and affected biological processes. Identification of aging-related genes and affected biological processes. (A) Experimental.
Statistical chart of significantly differentially expressed genes
Extended analysis of differential expression datasets.
Distinct molecular and clinical correlates of H3F3A mutation subgroups
Global analysis of the chemical–genetic interaction map.
Transcriptome profiling of PD-L1 antibody–treated macrophages showed inflammatory phenotype, increased survival and proliferation, and decreased apoptosis.
Presentation transcript:

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

 Gene expression profiling  Which molecular processes/functions are involved in a certain phenotype (e.g., disease, stress response, etc.)  The Gene Ontology (GO) Project  Provides shared vocabulary/annotation  Terms are linked in a complex structure  Enrichment analysis:  Find the “most” differentially expressed genes  Identify over-represented annotations  Modified Fisher's exact test A quick review

Enrichment Analysis ClassA ClassB Genes ranked by expression correlation to Class A Cutoff Biological function?

Enrichment Analysis ClassA ClassB Genes ranked by expression correlation to Class A Cutoff Biological function? 2 / 10 Function 1 (e.g., metabolism) 5 / 11 Function 2 (e.g., signaling) 3 / 10 Function 3 (e.g., regulation)

 After correcting for multiple hypotheses testing, no individual gene may meet the threshold due to noise.  Alternatively, one may be left with a long list of significant genes without any unifying biological theme.  The cutoff value is often arbitrary!  We are really examining only a handful of genes, totally ignoring much of the data Problems with cutoff-based analysis

 MIT, Broad Institute  V 2.0 available since Jan 2007 Gene Set Enrichment Analysis (Subramanian et al. PNAS )

 Calculates a score for the enrichment of a entire set of genes rather than single genes!  Does not require setting a cutoff!  Identifies the set of relevant genes as part of the analysis!  Provides a more robust statistical framework! GSEA key features

Gene Set Enrichment Analysis ClassA ClassB Genes ranked by expression correlation to Class A Cutoff Biological function? 2 / 10 5 / 11 3 / 10 Function 1 (e.g., metabolism) Function 2 (e.g., signaling) Function 3 (e.g., regulation)

Gene Set Enrichment Analysis ClassA ClassB Genes ranked by expression correlation to Class A Running sum: Increase when gene is in set Decrease otherwise Function 1 (e.g., metabolism) Function 2 (e.g., signaling) Function 3 (e.g., regulation)

Gene Set Enrichment Analysis What would you expect if the hits were randomly distributed? What would you expect if most of the hits cluster at the top of the list?

Gene Set Enrichment Analysis Genes within functional set (hits) Running sum Enrichment score (ES) = max deviation from 0 Leading Edge genes

Gene Set Enrichment Analysis Low ES (evenly distributed) ES = 0.43ES = -0.45

Ducray et al. Molecular Cancer :41 Gene Set Enrichment Analysis

1.Calculation of an enrichment score (ES) for each functional category 2.Estimation of significance level of the ES  An empirical permutation test  Phenotype labels are shuffled and the ES for this functional set is recomputed. Repeat 1000 times.  Generating a null distribution 3.Adjustment for multiple hypotheses testing  Necessary if comparing multiple gene sets (i.e.,functions)  Computes FDR (false discovery rate) GSEA Steps