Charlie Whittaker – BIG meeting 12/3/14

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Supplementary Fig. 1. Transcriptome analysis of MENX-associated pituitary adenomas and and comparison with human studies. Control samples from wild-type.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis (GSEA)
Timothy H. W. Chan, Calum MacAulay, Wan Lam, Stephen Lam, Kim Lonergan, Steven Jones, Marco Marra, Raymond T. Ng Department of Computer Science, University.
RNA-seq analysis case study Anne de Jong 2015
Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –FARO compendium – Yeast Cell Cycle –Yeast Rosetta Find one yourself.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Supplementary Material Supplementary Tables Supplementary Table 1. Sequencing statistics for ChIP-seq samples. Supplementary Table 2. Pearson correlation.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
RNAseq analyses -- methods
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
GSEA Overview -- Workflow GSEA is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant.
Course on Functional Analysis
BIOS6660 shRNAseq Gene Set Enrichment Analysis Tzu L Phang PhD Robert Stearman PhD April 16, 2014.
Developed at the Broad Institute of MIT and Harvard Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, and Mesirov JP. GenePattern 2.0. Nature Genetics 38.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Cluster validation Integration ICES Bioinformatics.
The Broad Institute of MIT and Harvard Differential Analysis.
Anthony Gitter Cancer Bioinformatics (BMI 826/CS 838) May 5, 2015
No reference available
CBioPortal Web resource for exploring, visualizing, and analyzing multidimentional cancer genomics data.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Gene Set Enrichment Analysis. GSEA: Key Features Ranks all genes on array based on their differential expression Identifies gene sets whose member genes.
High-throughput genomic profiling of tumor-infiltrating leukocytes
Clustering Manpreet S. Katari.
Ashwani Kumar and Tiratha Raj Singh*
Volume 44, Issue 1, Pages (January 2016)
Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing  Graham Heimberg, Rajat.
Kenneth G. Geles, Wenyan Zhong, Siobhan K
Loyola Marymount University
Volume 17, Issue 1, Pages (January 2010)
Mutant IDH1 Promotes Glioma Formation In Vivo
Volume 60, Issue 2, Pages (February 2014)
Adrien Le Thomas, Georgi K. Marinov, Alexei A. Aravin  Cell Reports 
Volume 4, Issue 3, Pages (August 2013)
ADAGE model example. ADAGE model example. For one sample in the expression compendium (one column in the figure with red or green colors, representing.
Volume 24, Issue 4, Pages (July 2018)
Volume 12, Issue 6, Pages (December 2003)
Volume 24, Issue 12, Pages e5 (September 2018)
Changes to the growth conditions break the circuit by changing host gene expression Changes to the growth conditions break the circuit by changing host.
Volume 63, Issue 4, Pages (August 2016)
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Differential binding of H3K36me3 in G34-mutant KNS42 cells drives pediatric GBM expression signatures. Differential binding of H3K36me3 in G34-mutant KNS42.
Loyola Marymount University
Volume 1, Issue 1, Pages (July 2015)
Cancer Cell Line Encyclopedia
Loyola Marymount University
Extended analysis of differential expression datasets.
Loyola Marymount University
Distinct molecular and clinical correlates of H3F3A mutation subgroups
Transcripts enriched and depleted in NB TICs compared with SKPs and other tumor tissues. Transcripts enriched and depleted in NB TICs compared with SKPs.
CREBBP loss-of-function results in gene expression repression signature. CREBBP loss-of-function results in gene expression repression signature. A–D,
Distinct subtypes of CAFs are detected in human PDAC
EZH2-driven lung cancer as a molecularly distinct entity.
Highly metastatic PDAC cells have a unique gene signature, which is not preserved in metastases but predicts poor patient outcome. Highly metastatic PDAC.
Volume 28, Issue 4, Pages e6 (July 2019)
Characteristic gene expression patterns distinguish LCH cells from other immune cells present in LCH lesions. Characteristic gene expression patterns distinguish.
Presentation transcript:

Charlie Whittaker – BIG meeting 12/3/14 ssGSEA Charlie Whittaker – BIG meeting 12/3/14 From documentation: Where GSEA generates a gene set’s enrichment score with respect to phenotypic differences across a collection of samples within a dataset, ssGSEA calculates a separate enrichment score for each pairing of sample and gene set, independent of phenotype labeling. In this manner, ssGSEA transforms a single sample's gene expression profile to a gene set enrichment profile. A gene set's enrichment score represents the activity level of the biological process in which the gene set's members are coordinately up- or down-regulated. This transformation allows researchers to characterize cell state in terms of the activity levels of biological processes and pathways rather than through the expression levels of individual genes. ssGSEA projection transforms the data to a higher-level (pathways instead of genes) space representing a more biologically interpretable set of features on which analytic methods can be applied. Barbie et al., 2009 and Verhaak et al., 2010 are the references. There is no publication devoted to the tool because reviewers felt it was too closely related to GSEA. Very useful when you lack phenotypic contrast (Barbie and Verhaak examples), when you wish to compare results from multiple contrasts (example 1) or in extremely complex experiments (example 2)

Gene Set – Remaining Genes ssGSEA – from Barbie et al., 2009 The ‘single sample’ extension of GSEA7 allows one to define an enrichment score that represents the degree of absolute enrichment of a gene set in each sample within a given data set. The gene expression values for a given sample were rank-normalized, and an enrichment score was produced using the Empirical Cumulative Distribution Functions (ECDF) of the genes in the signature and the remaining genes. This procedure is similar to GSEA but the list is ranked by absolute expression (in one sample). The enrichment score is obtained by an integration of the difference between the ECDF. Gene Set – Remaining Genes As you progress along the rank ordered list of genes, the algorithm looks for a difference in encountering the genes in the gene set compared to the non-gene set genes. If the gene set genes are encountered relatively early in the list the ES is negative, late in the list the ES is positive and encountered at roughly the same rate as the non-gene set genes the ES is near 0.

Input is a gct file of expression data and a gm[xt] file of gene sets. Running from GenePattern http://genepattern.broadinstitute.org/gp/pages/index.jsf Module and Documentation are here: http://genepattern.broadinstitute.org/gp/pages/index.jsf?lsid=urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00270:5 http://www.broadinstitute.org/cancer/software/genepattern/modules/docs/ssGSEAProjection/5 Running from R Download from GenePattern by selecting Export from ssGSEA module page: http://genepattern.broadinstitute.org/gp/pages/index.jsf?lsid=urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00270:5 Set up working directory, source relevant files and execute ssGSEA: http://rowley.mit.edu/caw_web/ssGSEAProjection/run_ssGSEA.r setwd("Z:/charliew/caw_web/ssGSEAProjection") source('Z:/charliew/caw_web/ssGSEAProjection/common.R') source('Z:/charliew/caw_web/ssGSEAProjection/ssGSEAProjection.R') source('Z:/charliew/caw_web/ssGSEAProjection/ssGSEAProjection.Library.R') ssGSEA.project.dataset(javaexec = "ssgseaprojection.jar", jardir = getwd(), input.ds = "testSet_rand1200.gct", output.ds = "test", gene.sets.dbfile.list = "randomSets.gmx") Output is gct file with one row per geneset and a columns for each sample. Projected data can be visualized and analyzed in the same way as gene expression data.

X2 Y1 Up In Y * X3 Y1 Up In X

1200 randomly selected genes 5 random gene sets 6 gene sets randomly selected from 6 different levels of expression. All gene sets consist of about 50 genes Level 2 Level 6 Level 12 rand 4

Gene Set Sizes and Enrichment Scores Size of Gene Set

Barbie et al., 2009 Fig 3: b, RAS signatures in mutant KRAS lung adenocarcinomas correlate with NF-κB but not IRF3 signatures (red denotes activation, blue denotes inactivation). c, RAS and NF-κB signature expression in wild-type KRAS lung adenocarcinomas and normal lung tissue. No phenotype contrast and downstream manipulation of projection results.

Verhaak et al., 2010 Gene expression signatures of different GBM subtypes were identified and validated. ssGSEA used to compare these signatures to gene expression profiles from normal cells. Figure 4. Single Sample GSEA Scores of GBM Subtypes Show a Relationship to Specific Cell Types Gene expression signatures of oligodendrocytes, astrocytes, neurons, and cultured astroglial cells were generated from murine brain cell types (Cahoy et al., 2008). Single sample GSEA was used to project the four gene sets on samples on the Proneural, Classical, Neural, and Mesenchymal subtypes. A positive enrichment score indicates a positive correlation between genes in the gene set and the tumor sample expression profile; a negative enrichment score indicates the reverse. Also see Figure S6 (shows histological data). No phenotype contrast, cross-species analysis.

ssGSEA and multiple GSEA contrasts. Enrichment of gene set in treatment “R” supports a working hypothesis B - 0.94 R - 1.23 M - 1.42 NES work – Treatment vs Control structure is available B – 0.94 R – 1.23 M – 1.42 Row-centered ssGSEA Projections Visualize replicates and controls

ssGSEA facilitates analysis of high complexity experiments 5 strains derived from 3 different organisms. 3 genome sequences – 2 closely related, one more distant. Variant analysis between close relatives. RNAseq data for 16 culture conditions 16 relevant intra-organism comparisons Many inter-organism comparisons 3 replicates of each condition 47 pathways or gene sets of critical interest

ssGSEA and Functional Analysis - Gene Sets and Strans

ssGSEA and Differential Expression Analysis (Jie) 48 gene expression samples (for each strain) 146 gene sets @ LogFC1, 0.05FDR – 16 comparisons, 5 strains, up+down A/B_0/6_G upInB6G

ssGSEA and pathway analysis ~35 non-synonymous point mutants detected between 2 strains (Duan) Are pathways surrounding these genes transcriptionally altered?

PDR16 pathway analysis Strain A Strain B An assembly issue results in multiple copies of PDR16 in one strain but not the other. Differences in expression are caused by low mapping quality of PDR16 reads in one strain.