Working with enriched gene sets in R Peter Svensson Micheline Giphart-Gassler Harry Vrieling.

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
Gene Set Enrichment Analysis (GSEA)
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Introduction to Microarry Data Analysis - II BMI 730
Transcriptome Sequencing with Reference
Group testing: global tests Ulrich Mansmann Department of Medical Biometrics and Informatics University of Heidelberg.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Microarray Data Preprocessing and Clustering Analysis
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Pathway Analysis Michael Sneddon Southern California Bioinformatics Institute August 20, 2004.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Babelomics Functional interpretation of genome-scale experiments Barcelona, 28 November de 2007 Ignacio Medina David Montaner
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Pathway analysis using BioConductor The global test revisited.
Density vs Hot Spot Analysis. Density Density analysis takes known quantities of some phenomenon and spreads them across the landscape based on the quantity.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
1Module 2: Analyzing Gene Lists Canadian Bioinformatics Workshops
Differential Analysis & FDR Correction
GO::TermFinder Gavin Sherlock Department of Genetics Stanford University
1 Identifying differentially expressed sets of genes in microarray experiments Lecture 23, Statistics 246, April 15, 2004.
Frédéric Schütz Statistics and bioinformatics applied to –omics technologies Part II: Integrating biological knowledge Center.
Gene Set Enrichment Analysis (GSEA)
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Course on Functional Analysis
Carlo Colantuoni – Summer Inst. Of Epidemiology and Biostatistics, 2009: Gene Expression Data Analysis 8:30am-12:00pm in Room W2017.
Bioconductor Course in Practical Microarray Analysis Heidelberg, 8 Oct 2003 Slides ©2002 Sandrine Dudoit, Robert Gentleman. Adapted by Wolfgang Huber.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
BIOS6660 shRNAseq Gene Set Enrichment Analysis Tzu L Phang PhD Robert Stearman PhD April 16, 2014.
Top X interactions of PIN Network A interactions Coverage of Network A Figure S1 - Network A interactions are distributed evenly across the top 60,000.
Bioinformatics, Erasmus MC Pathway Analysis Karl Brand, June 2012.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD.
GO enrichment and GOrilla
Copyright OpenHelix. No use or reproduction without express written consent1.
CGH Data BIOS Chromosome Re-arrangements.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Gene Set Enrichment Analysis. GSEA: Key Features Ranks all genes on array based on their differential expression Identifies gene sets whose member genes.
Canadian Bioinformatics Workshops
Module 2: Analyzing gene lists: over-representation analysis
::: Schedule. Biological (Functional) Databases
The mRNA stem cell signature.
Q-Q plot of observed P values against theoretical P values for factor analysis (red dots) and single gene–based methods (in blue). Q-Q plot of observed.
Anastasia Baryshnikova  Cell Systems 
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Varying Intolerance of Gene Pathways to Mutational Classes Explain Genetic Convergence across Neuropsychiatric Disorders  Shahar Shohat, Eyal Ben-David,
Clusters from the functional network
Enrichment of microbial functions with elevated triclosan.
Expression profiles of 5,493 transcripts grouped by k-means clustering
Statistics of cleavage sites and mutant-enriched sites.
Identification of aging-related genes and affected biological processes. Identification of aging-related genes and affected biological processes. (A) Experimental.
Gene P-values (−log10) of significant genes at the 1% nominal level in Europe (red bars), in Asia (blue bars,) or both continents (black bars) from the.
Stage-specific expression modules of preimplantation development.
EN1 expression in breast cancer and clinical outcome.
Gene expression profiles of T cells.
Global analysis of the chemical–genetic interaction map.
An isogenic cell line screen reveals genomic drivers of drug response.
Presentation transcript:

Working with enriched gene sets in R Peter Svensson Micheline Giphart-Gassler Harry Vrieling

P-values of genes Starting with a vector of p-values from –t.test(irradiated, control) –wilcoxon(irradiated, control) –lm(formula, data)

Distribution of p-values two-tailed

Distribution of p-values one-tailed

Distribution of p-values Proportion of unchanged genes, π 0 library(qvalue) (Storey&Tibshirani 2001) qvalue(pvals)$pi0

Annotation Anntotation of the genes available from Bioconductor –MetaData for commercial arrays –AnnBuilder for home- made –Unigene name, code, symbol, entrez gene, GO terms, KEGG pathways, Pubmed ids...

Gene Set Enrichment Analysis Mootha et al, Nat Genet. 2003, 34:267 Use the gene sets that are made by GO terms, KEGG terms, name containing ’kinase’, genes that cluster together Make a vector of –all not in group -sqrt(G/(N-G)) –all in group sqrt(N-G/G)

Running sum The sum of the values in vector will be 0 Plot the running sum: The peak is at a point at p=0.1

GSEA The enrichment score can be used to determine the importance of gene set. Permutation technique to get significance.

Hypergeometric probability Used in dChip and DAVID. Input is –# genes in the gene set (n), # genes on array (n+m) –# selected genes in the gene set (x), # selected genes (N) dhyper() gives the density

Selecting genes Have to set a threshold, p0, for the p-values. p < p0 selected p0 = is not informative p0 = 0.1 at the maximum of the peak dissect(pvals) –(BMC Bioinformatics, to appear)

Will get a p-value Tested 4000 GO terms, need for correction for multiple testing p.adjust(pvals,”fdr”) Look at significant terms, p<0.001

Cisplatin data Mouse embryonic stem cells exposed to various doses (low, medium and high). Harvested at 0<t<24 Low doses, early time points –Few genes changed –Few pathways changed Indications of what will come

Preprocessing For internal use at Not updated Code for working with widgets, definining MIAME-compliant object, AffyBatch (exprSet), doing tests, building linear models, correlation tests, GSEA Updating together with Agata Meglicz. It will be improved soon.

Demonstration cdf=“hgu133a” source(“gsea.R”) gsea() dissectGUI()