GO::TermFinder Gavin Sherlock Department of Genetics Stanford University

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Genetic Statistics Lectures (5) Multiple testing correction and population structure correction.
1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Fundamentals of Forensic DNA Typing Slides prepared by John M. Butler June 2009 Appendix 3 Probability and Statistics.
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Gene Expression Data Analyses (3)
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
Differentially expressed genes
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Inferences About Process Quality
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
1Module 2: Analyzing Gene Lists Canadian Bioinformatics Workshops
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Multiple testing in high- throughput biology Petter Mostad.
Confidence Intervals and Hypothesis Testing - II
Hypothesis testing. Want to know something about a population Take a sample from that population Measure the sample What would you expect the sample to.
Fundamentals of Hypothesis Testing: One-Sample Tests
1 Identifying differentially expressed sets of genes in microarray experiments Lecture 23, Statistics 246, April 15, 2004.
Gene Set Enrichment Analysis (GSEA)
Essential Statistics in Biology: Getting the Numbers Right
Comparing Two Proportions
Suppose we have analyzed total of N genes, n of which turned out to be differentially expressed/co-expressed (experimentally identified - call them significant)
Inferring Function From Known Genes Naomi Altman Nov. 06.
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Statistical Significance The power of ALPHA. “ Significant ” in the statistical sense does not mean “ important. ” It means simply “ not likely to happen.
Hypothesis Testing. Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly.
Statistical Testing with Genes Saurabh Sinha CS 466.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
ANOVA P OST ANOVA TEST 541 PHL By… Asma Al-Oneazi Supervised by… Dr. Amal Fatani King Saud University Pharmacy College Pharmacology Department.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Modeling Promoter and Untranslated Regions in Yeast Abstract T ranscriptional regulation is the primary form of gene regulation in eukaryotes. Approaches.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
The Broad Institute of MIT and Harvard Differential Analysis.
GO enrichment and GOrilla
Chapter 22 Comparing Two Proportions.  Comparisons between two percentages are much more common than questions about isolated percentages.  We often.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
The Birthday Problem. The Problem In a group of 50 students, what is the probability that at least two students share the same birthday?
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Statistics 22 Comparing Two Proportions. Comparisons between two percentages are much more common than questions about isolated percentages. And they.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Canadian Bioinformatics Workshops
Module 2: Analyzing gene lists: over-representation analysis
Statistical Testing with Genes
Subspace Clustering for Microarray Data Analysis:
I. Statistical Tests: Why do we use them? What do they involve?
Statistical Testing with Genes
Identification of aging-related genes and affected biological processes. Identification of aging-related genes and affected biological processes. (A) Experimental.
Presentation transcript:

GO::TermFinder Gavin Sherlock Department of Genetics Stanford University

GO::TermFinder includes: A way to determine statistically significant GO terms shared by a set of genes A module to visualize the results A set of software modules to access GO information

Inspiration… Tavazoie et al, 1999, used the hypergeometric distribution to determine enrichment of MIPS categories in clusters of cell cycle regulated genes.

Hypergeometric Distribution: We can calculate the probability of observing x of n events as having a particular property, given that in the general population, M of N things have that property, using the hypergeometric distribution, as:

Where, generically which is the number of permutations by which r ‘things’ can be chosen from a set of n ‘things’.

Calculating a P-value To calculate a P-value, we calculate the probability of having at least x of n events:

Translating this to GO Many analyses result in a list of interesting genes Typically biologists can make up a story about any random list Look at all GO annotations for the genes in a list, and see if the number of annotations for any is significant

Multiple hypothesis correction If we choose a P-value cutoff of 0.05, we have a 1 in 20 chance of falsely picking something as significant that is not. If we test multiple hypotheses (GO nodes), each one has a 1 in 20 chance of being wrong. Thus if we test 10 nodes, we have a 0.4 chance of falsely picking one as significant.

Multiple hypothesis correction (continued) Correct for multiple hypotheses to keep the overall chance of picking a false positive at 1 in 20. Bonferroni correction simply divides the alpha value by the number of hypotheses - assumes independence, which is not the case for our GO nodes.

Correction (continued) Use simulation to determine a p-value. Turns out Bonferroni is not conservative enough Why? Should all nodes be corrected equally?

Using GO::TermFinder to look at microarray data Our general assumption is guilt by association: i.e. genes with similar expression patterns are more likely to participate in the same biological process. So let’s take this assumption and exploit the Gene Ontology to examine our expression clusters:

YPL250C MET11 MXR1 MET17* SAM3 MET28 STR3 MMP1 MET1 YIL074C MHT1 MET14 MET16 MET3 MET10 ECM17 MET2* MUP1 MET17 MET6 YPL250C MET11 YER042W YLR302C YPL274W MET28 YGL184C YLL061W MET1 YIL074C YLL062C MET14 MET16 MET3 MET10 ECM17 YNL276C MUP1 MET17 MET6 Methionine Cluster

Visualization module Developed by SGD Takes output from GO::TermFinder Uses Perl interface to AT & T GraphViz tool for graph layout

GO Annotations sulfur metabolism : 1.77e-26 (13/19 vs 33/6911) methionine metabolism: 8.08e-19 (9/19 vs 19/6911)

The API

Included example tools ancestors.pl children.pl termFinderClient.pl analyze.pl batchGOView.pl

Acknowledgements Ellie Boyle Shuai Weng Jeremy Gollub Heng Jin Mike Cherry David Botstein

GO::TermFinder URL Full source code available under the MIT license from: