Microarray Data Analysis - A Brief Overview R Group Rongkun Shen 2008-02-11.

Slides:



Advertisements
Similar presentations
Differential Expression Analysis Introduction to Systems Biology Course Chris Plaisier Institute for Systems Biology.
Advertisements

Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro Tom Khabaza Sridhar Ramaswamy Presented briefly by Joey.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Introduction to Microarry Data Analysis - II BMI 730
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Data Preprocessing and Clustering Analysis
Differentially expressed genes
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Statistical Analysis of Microarray Data
Introduction to Microarry Data Analysis - II BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
1 Test of significance for small samples Javier Cabrera.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Analysis Jesse Mecham CS 601R. Microarray Analysis It all comes down to Experimental Design Experimental Design Preprocessing Preprocessing.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Generate Affy.dat file Hyb. cRNA Hybridize to Affy arrays Output as Affy.chp file Text Self Organized Maps (SOMs) Functional annotation Pathway assignment.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Microarray Preprocessing
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Gene expression profiling identifies molecular subtypes of gliomas
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
An Introduction to Bioconductor Bethany Wolf Statistical Computing I April 9, 2014.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Essential Statistics in Biology: Getting the Numbers Right
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Lab 4 R and Bioconductor II Feb 15, 2012 Alejandro Quiroz and Daniel Fernandez
Panu Somervuo, March 19, cDNA microarrays.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
RNAseq analyses -- methods
Agenda Introduction to microarrays
Microarray - Leukemia vs. normal GeneChip System.
Course on Functional Analysis
Carlo Colantuoni – Summer Inst. Of Epidemiology and Biostatistics, 2009: Gene Expression Data Analysis 8:30am-12:00pm in Room W2017.
CS 5263 Bioinformatics Lecture 23 Microarray Data Analysis.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
The Broad Institute of MIT and Harvard Differential Analysis.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
基于 R/Bioconductor 进行生物芯片数据分析 曹宗富 博奥生物有限公司
Differential Gene Expression
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Course: Statistics in Bioinformatics Date: 指導教授: 陳光琦 學生: 吳昱賢
Statistics for genomics
Presentation transcript:

Microarray Data Analysis - A Brief Overview R Group Rongkun Shen

R R is an environment and a computer programming language R is free, open-source, and runs on UNIX/Linux, Windows and Mac R language has a powerful, easy-to-learn syntax with many built-in statistical functions R has excellent built-in help system R has excellent graphing capabilities R has many user-written packages, e.g. BioC

Affymetrix microarray data analysis -- a simple example > library(gcrma) > data.rma = justRMA() Background correcting Normalizing Calculating Expression > head(exprs(data.rma) # view expression > write.table(exprs(data.rma), file="data.rma.txt", sep='\t') # output to file

Overview Introduction to R and Bioconductor Affymetrix microarray preprocessing and quality assessment Differential expression Machine learning Gene set enrichment analysis

Intro to R Atomic data types: –Numeric – 1, -2, 3, , 1.2e-10, etc –Character – ‘AbczyZ’, ‘256’, ‘y8e3.!$^*&’, etc –Complex – 1.2+3i –Logical – TRUE, FALSE

Intro to R – cont’d Data Structures –vector - arrays of the same type –list - can contain objects of different types –environment - hashtable –data.frame - table-like –factor - categorical –classes - arbitrary record type –function

Intro to R – cont’d Matrix – 2-D array Array – multi-D [vector is 1-D array] Subsetting –vector, list, matrix, array Packages – such as Bioconductor

Intro to R – cont’d Get help > ?plot > help.search(“wilcoxon”) Graph > plot(1:10) Write a function > x.sqr = function (x) { x*x } > x.sqr(2) [1] 4

Overview Introduction to R and Bioconductor Affymetrix microarray preprocessing and quality assessment Differential expression Machine learning Gene set enrichment analysis

Affymetrix Microarray Preprocessing and Quality Assessment Affymetrix Microarray Technology Quality Assessment and Quality Control Preprocessing –Background correction –Normalization –Summary

DNA microarrays

The experimental process involved in using a DNA microarray

Affy – cont’d How to check individual array quality? –image plot

Affy – cont’d Histogram: examine probe intensity behavior between arrays > affy.data <- ReadAffy() > hist(affy.data)

Affy – cont’d Boxplot: identify differences in the level of raw probe-intensities > boxplot(affy.data)

Affy – cont’d Background adjustment –RMA –gcRMA –MAS 5.0 Normalization –RMA –gcRMA –vsn (Variance Stabilizing Normalization)

Affy – cont’d Summarization –RMA –gcRMA –expresso Which method is better? affycomp –

Affymetrix microarray data analysis -- a gcRMA example > library(affy) > library(gcrma) > affy.data = ReadAffy() > data.gcrma = gcrma(affy.data) > head(exprs(data.gcrma)) # view expression > write.table (exprs(data.gcrma), file="data.gcrma.txt", sep='\t') # output to file

Affymetrix microarray data analysis -- a gcRMA example Os2a-1.CEL Os2a-2.CEL Os2a-3.CEL AFFX-BioB-3_at AFFX-BioB-5_at AFFX-BioB-M_at AFFX-BioC-3_at AFFX-BioC-5_at AFFX-BioDn-3_at

Overview Introduction to R and Bioconductor Affymetrix microarray preprocessing and quality assessment Differential expression Machine learning Gene set enrichment analysis

Differential Expression Goal – find statistically significant associations of biological conditions or phenotypes with gene expression Gene-by-gene approach Fold change vs. p-value

DE – cont’d Gene by gene tests –t-test > t.test(x) –Wilcoxon test > wilcox.test(x,…) –paired t-test > pairwise.t.test(x,…) –F-test (ANOVA) > library(limma)

p-value adjustment/correction > ?p.adjust "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" FDR (false discovery rate): ROC curve analysis –TP rate vs. FP rate DE – cont’d

Data reduction –Genes unexpressed should be filtered –Genes with unchanged expression levels across conditions Top 5 genes? –Select according to p-values DE – cont’d

Overview Introduction to R and Bioconductor Affymetrix microarray preprocessing and quality assessment Differential expression Machine learning Gene set enrichment analysis

Machine Learning Supervised Learning –classification Unsupervised Learning –clustering –class discovery

ML – cont’d Features: pick variables or attributes Distance: choose method to decide whether 2 samples are similar or different Model: how to cluster or classify –kNN, neural nets, hierarchical clustering, HMM

ML – cont’d Get to know your data Measure the distance –Phenotype –Time course –Transcription factors

ML – cont’d Cross-validation –Make use of all the data without bias –Leave-one-out CV

Overview Introduction to R and Bioconductor Affymetrix mciroarray preprocessing and quality assessment Differential expression Machine learning Gene set enrichment analysis

Gene Set Enrichment Analysis Which of 1000’s of probes are differentially expressed? Interested in genes in a pathway or other biological process?

GSEA cont’d Overall approach –Identify a priori biologically interesting sets KEGG or GO pathways –Preprocessing and quality assessment as usual –Non-specific filtering Remove probes with no KEGG or GO annotations

GSEA cont’d Overall approach –Compute a test statistic (e.g., t-test) for each probe –Calculate the average of the test statistic (z k ) in each set –Compare to Normal distribution of z k across sets > qqnorm(z.k)

R/BioC Workshop Fred Hutchinson Cancer Research Center Seattle, WA For more details, visit

Sequencing vs. Microarray Will sequencing replace microarray?

Acknowledgement Todd Mockler Robert Gentleman (Hutch) Martin Morgan (Hutch) Peter Dolan Brian Knaus Yi Cao (Hutch)