Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics

Slides:



Advertisements
Similar presentations
Experiment Design for Affymetrix Microarray.
Advertisements

Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Getting the numbers comparable
DNA microarray and array data analysis
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Microarray analysis Golan Yona ( original version by David Lin )
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Introduce to Microarray
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Analysis of microarray data
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Gene expression profiling identifies molecular subtypes of gliomas
Whole Genome Expression Analysis
Classification (Supervised Clustering) Naomi Altman Nov '06.
Gene Expression Profiling Illustrated Using BRB-ArrayTools.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
More on Microarrays Chitta Baral Arizona State University.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Analysis of Microarray Data Analysis of images Preprocessing of gene expression data Normalization of data –Subtraction of Background Noise –Global/local.
Agenda Introduction to microarrays
Microarray - Leukemia vs. normal GeneChip System.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Scenario 6 Distinguishing different types of leukemia to target treatment.
CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
Introduction to DNA microarray technologies Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Lecture 7. Functional Genomics: Gene Expression Profiling using
Statistics for Differential Expression Naomi Altman Oct. 06.
Whole Genome Approaches to Cancer 1. What other tumor is a given rare tumor most like? 2. Is tumor X likely to respond to drug Y?
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Analyzing Expression Data: Clustering and Stats Chapter 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Microarray Data Analysis The Bioinformatics side of the bench.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Introduction to Oligonucleotide Microarray Technology
Microarray: An Introduction
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Microarray - Leukemia vs. normal GeneChip System.
Microarray Technology and Applications
REMOTE SENSING Multispectral Image Classification
Getting the numbers comparable
Dimension reduction : PCA and Clustering
Normalization for cDNA Microarray Data
Data Type 1: Microarrays
Presentation transcript:

Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics

Microarrays A snapshot that captures the activity pattern of thousands of genes at once. Custom spotted arrays Affymetrix GeneChip

Spotted Microarray Process CTRL TEST

Affymetrix GeneChip® Probe Arrays 24µm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe Image of Hybridized Probe Array Over 250,000 different probes complementary to genetic information of interest Single stranded, fluorescently labeled DNA target Oligonucleotide probe * * * * * 1.28cm GeneChip Probe Array Hybridized Probe Cell BGT108_DukeUniv

Applications of microarrays Cancer research: Molecular characterization of tumors on a genomic scale; more reliable diagnosis and effective treatment of cancer Immunology: Study of host genomic responses to bacterial infections Model organisms: Multifactorial experiments monitoring expression response to different treatments and doses, over time or in different cell types etc.

Applications of Microarrays Compare mRNA transcript levels in different type of cells, i.e., vary –Tissue (liver vs. brain); –Treatment (Drugs A, B, and C); –State (tumor vs. normal); –Organism (yeast, different strains); –Timepoint; –etc.

Affymetrix Design PM MM GCGCCGGCTGCAGGAGCAGGAGGAG GCGCCGGCTGCACGAGCAGGAGGAG 11 – 20 Probe Pairs interrogate each gene

Image Analysis: Pixel Level Data 6 x 6 matrix of pixels for each PM and MM probe HG-U133A GeneChip

Expression Quantification PM MM GCGCCGGCTGCAGGAGCAGGAGGAG GCGCCGGCTGCACGAGCAGGAGGAG PM and MM intensities are combined to form an expression measure for the probe set (gene)

Expression Quantification Initially, Affymetrix signal was calculated as where j indexes the probe pairs for each probe set A. This is known as the “Average Difference” method. Problems: –Large variability in PM-MM –MM probes may be measuring signal for another gene/EST –PM-MM calculations are sometimes negative

Expression Quantification The mean of a random variable X is a measure of central location of the density of X. The variance of a random variable is a measure of spread or dispersion of the density of X. Var(X)=E[(X-  ) 2 ] =E(X 2 ) -  2 Standard deviation = = 

Expression Quantification Illustration: Average Difference.xls

Sources of Obscuring Variation in Microarray Measurements Sample handling (degree of physical manipulation, time from extripation to freezing) Microarray manufacture Sample processing (extraction procedure, RNA integrity & purity, RNA labeling) Processing differences (hybridization chambers, washing modules, scanners) Personnel differences Random differences in signal intensity in a data set which co vary with the biological process

Normalization The purpose of normalization is to remove experimental artifacts of no direct interest, that is, the removal of systematic effects other than differential expression. Normalization procedures often include –background subtraction, –detection of outliers, –and removal of variation due to differences in sample preparation, array differences, differences in dye labeling efficiencies, and scanning differences.

16 Replicate HG-133A GeneChips, Before normalization

16 Replicate HG-133A GeneChips, After normalization

Taxonomy of Microarray Data Analysis Methods Unsupervised Learning: The statistical analysis seeks to find structure in the data without knowledge of class labels. Supervised Learning: Class or group labels are known a priori and the goal of the statistical analysis pertains to identifying differentially expressed genes (AKA feature selection) or identifying combinations of genes that are predictive of class or group membership.

Unsupervised Learning Unsupervised learning or clustering involves the aggregation of samples into groups based on similarity of their respective expression patterns without knowledge of class labels. Examples of Unsupervised Learning methods include –Hierarchical clustering –k-means –k-medoids –Self Organizing Maps –Principal Components –Multidimensional Scaling

Supervised Learning Example methods for Class comparison/ Feature selection include –T-test / Wilcoxon rank sum test –F-test / Kruskal Wallis test –etc. Example methods for Class Prediction include –Weighted voting –K nearest neighbors –Compound Covariate Predictors –Classification trees –Support vector machines –etc.

Supervised Learning: Class Prediction Risk of over-fitting the data: may have a perfect discriminator for the data set at hand but the same model may perform poorly on independent data sets. Most prediction methods are intended for large ‘n’ (samples) small ‘p’ (covariates) datasets. Process is to –Fit model –Check model adequacy –Make an inference

Class Prediction: Checking model Adequacy Regardless of algorithm used, it is essential that once the prediction rule has been defined, an unbiased estimate of the true error rate must be calculated.

Class Prediction: Checking Model Adequacy In a data rich situation, – randomly divide the dataset into two parts, representing a training and test dataset. –Build the prediction algorithm using the training dataset –Once a final model has been developed, the prediction rule is applied to the test dataset to estimate the misclassification error

Class Prediction: Checking Model Adequacy For small sample sizes, withholding a large portion of the data for validation purposes may limit the ability of developing a prediction rule. Therefore, use cross-validation techniques to assess the error.

Class Prediction: Checking Model Adequacy K-fold cross-validation requires one to randomly split the dataset into K equally sized groups. Thereafter, the model is fit to K-1 parts of the data and the generalization error is calculated using the Kth remaining part of the data. This procedure is repeated so that the generalization error is estimated for each of the K parts of the data, providing an overall estimate of the generalization error and its associated standard error.

Class Prediction: Checking Model Adequacy Leave out data in group 3 Fit the model to the data in groups 1 – 2, 4 – 10 (learning dataset) Calculate the error using observations in group 3 as the test dataset Do this for each of the 10 partitions