M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, 822-828; December 2003 Biostatistics Article Oncology.

Slides:



Advertisements
Similar presentations
Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.
Advertisements

Analysis by design Statistics is involved in the analysis of data generated from an experiment. It is essential to spend time and effort in advance to.
Experiment Design for Affymetrix Microarray.
Optimal designs for one and two-colour microarrays using mixed models
CHAPTER 25: One-Way Analysis of Variance Comparing Several Means
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
1 Introduction to Experimental Design 1/26/2009 Copyright © 2009 Dan Nettleton.
Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
GCB/CIS 535 Microarray Topics John Tobias November 3 rd, 2004.
Copyright © 2010 Pearson Education, Inc. Chapter 24 Comparing Means.
Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Division of Human Cancer Genetics Ohio State University.
Two-Color Microarrays: Reference Designs and Reference RNAs. Kathleen Kerr Department of Biostatistics University of Washington Collaborators: Kyle Serikawa,
Designing Microarray Experiments Naomi Altman Oct. 06.
Experiments and Observational Studies.  A study at a high school in California compared academic performance of music students with that of non-music.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Copyright © 2010 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
Copyright © 2010 Pearson Education, Inc. Slide
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 23, Slide 1 Chapter 23 Comparing Means.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Experiments and Observational Studies.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Slide 13-1 Copyright © 2004 Pearson Education, Inc.
Agenda Introduction to microarrays
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
LT 4.2 Designing Experiments Thanks to James Jaszczak, American Nicaraguan School.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.
Design of microarray gene expression profiling experiments Peter-Bram ’t Hoen.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel.
1 Statistics in Research & Things to Consider for Your Proposal May 2, 2007.
STA 2023 Module 11 Inferences for Two Population Means.
Statistics for Differential Expression Naomi Altman Oct. 06.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Ledolter & Hogg: Applied Statistics Section 6.2: Other Inferences in One-Factor Experiments (ANOVA, continued) 1.
1 Introduction to Mixed Linear Models in Microarray Experiments 2/1/2011 Copyright © 2011 Dan Nettleton.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
CK, October, 2003 A Hidden Markov Model for Microarray Time Course Data Christina Kendziorski and Ming Yuan Department of Biostatistics and Medical Informatics.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means
T tests comparing two means t tests comparing two means.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Statistics 24 Comparing Means. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Experiments Textbook 4.2. Observational Study vs. Experiment Observational Studies observes individuals and measures variables of interest, but does not.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 23 Comparing Means.
Observational Studies and Experiments
Chapter 24 Comparing Means Copyright © 2009 Pearson Education, Inc.
Introduction to Experimental Design
Design Issues Lecture Topic 6.
Presentation transcript:

M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology Journal Club May 28, 2004

A couple introductory points Different kinds of microarrays Two main distinctions One-color (e.g. Affymetrix, long oligo) Two-color (e.g. spotted cDNA) Some of the statistical tools are the same and some are different Using two color arrays is slightly more complicated in terms of design

Statistics and Microarrays Statistical Principles certainly apply to microarray analyses We should be considering some of the same basic tenets when performing microarray studies Randomization Sample size/Replication issues Experimental design Good design is critical to making efficient and valid inferences.

Randomization Might not sound applicable But… If you have a ‘treatment’ you are giving, samples should be randomly assigned to treatment groups Randomize order in which samples are processed Randomize order in which hybridizations are performed Randomize the order in which arrays are chosen from array batch. Example: Dosing study Looking for genetic changes in cells as a function of dose Perform all dose=0 experiments first, then dose=1, then dose=2, etc…. But, as you proceed, you learn more, get better at processing samples, hybridizations, using scanner…. Your results be associated with dose even if dose has no affect on genetic changes: CONFOUNDING!

Sample Size and Replication Three types of ‘replication’ in microarrays A. Spotting genes multiple times on same array B. Hybridizing multiple arrays to the same RNA samples C. Using multiple individuals of a certain type A and B are considered ‘technical’ replicates C describes ‘random sampling’ from the population THESE ARE CRITICALLY DIFFERENT!

Sample Size and Replication Technical replication: DOES NOT address biological variability DOES address measurement error of assay Usually, interested how a condition affects individuals in general NOT usually interested in how a condition affects any given individual Example: AML Do we want to make inferences about differences in gene expression across AML subpopulations? Or, do we want to make inferences about differences in gene expression in two particular AML patients, each of whom has a different type of AML?

Sample Size and Replication Why/When would we be interested in technical replication? Medical diagnosis Need to know how precise the measures are Sensitivity and specificity of the assay depend on that

Sample Size and Replication Biological replicates Tell us about the variability across samples of the same type. Biological variability is critical for finding differences in gene expressions across populations Classification procedures which try to use gene expression patterns that differentiate individuals of different types If you use just one sample or cell line to make inferences about the population of interest You are making a BIG assumption: “Population is relatively homogeneous” Cannot evaluate your assumption based on the data from the study.

Sample Size and Replication For a fixed sample size: It is preferable to sample NEW individuals rather perform technical replicates Why? It is more efficient in terms of variance, power, etc. You gain much less by replicates than new samples But, if it is expensive to sample new individuals Examples: samples are very rare, recruitment is difficult, procedure for acquiring samples is risky or expensive In this case, it might be worthwhile to perform some technical replicates due to “cost-benefit” analysis GENERAL RULE: TRUE REPLICATION BEATS TECHNICAL REPLICATION FOR GAINS IN PRECISION WHEN ESTIMATING PARAMETERS

Pooling of Samples Often motivated by insufficient quantity of RNA, which is reasonable. Sometimes, proposed to ‘control’ for biological variability Bad idea! We need to understand, not eliminate biological variability To understand the differences in mean expressions across two populations (e.g. Normal karyotype and t(15:17)), we need to be able to estimate the populations means We cannot do that if we have pooled RNA We can estimate mean difference in two groups based on pooled samples But, we cannot make inferences about whether of not there is a difference in mean expression.

Pooling of Samples Pooling is ALWAYS bad if your goal is Finding classification scheme Discovering unknown subtypes ‘In between’ strategy for pooling when we are interested in determining if average expression is different in two phenotypes (Kendziorski et al (2003)). Pooling RNA for use as a ‘reference’ is OK (more in a minute).

Experimental Layout Discussion specific to two-color arrays Complicated due to pairing of samples on arrays One-color array design considerations usually more straightforward Critical determinant of design efficiency. Three main types of designs in two-color arrays: Reference Loop Dye swap

Reference Design Each arrow represents an array Lets say that origin of arrow is green and head of arrow is red Each sample of interest is paired with the same “reference” sample AML example: reference was 11 pooled cell lines Here, each sample is labeled with red (Cy5) and reference is labeled with green (Cy3) Each sample is only hybridized to ONE array (each reference) Reference sample Type 1 Type 2

Loop Design Type 1 Type 2 Each sample is paired with a sample of the other type (no reference!) Each sample is hybridized to TWO arrays and is both red and green Can compare any two arrays by comparing arrays between them in loop. Relative efficiency is 4 to 1 comparing loop to reference Downside: what if just ONE array goes bad? Loop is not a loop anymore! Good design for small number of samples: uses information very effectively

Dye Swap Design Type 1 Type 2 Each sample is paired with the same sample of the other type TWICE Each sample is hybridized to TWO arrays Dyes are swapped Relative efficiency is 4 to 1 comparing loop to reference More robust than loop Less complicated than loop Direct comparisons are not as easy because samples are not linked through other samples as in other two designs

Why reference so often? As population variance increase, loop and dye swaps have less advantage. Sample comparisons must go ‘through’ loop Direct comparisons not easy in dye swap if samples are not on same chip. If you have large number of samples, loop is risky due to ‘bad chips’ Logically, however, by using reference on every chip, we are ‘wasting’ a resource. But, less efficiency advantage in complex designs as number of RNAs increases

Robustness Two robust alternatives: require 2x as many arrays “Double reference” “Double Loop”

Practical Considerations Simplicity Large study with many technicians Extendability Open-ended Can add additional samples at a later time depending on what early results suggest Reference and “symmetric” reference designs Useful subdesigns “subgroup analyses” Example: all AMLs vs. normal karyotype