Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona.

Slides:



Advertisements
Similar presentations
Analysis by design Statistics is involved in the analysis of data generated from an experiment. It is essential to spend time and effort in advance to.
Advertisements

Optimal designs for one and two-colour microarrays using mixed models
Experimental Design and Differential Expression Class web site: Statistics for Microarrays.
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
1 Introduction to Experimental Design 1/26/2009 Copyright © 2009 Dan Nettleton.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
Analyzing Factorially designed microarray experiments Scholtens, D. et al. Journal of Multivariate Analysis, to appear Presented by M. Carme Ruíz de Villa.
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Introduction to the design of cDNA microarray experiments Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subject-to-variable ratios Stability of correlation values Useful.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Statistics: The Science of Learning from Data Data Collection Data Analysis Interpretation Prediction  Take Action W.E. Deming “The value of statistics.
Some thoughts of the design of cDNA microarray experiments Terry Speed & Yee HwaYang, Department of Statistics UC Berkeley MGED IV Boston, February 14,
DNA Microarrays Examining Gene Expression. Prof. GrossBiology 4 DNA MicroArrays DNA MicroArrays use hybridization technology to examine gene expression.
\department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel.
Chapter 14 Inferential Data Analysis
Designing Microarray Experiments Naomi Altman Oct. 06.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Magister of Electrical Engineering Udayana University September 2011
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Experiments and Observational Studies.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
CDNA Microarrays MB206.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Agenda Introduction to microarrays
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Step 3 of the Data Analysis Plan Confirm what the data reveal: Inferential statistics All this information is in Chapters 11 & 12 of text.
Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.
Design of microarray gene expression profiling experiments Peter-Bram ’t Hoen.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistics for Differential Expression Naomi Altman Oct. 06.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Producing Data 1.
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Copyright © 2009 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Chapter 12 Power Analysis.
Statistical Data Analysis
Introduction to Experimental Design
Normalization for cDNA Microarray Data
CHAPTER 16: Inference in Practice
DESIGN OF EXPERIMENTS by R. C. Baker
Design Issues Lecture Topic 6.
Presentation transcript:

Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona

2 Outline Introduction Design issues in microarray experiments Applying experimental design principles The choice of experimental layout Hints and conclusions Acknowledgments & Disclaimer

3 And so said the master… To call in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of. Sir Ronald A. Fisher Geneticist, Experimentalist, Statistician Indian Statistical Congress, 1938

4 Why experimental design? The objective of experimental design is to make the analysis of the data and the interpretation of the results –As simple and as powerful as possible, –Given the purpose of the experiment and –The constraints of the experimental material.

5 First things first Experimental design must –Look backward What’s the relevant question to be answered? –Consider the current situation Which factors influence the experiment’s result? Which ones are limiting? Which ones can be controlled? –Look ahead How are the results to be processed and analyzed?

6 Biological verification and interpretation Microarray experiment Experimental design Image analysis Normalization Biological question TestingEstimation Discrimination Analysis Clustering Quality Measurement Failed Pass

Design issues for microarray experiments

8 Designing microarray experiments The appropriate design of a microarray experiment must consider –Design of the array –Allocation of mRNA samples to the slides Both aspects are influenced by different sets of parameters, Ultimately the decisions must be guided by the questions that have to be answered

9

10 Some aspects of design (I): Layout of the array Which sequences to print? –cDNA’s  Selection of cDNA from library Riken, NIA, etc. –Affymetrix  PM’s and MM’s Oligo probes selection (from Operon, Agilent, etc) –Control probes What %?. Where should controls be put? How many sequences to print? –Duplicate or replicate spots within a slide Ask an statistician ….

11 Some aspects of design (2): Allocation of samples in the slides Types of Samples –Replication: technical vs biological –Pooled vs individual samples –Pooled vs amplification samples Different design layout / data analysis: –Scientific aim of the experiment. –Efficiency, Robustness, Extensibility Physical limitations (cost) : –Number of slides. –Amount of material.

12 Scientific aims and design choice Different studies, have different objectives –to identify differentially expressed genes, (class comparison) –to search for specific gene-expression patterns (class discovery) –to identify phenotypic subclasses. (class prediction) So.. They may require different designs –Sometimes only an option available –Sometimes a choice must be made

13 Principles of experimental design In order to attain the objectives of experimental design it is usually considered that the following principles should be applied –Randomization –Replication –Local control Next slides give some hints on their application in this context

14 Randomization Where to print duplicate spots? –Printing them together  high concordance, higher risk of missing values if problems  –Print them at random may be a better choice How to assign… –Treatments to individuals? –Samples to arrays  at random

15 Replication It’s important –To increase precision –As a formal basis for inferential procedures Different types of replicates –Technical Duplicate spots Multiple hybridizations from the same sample –Biological [If I had to replicate all my experiments I could only do half as much, (Bottstein 1999)]

16 The 3 layers of experimental Nature reviews & G. Churchill (2002)

17 Replication (1): Duplicate spots In general good practice, but… –Good for quality control But: Don’t use internal controls to normalize! –Worse for statistical inference Highly correlated  Not really iid samples Ideally: spot duplicates at random We want to have duplicates  How many? –A minimum of 3 is reasonable Helps detect outliers Decreases # of false (+) and false (-) (Lee 2000)

18 Replication (2): Technical replicates Goal is not measure biological variability –X: expression level, E(X)=  x, Var(X)=  x 2 Good for assessing measurements precision (intra individual variability) –Y: measure of X, Y i =X+  I, E(  i )=0, Var(  i )=  2 Sometimes yields valuable information –If interest is on individual mRNA’s (diagnostic)

19 Replication (3): Biological replicates Hybridizations involving mRNA from different extractions (individuals, cell line…) Its main usefulness is to assess and account for biological variability –In very homogeneous populations a single individual is sometimes taken as representative –It relies on strong unverifiable assumptions –We can’t asses error committed Don’t substitute biological replicates with technical ones!

20 Pooling: To pool or not to pool? mRNA from different samples combined to formed a pool. Why? –If each sample doesn’t yield enough mRNA But… one can also amplify –To compensate an excess of variability  But we can’t estimate it when pooling Pooling should in general be avoided but… –If goal of study is test for differential expression Under certain restrictions may still be used –If goal of study requires individual’s information Can’t be used

21 Experimental layout Local control = Experimental layout = How are mRNA samples assigned to arrays The experimental layout has to be chosen so that the resulting analysis can be done as efficient and robust as possible –Sometimes there is only one reasonable choice –Sometimes several choices are available

22 Case 1: Meaningful biological control (C) Samples: Liver tissue from 4 mice treated by cholesterol modifying drugs. Question 1: Genes that respond differently between the T and the C. Question 2: Genes that responded similarly across two or more treatments relative to control. Case 2: Use of universal reference. Samples: Different tumor samples. Question: To discover tumor subtypes. Example I: Only one design choice T2T3T4 C T1T1T1 Ref T2T2 T n-1 TnTn

23 Example 2: a number of different designs are suitable for use (2) Direct comparison between two treatments

24 Dye swap experiment

25 Repeated dye-swaps Useful for reducing technical variation Conclusions limited to the samples A A A B

26 Replicated dye swap Accounts for biological and technical variation Significance may be harder to achieve Conclusions apply to population

27 Reference design Widely used Dye effect-confounded with treatments  (  dye swap to avoid) Poor efficiency  –½ measures reference Path between any 2 samples: short Easy to extend R V1V1 V2V2 V3V3

28 Loop design Efficient alternative to reference design Large loops inefficient –Interweave several loops V1V1 V3V3 V2V2 A1A1 A2A2 A3A3

29 How can we decide? A-optimality: choosee design which minimizes variance of estimates of effects of interest A simple example: Direct vs indirect estimates AB A B R Direct Indirect  2 /22222 average (log (A/B))log (A / R) – log (B / R ) These calculations assume independence of replicates: the reality is not so simple.

30 How can we reduce variation? Replicated Spots Multiple Arrays per Sample Multiple Samples per Treatment Group Pooling Increased precision and quality control Estimate measurement error Estimate Biological Variation Reduce Biological Variation

31 About Resources Allocation in a Microarray Experiment Measurement error Technical variance Biological variance Effect of pooling Total variance 2e2A2B2e2A2B

32 Tips of experimental design In this link from WEHI you may find some ideas and a case study of which information may be collected to make an adequate experimental designWEHI

33 Summary Two important issues –Selection of mRNA samples Most important: biological replicates Technical replicates also useful, but different Try to avoid making a big pool –Choice of experimental layout May be guided by the scientific question Also by efficiency and robustness considerations –In general direct comparisons better than indirect –Loop design may be preferred to reference designs –Robust variants for each version yield similar conclusion

34 References Churchill (2002) Fundamentals of cDNA microarray design. Nature Genetics 32(suppl.2): 490-5Churchill (2002) Draghici, S. (2003) Data analysis tools for microarrays, Chapman & Hall, Kerr (2003) Design considerations for effective and efficient microarray studies. Biometrics, 59, Lee, M.L.T. et al. (2000) Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. PNAS 97(18) Lee, M.L.T. et al. (2000) Speed, T. (2003) Statistical analysis of gene expression microarray data, CRC Press, Yang & Speed (2002) Design Issues for cDNA Microarrays. Nature Reviews Genetics 3: Yang & Speed (2002)

35 Acknowledgments Special thanks to Yee Hwa Yang (UCSF) for allowing me to use some of her materials G. Churchill and Kathleen Kerr, for writing their papers and making their slides available Sandrine Dudoit & Terry Speed, U.C. Berkeley M. Carme Ruíz de Villa, U. Barcelona Sara Marsal, U. Reumatología, HVH Barcelona

36 Disclaimer The goal of this presentation is to discuss the contents of the paper indicated in the title Copyrighted images have been taken from the corresponding journals or from slide shows found in internet with the only goal to facilitate the discussion All merit for them has to be attributed to the authors of the papers or the slide shows and we wish to thank them for making them available