Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.

Slides:



Advertisements
Similar presentations
Experiment Design for Affymetrix Microarray.
Advertisements

Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina.
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
BiGCaT Bioinformatics Hunting strategy of the bigcat.
The IF function Bernard Liengme. Objectives To know how to: Construct a condition using the comparison operators =, >=, >, ; Construct a formula using.
Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,
Robust microarray experiments by design: a multiphase framework Chris Brien Phenomics & Bioinformatics Research Centre, University of South Australia
Optimal designs for one and two-colour microarrays using mixed models
Relating Gene Expression to a Phenotype and External Biological Information Richard Simon, D.Sc. Chief, Biometric Research Branch, NCI
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
1 Introduction to Experimental Design 1/26/2009 Copyright © 2009 Dan Nettleton.
Microarray Normalization
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
CDNA Microarray Design and Pre-processing By H. Bjørn Nielsen.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
GCB/CIS 535 Microarray Topics John Tobias November 3 rd, 2004.
Bacterial Physiology (Micr430)
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Alternative Splicing As an introduction to microarrays.
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In.
Affymetrix GeneChip Data Analysis Chip concepts and array design Improving intensity estimation from probe pairs level Clustering Motif discovering and.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Division of Human Cancer Genetics Ohio State University.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Designing Microarray Experiments Naomi Altman Oct. 06.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Affymetrix vs. glass slide based arrays
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Agenda Introduction to microarrays
Scenario 6 Distinguishing different types of leukemia to target treatment.
Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.
Design of microarray gene expression profiling experiments Peter-Bram ’t Hoen.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
A Report on CAMDA’01 Biointelligence Lab School of Computer Science and Engineering Seoul National University Kyu-Baek Hwang and Jeong-Ho Chang.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.
Microarray Data Analysis The Bioinformatics side of the bench.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Introduction to Oligonucleotide Microarray Technology
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Getting the numbers comparable
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center
Design Issues Lecture Topic 6.
Presentation transcript:

Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei Tumori, Milano NETTAB 2003 Bologna, 28th November 2003

Outline Biostatistics and microarrays Study objectives Design of microarray experiments A case study: a designed experiment

Biostatistics and microarrays Microarray research: unique challenge for interdisciplinary collaboration Can biostatisticians be useful in microarray research? Are available software tools a valid substitution for collaboration with biostatisticians?

What can biostatisticians do? Active collaboration with researchers from biomedical and bioinformatics fields –to develop and critically evaluate methods for design of microarray experiments analysis of data –to perform data-analysis –to develop software tools and train biomedical researchers to use them

Italian inter-university research group Statistical issues in design and analysis of microarray data MIUR grant –Firenze (Annibale Biggeri) –Milano (Giuseppe Gallus) –Padova (Monica Chiogna) –Torino (Mauro Gasparini) –Udine(Corrado Lagazio)

Collaborations Milano Istituto Nazionale per lo Studio e la Cura dei Tumori, Milano (statisticians, biologists, molecular oncologists) IFOM, Milano (biologists, bioinformatics) Biometric Research Branch, NCI, Bethesda (statisticians) Edo Tempia Foundation, Biella Bioconductor poject (software development)

Study objectives Class comparison (supervised) –establish differences in gene expression between predetermined classes Class prediction (supervised) –prediction of phenotype using gene expression data Class discovery (unsupervised) –discover groups of samples or genes with similar expression

Design of microarray experiments Design of arrays Allocation of samples –Replication –Labeling of samples (cDNA) reference design balanced block design loop design

Levels of replication Biological replicates –multiple samples from different populations Technical replicates –multiple samples from the same subject –multiple samples from the same mRNA –multiple clones or probes of the same gene on the array

How many replicates? Biological replicates essential to make inference about population Technical replicates useful for quality control and for increasing precision How to determine sample size? –Problem-dependent –simple methods available for class comparison problems –not yet clear what to use for class discovery

Common pitfalls in microarray experiments Too little or no replication Use of replication at the wrong level Experiments with cell lines: assuming no variability among cell lines of the same type Inappropriate use of pooling –ok: use of multiple independent pools –but: is it useful? –individual information lost

Case study: a designed experiment Biological aim: assess the effect of Toremifen on MCF-7 breast cancer cell line, in terms of gene expression

BATCH A1A2A3 Control B1B2B3 Control Treatment CDNA Affymetrix CDNA Affymetrix CDNA Affymetrix CDNA Affymetrix CDNA Affymetrix CDNA Affymetrix POOL CDNA AffymetrixCDNAAffymetrix Week 1

BATCH A1A2A3 Control B1B2B3 Control Treatment POOL CDNA AffymetrixCDNAAffymetrix Week 2 and 3

Statistical aims comparison of microarray platforms (cDNA vs Affymetrix) hybridization of individual samples vs pools variability of cell lines robustness of commonly used statistical methods

Data available (so far) Hybridizations from Affymetrix HGU133 Chips Summary measure of intensities: MAS5 (Affymetrix, 2002) most commonly used, but other possibilities available –Robust Multichip Analysis (Irizarry et al., 2002) –Model-Based Expression Index (Li and Wong, 2001) (at least 16 chips!)

Brief summary of data HGU133A: –chipA : probe sets –chipB : probe sets Present –chipA: 48.5% –chipB: 38.2% pm<mm –chipA: 27% –chipB: 31%

Methods for exploring reproducibility among arrays Pearsons coefficient of correlation (common, but wrong!) Coefficient of variation Distribution of differences of intensities Altman and Blands plot (MA plot)

Class comparison Identification of differentially expressed genes between treated and not treated cell lines t-tests (adjusting for multiple comparisons) –all arrays –only pooled arrays –only individual arrays ANOVA (linear) model –estimation of treatment effect, adjusting for pool effect and week effect

Some results... Pooled variance t-test on whole data –treated versus controls: chipA –1948 p 2) –240 with Bonferroni correction chipB –743 p 2) –76 with Bonferroni correction

Some results... Pooled variance t-test on pooled data –treated versus controls: chipA –204 p<0.001 »189/(204, 1948) common to overall analysis –82 p 2 »82/(82, 356) common to overall analysis chipB –80 p<0.001 »69/(80, 743) common to overall analysis –37 p 2 »37/(37, 143) common to overall analysis

Some results... Pooled variance t-test on individual data –treated versus controls: chipA –669 p<0.001 »594/(669, 1948) common to overall analysis –226 p 2 »221/(226, 356) common to overall analysis chipB –245 p<0.001 »196(245, 743) common to overall analysis –80 p 2 »77/(80, 143) common to overall analysis

Some results... Pooled variance ANOVA results –treated versus controls: chipA –1913 p<0.001 »1624/(1913, 1948) common to overall analysis –343 p 2 »339/(343, 356) common to overall analysis chipB –245 p<0.001 »196(245, 743) common to overall analysis –80 p 2 »77/(80, 143) common to overall analysis

Conclusions ????????? So far no evidence for the usefulness of pooling data from cell lines… no evidence of decreased variability … but need to further investigate the differences in the individual versus pooled results Need for a plan of biological (quantitative) validations of expression measures