High-dimensional data analysis: Microarrays and multiple testing Mark van de Wiel 1,2 1. Dep. of Mathematics, VU University Amsterdam 2. Dep. of Biostatistics.

Slides:



Advertisements
Similar presentations
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Advertisements

1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Gene expression analysis summary Where are we now?
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
3 rd Summer School in Computational Biology September 10, 2014 Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory.
\department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
By Moayed al Suleiman Suleiman al borican Ahmad al Ahmadi
Analysis of microarray data
Microarray Preprocessing
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Whole Genome Expression Analysis
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
From motif search to gene expression analysis
DNA microarrays Each spot contains a picomole of a DNA ( moles) sequence.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
DNA MICROARRAYS WHAT ARE THEY? BEFORE WE ANSWER THAT FIRST TAKE 1 MIN TO WRITE DOWN WHAT YOU KNOW ABOUT GENE EXPRESSION THEN SHARE YOUR THOUGHTS IN GROUPS.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Panu Somervuo, March 19, cDNA microarrays.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Genomica Funcional Dr. Víctor Treviño A7-421
Microarray - Leukemia vs. normal GeneChip System.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Extracting quantitative information from proteomic 2-D gels Lecture in the bioinformatics course ”Gene expression and cell models” April 20, 2005 John.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Microarrays and Gene Expression Arrays
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.
High-throughput omic datasets and clustering
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Prof. Yechiam Yemini (YY) Computer Science Department Columbia University (c)Copyrights; Yechiam Yemini; Lecture 2: Introduction to Paradigms 2.3.
Disease Diagnosis by DNAC MEC seminar 25 May 04. DNA chip Blood Biopsy Sample rRNA/mRNA/ tRNA RNA RNA with cDNA Hybridization Mixture of cell-lines Reference.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
Gene Expression Analysis
Genomic analysis: Toward a new approach in breast cancer management
Functional Genomics in Evolutionary Research
Normalization for cDNA Microarray Data
Figure 1. Identification of three tumour molecular subtypes in CIT and TCGA cohorts. We used CIT multi-omics data ( Figure 1. Identification of.
Data Type 1: Microarrays
Presentation transcript:

High-dimensional data analysis: Microarrays and multiple testing Mark van de Wiel 1,2 1. Dep. of Mathematics, VU University Amsterdam 2. Dep. of Biostatistics & Dep. of Pathology, VU University medical center, Amsterdam

Genomics: a short history (1) Some history 1.Watson & Crick: double helix structure of DNA (1953) Source:

Genomics: a short history (2) 2. Human Genome Project: Identification of all human genes ( ) June 25, 2000 PRESIDENT CLINTON ANNOUNCES THE COMPLETION OF THE FIRST SURVEY OF THE ENTIRE HUMAN GENOME Hails Public and Private Efforts Leading to This Historic Achievement THE WHITE HOUSE Office of the Press Secretary For Immediate Release June 25, 2000 PRESIDENT CLINTON ANNOUNCES THE COMPLETION OF THE FIRST SURVEY OF THE ENTIRE HUMAN GENOME Hails Public and Private Efforts Leading to This Historic Achievement June 26, 2000 Today, at a historic White House event with British Prime Minister Tony Blair, President Clinton announced that the international Human Genome Project and Celera Genomics Corporation have both completed an initial sequencing of the human genome -- the genetic blueprint for human beings.

Genomics: a short history (3) 3a DNA hybridisation discovered 3b Introduction of robotics (Hoheisel et al.) 3c First microarray publication (Schena et al.) 3d First whole genome microarray experiments (De Risi et al.) 3e First publication on microarrays for cancer classification (Golub et al.): Leukemia / Affymetrix arrays

Central dogma 1.DNA is the same in each cell (tumours are an exception) 2.Function of the cell is determined by proteins 3.The path from DNA to proteins goes via messenger RNA (mRNA) 4.DNA is transcribed to mRNA according to the needs of that cell 5.mRNA contains the instructions for what proteins to build Microarrays measure the amount of mRNA DNAmRNA protein

Microarrays (1) Source:

Microarrays (2) 1. Isolation of mRNA (single-stranded DNA; genes) 2. Labeling with color molecule 3. Chip contains probes which uniquely correspond to genes 4. Hybridization to the chip 5. Laser to read labeled molecules 6. Image analysis converts colors to numbers, intensities 7. Result: data matrix with 2 intensities for each array Microarray Movie

The result Nr of rows (eg ) is determined by nr of probes (> nr of genes) More genes than samples: high-dimensional setting

Statistical issues before data analysis 1. Design of the experiment (not discussed) 2. Quality control (not discussed) 3. Normalization Data visualized by MA plot Use of different dyes (colours) may leed to a non-linear dye-bias This needs to be removed since it is artificial M = log2(R/G) = log2(R)-log2(G) A = log2(R*G)= log2(R)+log2(G)

Normalization Purpose: remove artificial dye effects to obtain unbiased M values. Most popular method: Loess. Assumption: mean M value equals 0 for all intensity ranges. Algorithm 1.Sort A values: A’ 1,..., A’ p. 2.For A’ i, window W i = [A’ i – L, A’ i + L] 3.For each W i linearly regress: M = a + bA + ε 4.M’ i (pred) = a i + b i A’ i 5.Subtract M’ i (pred) from M’ i.

Loess BeforeAfter

After normalization Log2-ratios for further analysis. Ratios: cancel out experimental spot effect, log to obtain symmetric scale. However, nowadays log-intensities (both dyes) are used more and more often.

Data Type of response Nominal. Eg tumor type. R = {Benigne, Maligne} Ordinal. Stage of a tumor. R={1,2,3,4} Continuous. Disease severity score. R = R + Censored. Survival. R= R + x {0,1}.

Typical data analyses for microarrays (1) Multivariate Unsupervised Clustering Principle component analysis Classification (statistical learning, discriminant analysis, supervised clustering) Multivariate regression with penalty for overfitting (eg Lasso / Ridge regression) Prognostic multivariate survival models

Typical data analyses for microarrays (2) Univariate Inference (Hypothesis testing). Expression of each gene is related to clinical response using, for example, –ANOVA –Linear Regression –Cox regression (survival) –Permutation (nonparametric) tests Hybrid Inference for sets of genes that are functionally related

Two-step ANOVA (1) (1) is the normalization model; it only includes a gene factor in the residual u. That is residual u contains all gene specific factors. (2) is the differential expression model Indices a: array; c: condition; d: dye; g: gene

Two-step ANOVA (2) Use of the two-step ANOVA: first fit (1) on all data, then estimate residuals u for each gene, then fit (2) for each gene separately. Main advantage with respect to one-level model: computational. One-level model would require fitting many parameters simultaneously in one ANOVA. Computation of raw p-values is the same as for usual ANOVA.

Multiple Testing, Motivation. Histogram of p-values generated under H 0 Even when all null-hypotheses are true, we expect *0.05 = p-values smaller than α = 0.05!!!

Multiple Testing. Illustration of Benjamini-Hochberg procedure

Multiple Testing M