Microarray Data Analysis Stuart M. Brown NYU School of Medicine.

Slides:



Advertisements
Similar presentations
Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,
Advertisements

Application of available statistical tools Development of specific, more appropriate statistical tools for use with microarrays Functional annotation of.
Microarray Data Analysis Day 2
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Gene Expression Chapter 9.
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
DNA microarray and array data analysis
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Data Preprocessing and Clustering Analysis
Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes Mark Schena, Dari Shalon, Renu Heller, Andrew Chai, Patrick O. Brown,
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
The Human Genome Project and ~ 100 other genome projects:
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Arrays: Narrower terms include bead arrays, bead based arrays, bioarrays, bioelectronic arrays, cDNA arrays, cell arrays, DNA arrays, gene arrays, gene.
Introduce to Microarray
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Previous Lecture: Proteomics Informatics
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
and analysis of gene transcription
Microarray Preprocessing
with an emphasis on DNA microarrays
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
COT 6930 HPC & Bioinformatics Microarray Data Analysis
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
DNA MICROARRAYS WHAT ARE THEY? BEFORE WE ANSWER THAT FIRST TAKE 1 MIN TO WRITE DOWN WHAT YOU KNOW ABOUT GENE EXPRESSION THEN SHARE YOUR THOUGHTS IN GROUPS.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Lecture 22 Introduction to Microarray
CDNA Microarrays MB206.
Data Type 1: Microarrays
Panu Somervuo, March 19, cDNA microarrays.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Microarray Technology
Agenda Introduction to microarrays
Microarray - Leukemia vs. normal GeneChip System.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Genomics I: The Transcriptome
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
MICROARRAY TECHNOLOGY
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Lecture 7. Functional Genomics: Gene Expression Profiling using
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Microarray Data Analysis The Bioinformatics side of the bench.
EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
Gene Expression Analysis
Microarray - Leukemia vs. normal GeneChip System.
The Basics of cDNA Microarray Technology
Microarray Data Analysis
Data Type 1: Microarrays
Presentation transcript:

Microarray Data Analysis Stuart M. Brown NYU School of Medicine

What is a Microarray A simple concept: Dot Blot + Northern Reverse the hybridization - put the probes on the filter and label the bulk RNA Make probes for lots of genes - a massively parallel experiment Make it tiny so you don’t need so much RNA from your experimental cells. Make quantitative measurements

A Filter Array

DNA Chip Microarrays Put a large number (~100K) of cDNA sequences or synthetic DNA oligomers onto a glass slide (or other subtrate) in known locations on a grid. Label an RNA sample and hybridize Measure amounts of RNA bound to each square in the grid Make comparisons –Cancerous vs. normal tissue –Treated vs. untreated –Time course Many applications in both basic and clinical research

cDNA Microarray Technologies Spot cloned cDNAs onto a glass microscope slide –usually PCR amplified segments of plasmids Label 2 RNA samples with 2 different colors of flourescent dye - control vs. experimental Mix two labeled RNAs and hybridize to the chip Make two scans - one for each color Combine the images to calculate ratios of amounts of each RNA that bind to each spot

Spot your own Chip (plans available for free from Pat Brown’s website) Robot spotter Ordinary glass microscope slide

Combine scans for Red & Green False color image is made from digitized fluorescence data, not by superimposing scanned images

cDNA Spotted Microarrays

Affymetrix “Gene chip” system Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene) RNA labeled and scanned in a single “color” –one sample per chip Can have as many as 20,000 genes on a chip Arrays get smaller every year (more genes) Chips are expensive Proprietary system: “black box” software, can only use their chips

Affymetrix Gene Chip

Affymetrix Technology

“Long Oligos” Like cDNAs, but instead of using a cloned gene, design a base probe to represent each gene Relies on genome sequence database and bioinformatics Reduces cross hybridization Cheaper and possibly more sensitive than Affy. system

Data Acquisition Scan the arrays Quantitate each spot Subtract background Normalize Export a table of fluorescent intensities for each gene in the array

Automate!! All of this can be done automatically by software. Much more consistent Mistakes will be made (especially in the spot quantitation) but you can’t manually check hundreds of thousands of spots

Affymetrix Software Affymetrix System is totally automated Computes a single value for each gene from 40 probes - (using surprisingly kludgy math) Highly reproducible (re-scan of same chip or hyb. of duplicate chips with same labeled sample gives very similar results) Incorporates false results due to image artefacts –dust, bubbles –pixel spillover from bright spot to neighboring dark spots

Basic Data Analysis Fold change (relative increase or decrease in intensity for each gene) Set cutoff filter for low values (background +noise) Cluster genes by similar changes - only really meaningful across multiple treatments or time points Cluster samples by similar gene expression profiles

Scatter plot of all genes in a simple comparison of two control (A) and two treatments (B: high vs. low glucose) showing changes in expression greater than 2.2 and 3 fold.

Cluster by color difference

Microarry Data Variablity Microarray data are inherently highly variable - you are measuring mRNA levels Any kind of measurement of thousands of values across 2 samples will find some large differences due to chance (normal distribution) Must have replication and statistics to show that differences are real

Sources of Variability Image analysis (identifying and quantitating each spot on the array) Scanning (laser and detector, chemistry of the flourescent label)) Hybridization (temperature, time, mixing, etc.) Probe labeling RNA extraction Biological variability

Normalization Can control for many of the experimental sources of variability (systematic, not random or gene specific) Bring each image to the same average brightness Can use simple math or fancy - –divide by the mean (whole chip or by sectors) –LOESS (locally weighted regression) No sure biological standards

Real Differences? Spots with low intensity will show much greater percent variability than bright spots –Background and machine variability represent a much larger fraction of the total measurement Fold change is often much greater for low intensity samples (absolute amount of RNA is small) If you normalize by dividing all samples by the mean, then genes that express at this level will have their variation suppressed

Thomas Hudson, Montreal Genome Center

Multiple Comparisons In a microarray experiment, each gene (each probe or probe set) is really a separate experiment You can’t look at a set of microarray data and ask if the overall average gene expression is different between two treatments Yet if you treat each gene as an independent comparison, you will always find some with significant differences

Gene-Specific Variability Different probes will hybridize to mRNAs with different efficiency –microarrays can only measure relative change of expression, not absolute levels Cross-hybridization –Gene families –Chance similarity of short oligo sequence Affy mis-match >> perfect match for many probes Diff. Affy probes for the same gene show huge differences in hyb intensity Alternative splicing!!

Statistics When you have variability in measurements, you need replication and statistics to find real differences It’s not just the genes with 2 fold increase, but those with a significant p-value across replicates Non-parametric (i.e. rank) or paired value statistics may be more appropriate

Experimental Design Real replicates! (same treatment, same biological source, different RNA prep, labeling, hybridization, and scanning) Dye reversal for two color hybs. Block design (don’t do exp. on one day and control on another) Work with a Statistician!!

Higher Level Microarray data analysis Clustering and pattern detection Data mining and visualization Controls and normalization of results Statistical validatation Linkage between gene expression data and gene sequence/function/metabolic pathways databases Discovery of common sequences in co-regulated genes Meta-studies using data from multiple experiments

Types of Clustering Herarchical –Link similar genes, build up to a tree of all Self Organizing Maps (SOM) –Split all genes into similar sub-groups –Finds its own groups (machine learning) Principle Component –every gene is a dimension (vector), find a single dimension that best represents the differences in the data

Microarray Databases Large experiments may have hundreds of individual array hybridizations Core lab at an institution or multiple investigators using one machine - data archive and validate across experiments Data-mining - look for similar patterns of gene expression across different experiments

Public Databases Gene Expression data is an essential aspect of annotating the genome Publication and data exchange for microarray experiments Data mining/Meta-studies Common data format - XML MIAME (Minimal Information About a Microarray Experiment)

GEO at the NCBI

Array Express at EMBL