Lecture 22 Introduction to Microarray

Slides:



Advertisements
Similar presentations
Introduction to Microarray Gene Expression
Advertisements

Microarray Simultaneously determining the abundance of multiple(100s-10,000s) transcripts.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
Getting the numbers comparable
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
DNA microarray and array data analysis
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray analysis Golan Yona ( original version by David Lin )
The Human Genome Project and ~ 100 other genome projects:
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Introduce to Microarray
Gene Expression BMI 731 Winter 2005 Catalin Barbacioru Department of Biomedical Informatics Ohio State University.
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Applied Biosystems 7900HT Fast Real-Time PCR System I. Real-time RT-PCR analysis of siRNA-induced knockdown in mammalian cells (Amit Berson, Mor Hanan.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
and analysis of gene transcription
Analysis of microarray data
with an emphasis on DNA microarrays
HC70AL Spring 2009 Gene Discovery Laboratory RNA and Tools For Studying Differential Gene Expression During Seed Development 4/20/09tratorp.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
‘Omics’ - Analysis of high dimensional Data
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
CDNA Microarrays MB206.
Data Type 1: Microarrays
Panu Somervuo, March 19, cDNA microarrays.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Agenda Introduction to microarrays
Literature reviews revised is due4/11 (Friday) turn in together: revised paper (with bibliography) and peer review and 1st draft.
Microarray - Leukemia vs. normal GeneChip System.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
Introduction to DNA microarray technologies Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Genomics I: The Transcriptome
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
MICROARRAY TECHNOLOGY
Lecture 7. Functional Genomics: Gene Expression Profiling using
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
MICROARRAYS D’EXPRESSIÓ ESTUDI DE REGULADORS DE LA TRANSCRIPCIÓ DE LA FAMILIA trxG M. Corominas:
Introduction to Microarrays. The Central Dogma.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Proteome and Gene Expression Analysis Chapter 15 & 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Lecture 23 – Functional Genomics I Based on chapter 8 Functional and Comparative Genomics Copyright © 2010 Pearson Education Inc.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Introduction to Oligonucleotide Microarray Technology
Microarray: An Introduction
Chapter 14 GENETIC TECHNOLOGY. A. Manipulation and Modification of DNA 1. Restriction Enzymes Recognize specific sequences of DNA (usually palindromes)
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
Microarray Technology and Applications
Getting the numbers comparable
Microarray Data Analysis
Data Type 1: Microarrays
Presentation transcript:

Lecture 22 Introduction to Microarray CS 5263 Bioinformatics Lecture 22 Introduction to Microarray

Outline What is microarray Basic categories of microarray How can microarray be used Computational and statistical methods involved in microarray Probe design Image processing Pre-processing Differentially expressed gene identification Clustering / classification Network / pathway modeling

Gene expression Reverse transcription (in lab) Product is called cDNA Genes have different activities at different time / location DNA Microarrays Measure gene transcription (amount of mRNA) in a high-throughput fashion A surrogate of gene activity

(an old technique for measuring mRNA expression) Northern Blot (an old technique for measuring mRNA expression) 1. mRNA extracted and purified. 4. mRNA are transferred from the gel to a membrane. 2. mRNA loaded for electrophoresis. Lane 1: size standards. Lane 2: RNA to be tested. 5. A labeled probe specific for the RNA fragment is incubated with the blot. So the RNA of interest can be detected. - 3. The gel is charged and RNA “swim” through gel according to weight. Hybridization Need relatively large amount of mRNA + http://www.escience.ws/b572/L13/north.html

RT-PCR (reverse transcription-polymerase chain reaction) RNA is reverse transcribed to DNA. PCR procedures can be used amplify DNA at exponential rate. Gel quantification for the amplified product. ---- an semi-quantitative method. Smaller amount of sample needed. See animation of RT-PCR: http://www.bio.davidson.edu/courses/Immunology/Flash/RT_PCR.html real-time RT-PCR The PCR amplification can be monitored by fluorescence in “real time”. The fluorescence values recorded in each cycle represent the amount of amplified product. ---- a quantitative method. The current most advanced and accurate analysis for mRNA abundance. Usually used to validate microarray result. Often used to validate microarray http://www.ambion.com/techlib/basics/rtpcr/

Limitation of the old techniques Labor intensive Can only detect up to dozens of genes. (gene-by-gene analysis)

What is a Microarray Gene 102 Conceptually similar to (reverse) Northern blot (Many) probes, rather than mRNAs, are fixed on some surface, in an ordered way Gene 305

What is a microarray (2) A 2D array of DNA sequences from thousands of genes Each spot has many copies of same gene (probe) Allow mRNAs from a sample to hybridize Measure number of hybridizations per spot

Goals of a Microarray Experiment Find the genes that change expression between experimental and control samples Classify samples based on a gene expression profile Find patterns: Groups of biologically related genes that change expression together across samples/treatments

Microarray categories cDNAs microarray Each probe is the cDNA of a gene (hundreds to thousands bp) Stanford, Brown Lab Oligonucleotide microarray Each probe is a synthesized short DNA (uniquely corresponding to a substring of a gene) Affymetrix: ~ 25mers Aglient: ~ 60 mers Others

Spotted cDNA microarray

Array Manufacturing Each tube contains cDNAs corresponding to a unique gene. Pre-amplified, and spotted onto a glass slide

Experiment cy3 cy5

Data acquisition Computer programs are used to process the image into digital signals. Segmentation: determine the boundary between signal and background Results: gene expression ratios between two samples

cDNA Microarray Methodology Animation

Affymetrix GeneChip®

Array Design 25-mer unique oligo mismatch in the middle nuclieotide multiple probes (11~16) for each gene from Affymetrix Inc.

Array Manufacturing In situ synthesis of oligonucletides Technology adapted from semiconductor industry. (photolithography and combinatorial chemistry)                                                              In situ synthesis of oligonucletides from Affymetrix Inc.

GeneChip® Probe Arrays Hybridized Probe Cell * * GeneChip Probe Array * * * Single stranded, labeled RNA target * Oligonucleotide probe 24µm Millions of copies of a specific oligonucleotide probe 1.28cm >200,000 different complementary probes Image of Hybridized Probe Array

Overview of the Affymetrix GeneChip technology Each probe set combines to give an absolute expression level. Image segmentation is relatively easy. But how to use MM signal is debatable from Affymetrix Inc.

Comparison of cDNA array and GeneChip cDNA GeneChip Probe preparation Probes are cDNA fragments, usually amplified by PCR and spotted by robot. Probes are short oligos synthesized using a photolithographic approach. colors Two-color (measures relative intensity) One-color (measures absolute intensity) Gene representation One probe per gene 11-16 probe pairs per gene Probe length Long, varying lengths (hundreds to 1K bp) 25-mers Density Maximum of ~15000 probes. 38500 genes * 11 probes = 423500 probes

Why the difference? Affymetrix GeneChip One color design cDNA microarray Two color design Why the difference?

Affymetrix GeneChip cDNA microarray Photolithography Robotic spotting (The amount of oligos on a probe is well controlled) cDNA microarray Robotic spotting (The amount of cDNA spotted on a probe may vary greatly)

Advantage and disadvantage of cDNA array and GeneChip cDNA microarray Affymetrix GeneChip The data can be noisy and with variable quality Specific and sensitive. Result very reproducible. Cross(non-specific) hybridization can often happen. Hybridization more specific. May need a RNA amplification procedure. Can use small amount of RNA. More difficulty in image analysis. Image analysis and intensity extraction is easier. Need to search the database for gene annotation. More widely used. Better quality of gene annotation. Cheap. (both initial cost and per slide cost) Expensive (~$400 per array+labeling and hybridization) Can be custom made for special species. Only several popular species are available Do not need to know the exact DNA sequence. Need the DNA sequence for probe selection.

Computational aspects Probe design Image processing Pre-processing Differentially expressed gene identification Clustering / classification Network / pathway modeling

First step: pre-processing Transformation Transforms intensities or ratios to a different scale Why? For convenience Convert data into a certain distribution (e.g. normal) assumed by many other statistical procedures Normalization Correct for systematic errors Make data from different samples comparable Garbage in => Garbage out

Where errors could come from? Random errors Repeat the same experiment twice, get diff results Using multiple replicates reduces the problem Systematic errors Arrays manufactured at different time On the same array, probes printed with different printer tips may have different biases Dye effect: difference between Cy5 and Cy3 labeling Experimental factors Array A being applied more mRNAs than array B Sample preparation procedure Experiments carried out at different time, by different users, etc.

cDNA microarray data preprocessing

Typical experiments Probes (genes) Wide-type cells vs mutated cells Diseased cells with normal cells Cells under normal growth condition vs cells treated with chemicals Typically repeated for several times Ratios Probes (genes)

Transforming cDNA microarray data Data: Cy5/Cy3 ratios as well as raw intensities Most common is log2 transformation 2 fold increase => log2(2) = 1 2 fold decrease => log2(1/2) = -1

Dye effect cDNA microarray experiments using two identical samples. Cy5 consistently lower than Cy3. Solution: dye swapping.

Dye swapping ½ log2 (cy5/cy3 on chip 1) + ½ log2 (cy3/cy5 on chip 2) Chip 1: label test by cy5 and control by cy3 Chip 2: label test by cy3 and control by cy5 Ideally cy5/cy3 = cy3/cy5 Not so due to dye effect Compute average ratio: ½ log2 (cy5/cy3 on chip 1) + ½ log2 (cy3/cy5 on chip 2)

Total intensity normalization Even after dye-swapping, may still see systematic biases Assume the total amount of mRNAs should not change between two samples Not necessarily true Rescale so that the two colors have same total intensity Rescale according to a subset of genes House-keeping genes Middle 90% (for example) of genes Spike-in genes

M-A plot Also know as ratio-intensity plot M: log2(cy5 / cy3) = log2(cy5) – log2(cy3) A: ½ log2(cy5 * cy3) = (log2(cy5) + log2(cy3)) / 2 Ideal: M centered at zero variance does not depend on A. However: Systematic dependence between M and A High variance of M for smaller A M A

Lowess normalization Lowess: Locally Weighted Regression Fit local polynomial functions M adjusted according to fitted line M M’ A A

Replicate filtering Experiments repeated Genes with very high variability is questionable Ratio 1 Ratio 2 Log2(ratio2) Log2(ratio1)

oligo microarray data preprocessing (Affymetrix chip)

Typical experiments Multiple microarrays For example n samples (from different time, location, condition, treatment, etc.) k replicates for each samples For example Samples collected from 100 healthy people and 100 cancer patients Cells treated with some drugs, take samples every 10 minutes Repeat on 3 – 5 microarrays for each sample Improve reliability of the results Often averaged after some preprocessing

Main characteristics For each gene, there are multiple PM and MM probes (11-16 pairs) how to obtain overall intensities from these probe-level intensities? Array outputs are absolute values rather than ratios Cross-array normalization is important for them to be comparable

How to use MM information? Earlier approach: First remove outliner probes Actual intensity = Ipm – Imm IPM = IMM + Ispecific ? Recent trend Tend to ignore Imm or use in a different way Various software packages MAS5 (by affymetrix) dChIP RMA GCRMA

Normalization Similar to cDNA microarrays Total intensity normalization Each array has the same mean intensity Can be based on all genes or a selected subset of genes House-keeping genes Middle 90% (for example) of genes Spike-in genes Lowess with a common reference Many useful tools implemented in Bioconductor

Conclusions Microarray provides a way to measure thousands of genes simultaneously and make the global monitoring of cellular activities possible. The method produces noisy data and normalization is crucial. Real Time RT-PCR for validation of small number of genes.

Limitation Measures mRNA instead of proteins. Actual protein abundance and post-translation modification can not be detected. Suitable for global monitoring and should be used to generate further hypothesis or should combine with other carefully designed experiments.

Microarray preproc questions What kind of array it is? Two-color? One-color? Oligo array? cDNA array? How is the experiment designed? Time series? Test vs control? What kind of preprocessing has been done? What value: raw intensity value or ratios? Transformation? Log scale? Linear scale? Normalization: within-array? Cross-array? What are the next steps you want to proceed? Identifying differentially expressed genes? Clustering?

Some real data Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown, “Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale”, Science, 278: 680 – 686, 1997