An Introduction to Microarrays Ellen Wisman Michigan State University AFGC edu/
Uses of Microarrays RNA Expression profiles DNA profiling Comparative genomic hybridizations Transcription binding mapping SNP mapping Mini-sequencing
Cycles 1 through 3: Number of Customer and Collaborator slides 236 Number of Slides Publicly available in SMD178 Through Cycle 5: Total Number of Proposals115 Total Number of Customers111 Projected Total Number of Slides408 AFGC experiments40 Proposal Facts
Thlaspi caerulescens 2% Types of experiments 45% Genotype comparisons/transgene/antisense 34% Treated plants vs. wild type 8% Studying the effects of pathogens 7% Treatment of mutant 4% Studying development 2% Tissue comparisons V.S. 88 experiments use Arabidopsis thaliana. 2 experiments use Brassica. 2 experiments use Thlaspi caerulescens. 1 experiment uses Alyssum lesbiacum Organisms Studied Arabidopsis 95% Brassica 2% Alyssum lesbiacum 1% Arabidopsis 95%
Principles of array production In situ oligos synthesis (=directly on the slide) – Photolithography, light directed synthesis using a mask, 25-mers (Affymetrix ) – Ink-jet printing process, 60-mers (Rosetta) – Others
Principles of array production Spotted arrays –cDNA clones –ESTs –GSTs gene specific primers –Oligonucleotides –Genomic clones –Genomic DNA
cDNA microarrays Labeled RNAArray RNA –cy5RNA –cy3 + cDNA clones 1) Reference2) Experimental Hybridize ExpressionRatio higher in 1 than 2 > 2 same between 1 and 2 1 lower in 1 than 2< 0.5
Data Analysis & Presentation Acquisition (e.g. ScanAlyze ®, GenePix ® ) Input/Storage/Retrieval (Stanford Microarray Database) Analysis/Pattern Recognition/ Visualization (Tree View, Clustering, Self-Organizing Maps, K-means,R, Gene Spring ® ) Interpretation/Annotation Publication/Repository (TAIR, GenBank) SMD M. Cherry, Stanford University
The AFGC array Microarray Design >100,000 Arabidopsis ESTs Compare to each other (BLASTN) then to all Arabidopsis proteins (BLASTX) Analysis Performed by Rob Ewing, Stanford University -9,200 EST clones from all tissues -2,000 EST clones from developing seeds -3,000 GSTs, gene specific tags Former array >11,000 clones 99% of the re-sequenced clones were correct
PCR Products Classified in: No band Low Concentration Double band Smear
Microarray Controls Available from NASC Negative/spiking controls –Heterologous sequences spotted either to determine background and non- specific hybridization or to serve as external controls by spiking the corresponding RNA into the labeling reaction ( human clones ) Transgenes –For quantification of reporter constructs (can also serve as negative controls) (BAR, BT, BASTA, Luciferase…….) Positive controls –Dilutions of chromosomal DNA
Control spots: Genomic DNA Arabidopsis Genomic DNA digested with RSA1 Spotted at different concentrations Carryover Intensity
Data manipulation Remove flagged spots Remove spots with bad quality defined as % of pixels 1.5% above background Excel Access Other commercial available programs Normalization = Adjust the signal intensities for each channel to make the two channels comparable
Why Normalization
cy3 cy5 Log of Intensities Data Distribution before and after Normalization Number of clones cy3 cy5
Distribution of ratios Number of clones Ratios 2x Fold change Number of clones > > > > >2.2503
Identification of false positives in slides hybridized with identical RNAs in both channels 2-foldclass Slide 46 a Slide 329 Slide 585 b Slide 1140 Slide 2188 c Slide 6365
(I) Intensities(II) Ratios (M) vs. Intensities (A)(III) Spatial Ratios Slide 4, class (a) Slide 6, class (c)
a a+b a+b+c c b+c 1(a)+1(c) Number of clones (log scale) Ratio Frequency of false positives among 6 slides
0% 20% 40% 60% 80% 100% A3/A4A7/A8A5/A6 Slide pairs Reproducible (% clones in final set) Non-reproducible 0% 10% 20% 30% 40% 50% 60% 70% 80% Number of slides Worst pair to best pair Best pair to worst pair Percentage of Non-reproducible spots A 2-fold cutoff yields 25-35% non-reproducible that can be removed following multiple replicates R. Gutierrez
Repetitions How many repetitions? Minimum is a technical repetition (dye swap) Recommended at least one other biological repetition Experimental considerations, small changes need more repetitions
Uses of Microarrays RNA Expression profiles Type 1: Direct comparisons two different RNAs Type 2: Multiple comparisons or RNA, via common reference
Type 2: Multiple comparisons Use of a common reference allows to compare experiments directly Direct comparison of many different growth conditions or tissues Time course DNA as common reference Hybridizes equally to each spot Consistent between slides Unlimited supply
Distribution of intensities of genomic DNA RNA intensities DNA intensities High intensities clones 4x stde v 6x stde v chloroplast5042 mitochondrial52 ribosomal2416 repetitive sequence 21 multigene families499 single genes548 Relative intensities No. Clones
Identification of low intensity spots Low intensity spots include: defects in printing poorly amplified clones Relative Intensities
Comparison of common reference vs Direct comparison 6 slides: Time 0 hr- Time12 hrs Time 0 hr - Common Reference Time 12 hrs- Common Reference Direct Ratio Ratio via common ref
Comparison of direct vs DNA as reference Direct ratio (log) via DNA ratio (log) Ratios correlate well for higher values, smaller differences may not be detected in type II experiments
Consistent trends between heterologous species Conserved Genes Expressed Preferentially in Shoot Apices Conserved Genes Expressed Preferentially in Leaves D. Horvath
Differential expression of the homologues genes confirmed in leafy spurge Adenosylhomocysteinase (Cytokinin-binding) Ubiquitin-conjugating Enzyme Glyceraldehyde 3-Phosphate Dehydrogenase ATP Synthase Guanine Nucleotide Binding Protein Elongation Factor 1B L M D.Horvath
Summary AFGC contributions Good source of clones Spotting conditions Hybridizing and labeling techniques »Comparison labeling from Total and PolyA+ RNA »Slide coatings »Hybridization solutions »Labeling from low amounts of RNA Experimental set up »Type II experiments »Heterologous probes »Number of repetitions »Comparison between spotted arrays and Affymetrix chips Data Analysis tools via SMD
Future perspective global gene expression studies Gene discovery Global view (circadian rhythm, epigenetic changes) Approach old problems (hybrid vigor) Diagnostic tool (transgenes) New hypotheses (looking for patterns over many experiments)
Acknowledgements PRL-MSU Robert Schaffer Jeff Landgraf Matt Larson Dave Green Monica Accerbi Verna Simon Kim Trouten Sergei Mekhedov Ellen Wisman John Ohlrogge Ken Keegstra Pam Green Stanford University Shauna Somerville Mike Cherry