Presentation on theme: "Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006."— Presentation transcript:
Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006
Goals To provide some guidelines on Affymetrix microarrays: –How to use them –How not to use them –Things to keep in mind when designing experiments and analyzing data This is a general discussion of issues and is by no means exhaustive
Inconsistent Annotations Affymetrix provided probeset annotations change over time The gene symbol associated with a given probeset is not necessarily stable This is due to changes in gene prediction as new information becomes available.
Inconsistent Annotations (2) Perez-Iratxeta, C. and M.A. Andrade Inconsistencies over time in 5% of NetAffx probe-to-gene annotations. BMC Bioinformatics. 6, 183. –5% of probesets have gene identifiers that change over the two year time span covered by this analysis An inconsistently annotated probeset
Inconsistent Annotations (3) How do we deal with this? –Always note annotation version used in analysis especially when it is for publication –Report probeset name as well as gene symbol –Remember that re-analysis with later annotations may yield different results –Keep your annotation files up to date
Old chips, new data Expression microarrays are designed based the best available model of the genome of interest The model for the HG-U133 microarrays was a human genome assembly that was only 25% complete! The human assembly is >99% complete now
Old chips, new data (2) How do we deal with this? –A number of groups provide re-mappings of probes to probesets based upon the latest data available, for example: Dai M, et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005;33:e175
Multiple Testing Corrections A single expression microarray experiment actually consist of hundreds of thousands of simultaneous parallel experiment This means you can test many hypotheses simultaneously This is not free: the significance of any given result is decreases as a function of the number of hypotheses tested
Multiple Testing Corrections (2) How do we deal with this? –Limit the number of hypothesis you are testing instead of just ‘fishing’ in the whole data set. –Do this by selecting a set of candidate genes ahead of time based on your knowledge of the biology of the system.
Multiple Testing (3) Sandrine Dudoit, Juliet Popper Shaffer and Jennifer C. Boldrick Multiple Hypothesis Testing in Microarray Experiments Statistical Science 2003, Vol. 18, No. 1, 71–103 –“The biological question of differential expression can be restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and the responses” Talk to a statistician if you have doubts
Not everything is in the array Probesets are designed with a bias towards the 3’ end of the gene. they won’t distinct splice variants won’t pick up alternative 3’ endings
Not everything is in the array (2) What can we do about this? –You should be aware of this, but not much can be done. –Use other technologies to complement your microarray results (PCR, sequencing)
What are you measuring? Remember that you are detecting the average mRNA over a population of cells. Is your sample homogenous? If it’s not homogenous then what are you measuring? How many types of cells in what state? Time series of differentiating cells are particularly problematic.
Inhomogenous Samples? Many sources of inhomogeneity –Source organism gender –Cell cycle –Tissue source –Diet Some can be eliminated All should be documented where possible
Chips don’t detect protein Central assumption of microarray analysis: The level of mRNA is positively correlated with protein expression levels. –Higher mRNA levels mean higher protein expression, lower mRNA means lower protein expression Other factors: –Protein degradation, mRNA degradation, polyadenylation, codon preference, translation rates,….
Conclusion This is a general discussion of issues, doesn’t cover all pitfalls. Please contact if you have any comments, corrections or See associated bibliography for references from this presentation and further reading. Thanks for your attention!