Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.

Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment

Outline Description of high-density oligonucleotide expression array data Derivation of a model for gene expression estimation Application of the model for data quality assessment

Gene Expression Analysis Central Dogma: DNA -> mRNA -> Protein By comparing the abundance of mRNA in different cells we can deduce the genes associated with cell condition. Oligonucleotide arrays enable quantitative, highly parallel measurements of gene expression.

Probe Selection Probes are 25-mer selected from target sequence. 5-20K target fragments are interrogated by probe sets of 11-20 probes.

Data preparation RNA samples are prepared, labeled, and hybridized with arrays. Arrays are scanned and the resulting image analyzed to produce an intensity value for each probe cell indicating how much hybridization occurred. Of interest is to find a way to combine probe intensities for a given gene to produce an index of expression – an indicator of mRNA abundance.

Oligonucleotide Arrays 18µm 10 6 -10 7 copies of a specific oligonucleotide probe per feature Image of Hybridized Probe Array Image of Hybridized Probe Array >450,000 different probes Single stranded, labeled RNA target Oligonucleotide probe * * * * *1.28cm GeneChip Probe Array Hybridized Probe Cell Compliments of D. Gerhold

Outline Description of high-density oligonucleotide expression array data Derivation of a model for gene expression analysis Application of the model for data quality assessment

Probe Intensity vs conc ex 1

The probe intensity model On a probe set by probe set basis, the log of the probe intensities, Y jk say, are modelled as the sum of a probe effect and a chip effect: Y jk =  j +  k +  jk To make this model identifiable, we constrain the sum of the probe effects to be zero. The  j ‘s can be interpreted as a relative non- specific binding effects for probes. The parameters  k provide an index of expression for each chip.

Example - detecting differential expression Fit the model to 24 chips with common source of RNA + 12 RNA spiked in at 2-fold pM concentrations between the two groups of 12.

MVA A vs B

Index vs Conc

Robust procedures Robust procedures perform well under a range of possible models and greatly facilitates the detection of anomalous data points. Why robust? Image artifacts Bad probes Bad chips Quality assessment

Robust fit example A

Robust fit example B

Residuals from fit

Outline Description of high-density oligonucleotide expression array data Derivation of a model for gene expression estimation Application of the model to data quality assessment

Chip manufacturer QA protocols Starting RNA QA – look at gel patterns and RNA quantification. Post hybridization QA – image examination, chip intensity parameters, expressions for control genes of various sorts, house keeping genes, percent present calls.

Goal: measuring expression data quality Manufacturer QA guidelines emphasize maintenance of data comparability across chips in analysis set. We seek assessments that measure data quality as it pertains to expression values. In particular, would like to provide quantitative measures that can help making decisions – Accept, Reject or Adjust.

Model components – role in QA Probe effects -can only be compared across fitting sets. -Chip effects - expression indices -can examine distribution of relative expressions across arrays. Residuals – more than 200K per chip. -view as chip image, summarize spatial patterns. -summarize in batches by chip. -combine to estimate SE of expression indices and these pooled and summarized by chip.

Robust fit by IRLS for each probe set Starting with robust fit, at each iteration: S = mad(r jk ) – robust estimate of scale or  u jk = r jk /S – standardized residuals w jk =  (|u jk |) – weights to reduce the effect of deviant points on next fit The SE of the final expression index is given by SE(a k ) = S/  (  j w jk ) Unscaled SE(a k ) = 1/  (  j w jk )

 function

Images of weights For 24 chips from Affymetrix, look at patterns of weights on chip real estate.

Images of weights

Images sign of residuals For 24 chips from Affymetrix, look at patterns of sign of residuals on chip real estate.

Images of sign of residuals

Residual summaries

MVA exp index

Future developments Develop quality assessment measures for routine use in large throughput environment. Assess relationships among various QA measures. Develop diagnostics to assign causes to departures from quality standards. Other applications - Identify non-performing or cross-hybridizing probes, qualify probe sets.

References 1.New Statistical Algorithms for Monitoring Gene Expression on GeneChip® Probe Arrays, Affymetrix technical report. 2. Array Design for the GeneChip® Human Genome U133 Set, Affymetrix technical note. 3.Discussion on Background, Ben Bolstad. 4.Bolstad BM, et. al. (2003), A comparison of normalization methods for high density oligonucleotide array data basedon variance and bias.Bioinformatics. 2003 Jan 22;19(2):185-193. 5.Irizarry, R. et.al (2003) Summaries of Affymetrix GeneChip probe level data, Nucleic Acids Research, 2003, Vol. 31, No. 4 e15 6.Irizarry, R. et. al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, in press. 7.http://array.mc.vanderbilt.edu/Pages/VMSR_Info/Sample_submission.htm

Background correction Background correction - to correct for differential background due to experimental processing effects and to put the estimated differential expression on a proper scale. Normalization – to correct for systematic differences in the distribution of probe intensities

Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.

Similar presentations

Presentation on theme: "Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.

Similar presentations

Presentation on theme: "Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment."— Presentation transcript:

Similar presentations

About project

Feedback