Presentation on theme: "Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center."— Presentation transcript:
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center
Wide use of short oligonucleotide microarrays Gene expression assay Genotyping (SNP detection) Comparative genome hybridization DNA methylation detection Gene structure discovery Genome reseqeuncing
Protocol of a microarray experiment
Affymetrix GeneChip ® Probe Arrays 24µm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe Image of Hybridized Probe Array Over 250,000 different probes complementary to genetic information of interest Single stranded, fluorescently labeled DNA target Oligonucleotide probe * * * * * 1.28cm GeneChip Probe Array Hybridized Probe Cell *
Double helix on microarrays The probe is a 25-mer DNA oligo: ATCAGCATACGAGAGAATGATGGAT ||||||||||||||||||||||||| AAUAGUCGUAUGCUCUCUUACUACCUAGC cRNA fragment from solution ATCAGCATACGACAGAATGATGGAT Average distance between probes is 80Å
Technical factors affecting gene expression measurements Interaction between base pairs (stacking) Interaction with microarray surface Interaction with unintended targets (cross hybridization) Kinetic process (equilibration & washing) Physical properties of RNA sample Degradation (missing 5’ ends) Alternative splicing (missing exons) Secondary structure (RNA hairpins & loops) Biotinylation
Technical factors affecting gene expression measurements Interaction between base pairs (stacking) Nearest-neighbor model Interaction with microarray surface Positional dependant weights for stacking energies Interaction with unintended targets (cross hybridization) PDNN; mean field theory Kinetic process (equilibration & washing) Langmuir and Sips model Physical properties of RNA sample Degradation (missing 5’ ends) Alternative splicing (missing exons) Secondary structure (RNA hairpins & loops) Biotinylation
Assumption: two types of binding 1.Gene-specific binding: 25 n.t. exact complementary sequences (binding with the intended target). 2.Non-specific binding: Many (>5) mismatches or short stretches (binding with unintended targets).
Gene-specific binding energy: Non-specific binding energy: Weighted sum base-pair stacking energies: Positional Dependant Nearest-Neighbor (PDNN) model of molecular interactions
PDNN model of probe signals Minimization of T Energy parameters B, N*, N j N*, B are the same on a microarray; N j is the same in a probe set. Probe Signal: Fitness: Constraints: Software available at:
Fitting PDNN model ln (signal) Probe index
Energy parameters in PDNN model Weight factors Stacking energy terms
Baseline of non-specific binding Non-specific binding energy
Effects of Mismatches A Mismatch disrupts the double helix formation. Energetically, it is unfavorable for binding. It depends on the context of DNA sequences.
Effect of mismatch at base13 depends on the nearest-neighbors A A C G T
Sequence dependence of free energy cost of single mismatch in DNA duplexes
Pattern of cross hybridization: MM and PM probes bind to different molecules Var(ln PM) Var(ln MM) Data source: Affymetrix HG-U133 spike-in data set. Large variation indicates resonse to spike-ins. Number of arrays: 42. Number of probes on an array: ~ 0.5 million.
Microarray surface effects DNA and RNA are negatively charged. Glass surface also charged Repulsion
Pattern of cross hybridization: bias towards the 5’ end 5’ end
Sense and antisense Upon binding, sense and antisense probes form the same double helix structure. The same interactions should lead to the same binding energy. The observed data contradict with this prediction.
Contrast of sense and antisense probe signals Ŷ = Nt – 0.05 Na Ng R 2 = 0.67; Sample size=875. Ln (sense probe signal / antisense probe signal) Model fitted
Summary Binding on array surface: Probe binding free energy can be approximated by a weighted sum of base-pair stacking energies, with the probe ends having less contributions. Mismatches: Mismatches disrupt hybridization, especially in cross hybridization. The effects of mismatches depend on sequences. The surface also an effect. Surface effects: Cross hybridization is biased towards the 5’ end of the probes. Repulsion of surface depends on nucleotides.