Felix Naef & Marcelo Magnasco, GL meeting, Nov Outline Background subtraction Probeset statistics Excursions into GeneChip data analysis
Background estimation estimate both mean B and fluctuations needed in low-intensity regime includes light reflection from substrate, photodetector dark current, some cross- hybridization (i.e. small residues) by the CLT, background is expected to be a Gaussian variable
idea: B is insensitive to MM and visible at low intensity select probes such that |PM-MM| < (locally?) use =50 (new) or 100 (old settings) P(PM) or P(MM) is convolution of Gaussian and step function “+” = 0 B B Real P(PM)
example: ) dependence on
trick for dealing with negative values
PM vs. MM distribution MM>PM make a histogram in this region make a histogram in this region zoom
PM vs. MM histogram
MM>PM across different chips MM>PM not concentrated at low intensities: 27% of probe pairs with MM>PM are in the top quartile
probe pairs trajectories (~80 chips) take all (PM, MM) for a given probe set center of mass (x,y) ellipsoid of inertia > and histogram the cm’s color code acc. to s = / (min(x, y ~ noise detrending
all probe sets blue : large s green : mid red : small
probes with ‘well’ defined trajectories (eccentricity > 3) ~1/3 of probes blue : large green : mid red : small
PM within a probe set Are the brightness of the probes reasonably uniform? Or do different probes have very different hybridization efficiencies?
So what can possibly be happening? sequence dependent hybridization efficiencies are kinetic effects important? cross-hybridization beyond what is detectable by MM probes this is hard to assess without sequence info sequence dependent fabrication efficiencies? variable probe densities
Composite scores What have we learned from previous slides? MM are not consistently behaving as expected -What about not using them ? The probe set intensities vary over decades -difficult to estimate absolute intensities using ‘averages’ (alternative: Li and Wong) - we focus on ratio scores
Outline of algorithm 1.estimate background (mean and std) 2.discard noisy and saturated probes use either only PM or PM-MM as raw intensities 3.average the remaining log-ratios in an outlier robust way (robust regression to intercept), SE 4.normalize by centering (event. local) log- ratio distribution