Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the 11-25 PM.

Similar presentations


Presentation on theme: "Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the 11-25 PM."— Presentation transcript:

1 Lecture Topic 5 Pre-processing AFFY data

2 Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the 11-25 PM and MM intensities –Critical for later analysis. Avoiding GIGO –VERY recent, but has made significant progress

3 Difficulties Large variability Few measurements (11-25) at most MM is very complex, it is signal plus background Signal has to be SCALED Probe-level effects

4 Different Methods MAS 4 Affymetrix 1996 MAS 5 Affymetrix 2002 Model Based Expression Index (MBEI) Li and Wong 2001 Robust Multichip Analysis (RMA) 2002 GC-RMA 2004

5 MAS 4 A- probe pairs selected

6 Avg Diff Calculated using differences between MM and PM of every probe pair and averaging over the probe pair –Excluded OUTLIER pairs if PM-MM > 3 SD –Was NOT a robust average –NOT log-transformed –COULD be negative (about 1/3 of the times)

7 MAS 5 Signal=TukeyBiweight{log 2 (PM j -IM j ) Discussed this earlier. Requires calculating IM Adjusted PM-MM are log transformed and robust for outlying observations using Tukey Biweight.

8 Robust Multichip Analysis ONLY uses PM and ignores MM SACRIFICES Accuracy but major gains in PRECISION Basic Steps: –1. Calculate chip background (*BG) and subtract from PM –2. Carry out intensity dependent normalization for PM-*BG Lowess Quantile Normalization (Discussed before) –Normalized PM-*BG are log transformed –Robust multichip analysis of all probes in the set and using Tukey median polishing procedure. Signal is antilog of result.

9 RMA- Step 1: Background Correction Irrizary et al(2003) Looks at finding the conditional expectation of the TRUE signal given the observed signal (which is assumed to be the true signal plus noise) E(s i | s i +b i ) Here, s i assumed to follow Exponential distribution with parameter . B i assumed to follow N(  e,  2 e ) Estimate  e and  e as the mean and standard deviation of empty spots

10 RMA- BG Corrected Value

11 RMA-Normalization Use the background corrected intensities B(PM) to carry out normalization –Lowess (for Spatial effects) –Quantile Normalization (to allow comparability amongst replicate slides) –Normalized B(PM) are log transformed

12 RMA summarization Use MEDIAN POLISH to fit a linear model Given a MATRIX of data: –Data= overall effects+row effects + column effects + residual Find row and column effects by subtracting the medians of row and column successively till all the medians are less than some epsilon Gives estimated row, column and overall effect when done

13 Median Polish of RMA For each probe set we have a matrix (probes in rows and arrays in columns) We assume: Signal=probe affinity effect + logscale for expression + error Also assume the sum of probe affinities is 0 Use MEDIAN polish to estimate the expression level in each array

14

15 GC-RMA the Basic Idea of Background Uses MM and PM in a more statistical framework. –PM = O PM + N PM + S1 –MM = O MM + N MM +  S O: represents optical noise, N represents NSB noise and S is a quantity proportional to RNA expression (the quantity of interest). The parameter 0 <  < 1 accounts for the fact that for some probe-pairs the MM detects signal.

16 Distributional Assumptions Assume O follows a log-normal distribution log ( N PM ) and log ( N MM ) follow a bivariate-normal distribution with means of µ PM and µ MM the variance var [ log ( N PM ]= var [ log ( N MM )]=  2 and correlation  constant across probes. µ PM h (  PM ) and µ MM h (  MM ), with h a smooth (almost linear) function and the  defined next Because we do not expect NSB to be affected by optics we assume O and N are independent The parameters µ PM, µ MM, , and  2 can be estimated from the large amount of data. A background adjustment procedure can then be formalized as the statistical problem of predicting S given that we observed PM and MM and assuming we know h, ,  2 and 

17 GC-RMA Naef and Magnesco (2003) defined where k = 1,…,25 indicates the position along the probe, j indicates the base letter, b k represents the base at position k, I bk = j is an indicator function that is 1 when the k-th base is of type j and 0 otherwise, µ j ; k represents the contribution to affinity of base j in position k.

18 Assumptions and Notations needed for applying GC-RMA  is 0. (Although we know  > 0). 2. O is an array-dependent constant. Notations: Let m: minimum value allowed for S, (generally=0) and h   are plug-in estimators

19 MLE estimates Under the above described assumptions, the maximum likelihood estimate (MLE) of S=

20


Download ppt "Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the 11-25 PM."

Similar presentations


Ads by Google