Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3 From Images to Data

Similar presentations


Presentation on theme: "Lecture 3 From Images to Data"— Presentation transcript:

1 Lecture 3 From Images to Data
A lot of this is not so relevant NOW, but I think its good basic knowledge.

2 Recap of: How Microarrays are used to measure gene expression
Basic idea: measure the activity level of a gene (its expression level), in a particular cell at a particular time, by measuring the concentration of that gene’s mRNA transcript in the cell’s total RNA. Immobilize DNA probes (oligos, cDNA) onto glass Hybridize labelled target mRNA (in reality cDNA equivalent) with probes on glass, Measure how much binds to each probe (i.e. forms ds DNA). (In two-channel arrays, equal amounts of two differently labelled target cDNAs are hybridized to the probes.) Recall:Dyes are chosen to have different peak emission wavelengths, Cy3’s is 532 nm and Cy5’s is 635 nm.

3 What is measured in Microarrays?
We measure the amount of labeled target cDNA which is bound to the immobilized probe by exciting the labeling molecules (the dye) with a laser, and collecting and counting the photons emitted. In practice the entire glass slide or chip is scanned (excitation emission counting of emitted photons), and the result is a digital image.This then needs to be processed to locate the probes in the image and assign intensity measurements to each of them. There can be from hundreds to millions of different probes on each slide/chip.

4 The AFFY Chip

5 Affymetrix GeneChips®
Probes = 25 bp sequences Probe Sets = set of probes corresponding to a particular gene or EST. In the past there has been 20 probes/probe set on human chips, 16 on mouse, while there are 11 on Human GeneChips® HG-U133A. Most genes or ESTs contain one probe set, but quite a few have > 1.

6

7

8

9 DATA AND NOTATION PMijg, MMijg: Intensity for perfect match and mismatch probe in cell j for gene g in chip I i=1…n: From one to hundreds of chips J=1…J: from probe pairs g=1…G: from probe sets Compute SIGNAL/ Expression measure

10 DATA AND NOTATION PROBE SET 1 PROBE SET 2 Probe Cell Probe Pair

11 Expression Value Calculation (Signal)
The signal represents the amount of transcript in solution Signal is calculated as follows (in brief): - Cell intensities are preprocessed for global background - An ideal mismatch value is calculated and subtracted to adjust PM intensity - The adjusted PM intensities are log transformed to stabilize the variance - The Tukey’s biweight estimator is used to provide a robust mean of the signal - Signal is output as the antilog of the mean signal value - Finally the signal is scaled to generated a normalized data

12 DETECTION ALGORITHM BACKGROUND: Average of the lowest 2% of the intensities subtracted.

13 Algorithm First calculate R: ability to detect intended target for each probe R = (PM-MM)/(PM+MM) R near 1 means PM>>MM R near or below 0 means PM <= MM Define: t (default 0.015) as cutoff for R to be “present” for each probe pair

14 Algorithm contd… Calculate (R-t) for each Probe. Rank Probes according to their (R-t) Values Apply Wilcoxin’s Sign test (non-parametric) to generate the detection p-value.

15 Wilcoxon Signed Rank Test
To test if the median of a distribution q >,<, ≠, q0. Non-parametric equivalent of one sample mean problem. Model: yi = q + ei Procedure for greater than 0 alternative. Subtract q0 from the yi as zi=yi-q0 Calculate absolute values |zi| and define yi = 1 if zi is positive and 0 otherwise Rank the absolute values, Ri Test Statistic, sum of positive ranks, S= Find the corresponding p-value P(S > s) and reject if p-value is small.

16 Calculating p-values Logic: Lets find the distribution of the Test Statistic. If there are n observations the total number of possible configurations for the ranks is 2n For n=8, there are 256 possible outcomes All positive, S=1+2+…+8 =36 One negative: ( 8 options) with S=35,…,28 Two negative: (28 options) S= 33,…,1 And so on All we need are the extreme outcomes and see how extreme our test statistic is.

17 Discrimination Score [R]
80 PM MM 10 100 Increasing Tau: reduces false positives but also reduces the number of present calls 1 R t -0.2 MM Intensity/probe pair

18 Detection Call Detection Call is based on p-value cut offs: Alpha1 and Alpha2 provide boundaries for P,M,A calls Default: a1=0.04, a2=0.06 p<a1: P, p>a2: A, intermediate: M a1 a2 P M A 0.04 0.06 1.00 0.00

19 Example: s=35 P(S>35)= 1/256=.003 PRESENT PM MM R Zi= R-.15 |Zi|
Rank Pos or not 283.3 .992 0.842 5.5 1 -.02 -.170 .170 239.0 .842 286.0 .991 .841 4 190.8 .994 .844 8 6314.0 .792 .642 2 265.0 .990 .840 3 218.0 .993 .843 7 s=35 P(S>35)= 1/256=.003 PRESENT

20 SIGNAL Calculated using One –step Tukey’s Biweight estimate: robust weighted mean, insensitive to outliers One STEP Tukey’s Biweight Algorithm Let data be xi, let m represent the median of the data. Calculate: Median Absolute Deviation (MAD)= Med |x-m| Ui = (xi –m)/(cMAD+e) Here c is the tuning constant (set at 5), e=.0001 (so that we don’t have division by zero)

21 Tukey Bi-weight Weights are calculated as: Tukey-Biweight is:

22 Comments on Biweight Generally used for multiple interations, but here we are using just one iteration of it. Supposedly very robust as an estimator.

23 Signal Calculation Signal= Tukey Biweight{log(PMj-IM)j}
IM= Idealized mismatch which is never greater than PM If MM<PM them IM=MM If MM>PM, then use IM Calculate SB (Specific Background) SB = [ Tbi( log2 (PM) – log2 (MM)) ]

24 Signal Calculation What is the Idealized Mismatch (IM)?
According to Affymetrix the reason for including the MM probe is to provide a value that comprises of most of the cross-hybridizations and stray signals affecting the PM probe. It does contain a portion of the true signal. If MM is less than PM then it can be directly used. If not we calculate the IM. To do so, first calculate the Specific Background (SB) for each probe pair in a probe set:

25 Specific Background Calculations
Calculate y=log2 (PM/MM) Find Median(y) = m Calculate MAD=Median|y-m| Define u= (y-m)/(c*MAD+e) with c=5 and e=.0001 Define w= (1-u2)2 if |u| ≤1, 0 otherwise Tb(y) = Syiwi/ Swi

26 Example: Specific Background calculations

27 Signal Calculation First we need to define Idealized Mismatch
By default t=.03 v=10

28 Signal Calculation Contd
Vij=Max(PMij-IMij, d) The d is a small positive constant. Signal=Tbi(log2(Vij)) Keep in mind here we are SUBTRACTING IM from PM and not taking a ratio as we did for SB.

29 Example for Signal Calculation
Signal= , P

30 Program for Signal Calculation for PM and MM data.
Have the data in a csv file called signal1.csv setwd("/myRfolder") hwdata<-read.table(“signal1.csv",header=TRUE,sep=",",na.strings=" ") #SB calculation p1<-hwdata$PM m1<-hwdata$MM y<-log2(p1/m1) md<-median(y) z<-abs(y-md) mz<-median(z) u<-(y-md)/(5*mz+.0001) w<-ifelse(abs(u)<=1,(1-u^2)^2,0) sb=sum(y*w)/sum(w)  sb

31 #signal calculation begins
IM<-ifelse(m1>p1,p1/(2^sb),m1) i1<-IM ys<-log2(p1-i1) ms<-median(ys) zs<-abs(ys-ms) ss<-median(zs) us<-(ys-ms)/(5*ss+.0001) ws<-ifelse(abs(us)<=1,(1-us^2)^2,0) tbs<-sum(ws*ys)/sum(ws) sgnl<-2^tbs

32 AFFY: SINGLE CHIP EXPRESSION

33 AFFY: TREATMENT COMPARISON: TWO CHIPS


Download ppt "Lecture 3 From Images to Data"

Similar presentations


Ads by Google