Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding associated genes in large collections of microarrays.

Similar presentations


Presentation on theme: "Finding associated genes in large collections of microarrays."— Presentation transcript:

1 Finding associated genes in large collections of microarrays

2 Produce hypothesis of functional relations between genes Positive correlation: Co-regulated genes or positive modulator Negative correlation: Co-regulated genes or inhibitor. Used to derive networks of gene interactions.

3 4 simple ways of finding association Pearson correlation coefficient. Spearman’s rank correlation coefficient. Probabilistic approach (Present/Absent). Mutual information (Present/Absent)

4 Pearson correlation coefficient Varies between -1 and 1: Between 0.6 and 1: strong positive correlation. Between -0.6 and -1: strong negative correlation. -1 is perfect negative correlation 1 is perfect positive correlation Assumes linear relation between variables.

5 Pearson correlation coefficient Step 1: Prepare data. Step 2: Compute Pearson coefficient between pairs of probes of interest. Step 3: Assess significance. Step 4: Multiple testing correction.

6 Pearson correlation coefficient Step 1: Prepare data: –Chips are normalized with MAS 5.0 or other procedure. –Scale probes in each chip dividing by mean. –Center and standardize each probe distribution: z-scores.

7 Pearson correlation coefficient Step 2: Compute Pearson coefficient between pairs of probes: when z-scores are pre-computed: n: number of chips

8 Pearson correlation coefficient Step 3: Assess significance: –Randomize if possible. Good for less than 20 chips or –Use t-Student distribution with n-2 degrees of freedom: ρ: correlation coefficient n: number of chips

9 Pearson correlation coefficient Step 4: Multiple testing correction

10 Spearman’s rank correlation coefficient Non parametric method: –Less power but more robust. –Does not assume normal distribution. Also varies between -1 and 1

11 Spearman’s rank correlation coefficient Step 1: Prepare data. Step 2: Compute Spearman’s rank correlation coefficient between probe of interest and the rest. Step 3: Assess significance. Step 4: Multiple test correction.

12 Spearman’s rank correlation coefficient Step 1: Prepare data: –Same as Pearson. –Order the values of the probes by increasing hybridization values. –Construct the rank vectors.

13 Spearman’s rank correlation coefficient Step 2: Compute coefficient between probe sets of interest: d: differences between the ranks of the two probes n: number of chips

14 Spearman’s rank correlation coefficient Step 3: Assess significance: Same as Pearson. –Randomize if possible. Less than 20 chips or –Use t-Student distribution with n -2 degrees of freedom: ρ: correlation coefficient n: number of chips

15 Spearman’s rank correlation coefficient Step 4: Multiple testing correction.

16 Binary probabilistic approach based on Present/Absent Approach adapted from: “Computational methods for the identification of differential and coordinated gene expression.” Claverie JM Hum Mol Genet. 1999;8(10):1821-32 Use MAS 5.0 calls of Present-Marginal-Absent for each probe. Good for heterogeneous microarray collections.

17 Binary approach based on Present/Absent Step 1: Prepare data. Step 2: Compute p-value of # of observed matches. Step 3: Multiple test correction.

18 Binary approach based on Present/Absent Step 1: Obtain P/M/A calls for probes: –Each call is associated to a p-value. Filter can be applied. –Codify P/M/A calls as binary vectors: Encode P as 1 and M/A as 0

19 Binary approach based on Present/Absent Step 2: Compute p-value of # of matches probe x: 1 1 0 0 0 1 1 0 1 0 0 0 probe y: 1 1 0 0 0 0 1 0 1 0 0 0 probe z: 0 0 1 1 1 1 0 0 0 1 1 1 Find improbably high number of matches (or miss-matches). probe x & y: 11 out of 12 matches probe x & z: 11 out of 12 miss-matches

20 Binary approach based on Present/Absent Step 2: Compute probability for observing by chance x matches or more from the binomial distribution B(n,p). First, probability of a match. : fraction of 1s (Present) probe x. : fraction of 1s (Present) probe y.

21 Binary approach based on Present/Absent Step 2: Compute probability for observing by chance x matches or more from the binomial distribution: For n large one can use the normal distribution: n: number of chips.

22 Binary approach based on Present/Absent Step 3: Multiple test correction.

23 Mutual information based on Present/Absent Step 1: Prepare data. Step 2: Compute MI value for pairs of probes. Step 3: Use of a threshold for MI

24 Mutual information based on Present/Absent Step 1: Obtain P/M/A calls for probes: –Each call is associated to a p-value. Filter can be applied. –Codify P/M/A calls as binary vectors: Encode P/M as 1 and A as 0 OR Encode P as 1 and M/A as 0

25 Mutual information based on Present/Absent Step 2: Compute MI value for probes X and Y: p(.) frequencies of observed Ps and As p(x,y) frequencies of the joint distribution

26 Mutual information based on Present/Absent Step 3: Use a threshold: probes X and Y are correlated if: MI(X, Y) > 1/n * log(1/P) n: number of chips. P: 1/p^2 (with p number of probes). “A simple method for reverse engineering causal networks” M. Andrecut and S. A. Kauffman J. Phys. A: Math. Gen. 39 No 46.

27 Try Pearson method in Stembase! Implemented by Reatha Sandie


Download ppt "Finding associated genes in large collections of microarrays."

Similar presentations


Ads by Google