Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding associated genes in large collections of microarrays

Similar presentations


Presentation on theme: "Finding associated genes in large collections of microarrays"— Presentation transcript:

1 Finding associated genes in large collections of microarrays

2 Produce hypothesis of functional relations between genes
Positive correlation: Co-regulated genes or positive modulator Negative correlation: Co-regulated genes or inhibitor. Used to derive networks of gene interactions.

3 4 simple ways of finding association
Pearson correlation coefficient. Spearman’s rank correlation coefficient. Probabilistic approach (Present/Absent). Mutual information (Present/Absent)

4 Pearson correlation coefficient
Varies between -1 and 1: Between 0.6 and 1: strong positive correlation. Between -0.6 and -1: strong negative correlation. -1 is perfect negative correlation 1 is perfect positive correlation Assumes linear relation between variables.

5 Pearson correlation coefficient
Step 1: Prepare data. Step 2: Compute Pearson coefficient between pairs of probes of interest. Step 3: Assess significance. Step 4: Multiple testing correction.

6 Pearson correlation coefficient
Step 1: Prepare data: Chips are normalized with MAS 5.0 or other procedure. Scale probes in each chip dividing by mean. Center and standardize each probe distribution: z-scores.

7 Pearson correlation coefficient
Step 2: Compute Pearson coefficient between pairs of probes: when z-scores are pre-computed: n: number of chips

8 Pearson correlation coefficient
Step 3: Assess significance: Randomize if possible. Good for less than 20 chips or Use t-Student distribution with n-2 degrees of freedom: ρ: correlation coefficient n: number of chips

9 Pearson correlation coefficient
Step 4: Multiple testing correction

10 Spearman’s rank correlation coefficient
Non parametric method: Less power but more robust. Does not assume normal distribution. Also varies between -1 and 1

11 Spearman’s rank correlation coefficient
Step 1: Prepare data. Step 2: Compute Spearman’s rank correlation coefficient between probe of interest and the rest. Step 3: Assess significance. Step 4: Multiple test correction.

12 Spearman’s rank correlation coefficient
Step 1: Prepare data: Same as Pearson. Order the values of the probes by increasing hybridization values. Construct the rank vectors.

13 Spearman’s rank correlation coefficient
Step 2: Compute coefficient between probe sets of interest: d: differences between the ranks of the two probes n: number of chips

14 Spearman’s rank correlation coefficient
Step 3: Assess significance: Same as Pearson. Randomize if possible. Less than 20 chips or Use t-Student distribution with n -2 degrees of freedom: ρ: correlation coefficient n: number of chips

15 Spearman’s rank correlation coefficient
Step 4: Multiple testing correction.

16 Binary probabilistic approach based on Present/Absent
Approach adapted from: “Computational methods for the identification of differential and coordinated gene expression.” Claverie JM Hum Mol Genet. 1999;8(10): Use MAS 5.0 calls of Present-Marginal-Absent for each probe. Good for heterogeneous microarray collections.

17 Binary approach based on Present/Absent
Step 1: Prepare data. Step 2: Compute p-value of # of observed matches. Step 3: Multiple test correction.

18 Binary approach based on Present/Absent
Step 1: Obtain P/M/A calls for probes: Each call is associated to a p-value. Filter can be applied. Codify P/M/A calls as binary vectors: Encode P as 1 and M/A as 0

19 Binary approach based on Present/Absent
Step 2: Compute p-value of # of matches probe x: probe y: probe z: Find improbably high number of matches (or miss-matches). probe x & y: 11 out of 12 matches probe x & z: 11 out of 12 miss-matches

20 Binary approach based on Present/Absent
Step 2: Compute probability for observing by chance x matches or more from the binomial distribution B(n,p). First, probability of a match. : fraction of 1s (Present) probe x. : fraction of 1s (Present) probe y.

21 Binary approach based on Present/Absent
Step 2: Compute probability for observing by chance x matches or more from the binomial distribution: For n large one can use the normal distribution: n: number of chips.

22 Binary approach based on Present/Absent
Step 3: Multiple test correction.

23 Mutual information based on Present/Absent
Step 1: Prepare data. Step 2: Compute MI value for pairs of probes. Step 3: Use of a threshold for MI

24 Mutual information based on Present/Absent
Step 1: Obtain P/M/A calls for probes: Each call is associated to a p-value. Filter can be applied. Codify P/M/A calls as binary vectors: Encode P/M as 1 and A as 0 OR Encode P as 1 and M/A as 0

25 Mutual information based on Present/Absent
Step 2: Compute MI value for probes X and Y: p(.) frequencies of observed Ps and As p(x,y) frequencies of the joint distribution

26 Mutual information based on Present/Absent
Step 3: Use a threshold: probes X and Y are correlated if: MI(X, Y) >1/n * log(1/P) n: number of chips. P: 1/p^2 (with p number of probes). “A simple method for reverse engineering causal networks” M. Andrecut and S. A. Kauffman J. Phys. A: Math. Gen. 39 No 46.

27 Try Pearson method in Stembase!
Implemented by Reatha Sandie


Download ppt "Finding associated genes in large collections of microarrays"

Similar presentations


Ads by Google