Download presentation

Presentation is loading. Please wait.

Published byMelinda Curtis Modified over 4 years ago

1
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06

2
SAM Significance Analysis of Microarrays is a popular method of differential expression analysis, freely available from www-stat.stanford.edu/~tibs It uses permutation based tests, and allows for some common models including paired and unpaired t-tests, one-way ANOVA, and some simple block designs. It also has some other analyses. The data must be normalized in advance. No missing data are allowed. SAM includes a method to "fill in" (impute) missing values, assuming they are missing at random and sparse.

3
SAM SAM can be run from Excel through an interface that sends data to and from R. samr is the package running on R. I will demonstrate the Excel interface, which is the popular method.

4
SAM Like Limma, SAM starts by computing a test statistic for each gene. SAM uses a regularized denominator: i.e. the test statistic is based on a paired or two- sample t-test, or an ANOVA F-test, but a small constant computed from all the data replaces the within treatment estimate of variance for each gene. The variance of a gene is supposed to be the same for all treatments.

5
SAM Like Limma, SAM starts by computing a test statistic for each gene. SAM uses a regularized denominator: i.e. the test statistic is based on a paired or two-sample t-test, or an ANOVA F-test, but a small constant computed from all the data replaces the within treatment estimate of variance for each gene. The variance of a gene is supposed to be the same for all treatments. Usual Moderated 2-sample paired ANOVA

6
s0s0 s 0 is computed from the values of s i computed from all the genes. An ad hoc procedure based on simulations is used.

7
Selecting the Significant Genes SAM uses a quantile-quantile plot of the data versus the expected quantiles of the null distribution. Observations off the identity line are considered detections. The FDR is estimated based on the percentage of the randomization values that would have been "detected".

8
Selecting the Significant Genes SAM uses a quantile-quantile plot of the data versus the expected quantiles of the null distribution. Observations off the identity line are considered detections. The FDR is estimated based on the percentage of the randomization values that would have been "detected".

9
Example for Random Normals We sort the data into y(1)<y(2)...y(n) y(i) has a sampling distribution with mean: nz(i) the ith normal score. We plot y(i) versus nz(i). If the data are normally distributed, then the data should lie on the line y=x. (Note that in the case of N( 2 ) data, we often plot against the normal scores for N(0,1) - then the data should lie on the line y= x

10
Example for Random Normals We sort the data into y(1)<y(2)...y(n) y(i) has a sampling distribution with mean: nz(i) the ith normal score. We plot y(i) versus nz(i). If the data are normally distributed, then the data should lie on the line y=x. (Note that in the case of N( 2 ) data, we often plot against the normal scores for N(0,1) - then the data should lie on the line y= x

11
Selecting the Significant Genes SAM computes a test statistic D i for the i th gene. Then, the sample labels are permuted. For each permutation: D(1)<D(2)...<D(G) saved. These are averaged over the permutations to obtain the X-axis of the plot (call these the DN scores). As well, all the distances dist(i)=|D(i)-DN(i)| are recorded. The median number of values such that dist(i)>K is considered to be the estimate of the number of expected false discoveries at distance K.

12
Selecting the Significant Genes SAM computes a test statistic D i for the i th gene. The user selects a distance. SAM computes the number of genes detected at that distance R, and estimates the expected number of false discoveries at that distance V to obtain an estimate of the FDR

13
Example for Random Normals If this is the plot for the data, the points indicated are the discoveries. For each permutation data set, we also compute the number of discoveries, and then obtain an estimate of V.

14
Running SAM 1.Write normalized data to a file compatible with Excel (tab or comma delimited). 2.Start Excel. First 2 columns should be gene ids. First row are numbers 1... T giving treatments. 3.Select rows and columns of spreadsheet that you want to analyze. 4.Click on SAM on GUI. Select type of analysis, random seed and number of permutations.

15
Running SAM 5.The SAM qqplot comes up. Select a distance or use slider to assess FDR. 6.Print genelist. The contrasts are:

16
Limma Vs SAM Limma model-based can handle small numbers of replicates handles ANOVA-type problems including 1 random effect handles missing data produces a genelist and CIs can determine significance of any linear contrast hard to use SAM nonparametric cannot handle small numbers of replicates handles limited ANOVA -type problems and survival "imputes" missing data produces only a genelist only determines significance of deviation from mean easy to use

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google