New Proposals for Multiple Test Procedures, Applied to Gene Expression Array Data Siegfried Kropf, Otto von Guericke University Magdeburg in cooperation.

Slides:



Advertisements
Similar presentations
Analysis of variance and statistical inference.
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Is it statistically significant?
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
Confidence Interval and Hypothesis Testing for:
PSY 307 – Statistics for the Behavioral Sciences
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Differentially expressed genes
Analysis of Variance: Inferences about 2 or More Means
Independent Samples and Paired Samples t-tests PSY440 June 24, 2008.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Final Review Session.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Incomplete Block Designs
Copyright © 2010 Pearson Education, Inc. Chapter 24 Comparing Means.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
5-3 Inference on the Means of Two Populations, Variances Unknown
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Inferential Statistics
Multivariate Tests Based on Pairwise Distance or Similarity Measures Siegfried Kropf Institute for Biometry and Medical Informatics Otto von Guericke University.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Leedy and Ormrod Ch. 11 Gray Ch. 14
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Chapter 12: Analysis of Variance
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Multiple testing in high- throughput biology Petter Mostad.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Intermediate Applied Statistics STAT 460
ANOVA Greg C Elvers.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Analysis of variance Petter Mostad Comparing more than two groups Up to now we have studied situations with –One observation per object One.
ANOVA (Analysis of Variance) by Aziza Munir
Multiple Testing in Microarray Data Analysis Mi-Ok Kim.
Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Simple Linear Regression ANOVA for regression (10.2)
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
ANOVA: Analysis of Variance.
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Linear Models One-Way ANOVA. 2 A researcher is interested in the effect of irrigation on fruit production by raspberry plants. The researcher has determined.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.
Multiple Comparisons with Gene Expression Arrays Using a Data Driven Ordering of Hypotheses Siegfried Kropf, Jürgen Läuter, Magdeburg, Germany Peter H.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Analysis of variance Tron Anders Moger
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Step 1: Specify a null hypothesis
Hypothesis testing using contrasts
Discrete Event Simulation - 4
I. Statistical Tests: Why do we use them? What do they involve?
Statistics II: An Overview of Statistics
Presentation transcript:

New Proposals for Multiple Test Procedures, Applied to Gene Expression Array Data Siegfried Kropf, Otto von Guericke University Magdeburg in cooperation with Jürgen Läuter, Magdeburg Peter H. Westfall, Lubbok, USA Markus Eszlinger, Knut Krohn, Leipzig IZBI-Workshop, November 11, 2002, Leipzig

Leipzig2 Otto von Guericke

Leipzig3 Example 1 6 quadrants, each 98 genes (double spotted): 588 genes + housekeeping genes Atlas Human Cancer 1.2 Array Example data – gene expression arrays

Leipzig4 applied to 6 patients with nodules in thyroid gland 3 hot, 3 cold, here not distinguished, + surrounding logarithmic transformation, double spots averaged correction with housekeeping gene at position i5a  distribution can hardly be checked with n = 6, standard deviation of the genes is not too different:

Leipzig5 Example 2 30 patients with nodules in thyroids 15 hot nodules, 15 cold nodules tissue samples of nodules and surrounding analysed with Affymetrix ® Gene Chips Signal log ratio nodule vs. surrounding from each patient for each of genes outlier catching by additional logistic transformation approximately multivariate normal distribution “similar” variances for all genes, expectation 0 if unaffected

Leipzig6 Why familywise error rate? Discussion: non-statistical assessment – unadjusted statistical assessment – false discovery rate – familywise error rate familywise error rate is rather high claim, growing with dimension of array (in contrast to false discovery rate) if possible, however, then highest degree of security for the positive results of this one trial trials mostly with small or moderate samples sizes, not enough to rule out effects in case of non-significance, therefore at least the positive results should be as sure as possible results for FWE could at least be given in addition to other versions

Leipzig7 Procedure with data-driven ordering of hypotheses Starting point: Two well known procedures for MCPs controlling the FWE Testing with a-priori ordered hypotheses (without  -adjustment) Bonferroni-Holm (data dependent order, with adjustment) In analysis of high-dimensional gene expression arrays both not applicable/optimal.  We are looking for a method with data dependent ordering of hypotheses but without  -adjustment.

Leipzig8 New proposal (Kropf, 2000; Kropf and Läuter, 2002) Consider one-sample situation first: data matrix from n iid p-dimensional normal data vectors Aim: test of the local hypotheses H i :  i = 0 at the strong FWE . Procedure I: sort variables for decreasing values of, in that order carry out the unadjusted one-sample t tests for the variables as long as significance is attained.

Leipzig9 Remark: In order to yield an efficient order of variables, the variances of the variables should be approximately equal because with we have. Thus, approximately equal variances are important for a high power of the procedure.

Leipzig10 That alone would, however, only ensure the multiple error rate under the global hypothesis (regardless of variances). Additionally, we have special criteria for ordering and special tests – the non-null variables do not confuse the behaviour of the null variables, only conservative influence. Proof that the procedure keeps the FWE (draft): The univariate t tests with the single variables are considered as special cases of the stabilized multivariate tests with scores z j = d´x (j). The weight vectors are

Leipzig11 Example I: Comparison nodules vs. surrounding (3 hot and 3 cold nodules together  one-sample test vs. 0) # locally sign. genes: 33 # sign. genes Westfall-Young: 0 # sign. genes Holm‘s proc.: 0 # sign. genes Procedure I: 10 Quadrant A only (98 genes, 2 spots aver., corrected with housek. genes) gene no. sum of squares unadjusted P-value

Leipzig12 Example II: 15 cold nodules vs. surrounding (one-sample problem) The present procedure stops already after the 2nd gene. The basic trend for sums of squares is present, but the procedure is sensitive to disturbances. It should be smoothed (see below, hybridisation with Bonferroni / Holm) · · · · · P valuegeneno. For comparison: without any adjustment: 1064 Bonferroni /Holm: 1 (gene 8104) Westfall / Young: 0

Leipzig13 Simulation experiments guided by example I: n = 6,...,33 cases, p = 98 variables, normally distributed, variance 1, pairwise correlation 0.5, expectation 0 for 88 var‘s, other 10 var‘s: sample size n Average # of significant genes in Monte Carlo replications Simulation experiments

Leipzig14 Extensions: Other test problems: –particularly comparison of two/more independent samples ordering by sums of squares, i.e., related to the variablewise total mean of all samples, then two-sample t tests or one-way ANOVA. Other subsets of variables (e.g., pairs of variables)  Kropf, Läuter (2002) „Distribution-free“ version possible

Leipzig15 Weighted procedure (Procedure II) In notation of the one-sample problem (Westfall, Kropf, Finos, 2002) Calculate the P-values p i (i = 1, …, p) for the usual unadjusted one-sample t test for each of the p variables. For each variable, determine the sums of squares values and the weights for fixed   0. Calculate the weighted P-values q i = p i / g i and order the variables for increasing values of them. Then the hypothesis H (j) for the jth ordered variable is rejected iff S i : ith ordered variable and all subsequent ones.

Leipzig16 How does this procedure fit to the others above? Procedure II utilises ideas from Bonferroni/Holm (fixed weights) as well as from Procedure II (data-driven through w ii ).  = 0, g i = w ii 0 = 1 : Then the procedure is identical to usual unweighted Bonferroni / Holm.   : According to Westfall and Krishen (2001), the influence of the weights totally rules out the P-values from Bonferroni- Holm, critical function converges to that of Procedure I. Intermediate values of  : both parts are present, „power- assump- tion“ of equal variances only important for part of Procedure I. In an application,  has to be fixed in advance!

Leipzig17 Example 2 again Cold nodules vs. surrounding unadjusted 1064, Westf./Y  Is the choice of genes stable? B/H Pr. I 

Leipzig18 Example 2, cont. hot nodules vs. surrounding  hot vs. cold nodules B/H Pr. I B/H Pr. I unadjusted 2597, Westf./Y. 93unadjusted 1290, Westf./Y 

Leipzig19 Simulation experiments with weighted procedure guided by example II p = variables, n = 4, 6, 8, 12, 16, 20, 30, 50, 100 number of significant genes 10, 100, 1000 pairwise correlation coefficient 0 or 0.5 heterogeneity of variances in 5 levels influence of pairwise correlation on optimal choice of  small, also number of significant genes not so important sample size is influential (and known in practice) heterogeneity of variances is important, too, but not known in practice; estimation through w ii ensures only weak FWE control.

Leipzig20 Summary A new technique for multiple testing with data-dependent ordering of hypotheses is proposed. It keeps the FWE in the strong sense for arbitrary multivariate normal data. In order to provide a high power, the variables should have approximately equal variances. The proposal is advantageous in very small samples of high-dimensional data. The method is sensitive to disturbances. Westfall‘s proposal of the weighted procedure establishes a link of the above procedure and the Bonferroni-Holm method and smoothes out for these disturbances. The weighted procedure is a real alternative to existing analysis techni- ques for microarray data, problem of suitable choice of .

Leipzig21 References Fang, K.-T. and Zhang, Y.-T., 1990: General Multivariate Analysis. Science Press Beijing and Springer-Verlag Berlin Heidelberg. Kropf, S., 2000: Hochdimensionale multivariate Verfahren in der medizinischen Statistik. Shaker Verlag, Aachen. Kropf, S., and Läuter, J., 2002: Multiple Tests for Different Sets of Variables Using a Data- Driven Ordering of Hypotheses, with an Application to Gene Expression Data. Biometrical Journal 44, no. 7. Läuter J., 1996: Exact t and F Tests for Analysing Studies with Multiple End­points. Biometrics 52, Läuter, J., Glimm, E., and Kropf, S., 1998: Multivariate Tests Based on Left-Spherically Distributed Linear Scores. Annals of Statistics 26, Erratum: Annals of Statistics 27, Westfall, P.H., Kropf, S., and Finos, L., 2002: Weighted FWE-controlling methods in high- dimensional situations. Submitted for IMS Philadelphia companion volume. Westfall, P.H. and Krishen, A. (2001): Optimally weighted, fixed sequence, and gatekeeping multiple testing procedures. Journal of Statistical Planning and Inference 99, Westfall, P.H. and Young, S.S., 1993: Resampling-Based Multiple Testing. John Wiley & Sons, New York.