# Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Tests of Comparison.

## Presentation on theme: "Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Tests of Comparison."— Presentation transcript:

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Tests of Comparison

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter Overview Statistical procedures used to test hypotheses and investigate differences between groups or within groups across time. Types of data and the differences between parametric and nonparametric statistics. Concepts of Type I and II error, statistical power, and interaction between independent variables. Introduction to planned and post-hoc comparisons and analysis of covariance. The use of t-tests and the link between t-tests and ANOVA. Introduction to issues of statistical significance, clinical meaningfulness, confidence interval analysis, and effect size provides a context for the critical appraisal of clinical research. Overview of nonparametric test of comparison and a working example of a Mann– Whitney U test. All practitioners claiming to practice from an evidence base must also understand the principles of statistics.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Selecting Statistics and Types of Data Data are categorized as: –Nominal –Ordinal –Interval –Ratio Nominal simply means “to name.” The assignment of numeric values for analysis of nominal data is arbitrary. Ordinal data are ordered in a particular and meaningful manner (e.g., numeric pain scales) Nonparametric statistical methods of comparison are used to analyze nominal data. Parametric statistics are appropriate for analyzing interval and ratio data under most circumstances.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Interval and Ratio Data Interval and ratio data permit the calculation and useful understanding of a mean or average value and a standard deviation. The mode is the most useful measure of central tendency when analyzing nominal data. The appropriate measure of central tendency is the median, whereas range could be provided as a measure of dispersion for some ordinal scales. Interval data: “Interval” implies that the differences between points of measure are consistent and meaningful. Ratio data: Similar to interval data but can yield meaningful ratio values. The absence of an absolute 0 precludes the calculation of meaningful ratios. In all other respects, interval and ratio data are similar. Both types of data are analyzed with the same statistical procedures.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Differences Between Nonparametric and Parametric Procedures Variance and standard deviation are measures of dispersion for interval and ratio data. Median and range value are reported for ordinal data and the mode for nominal data. Parametric statistics analyze the distribution of variance, hence the term “analysis of variance (ANOVA).” Variance is the difference between a score or value and a mean. Standard deviation is the square root of variance. Variance from the mean can be used to compare sets of data.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Steps to Complete ANOVA Steps in Preparing and Completing Analysis of Variance 1.Formulate an answerable question that includes identifying independent and dependent variables from a research idea. 2.Write the research question in a null form. Abbreviate the null as “NES = no NES” 3.Collect data. 4.Organize data, and perform analysis of variance. 5.Interpret the results and report findings.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Analysis of Variance The purpose of most research involving comparisons is to infer the results to the population. The analysis estimates the probability that the differences found reflect real population differences. Statistical analysis asks if the data collected provide sufficient evidence to determine a difference in an entire population. Statistical tests of comparison are used to reject a null hypothesis: two sets of data are drawn from the same population. Rejecting the null: the groups are different. It is not possible to accept a null since two groups will not truly be equal. If we fail to reject a null, then we must suspend judgment as to whether the groups differ.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins F-Value and Unexplained Variance The result of ANOVA is an F value, which is a point on an F distribution that permits estimates of probability. The formula for an F is a ratio of variance estimates—thus the term “analysis of variance.” F = mean square explained / mean square unexplained (also sometimes referred to as “ms error”). A mean square is essentially the sum of the squared differences from each score and a mean divided by the number of scores minus 1. Unexplained variance is variation from the mean that is attributed to factors beyond the scope of the research design.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Interpreting F F is a point on a distribution. There are an infinite number of F distributions that are reflective of the number of degrees of freedom in the numerator and denominator. The larger the F (ratio of explained / unexplained variance), the less likely that the differences observed were chance occurrences. By convention, researchers are generally willing to accept less than a 5% risk that an F value obtained is a chance occurrence. When the F value is larger, we reject the null hypothesis; thus, differences observed are due to the effects of our intervention. The alpha value specifies the level of accepted risk of incorrectly concluding that observed differences do not reflect true differences in a population of 100.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Alpha Values and Types of Error Type I error occurs when a null is rejected when in fact population differences do not exist. The alpha value is really the level of risk of Type I error. Type II error occurs when a null is not rejected yet a study of the population would reveal differences between groups. Researchers guard against Type I error by selecting the alpha level. Statistical power is required to decrease the risk of Type II error. Power is influenced by 3 factors: –The mean difference between groups –The variance within groups –Sample size In reality, the only factor investigators can control is sample size.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Complexity and Interaction When a between-subjects variable and a within-subjects variable exist in a research design, the design may be referred to as a “mixed model” (very common in health care research). It is possible to have multiple between-subjects and within- subjects variables within a research design. Greater complexity in research designs and data analysis is not necessarily an indicator of better research. Significant interaction: “significant” suggests that the finding is a reflection of a population phenomenon. To better understand how variables interact, you can turn to tables that include “cell” means and standard deviations. To interpret the meaning of interactions between variables, use graphic representation.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Levels of Variables, Planned Comparison, and Post-Hoc Analysis There may be multiple levels within a variable. –Example: Time as a within-subject variable. Addition of levels of independent variables can maximize efficiencies and yield greater insights into the interactions between the variables of interest. Comparisons between pairs of means: pre-planned pairwise comparison. Post-hoc test: Tukey, Scheffe, and Bonferroni procedures. When one encounters reference to procedures of post-hoc testing, the investigator is conveying that additional analyses were performed to isolate the sources of significant differences between sets of scores. Risk of Type I error exists with each analysis performed.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Analysis of Covariance ANCOVA signifies analysis of covariance. ANCOVA is a special case of ANOVA in which a variable is introduced for the purpose of accounting for unexplained variance. ANCOVA can increase statistical power. MANOVA refers to multivariant analysis of variance or cases where more than one dependent measure is analyzed simultaneously. MANOVA is best applied when the investigator is interested in the effect of the independent variable(s) on the collection of dependent variables.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins T-Tests T- Tests are a special case of ANOVA in which there are only two sets of data in the comparison. t 2 = F t values are points on a curve, and there are an infinite number of t distributions. Each t value corresponds to the DF associated with unexplained variance. The DF associated with explained variance is always 1. t = mean A – mean B / S pool √1 / n a + 1 / n b Standard deviation (SD) is the square root of variance. Thus, it’s the link between the formula for t and ANOVA. t values may be positive or negative. (F values are always positive.) With t there is a choice of a null of A = B or A = or >B or vice versa A = or < B.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Significance and Confidence Intervals ANOVA, t-tests, and the nonparametric procedures address the probability that differences observed reflect population differences but do not address the magnitude of differences. It is possible to reject a null hypothesis and conclude when the magnitude is of little clinical consequence or conversely fail to reject a null when the possibility of clinically meaningful differences exists. The solution to this problem is the reporting of confidence intervals. Focus on the interpretation of confidence intervals, and provide only one example of the calculation process. A statistically meaningful difference may not reflect clinically meaningful differences; thus, additional information may be needed before deciding on a plan of care.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Effects Size Effect size is a calculation that shows the typical response to intervention. It is a useful approach to understanding what the observed differences between groups means in terms of magnitude of effect. Effect size calculations place the magnitude of differences between groups in the context of group variance. Jacob Cohen (1988): Cohen’s d is one of the most commonly referenced methods of calculating effect size: d = mean a - mean b / s s is the pooled variance estimate: s = √ (n 1 – 1)s 1 2 + (n 2 – 1)s 2 2 / n 1 + n 2

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Effects Size Hedges (1981): Similar equation to Cohen’s. Yields higher effect size estimates. Denominator based on the degrees of freedom: s = √ (n 1 – 1)s 1 2 + (n 2 – 1)s 2 2 / n 1 + n 2 -2 Effect size of –0.2 represents a small effect. –0.5 represents a moderate effect. –> 0.8 represents a large effect. These values are based in social science rather than in biomedical research.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Nonparametric Statistics The terms nonparametric and distribution-free can be used interchangeably. When parametric analyses are completed, it is assumed that data is based on observations of a normally distributed population with similar variance, and that samples are drawn at random. If these assumptions are not met, nonparametric procedures may be the appropriate analytical methods. Nonparametric statistics test hypotheses about medians or nominal data distribution. Violation of the assumptions is unlikely to have a substantial impact on the statistical outcome, as procedures such as ANOVA are robust. Three of the most common nonparametric procedures: –Mann–Whitney U –Kruskal–Wallis One-Way Analysis of Variance by Ranks –Friedman Two-Way Analysis of Variance by Ranks

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Mann–Whitney U Test The Mann–Whitney U test is analogous to the paired t-test. The analysis tests the null hypothesis that the median score in one group (A) is < or = to the median score of a second group (B) (A < or = B). If the analysis reveals the median of B > A, then one might reject the null hypothesis. The Mann–Whitney U result is designated as a T. As with parametric tests, the null hypothesis (A < or = B) is rejected only if the probability of obtaining a T- value is sufficiently small (e.g., less than 5%).

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Kruskal–Wallis One-Way and Friedman Two-Way A Kruskal–Wallis One-Way Analysis of Variance by Ranks is appropriate when there are more than two groups. –The result is an H-value. –The probability of H can be found by consulting a table specific to this analysis. A Friedman Two-Way Analysis of Variance by Ranks is appropriate for analyses in which there are repeated measures within one group. None of these nonparametric tests allow for the analysis of repeated measures from multiple groups, known as a mixed model design. This represents one of the major limitations of these statistical tests in clinical research.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter Summary and Key Points Statistics do not prove anything. Do not read to accept the conclusions of a research report as an absolute or final answer. Numbers can lie, and the misinterpretation of data and statistical analyses can mislead. Most students preparing for careers in health care are not fond of statistics. Careful consideration and critical appraisal inform quality clinical practice; thus, it is necessary to understand the principles of statistics.