 # Statistical Issues in Research Planning and Evaluation

## Presentation on theme: "Statistical Issues in Research Planning and Evaluation"— Presentation transcript:

Statistical Issues in Research Planning and Evaluation
Chapter 7 Statistical Issues in Research Planning and Evaluation Research Methods in Physical Activity

Another pertinent approach to probability involves relative frequency.
To plan your own study or evaluate a study by someone else, you need to understand these concepts and their interrelationships: probability, alpha, power, sample size, and effect size. Probability — The odds that a certain event will occur. A concept of probability related to statistics is called equally likely events. equally likely events — A concept of probability in which the chances of one event occurring are the same as the chances of another event occurring. For example, if you roll a die, the chances of the numbers from 1 to 6 occurring are equally likely. Another pertinent approach to probability involves relative frequency. relative frequency - A concept of probability concerning the comparative likelihood of two or more events occurring. For example, suppose that you toss a coin 100 times. You would expect heads 50 times and tails 50 times; the probability of either result is one-half, or .50. When you toss, however, you may get heads 48 times, or .48. This is the relative frequency. You might perform 100 tosses 10 times and never get .50, but the relative frequency would be distributed closely around .50, and you would still assume the probability as .50. Research Methods in Physical Activity

alpha (level of significance)
Probability in statistical tests In a statistical test, you sample from a population of participants and events. You use probability statements to describe the confidence that you place in the statistical findings. Frequently, you encounter a statistical test followed by a probability statement such as p < This interpretation is that a difference or relationship of this size would be expected less than 5 times in 100 as a result of chance. alpha (level of significance) In research, the test statistic is compared with a probability table for that statistic, which tells you what the chance occurrence is. In behavioral research, alpha (probability of chance occurrence) is frequently set at .05 or .01 (the odds that the findings are due to chance are either 5 in 100 or 1 in 100). These values are used to control for a type I error. Research Methods in Physical Activity

Error Types (Type I and Type II Errors)
In a study, the experimenter may make two types of error. A type I error is to reject the null hypothesis when the null hypothesis is true. For example, a researcher concludes that there is a difference between two methods of training, but there really is not. A type II error is not to reject the null hypothesis when the null hypothesis is false. For example, a researcher concludes that there is no difference between the two training methods, but there really is a difference. Figure 7.1 (p. 116) is called a truth table, which displays type I and type II errors. You control for type I errors by setting alpha. For example, if alpha is set at .05, then if 100 experiments are conducted, a true null hypothesis of no difference or no relationship would be rejected on only 5 occasions. To some extent the issue is this: If you had to make an error, which type of error would you be willing to make? The level of alpha reflects the type of error that you are willing to make. Research Methods in Physical Activity

beta Acceptable variations in reporting alpha in research
Even when experimenters set alpha at a specific level (e.g., .05) before the research, they often report the probability of a chance occurrence for the specific effects of the study at the level it occurred (e.g., p = .012). This procedure is appropriate (and recommended), because the researchers are only demonstrating to what degree the level of probability exceeded the specified level. beta beta is the magnitude of the type II error. Although the magnitude of type I error is specified by alpha, you may also make a type II error, the magnitude of which is determined by beta (ß). See Figure 7.2, (p.118) - you can see the overlap of the score distribution on the dependent variable for x (the sampling distribution if the null hypothesis is true) and y (the sampling distribution if the null hypothesis is false). Continued on next slide… Research Methods in Physical Activity

Beta By specifying alpha, you indicate that the mean of y (given a certain distribution) must be at a specified distance from the mean of x before the null hypothesis is rejected. But if the mean of y falls anywhere between the mean of x and the specified y, you could be making a type II error (beta); that is, you do not reject the null hypothesis when, in fact, there is a true difference. There is a relationship between alpha and beta; for example, as alpha is set increasingly smaller, beta becomes larger. Research Methods in Physical Activity

Meaningfulness (Effect Size)
Meaningfulness is the importance or practical significance of an effect or relationship. The meaningfulness of a difference between two means can be estimated by effect size (also called delta). effect size (ES)— The standardized value that is the difference between the means divided by the standard deviation. (ES vs. Sample size listed in Tables 7.3 and 7.4, p.119) The formula for effect size is: = (M1 – M2)/s Also written as: Cohen's = M1 - M2 / spooled      where spooled = �[(s 1�+ s 2�) / 2] This formula subtracts the mean of one group (M1) from the mean of a second group (M2) and divides the difference by the standard deviation. That places the difference between the means in the common metric called standard deviation units. Pay attention to the formula – the greater the difference between the group means and the less variance within the groups = greater effect size. A 0.2 or less is a small ES, about 0.5 is a moderate ES, and 0.8 or more is a large ES. Research Methods in Physical Activity

Power Power is the probability of rejecting the null hypothesis when the null hypothesis is false (e.g., detecting a real difference), or the probability of making a correct decision. Power ranges from 0 to 1. The greater the power is, the more likely you are to detect a real difference or relationship. Rejecting the Null Hypothesis (power) is possible by increasing subject numbers, but are your results meaningful? There are important questions to answer: How large a difference is important in theory or practice? and, How many participants are needed to declare an important difference as significant? Understanding the concept of power can answer these questions. If a researcher can identify the size of an important effect through previous research or even simply estimate an effect size (e.g., 0.5 is a moderate ES) and establish how much power is acceptable (e.g., a common estimate in the behavioral science is 0.8), then the size of the sample needed for a study can be estimated. Research Methods in Physical Activity

Power (refer to figures 7.3 and 7.4)
Power is simply calculated as 1 – beta Beta should be kept at 4x alpha (seriousness of type 1 vs. type II error) Thus if alpha is .05 then beta is .20, and power is calculated at 0.8 You can review the literature and determine the ES. Now you have calculated the power (ability to find differences) and effect size (meaningfulness), with a pre-selected alpha level. Based on this information, you can determine how many subject you will need to recruit per group to achieve the desired outcomes (to find differences) Note how this works – if you have a small ES, more subjects are required to find differences (obtain power), thus the introduction of the independent variable is not meaningful. Also, if you make the alpha level more stringent, and the ES is unchanged, you will have to recruit more subjects to find significant results. The sample size is extremely influential on power (see Table 7.1) Research Methods in Physical Activity

Power Keep in mind the relationships of alpha, sample size, and ES in planning a study. If you have access to only a small number of participants, then you need to have a really large ES or use a larger alpha, or both. Do not just blindly specify the .05 alpha if detecting a real difference is the main issue. Use a higher one, such as .20 or even This approach is extremely pertinent in pilot studies. Research Methods in Physical Activity

Using Information in the Context of the Study
Context — The interrelationships found in the real-world setting. Are effect sizes for significant findings large enough to be meaningful when interpreted within the context of the study, or for the application of findings to other related samples, or for planning a related study? Remember - Effect sizes are based on the difference between the means (divided by the standard deviation) The larger the effect size, the less the overlap between the distribution of scores in the two groups (control and experimental). In very small samples a single unusual value can substantially influence the results. Moreover, within- and between-participant variation (the error variance) tends to be large, which causes the error term in tests of significance to be large, resulting in few significant findings. On the other extreme of sample size, statistics has little value for very large samples because nearly any difference or relationship is significant. Research Methods in Physical Activity

----- End of presentation ----
Context Context is what matters with regard to meaningfulness. You must ask yourself, “Within the context of what I do, does an effect of this size matter?” The answer nearly always depends on who you are and what you are doing (and practically never on whether p = .05 or .01). Thus, having a significant (reliable) effect is a necessary, but not sufficient, condition in statistics. To meet the criteria of being both necessary and sufficient, the effect must be significant and meaningful within the context of its use. Said another way, ♦ estimates of significance are driven by sample size, ♦ estimates of meaningfulness are driven by the size of the difference, and ♦ context is driven by how the findings will be used. ----- End of presentation ---- Research Methods in Physical Activity