Presentation is loading. Please wait.

Presentation is loading. Please wait.

Effect Size and Meta-Analysis

Similar presentations


Presentation on theme: "Effect Size and Meta-Analysis"— Presentation transcript:

1 Effect Size and Meta-Analysis
Effect size helps evaluate the size of a difference, such as the difference between two means. Meta-analysis is used to combine results across diverse studies on a given topic.

2 Topic 58: Introduction to Effect Size (d)
Suppose that Experimenter A administered a new treatment for Depression (Treatment X) to an experimental group, while the control group received a standard treatment. Furthermore, suppose that Experimenter A used a 20 item true-false depression scale (with possible raw scores from 0 to 20) and obtained the results on the posttest shown here. Note that the difference between the two means is 5 raw score points.

3 Topic 58: Introduction to Effect Size (d)
Suppose that Experimenter B administered Treatment Y to an experimental group while treating the control group with the standard treatment. Furthermore, suppose Experimenter B used a 30-item scale with choices from “Strongly agree” to “Strongly disagree” (with possible scores from 0 to 120) and obtained the results shown here, which show a difference of 10 raw score points in favor of the experimental group.

4 Topic 58: Introduction to Effect Size (d)
Which treatment is superior? Treatment X, which resulted in a 5-point raw score difference between the two means, or Treatment Y, which resulted in a 10-point raw score difference between the two means? Of course, the answer is not clear because the two experimenters used different measurement scales (0 to 20 versus 0 to 120)

5 Topic 58: Introduction to Effect Size (d)
In Experiment A, one standard-deviation unit equals 4.00 raw-score points. Dividing the difference between the means (5.00) by the size of the standard-deviation unit for Experiment A (4.00 points) yields an answer of This value is known as d and is obtained by applying the formula to the right, in which me stands for the mean of the experimental group, and mc stands for the mean of the control group.

6 Topic 58: Introduction to Effect Size (d)
Using the same formula for Experiment B, the difference between the means is divided by the standard deviation (10.00/14.00), yielding d = 0.71, which is almost three-quarters of the way above 0.00 on the three-point scale. The following is what is now known about the differences in the two experiments when both are expressed on a common (i.e., standardized) scale called d.

7 Topic 58: Introduction to Effect Size (d)
Remember that the two raw score differences are not directly comparable because different measurement scales were used (0 to 20 points versus 0 to 120 points). By examining the standardized values of d, which range from 0.00 to 3.00, a meaningful comparison of the results of the two experiments can be made.

8 Topic 58: Introduction to Effect Size (d)
Important definition: Effect size refers to the magnitude (i.e., size) of a difference when it is expressed on a standardized scale. The statistic d is one of the most popular statistics for describing the effect size of the difference between two means. In the next topic, the interpretation of d is discussed in more detail. In topic 60, an alternative statistic for expressing effect size is described.

9 Topic 59: Interpretation of Effect Size (d)
In the previous topic, effect size expressed as d was introduced. The two examples in that topic had values of d of 0.71 and Obviously, the experiment with a value of 1.25 had a larger effect than the one with a value of 0.71. While there are no universally accepted standards for describing values of d in words, many researchers use Cohen’s suggestions: (1) a value of d of about 0.20 (one-fifth of a standard deviation) is “small,” (2) a value of 0.50 (one-half of a standard deviation) is “medium,” and (3) a value of 0.80 (eight-tenths of a standard deviation) is “large.” Keep in mind that in terms of values of d, an experimental group can rarely exceed a control group by more than 3.00 because the effective range of standard-deviation units is only three on each side of the mean. Thus, for most practical purposes, 3.00 or is the maximum value of d.

10 Topic 59: Interpretation of Effect Size (d)
Using the labels in Table 1, the value of d of 0.71 in the previous topic would be described as being closer to “large” than “medium,” while the value of 1.25 would be described as being between “very large” and “extremely large.”

11 Topic 59: Interpretation of Effect Size (d)
The labels being discussed should not be used arbitrarily without consideration of the full context in which the values of d were obtained and the possible implications of the results. This leads to two principles: (1) a small effect size might represent an important result, and (2) a large effect size might represent an unimportant result. Slide 11: Consider the first principle (small effect size might represent an important result). Suppose that researchers have been frustrated by consistently finding values of d well below 0.20 when trying various treatments for solving an important problem (such as treatments for a new and deadly disease). If a subsequent researcher finds a treatment that results in a value of about 0.20, this might be considered a very important finding. At this low level (0.20), the effect of the treatment is small, but it might be of immense importance to ill individuals helped by the treatment - however small the effect size. In addition, the results might point the scientific community in a fruitful direction for additional research on treatments for the problem in question. The second principle is that a large value of d - even one above might be of limited importance. This is most likely when the results lack practical significance in terms of cost, public and political acceptability, and ethical and legal concerns.

12 Topic 60: Effect Size and Correlation (r)
Cohen’s d is so widely used as a measure of effect size that some researchers use the term “effect size” and “d” interchangeably -- as though they are synonyms. However, effect size refers to any statistic that describes the size of a difference on a standardized metric.

13 Topic 60: Effect Size and Correlation (r)
In addition to d, a number of other measures of effect size have been proposed. One that is very widely reported is “effect-size r,” which is simply the Pearson Correlation Coefficient (r), which was described in Topic 53. As outlined in that topic, r indicates the direction and strength of a relationship between two variables expressed on a scale that ranges from to 1.00, where 0.00 indicates no relationship. Values of r are interpreted by first squaring them (r2). For example, when r = 0.50, r2 = 0.25 (0.50 x 0.50 = 0.25). Then, the value of r2 should be multiplied by 100%. Thus, 0.25 x 100% = 25%. This indicates that the value of r of 0.50 is 25% greater than 0.00 on a scale that extends up to a maximum possible value of 1.00.

14 Topic 60: Effect Size and Correlation (r)
In basic studies, the choice values of d (which can range from to 3.00) and reporting correlation coefficients and the associated values of r2 (which can range from 0.00 to 1.00) is usually quite straightforward. If a researcher wants to determine which of two groups is superior on average, a comparison of means using d is usually the preferred method of analysis. On the other hand, if there is one group of participants with two scores per participant and if the goal is to determine the degree of relationship between the two sets of scores, then r and r2 should be used. For instance, if a vocabulary knowledge test and a reading comprehension test were administered to a group of students, it would not be surprising to obtain a correlation coefficient as high as 0.70, which indicates a substantial degree of relationship between two variables (i.e., there is a strong tendency for students who score high on vocabulary knowledge to score high on reading comprehension). As described in Topic 53, for interpretive purposes, 0.70 squared equals 0.49, which is equivalent to 49%. Knowing this allows a researcher to say that the relationship between the two variables is 49% higher than a relationship of 0.00.

15 Topic 60: Effect Size and Correlation (r)
When reviewing a body of literature of a given topic, some studies present means and values of d while other studies on the same topic present values of r, depending on the specific research purposes and research designs. When interpreting such a set of studies, it can be useful to think in terms of the equivalent of d and r. Table 1 shows the equivalents for selected values.

16 Topic 61: Intro to Meta-Analysis
Meta-analysis is a set of statistical methods for combining the results of previous studies. Meta-analysis provides a statistical method that can synthesize multiple studies on a given topic. The differences in the results of each study contained in the meta-analysis are subject to the many types of errors, such as: Random sampling errors Random errors of measurement Systematic errors known to one or more of the researchers Systematic errors of which the researchers are unaware The results of any one experiment should be interpreted with caution. The main focus of the results in a meta-analysis is based on a mathematical synthesis of the statistical results of the studies included in the analysis. The synthesis can be gathered by averaging the results of the four mean differences. 16

17 Example: Results of Meta-Analysis of Two Experiments
______________________________________________________________ Experimental Group Control Group Mean Difference ________________________________________________________ Researcher m= m= mdiff= 3.00 W ________________________________________________________ Researcher m= m= mdiff= 2.00 X ________________________________________________________ Researcher m= m= mdiff= 6.00 Y ________________________________________________________ Researcher m= m= mdiff= Z ________________________________________________________ The best estimate of the effectiveness of the program is 2.50 points based on sample of 400 students. 17

18 Two Important Characteristics of Meta- Analysis
Statistics based on larger samples yield more reliable results. It is important to remember that more reliable results do not necessarily mean more valid results. A systematic bias that skews the results will yield invalid outcomes no matter how big the sample size is. Meta-analysis typically synthesizes the results of studies conducted by independent researchers. Since the researchers are not working together, if one researcher makes an error, the effects of his or her erroneous results will be moderated when they are averaged with the other results. 18

19 Topic 62: Meta- Analysis and Effect Size
In a meta- analysis, it is difficult to find even one perfectly strict replication of a study, for studies often differ in that various researchers frequently use different measures of the same variable. For example, Experimenter A used a test with possible score values from , while Experimenter B used a test with possible scores values from 0-50. ________________________________________________________________ Experimental Group. Control Group Mean Difference _________________________________________________________ Exp. A m= m= mdifference N=50 sd= sd= = Exp B m= m= mdifference N=50 sd= sd= = 2.00 D= divide m difference by the standard deviation (sd) Exp A: d= 100.0/200.00= .50 Exp B: d= 2.00/3.00= .67 (had a larger effect than Exp A) 19

20 Topic 62: Meta- Analysis and Effect Size
In the previous study, the average of the mean difference lacks meaning because the results are expressed on different scales. The answer to this problem is to use a measure of effect size, Cohen’s d : expressed on a standardized scale that ranges from to Calculating d for all studies then averaging the values of d allows one to gather a meaningful result Once you gather this information, you can gauge the strength of this meta- analysis by comparing the results to the Table 1 of Topic 59 R is also expressed on a standardized scale, to +1.00 R values can also be averaged while weighting the avg. to take into account varying sample size **Consumers of research should look to see whether a meta-analysis is based on weighted averages, which is always desirable. 20

21 Topic 63: Meta- Analysis: Strengths and Weaknesses
Produce results based on large combined samples, such large sample yield very reliable results (may lack validity if meta- analysis contains serious methodological flaws) Can be used to synthesize the results of studies conducted by independent researchers Meta- analyses results in objective conclusions (obtain results mathematically) Demonstrates what can be obtained “objectively” which can be compared and contrasted with more subjective qualitative literature reviews on the same research topic 21

22 Topic 63: Meta- Analysis: Strengths and Weaknesses
Researcher may not be careful in selection of studies to include in a meta- analysis, which will lead to results that are difficult to interpret or even meaningless Moderator variable: variable on which the studies are divided into subgroups in a study which separate analyses are conducted for various subgroups Moderates the results so that the results for subgroups are different from the grand combined result “Publication bias” The body of published research available on a topic for a meta- analysis might be biased toward studies that have statistically significant results. 22


Download ppt "Effect Size and Meta-Analysis"

Similar presentations


Ads by Google