Presentation is loading. Please wait.

Presentation is loading. Please wait.

Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.

Similar presentations


Presentation on theme: "Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs."— Presentation transcript:

1 Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs

2 What we’re going to do  Review some simple, descriptive statistics.  Discuss the concept of random error.  Identify important item characteristics.  Conduct an item analysis using real data and actual test items.

3 Measures of central tendency  Mean = the sum of the test scores ÷ the number of test scores (i.e., an average)  Median = the middle score

4 Individual differences  Range = highest score - lowest score  Standard deviation (roughly!) = the range ÷ 5, or (more precisely) = the square root of the average, squared deviation score 1  Variance = the standard deviation squared 1. A deviation score is an individual’s score minus the group mean score

5 Score distribution

6 Score distribution (cont.)  Licensure test scores are generally NOT normally distributed, as shown in the preceding slide.  They are often left skewed (i.e., scores are concentrated on the right-hand side of the distribution).  But it doesn’t matter. We’re going to treat score distributions AS IF they are normal.

7 Individual differences (variance) and random error  Under classical measurement theory, individual score differences are the result of: 1. true differences in achievement and 2. random error  We are interested in the former and want to minimize the latter.

8 Estimating the influence of random error on score results  A RELIABLE test generates scores that are reasonably free from the influence of random error (i.e., the test has a high degree of precision).  A reliability coefficient indicates a test’s precision of measurement.  The general index of reliability is KR-20.

9 KR-20  KR-20 can range in value from 0 (perfectly unreliable) to 1 (perfectly reliable).  KR-20 values for licensure exams should range above.90.  KR-20 values are affected by the number of items on the test and by how strongly the items relate to (or correlate with) one another. Shorter tests are generally LESS reliable than longer ones and anything that restricts test score variance will also reduce the the value of KR-20.

10 Standard error of measurement (SEM)  SEM offers another means of examining the influence of random error.  It is an estimate of the standard deviation of test scores for any person resulting from repeated administrations of similar [parallel] test forms.  With qualifications, SEM can be used to place confidence intervals around a person’s actual score.

11 Item difficulty  Item difficulty is estimated by the p-value. It is the percentage of test takers who correctly answer the item.  Item p-values at.50 offer the greatest contribution to test reliability.  p-values are potentially biased on the sample of test takers from which they were calculated.

12 Item discrimination  Item discrimination describes an item’s ability to differentiate between persons who are knowledgeable about item content from those who are not.  Item discrimination is typically estimated by rpb (point-biserial correlation).  rpb indicates the strength of relationship (correlation) between how individuals answer an item and their score total.

13 Item discrimination (cont.)  High achievers are expected to answer an item correctly more frequently than low achievers. Consequently, an rpb should be positive.  rpbs above.30 are highly discriminating (and offer the greatest contribution to test reliability).  rpbs, like p-values, are potentially biased on the sample from which they were calculated.

14 Item omits  Omits indicate the number of persons who failed to respond to an item.  Numerous omits (assuming no correction for guessing) may indicate a problem with the amount of time allotted for the test.  Extensive non-response is a threat to valid score interpretation and use.

15 Resources for additional help  Haladyna, T. (2004). Developing and Validating Multiple-Choice Test Items.Third Edition. Lawrence Erlbaum Associates, Inc. Publishers. Mahwah, NJ.  Osterlind, S. (1998). Constructing Test Items: Multiple-Choice, Constructed-Response, Performance, and Other Formats. Second Edition. Kluwer Academic Publishers Group. Norwell, MA. or Contact your state land grant university’s college of education, department of psychology, or testing service about performing and interpreting item analysis reports. or try: http://www.eflclub.com/elvin/publications/2003/itemanalysis.html


Download ppt "Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs."

Similar presentations


Ads by Google