The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University.

The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University

Procedures for Test Design Test design has been considered to be a subjective, artistic endeavor. But, with the development of item response theory, test design has become more scientific. Lord suggested that tests be constructed to match a target information function. Very sophisticated methods have been developed to select items to match target information functions. Little work has been done on the design of test information functions.

Purposes for this Paper Present methodology for designing target information functions or item difficulty distributions for a test. Demonstrate that methodology for several common testing situations. Measure all examinees from a normal distribution of the trait to a desired level of precision. Measure a range of a trait to a desired level of precision.

Basic Concepts If examinee  is known, optimal test should contain a set of items that provide the required information at that . Information from an item covers a range so items that are optimal for one person supply some information for other persons. General approach is to randomly select persons from target population then select optimal items for that person. For each additional person, select only the additional items that are needed to reach information target.

Example Suppose target examinee population is N(0,1) Randomly select examinee. Information equivalent to reliability.90 is 10. Select items until information 10 is reached assuming Rasch model (b =  ). Randomly select additional examinees. Select items for those examinees until a test length of 50 is reached.

Results -- Comments Results are from one sample of 6 examinees randomly selected. 14 items needed for first examinee. Other examinees need fewer additional items because of overlap of information functions. Need to consider the effects of sampling variation.

Information from One Item

Results – Selected Items

Results – Information Function

The Complete Process Create ideal set of items for a sample. Replicate the process many times (500 seems to work well) Average information functions from the samples. Average number of items in.2-unit bins to determine difficult spread. Check specifications against target.

Conditions for Rasch-based Design N(0,1) trait distribution 50 item test Rasch model 500 replications Minimum information 10

Average Test Information

Item Difficulty Distribution

Match of Test to Target

Comments Minimum information requirement met from - 2.3 to 2.3. Information accumulates to higher values in the middle of the distribution. Difficulty distribution is essentially rectangular. Test information exceeds the target because item numbers are rounded upward in many cases.

Process Can Help Select Test Length Run process for different test lengths. Also can consider forcing selection of first examinee at 0.0. What test length allows criteria to be met?

Effect of Test Length

Results – Test Length With increase test length, information function widens and increases in height. Test length of 15 is too short to meet requirements unless it is focused at 0.0. Forcing first examinee at 0.0 makes information function narrower and more peaked. 75 items is maximum number of items that makes sense for the criteria specified here.

Test Designed to Measure with Precision over a Range Brian Junker suggested the following procedure. Select range Pick items at extremes of range Fill in with items between extremes to yield flat information function Continue until information criterion is reached over entire range

Increment of Information with Each Added Item

Target Information Function for Range from -2 to 2

Items that Match Target

Specifications Counter to Traditional Specifications Most tests have normal distributions of difficulties. These results seem very odd compared to traditional results. Need to investigate further. What is distribution of scores? What is distribution of p-values?

Number-Correct Score Distribution

P-value Distribution

Odd Results Distribution of scores is near normal. Distribution of p-values mirrors b- parameter distribution. Extreme item difficulties are.08 and.92. Surprising that these items yield normal distribution of scores. Look at test characteristic curve.

Test Characteristic Curve

Test Characteristic Cure Test characteristic curve is virtually linear from -2 to 2. When curve is linear, the form of the distribution of  is mapped to the estimated true score scale. In this case, since the  distribution was normal, so is the number-correct score distribution.

Test Information Function for Test with c =.16

Items that Match Target

Conclusions A process has been developed for designing target information functions and item difficulty distributions for tests. The process suggests that either a rectangular or a U-shaped distribution is appropriate if it is desired to measure with equal precision over a range. The number of items needed is related to the range of the scale that needs to be measured. The U-shaped item difficulty distribution works best if it is desired to recover the underlying  distribution. The results are quite different than traditional test development procedures.

The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University.

Similar presentations

Presentation on theme: "The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University.

Similar presentations

Presentation on theme: "The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University."— Presentation transcript:

Similar presentations

About project

Feedback