# Implications and Extensions of Rasch Measurement.

## Presentation on theme: "Implications and Extensions of Rasch Measurement."— Presentation transcript:

Implications and Extensions of Rasch Measurement

New Rules of Measurement The Rasch model has introduced several new rules of measurement, which are in stark contrast to the old rules. The Rasch model has introduced several new rules of measurement, which are in stark contrast to the old rules.

Rule 1: Standard Errors Old Rule Old Rule –The standard error of measurement applies to all scores in a population –"if the score distribution approaches normality, and if obtained scores do not extend over the entire possible range, the standard error of measurement is probably uniform at all score levels" (Guilford, 1965 p. 445). New Rule: New Rule: –The standard error of measurement varies across persons with different abilities/trait levels

Standard Error Across the Measurement Range

Implications of Rule 1 In classical test theory, standard errors of raw scores can lead one to believe that zero and perfect scores are perfectly estimated! In classical test theory, standard errors of raw scores can lead one to believe that zero and perfect scores are perfectly estimated! The opposite is the case in Rasch measurement. The opposite is the case in Rasch measurement. In Rasch, each examinee measure has its own standard error, irrespective of who, if any one, takes the same test. In Rasch, each examinee measure has its own standard error, irrespective of who, if any one, takes the same test.

Rule 2: Test Length and Reliability Old Rule: Old Rule: –Longer tests are more reliable New Rule: New Rule: –Shorter tests can be more reliable than longer tests. –While a longer test with the same sort of items is more reliable, this does not preclude the possibility that a shorter test with different items could be equally or more reliable.

Rule 3: Interchangeable Test Forms Old Rule: Old Rule: –Comparing scores from different forms of an instrument requires test parallelism. –Test forms must be comparable in item difficulty. New Rule: New Rule: –Equating test forms that vary in item difficulty is not only possible, but it results in better estimation of trait levels.

Rule 4: Item Properties Old Rule: Old Rule: –Unbiased assessment of item properties (I.e., difficulty) requires representative samples from the target population. New Rule: New Rule: –Unbiased estimates of item properties may be obtained from unrepresentative samples.

Rule 4 Bias: incorrect decisions due to poor test-to- sample targeting. Bias: incorrect decisions due to poor test-to- sample targeting. Representative: The sample trait distribution matches the distribution of the population. Representative: The sample trait distribution matches the distribution of the population. In Rasch measurement, unbiased estimates of item difficulty parameters can be obtained regardless of the way in which person measures are distributed. In Rasch measurement, unbiased estimates of item difficulty parameters can be obtained regardless of the way in which person measures are distributed.

Rule 5: Meaningful Measures Old Rule: Old Rule: –Meaningful interpretations of scores are obtained by comparing scores relative to a distribution (standardization sample). –Conversion of scores into t scores, percentiles. New Rule: New Rule: –Meaningful interpretations of measures are obtained by comparing the distance of measures to various items. –Item and person maps.

Rule 6: Interval Measurement Old Rule: Old Rule: –Interval measurement is achieved to the extent that items produce normally distributed scale scores. New Rule: New Rule: –Interval measurement is achieved to the extent that the data fit the Rasch model.

Summary The Rasch model with its new rules of measurement make it possible to: The Rasch model with its new rules of measurement make it possible to: –Achieve measurement that is free of the distributional properties of samples of persons and items. –More easily equate different instrument forms –Analyze an items characteristics irrespective of other items or sample characteristics. –Create better and shorter instruments, including: –Computerized adaptive testing

Short vs. Long Instruments ShortLong Instrument Length Floor and ceiling effects Limited content validity Lack precision Burden on respondent Redundant information May lack specificity Difficult to crosswalk without common items

Computer Adaptive Testing A CAT works much like a trained clinical interviewer: A CAT works much like a trained clinical interviewer: –Selects questions based on the clients previous responses. –Can cover a broad range of potential problems/diagnoses quickly. –Continues to ask questions until sufficient information for a diagnosis has been obtained.

Benefits of CAT & Item Banking CAT Item Bank Respondent Burden Tailoring/ Specificity Coverage of content domains Floor and ceiling effects

Item Banking Items for Instrument A Items for Instrument B Items for Instrument C Item Pool Items for Instrument A Item Bank Rasch/IRT Item Pool Calibrate items based on data collected from representative sample

CAT and the Rasch Model The Rasch model is ideal as the underlying measurement model for CAT: The Rasch model is ideal as the underlying measurement model for CAT: –Standard errors can be estimated for each respondent independent of other respondents (Rule 1). –Shorter tests can be as reliable as longer tests (Rule 2). –CAT-based measures can be equated regardless of the items administered in each CAT session (Rule 3).

Benefits of CAT CAT provides a way to obtain precise measures while minimizing respondent burden. CAT provides a way to obtain precise measures while minimizing respondent burden. Measures obtained with CAT can be directly compared even though respondents receive different sets of items. Measures obtained with CAT can be directly compared even though respondents receive different sets of items. Instruments measuring the same construct can be combined to form a larger item bank. Instruments measuring the same construct can be combined to form a larger item bank.

Benefits of CAT CAT of course shares the benefits of computer-based testing: CAT of course shares the benefits of computer-based testing: –Standardized scoring procedures –Automated data entry –Immediate feedback –Automatic report generation –Greater privacy

How Does CAT Work?

CAT Process Decreased Difficulty Typical Pattern of Responses Increased Difficulty Middle Difficulty Score is calculated and the next best item is selected based on item difficulty +/- 1 Std. Error CorrectIncorrect

Logical Components of CAT Start Rule Start Rule Item Selection Item Selection Measure Estimation Measure Estimation Stop Rule(s) Stop Rule(s)

The Start Rule Used to select first item Used to select first item What measure is assigned to the respondent prior to selecting the first item? What measure is assigned to the respondent prior to selecting the first item? Can be an arbitrary value (0 on the logit scale) or can be based on previously gathered information. Can be an arbitrary value (0 on the logit scale) or can be based on previously gathered information.

Item Selection Several methods available. Several methods available. Common approach is to select item providing maximum information relative to the current measure. Common approach is to select item providing maximum information relative to the current measure. Can be modified to include other criteria: Can be modified to include other criteria: –Content domains –Items needed for diagnosis

Item Information Item Difficulty = 0.5 too difficult too easy Maximum information, Trait level = 0.5

Item Selection Select Item 1 Item 2 Item 3

Estimating the Measure Once an item is selected and a response to the item is obtained, the CAT system will re- estimate the respondents measure and the standard error of measurement. Once an item is selected and a response to the item is obtained, the CAT system will re- estimate the respondents measure and the standard error of measurement. As with all Rasch measures, the measure estimated by CAT is on a logit scale ranging form negative to positive infinity. As with all Rasch measures, the measure estimated by CAT is on a logit scale ranging form negative to positive infinity.

Estimation Methods Maximum Likelihood Maximum Likelihood –No distributional assumptions –Cannot estimate measures with 0 or perfect scores. Bayesian Bayesian –Assumes the latent trait has a given distribution, e.g., normal distribution –Easier to program –Provides estimates of persons with extreme (0 or perfect) scores. –Measures at the extremes are biased.

Stop Rules Determines when sufficient information has been collected Determines when sufficient information has been collected Types of Stop Rules Types of Stop Rules –Measurement precision –Number of items administered –Test-taking time –Some combination of the above

Are CAT and Paper and Pencil Tests Equivalent? Numerous studies have documented the equivalence of paper-and-pencil and CAT administration, including: Numerous studies have documented the equivalence of paper-and-pencil and CAT administration, including: –Equal ability estimates (Bergstrom, 1992) –Equal variances –High correlations (>.90) –CATs provide comparable and in some cases improved construct and predictive validity

How Many Items? Short Answer: The more, the better. Short Answer: The more, the better. Not uncommon to have hundreds of items in an item bank. Not uncommon to have hundreds of items in an item bank. Number of items will depend on Number of items will depend on –Stop rule used –Number of constructs or domains being assessed –Measurement range –Purpose of the CAT: to estimate a measure or classify persons into groups

Comments Even large item banks fail to provide adequate precision over the entire measure (though it can come close). Even large item banks fail to provide adequate precision over the entire measure (though it can come close). Bank size matters, but so does item quality and targeting of items to the intended population. Bank size matters, but so does item quality and targeting of items to the intended population. An important question is: An important question is: –Where along the measurement continuum is precision most critical?

Potential of CAT in Clinical Practice Reduce respondent burden Reduce respondent burden Reduce staff resources Reduce staff resources Reduce data fragmentation Reduce data fragmentation Streamline complex assessment procedures Streamline complex assessment procedures

Limitations of CAT Expensive to develop and maintain Expensive to develop and maintain Reviewing/changing answers to previous items is usually not allowed, and when allowed can complicate CAT procedures. Reviewing/changing answers to previous items is usually not allowed, and when allowed can complicate CAT procedures.

Recommended Readings Wainer, H., Dorans, N.J., Flaugher, R., Green, B.F., Mislevy, R.J., Steinberg, L., & Thissen, D. (2000). Computerized Adaptive Testing: A Primer. New York: Lawrence Erlbaum. Wainer, H., Dorans, N.J., Flaugher, R., Green, B.F., Mislevy, R.J., Steinberg, L., & Thissen, D. (2000). Computerized Adaptive Testing: A Primer. New York: Lawrence Erlbaum. van der Linden, W. & Glas C.A.W. (2000). Computerized adaptive testing: Theory and Practice. van der Linden, W. & Glas C.A.W. (2000). Computerized adaptive testing: Theory and Practice. Parshall, C.G., Spray, J.A., Kalohn, J.C., & Davey T. (2002). Practical Considerations in Computer-Based Testing. New York: Springer Verlag. Parshall, C.G., Spray, J.A., Kalohn, J.C., & Davey T. (2002). Practical Considerations in Computer-Based Testing. New York: Springer Verlag.