The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University.

Slides:

Advertisements

Similar presentations

Mark D. Reckase Michigan State University The Evaluation of Teachers and Schools Using the Educator Response Function (ERF)

Advertisements

1 Introduction to Inference Confidence Intervals William P. Wattles, Ph.D. Psychology 302.

Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Confidence Interval Estimation Statistics for Managers.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 10 th Edition.

Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.

+ A New Stopping Rule for Computerized Adaptive Testing.

Psychological Statistics

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.

Inferential Statistics

Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Confidence Interval Estimation Statistics for Managers.

1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.

Introduction to Statistical Inferences

Determining Sample Size

Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.

COLLECTING QUANTITATIVE DATA: Sampling and Data collection

Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.

Confidence Intervals (Chapter 8) Confidence Intervals for numerical data: –Standard deviation known –Standard deviation unknown Confidence Intervals for.

Confidence Interval Estimation

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.

PARAMETRIC STATISTICAL INFERENCE

Statistics and Quantitative Analysis Chemistry 321, Summer 2014.

The Sampling Distribution of a Difference Between Two Means!

When σ is Unknown The One – Sample Interval For a Population Mean Target Goal: I can construct and interpret a CI for a population mean when σ is unknown.

Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.

Analyzing Graphs Section 2.3. Important Characteristics of Data Center: a representative or average value that indicates where the middle of the data.

L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 9 1 MER301:Engineering Reliability LECTURE 9: Chapter 4: Decision Making for a Single.

EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.

Essential Statistics Chapter 141 Thinking about Inference.

BPS - 3rd Ed. Chapter 131 Confidence Intervals: The Basics.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.

Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.

University of Ostrava Czech republic 26-31, March, 2012.

Logic and Vocabulary of Hypothesis Tests Chapter 13.

RESEARCH & DATA ANALYSIS

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.

Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.

Confidence Interval Estimation For statistical inference in decision making: Chapter 9.

Sampling Theory and Some Important Sampling Distributions.

Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.

Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics: A First Course 5 th Edition.

Inferential Statistics Psych 231: Research Methods in Psychology.

6.1 Confidence Intervals for the Mean (Large Samples) Prob & Stats Mrs. O’Toole.

Chapter 7 Confidence Interval Estimation

CHAPTER 10 Comparing Two Populations or Groups

Item Analysis: Classical and Beyond

Reliability & Validity

CHAPTER 10 Comparing Two Populations or Groups

Confidence Intervals for a Population Mean,

Chapter 9 Hypothesis Testing.

Sampling Distribution

Sampling Distribution

CHAPTER 10 Comparing Two Populations or Groups

Estimating with Confidence

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups

Item Analysis: Classical and Beyond

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups

Objectives 6.1 Estimating with confidence Statistical confidence

Objectives 6.1 Estimating with confidence Statistical confidence

Item Analysis: Classical and Beyond

CHAPTER 10 Comparing Two Populations or Groups

How Confident Are You?.

Presentation transcript:

The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University

Procedures for Test Design Test design has been considered to be a subjective, artistic endeavor. But, with the development of item response theory, test design has become more scientific. Lord suggested that tests be constructed to match a target information function. Very sophisticated methods have been developed to select items to match target information functions. Little work has been done on the design of test information functions.

Purposes for this Paper Present methodology for designing target information functions or item difficulty distributions for a test. Demonstrate that methodology for several common testing situations. Measure all examinees from a normal distribution of the trait to a desired level of precision. Measure a range of a trait to a desired level of precision.

Basic Concepts If examinee  is known, optimal test should contain a set of items that provide the required information at that . Information from an item covers a range so items that are optimal for one person supply some information for other persons. General approach is to randomly select persons from target population then select optimal items for that person. For each additional person, select only the additional items that are needed to reach information target.

Example Suppose target examinee population is N(0,1) Randomly select examinee. Information equivalent to reliability.90 is 10. Select items until information 10 is reached assuming Rasch model (b =  ). Randomly select additional examinees. Select items for those examinees until a test length of 50 is reached.

Results -- Comments Results are from one sample of 6 examinees randomly selected. 14 items needed for first examinee. Other examinees need fewer additional items because of overlap of information functions. Need to consider the effects of sampling variation.

Information from One Item

Results – Selected Items

Results – Information Function

The Complete Process Create ideal set of items for a sample. Replicate the process many times (500 seems to work well) Average information functions from the samples. Average number of items in.2-unit bins to determine difficult spread. Check specifications against target.

Conditions for Rasch-based Design N(0,1) trait distribution 50 item test Rasch model 500 replications Minimum information 10

Average Test Information

Item Difficulty Distribution

Match of Test to Target

Comments Minimum information requirement met from to 2.3. Information accumulates to higher values in the middle of the distribution. Difficulty distribution is essentially rectangular. Test information exceeds the target because item numbers are rounded upward in many cases.

Process Can Help Select Test Length Run process for different test lengths. Also can consider forcing selection of first examinee at 0.0. What test length allows criteria to be met?

Effect of Test Length

Results – Test Length With increase test length, information function widens and increases in height. Test length of 15 is too short to meet requirements unless it is focused at 0.0. Forcing first examinee at 0.0 makes information function narrower and more peaked. 75 items is maximum number of items that makes sense for the criteria specified here.

Test Designed to Measure with Precision over a Range Brian Junker suggested the following procedure. Select range Pick items at extremes of range Fill in with items between extremes to yield flat information function Continue until information criterion is reached over entire range

Increment of Information with Each Added Item

Target Information Function for Range from -2 to 2

Items that Match Target

Specifications Counter to Traditional Specifications Most tests have normal distributions of difficulties. These results seem very odd compared to traditional results. Need to investigate further. What is distribution of scores? What is distribution of p-values?

Number-Correct Score Distribution

P-value Distribution

Odd Results Distribution of scores is near normal. Distribution of p-values mirrors b- parameter distribution. Extreme item difficulties are.08 and.92. Surprising that these items yield normal distribution of scores. Look at test characteristic curve.

Test Characteristic Curve

Test Characteristic Cure Test characteristic curve is virtually linear from -2 to 2. When curve is linear, the form of the distribution of  is mapped to the estimated true score scale. In this case, since the  distribution was normal, so is the number-correct score distribution.

Test Information Function for Test with c =.16

Items that Match Target

Conclusions A process has been developed for designing target information functions and item difficulty distributions for tests. The process suggests that either a rectangular or a U-shaped distribution is appropriate if it is desired to measure with equal precision over a range. The number of items needed is related to the range of the scale that needs to be measured. The U-shaped item difficulty distribution works best if it is desired to recover the underlying  distribution. The results are quite different than traditional test development procedures.