Presentation on theme: "Multidimensional Adaptive Testing with Optimal Design Criteria for Item Selection Joris Mulder & Wim J. Van Der Linden 1."— Presentation transcript:
Multidimensional Adaptive Testing with Optimal Design Criteria for Item Selection Joris Mulder & Wim J. Van Der Linden 1
The choice of criterion (D-optimality, A- optimality,…) should consider the goal of testing. A different optimal design criterion for item selection in MAT seems more appropriate. 2
Motivation To find the matches between the different cases of MCAT and the performance of optimal design criteria. To investigate the preference of the optimality criteria for items in the pool with specific patterns of parameter values: Will the criterion for selection in a MCAT program with nuisance abilities select only items that are informative about the intentional abilities? Are there any circumstances in which they also select items that are mainly sensitive to nuisance ability To report some features of Fisher information matrix and its use in adaptive testing that have been hardly noticed 3
Response Model 4
Fisher Information Each element in the matrix has a common factor: When selecting the kth item, the information is: on which different optimality criteria apply. 5
Item Information Matrix in MIRT The information matrix include: (1) Function of g, and (2) matrix aa’ Reparameterize into a one- dimensional function by substituting 6
Where is the Euclidean norm of a i. The ability value is determined by solving: The results: 7
Item selection criteria for MAT Three cases of multidimensional testing: (1) All abilities are intentional (2) Some ability are intentional and others are nuisance (3) All abilities are intentional, but the interest is only in a specific linear combination of them 8
All abilities Intentional D-Optimality (Segall, 1996) which can be expresses as The criterion tends to select items with a large discrimination parameter for the ability with a relatively large (asymptotic) variance for its current estimator (minimax mechanism) 9
Items with large discrimination parameters for more than one ability are generally not informative. Consequently, the criterion of D- optimality tends to prefer items that are sensitive to a single ability over items sensitive to multiple abilities (trade-off effect). Segall (1996) proposed a Bayesian version of D-optimality for MCAT. 10
A-Optimality: minimize the trace of the inverse of the information matrix This results contains the determinant of the information matrix as an important factor. And will similar to that of D-optimality. Can easily extend to a Bayesian version 11
E-Optimal: maximized the smallest eigenvalue of the information matrix. May behave unfavorably because the contribution of an item with equal discrimination parameters to the test information vanishes when the sampling variance of the ability estimator have become equal to each other. This fact contradicts the fundamental rule that the average sampling variance of ability should always decrease after a new observation. Using E-optimality for item selection in MCAT may result in occasionally bad item selection and its use not recommended. 12
Graphical Example 13
14 Item 1: a=(0.5,0) Col Item 2: a=(0.64,0.64) B&W
Nuisance Abilities 15
Both Ds-optimality and As-optimality generally selects items that highly discriminate with respect to the intentional ability. However, when the amount of information about the nuisance abilities is relatively low (that, determinant of nuisance ability is small), an item that highly discriminates with respect to the nuisance abilities is often preferred. 16
Composite ability C-optimality prefer items with discrimination parameters that reflect the weights of importance in the composite ability. Thus, items that with is generally more informative. 17
Simulation Study Two dimensions MACT Item pool: 200 items generated from a1~N(1,0.3), a2~N(1,0.3), b~N(0,3) and 10c~Bin(3,0.5). Stopping rule: 30 items For each combination theta1 =-1,0,1 and theta2=- 1,0,1, a total of 100 adaptive test administration were simulated. Bias and MSE were compared between different criterion optimality Random selection was served as baseline. 19
Theta1 and theta2 intentional 21 A-optimality and D-optimality resulted more accurate ability estimation than E-optimality (which is even worse than R).
theta1 intentional and theta2 a nuisance Ds-optimality selects items that minimize the asymptotic variance of the intentional theta1 (the MSE of theta1 is smaller than that of theta1 when theta1 and theta 2 are both intentional). However, the MSE for the theta2 is much larger. 22
Composite ability 23 When equal weights, C-optimality with weights (1/2,1/2) yielded the highest accuracy for composite ability, however, larger MSE for separate abilities.
24 c(1,0) When unequal weights (3/4,1/4), the Ds-optimal was similar to c(3/4,1/4).
Average Values of Optimality Criteria 25 Except for E-optimality, each of the criterion produced the smallest average value for the specific quantity optimized by the criteria.
Conclusions When all abilities are intentional, both A-optimality and D-optimality result in the most accurate estimation for the separate abilities. The most informative items measure mainly one ability. Both criteria tend to “minimax”. When one of the abilities is intentional and the others are nuisance, item selection based on Ds-optimality (or As-optimality) result in the most accurate estimates for the intentional ability. Items that measure only the intentional ability are generally most informative. When the current inaccuracy of the estimator of a nuisance ability becomes too large relative to that of the intentional abilities, an item that sensitive to the nuisance ability will occasionally preferred. 26
For composition abilities, c-optimality with weights lambda proportional to the coefficients in the composite ability results in the most accurate estimation of ability. The criterion has a preference for items when the proportion of the discrimination parameters reflects the weights in the combination. Content control and exposure rate should be considered. CAT for ipsative tests 27