Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

MCMC estimation in MlwiN
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 23, 2012.
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
Hierarchical Linear Modeling for Detecting Cheating and Aberrance Statistical Detection of Potential Test Fraud May, 2012 Lawrence, KS William Skorupski.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Bayesian Estimation in MARK
IRT Equating Kolen & Brennan, IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person.
Exploring the Full-Information Bifactor Model in Vertical Scaling With Construct Shift Ying Li and Robert W. Lissitz.
The DIF-Free-Then-DIF Strategy for the Assessment of Differential Item Functioning 1.
Markov-Chain Monte Carlo
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
Making rating curves - the Bayesian approach. Rating curves – what is wanted? A best estimate of the relationship between stage and discharge at a given.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
The AutoSimOA Project Katy Hoad, Stewart Robinson, Ruth Davies Warwick Business School OR49 Sept 07 A 3 year, EPSRC funded project in collaboration with.
1 Bayesian inference of genome structure and application to base composition variation Nick Smith and Paul Fearnhead, University of Lancaster.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Equivalence margins to assess parallelism between 4PL curves
Department of Geography, Florida State University
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Investigating Faking Using a Multilevel Logistic Regression Approach to Measuring Person Fit.
IRT Model Misspecification and Metric Consequences Sora Lee Sien Deng Daniel Bolt Dept of Educational Psychology University of Wisconsin, Madison.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
The ABC’s of Pattern Scoring
Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University Departments.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Sample Size Determination
Summary of Bayesian Estimation in the Rasch Model H. Swaminathan and J. Gifford Journal of Educational Statistics (1982)
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
1 Probability and Statistics Confidence Intervals.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Chapter 2: Bayesian hierarchical models in geographical genetics Manda Sayler.
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
1 Getting started with WinBUGS Mei LU Graduate Research Assistant Dept. of Epidemiology, MD Anderson Cancer Center Some material was taken from James and.
Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory Quinn N. Lathrop and Ying Cheng Assistant Professor Ph.D., University.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
Chapter 9 Sampling Distributions 9.1 Sampling Distributions.
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
JAGS. Learning Objectives Be able to represent ecological systems as a network of known and unknowns linked by deterministic and stochastic relationships.
IRT Equating Kolen & Brennan, 2004 & 2014 EPSY
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
Bursts modelling Using WinBUGS Tim Watson May 2012 :diagnostics/ :transformation/ :investment planning/ :portfolio optimisation/ :investment economics/
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Sample Size Determination
MCMC Stopping and Variance Estimation: Idea here is to first use multiple Chains from different initial conditions to determine a burn-in period so the.
Advanced Statistical Computing Fall 2016
Let’s continue to do a Bayesian analysis
Let’s do a Bayesian analysis
Statistical Methods For Engineers
CHAPTER 26: Inference for Regression
Chapter 12 Review Inference for Regression
Chapter 13 - Confidence Intervals - The Basics
Chapter 14 - Confidence Intervals: The Basics
Bayesian Networks in Educational Assessment
Item Analysis: Classical and Beyond
Bayesian Estimation of Toluene and Trichloroethylene Biodegradation Kinetic Parameters Feng Yu and Breda Munoz RTI International.
Presentation transcript:

Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG? Bayesian Statistics, Fall 2009 Chunyan Liu & James Gambrell

Introduction 3 Parameter IRT Model Assigns each item a logistic function with a variable lower asymptote.

Purpose Compare BILOG-MG and WinBUGS estimation of item parameters under the 3 parameter logistic (3PL) IRT model Investigate the effect of sample size on the estimation of item parameters

BILOG – MG (Mislevy & Bock 1985) Propriety software Uses unknown estimation shortcuts Sometimes gives poor results “Black Box” program Very fast estimation Provides only point estimates and standard errors for model parameters Estimation method Marginal Maximum Likelihood Expectation-Maximization algorithm (Bock and Aitkin, 1981) “Black Box” nature makes it hard to assess what is going on when things go wrong. Also hard to justify suspicious or unexpected results.

WinBUGS More open-source (related to OpenBugs) More widely studied Might give more robust results Much more flexible Provides full posterior densities for model parameters More output to evaluate convergence Very slow estimation!

Literature Review Most researchers have used custom-built MCMC samplers using Metropolis-Hastings- within-Gibbs algorithm as recommended by Cowles, 1996! Patz and Junker (1999a & b) Wrote MCMC sampler in S plus Found that their sampler produced estimates identical to BILOG for the 2PL model, but had some trouble with 3PL models. Found MCMC was superior at handling missing data.

Literature Review Jones and Nediak (2000) Developed “commercial grade” sampler in C++ Improved the Patz and Junker algoritm Compared MCMC results to BILOG using both real and simulated data Found that item parameters varied substantially, but the ICCs described were close according to the Hellinger deviance criterion MCMC and BILOG were similar for real data MCMC was superior for simulated data Note that MCMC provides much more diagnostic out to assess convergence problems

Literature Review Proctor, Teo, Hou, and Hsieh (2005 project for this class!) Compared BILOG to WinBUGS Fit a 2PL model Only simulated a single replication Did not use deviance or RMSE to assess error

Data Test: 36-item multiple choice Item parameters (a, b and c) come from Chapter 6 of Equating, Scaling and Linking (Kolen and Brennan) Treated as true item parameters (See Appendix) Item responses simulated using 3PL model a – slope b – difficulty c – guessing – examinee ability

Methods N (N=200, 500, 1000, 2000) θ values were generated from N(0,1) distribution. N item responses were simulated based on the θ’s generated in step 1 and the true item parameters using the 3PL model. Item parameters (a, b, c for the 36 items) were estimated using BILOG-MG based on the N item responses. Item parameters (a, b, c for the 36 items) were estimated using WinBUGS based on the N item responses using the same prior as specified by BILOG-MG. Repeat steps two and four 100 times. For each item, we have 100 estimated parameter sets from both programs

Priors a[i] ~ dlnorm(0, 4) b[i] ~ dnorm(0, 0.25) c[i] ~ dbeta(5,17) Same priors used in BILOG and WinBUGS

Criterion-Root Mean Square Error (RMSE) For each item, we computed the RMSE for a, b, and c using the same formula where and Here could be , , or and x could be the parameter of a, b or c

Results 1. Deciding the number of Burn-in Iterations- History Plots

Results-cont. 1. Deciding the number of Burn-in Iterations- Autocorrelation and BGR plots

Results-cont. 1. Deciding the number of Burn-in Iterations- Statistics node mean sd MC error 2.5% median 97.5% start sample a[1] 0.899 0.1011 0.004938 0.7117 0.8949 1.107 2501 3500 a[2] 1.339 0.1159 0.004132 1.125 1.333 1.58 2501 3500 a[3] 0.7308 0.111 0.005769 0.551 0.717 0.9893 2501 3500 a[4] 2.012 0.2712 0.009897 1.531 1.996 2.59 2501 3500 a[5] 1.766 0.2202 0.009585 1.394 1.745 2.243 2501 3500 b[1] -1.706 0.2944 0.01793 -2.253 -1.717 -1.1 2501 3500 b[2] -0.4277 0.1167 0.005916 -0.6571 -0.428 -0.1857 2501 3500 b[3] -0.7499 0.3967 0.01586 -1.409 -0.7994 0.1348 2501 3500 b[4] 0.4324 0.09295 0.004443 0.2363 0.4384 0.6008 2501 3500 b[5] -0.05619 0.122 0.006737 -0.3127 -0.05246 0.1657 2501 3500 c[1] 0.2458 0.088 0.004718 0.09253 0.2415 0.4362 2501 3500 c[2] 0.1403 0.04745 0.002158 0.05368 0.139 0.2361 2501 3500 c[3] 0.2538 0.09285 0.005864 0.09991 0.243 0.4557 2501 3500 c[4] 0.2669 0.035 0.001491 0.1911 0.2693 0.3282 2501 3500 c[5] 0.2588 0.05029 0.002589 0.1526 0.261 0.35 2501 3500

1. Running conditions for WinBUGS Results-cont. 1. Running conditions for WinBUGS Adaptive phase: 1000 iterations Burn-in: 1500 iterations For computing the Statistics: 3500 iterations Using 1 chain Using bugs( ) function to run WinBUGS through R Need BRugs and R2WinBUGS packages

Results-cont. 2. Effect of Sample Size

BILOG-MG vs. WinBUGS – a parameter Results-cont. BILOG-MG vs. WinBUGS – a parameter

BILOG-MG vs. WinBUGS - b parameter Results-cont. BILOG-MG vs. WinBUGS - b parameter

BILOG-MG vs. WinBUGS - c parameter Results-cont. BILOG-MG vs. WinBUGS - c parameter

Discussion & Conclusions Larger sample size decreased RMSE for all parameters under both programs. For N=200, there was a significant convergence problem for BILOG-MG. No problem with WinBUGS.

Discussion & Conclusions-cont. Slope parameter “a” WinBUGS was superior to BILOG when N = 500 or less More accurately estimated for items without extreme a or b parameters by both programs. Difficulty parameter “b” BILOG was superior to WinBUGs when N = 500 or less Both programs had larger error for items either too difficult or too easy Guessing parameter “c” WinBUGs was superior to BILOG at all sample sizes, but especially at N = 1,000 or less More accurately estimated for difficult items by both programs. Both programs had larger error for items with shallow slopes.

Limitations Only one chain is used in the simulation study. Some of the MC errors are not less than 1/20 of the standard deviation, could use more iterations in MCMC sampler Simulated data Conforms to the 3PL model much more closely than real data would No missing responses No omit problems Fewer low scores

WinBUGS code for running 3PL model 3PL; { for (i in 1:N) { for (j in 1:n) { e[i,j]<-exp(a[j]*(theta[i]-b[j])) p[i,j] <- c[j]+(1-c[j])*(e[i,j]/(1+e[i,j])) resp[i,j] ~ dbern(p[i,j]) } theta[i] ~ dnorm(0,1) for (i in 1:n) { a[i] ~ dlnorm(0, 4) b[i] ~ dnorm(0, 0.25) c[i] ~ dbeta(5,17)

True Item Parmaeters item a b c 1 0.5496 -1.796 0.1751 19 0.6562 0.3853 0.1201 2 0.7891 -0.4796 0.1165 20 1.0556 0.9481 0.2036 3 0.4551 -0.7101 0.2087 21 0.3479 2.2768 0.1489 4 1.4443 0.4833 0.2826 22 0.8432 1.0601 0.2332 5 0.974 -0.168 0.2625 23 1.1142 0.5826 0.0644 6 0.5839 -0.8567 0.2038 24 1.4579 1.0241 0.2453 7 0.8604 0.4546 0.3224 25 0.5137 1.379 0.1427 8 1.1445 -0.1301 0.2209 26 0.9194 1.0782 0.0879 9 0.7544 0.0212 0.16 27 1.8811 1.4062 0.1992 10 0.917 1.0139 0.3648 28 1.5045 1.5093 0.1642 11 0.9592 0.7218 0.2399 29 0.9664 1.5443 0.1431 12 0.6633 0.0506 0.124 30 0.702 2.2401 0.0853 13 1.2324 0.4167 0.2535 31 1.2651 1.8759 0.2443 14 1.0492 0.7882 0.1569 32 0.8567 1.714 0.0865 15 1.069 0.961 0.2986 33 1.408 1.5556 0.0789 16 0.9193 0.6099 0.2521 34 0.5808 3.4728 0.1399 17 0.8935 0.5128 0.2273 35 0.9257 3.1202 0.109 18 0.9672 0.195 0.0535 36 1.2993 2.1589 0.1075

Acknowledgement Professor Katie Cowles

Questions?