Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
1 Scaling of the Cognitive Data and Use of Student Performance Estimates Guide to the PISA Data Analysis ManualPISA Data Analysis Manual.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
1 Introduction to Inference Confidence Intervals William P. Wattles, Ph.D. Psychology 302.
Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.
Introduction to statistics in medicine – Part 1 Arier Lee.
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
NORMAL CURVE Needed for inferential statistics. Find percentile ranks without knowing all the scores in the distribution. Determine probabilities.
Chapter 4 Multiple Regression.
Edpsy 511 Homework 1: Due 2/6.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Central Limit Theorem The Normal Distribution The Standardised Normal.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
ICCS th NRC Meeting, February 15 th - 18 th 2010, Madrid 1 Sample Participation and Sampling Weights.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Sampling Distributions & Standard Error Lesson 7.
Measures of Dispersion & The Standard Normal Distribution 2/5/07.
Measures of Dispersion & The Standard Normal Distribution 9/12/06.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Managerial Economics Demand Estimation & Forecasting.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Chapter 16 Data Analysis: Testing for Associations.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Estimation. The Model Probability The Model for N Items — 1 The vector probability takes this form if we assume independence.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
Sampling Theory and Some Important Sampling Distributions.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Virtual COMSATS Inferential Statistics Lecture-4 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 SPSS MACROS FOR COMPUTING STANDARD ERRORS WITH PLAUSIBLE VALUES.
6. Ordered Choice Models. Ordered Choices Ordered Discrete Outcomes E.g.: Taste test, credit rating, course grade, preference scale Underlying random.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
Describing a Score’s Position within a Distribution Lesson 5.
Hypothesis Testing and Statistical Significance
Computacion Inteligente Least-Square Methods for System Identification.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
WELCOME TO BIOSTATISTICS! WELCOME TO BIOSTATISTICS! Course content.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
Estimating standard error using bootstrap
Statistical analysis.
Claus H. Carstensen, Institute for Science Education IPN Kiel, Germany
Probability Theory and Parameter Estimation I
Multiple Imputation using SOLAS for Missing Data Analysis
Statistical analysis.
Ch3: Model Building through Regression
Item Analysis: Classical and Beyond
Sampling Distribution
Sampling Distribution
Quantitative Methods PSY302 Quiz Normal Curve Review February 6, 2017
Item Analysis: Classical and Beyond
Item Analysis: Classical and Beyond
Presentation transcript:

Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010

NRC Meeting Madrid February 2010 Content of presentation Rationale for scaling Rasch model and possible ability estimates Shortcomings of point estimates Drawing plausible values Computation of measurement error

NRC Meeting Madrid February 2010 Rationale for IRT scaling of data Summarising data instead of dealing with many single items Raw scores or percent correct sample-dependent Makes equating possible and can deal with rotated test forms

NRC Meeting Madrid February 2010 The ‘Rasch model’ Models the probability to respond correctly to an item as Likewise, the probability of NOT responding correctly is modelled as

NRC Meeting Madrid February 2010 IRT curves

NRC Meeting Madrid February 2010 How might we impute a reasonable proficiency value? Choose the proficiency that makes the score most likely –Maximum Likelihood Estimate –Weighted Likelihood Estimate Choose the most likely proficiency for the score –empirical Bayes Choose a selection of likely proficiencies for the score –Multiple imputations (plausible values)

NRC Meeting Madrid February 2010 Maximum Likelihood vs. Raw Score

NRC Meeting Madrid February 2010 The Resulting Proficiency Distribution Score 0 Score 1 Score 2 Score 3 Score 4 Score 5 Score 6 Proficiency on Logit Scale

NRC Meeting Madrid February 2010 Characteristics of Maximum Likelihood Estimates (MLE) Unbiased at individual level with sufficient information BUT biased towards ends of ability scale. Arbitrary treatment of perfects and zeroes required Discrete scale & measurement error leads to bias in population parameter estimates

NRC Meeting Madrid February 2010 Characteristics of Weighted Likelihood Estimates Less biased than MLE Provides estimates for perfect and zero scores BUT discrete scale & measurement error leads to bias in population parameter estimates

NRC Meeting Madrid February 2010 Plausible Values What are plausible values? Why do we use them? How to analyse plausible values?

NRC Meeting Madrid February 2010 Purpose of educational tests Measure particular students (minimise measurement error of individual estimates) Assess populations (minimise error when generalising to the population)

NRC Meeting Madrid February 2010 Posterior distributions for test scores on 6 dichotomous items

NRC Meeting Madrid February 2010 Empirical Bayes – Expected A-Priori estimates (EAP)

NRC Meeting Madrid February 2010 Characteristics of EAPs Biased at the individual level but unbiased population means (NOT variances) Discrete scale, bias & measurement error leads to bias in population parameter estimates Requires assumptions about the distribution of proficiency in the population

NRC Meeting Madrid February 2010 Plausible Values Score 0 Score 1 Score 2 Score 3 Score 4 Score 5 Score 6 Proficiency on Logit Scale

NRC Meeting Madrid February 2010 Characteristics of Plausible Values Not fair at the student level Produces unbiased population parameter estimates –if assumptions of scaling are reasonable Requires assumptions about the distribution of proficiency

NRC Meeting Madrid February 2010 Estimating percentages below benchmark with Plausible Values Level One Cutpoint The proportion of plausible values less than the cut-point will be a superior estimator to the EAP, MLE or WLE based values

NRC Meeting Madrid February 2010 Methodology of PVs Mathematically computing posterior distributions around test scores Drawing 5 random values for each assessed individual from the posterior distribution for that individual

NRC Meeting Madrid February 2010 What is conditioning? Assuming normal posterior distribution: Model sub-populations: X=0 for boy X=1 for girl

NRC Meeting Madrid February 2010 Conditioning Variables Plausible values should only be analysed with data that were included in the conditioning (otherwise, results may be biased) Aim: Maximise information included in the conditioning, that is use as many variables as possible To reduce number of conditioning variables, factor scores from principal component analysis were used in ICCS Use of classroom dummies takes between-school variation into account (no inclusion of school or teacher questionnaire data needed)

NRC Meeting Madrid February 2010 Plausible values Model with conditioning variables will improve precision of prediction of ability (population estimates ONLY) Conditioning provides unbiased estimates for modelled parameters. Simulation studies comparing PVs, EAPs and WLEs show that –Population means similar results –WLEs (or MLEs) tend to overestimate variances –EAPs tend to underestimate variance

NRC Meeting Madrid February 2010 Calculating of measurement error As in TIMSS or PIRLS data files, there are five plausible values for cognitive test scales in ICCS Using five plausible values enable researchers to obtain estimates of the measurement error

NRC Meeting Madrid February 2010 How to analyse PVs - 1 Estimated mean is the AVERAGE of the mean for each PV Sampling variance is the AVERAGE of the sampling variance for each PV

NRC Meeting Madrid February 2010 How to analyse PVs - 2 Measurement variance computed as: Total standard error computed from measurement and sampling variance as:

NRC Meeting Madrid February 2010 How to analyse PVs - 3  can be replaced by any statistic for instance: - SD - Percentile - Correlation coefficient - Regression coefficient - R-square - etc.

NRC Meeting Madrid February 2010 Steps for estimating both sampling and measurement error Compute statistic for each PV for fully weighted sample Compute statistics for each PV for 75 replicate samples Compute sampling error (based on previous steps) Compute measurement error Combine error variances to calculate standard error

NRC Meeting Madrid February 2010 Questions or comments?