A comparison of K-fold and leave-one-out cross-validation of empirical keys Alan D. Mead, IIT

A comparison of K-fold and leave-one-out cross-validation of empirical keys Alan D. Mead, IIT mead@iit.edu

What is “Keying”?  Many selection tests do not have demonstrably correct answers Biodata, SJT, some simulations, etc.  Keying is the constructing of a valid key What the “best” people answered is probably “correct” Most approaches use a correlation, or something similar

Correlation approach  Create 1-0 indicator variables for each response  Correlate indicators with a criterion (e.g., job performance) If r >.01, key = 1 If r < -.01, key = -1 Else, key = 0  Little loss by using 1,0,-1 key

How valid is my key?  Now that I have a key, I want to compute a validity… But I based my key on the responses of my “best” test-takers Can/should I compute a validity in this sample? No! Cureton (1967) showed that very high validities will result even for invalid keys  What shall I do?

Validation Approaches  Charge ahead! “Sure,.60 is an over-estimate; there will be shrinkage. But even half would still be substantial”  Split my sample into “calibration” and “cross-validation” samples Fine if you have a large N…  Resample

LOOCV procedure  Leave one out cross validation (LOOCV) resembles Tukey’s jackknife resampling procedure Hold out one person 1 Compute a key on remaining N-1 Score the held-out person Repeat with person 2, 3, 4, …  Produces N scores that do not capitalize on chance  Correlate the N scores with the criterion  (But use the total sample key for scoring)

Mead & Drasgow, 2003  Simulated test responses & criterion  Three approaches Charge ahead LOOCV True cross-validation  Varying sample sizes: N=50,100,200,500,1000

LOOCV Results

LOOCV Conclusions  LOOCV was much better than simply “charging ahead”  But consistently slightly worse than actual cross-validation  LOOCV has a large standard error  An elbow appeared at N=200

Subsequent work  Empirical keying worked much better than rational keying Specifically, rational keys had to be very good to beat/aid empirical keying  Samples of N=500+ would be ideal Split into calibration and cross-validation  Otherwise, LOOCV is a good choice

K-fold keying  LOOCV is like using crossvalidation samples of N=1  Break sample into K groups E.g., N=200 and k=10  Compute key 10 times  Each calibration sample N=190  Each crossvalidation sample N=10  Does not capitalize on chance  Potentially much more stable results

Present study  Simulation study  Four levels of sample size N=50, 100, 200, 500  Several levels of K K=2, 5, 10, 25, 50, 100, 200, 500 K=2 is double cross validation  True validity = 0.40  35 item test with four responses

Main Effect of Sample Size NValidityTrue Validity 50.27 (.19).40 (.11) 100.34 (.13).40 (.08) 200.36 (.06).40 (.06) 500.36 (.04).40 (.04) Total.34 (.12).40 (.07) Note: Mean (Standard Error)

Effect of k, N=50 kValidityTrue Validity 2.21 (.20).36 (.13) 5.29 (.23).41 (.13) 10.25 (.19).40 (.10) 20.32 (.17).46 (.09) 50.26 (.17).39 (.10) Total.27 (.19).40 (.11)

Effect of k, N=100 kValidityTrue Validity 2.31 (.15).40 (.07) 5.32 (.15).40 (.08) 10.34 (.12).38 (.08) 20.38 (.09).41 (.08) 50.37 (.15).42 (.10) 100.30 (.12).39 (.08) Total.34 (.13).40 (.08)

Effect of k, N=200 kValidityTrue Validity 2.32 (.07).40 (.06) 5.38 (.07).41 (.07) 10.37 (.06).40 (.05) 20.34 (.06).38 (.05) 50.39 (.04).42 (.05) 100.37 (.04).43 (.06) 200.37 (.06).42 (.05) Total.36 (.06).41 (.06)

Effect of k, N=500 kValidityTrue Validity 2.34 (.06).38 (.05) 5.35 (.05).38 (.04) 10.36 (.03).40 (.02) 20.38 (.03).40 (.03) 50.37 (.04).41 (.03) 100.37 (.03).40 (.04) 200.36 (.01).40 (.03) 500.37 (.04).39 (.04) Total.36 (.04).40 (.04)

Summary  N=50 is really too small a sample for empirical keying  Using a k that produces hold out samples of 4-5 seemed best N=100, k= 20 N=200, k= 50 N=500, k= 100  Traditional double cross validation was almost as good for N>100

A comparison of K-fold and leave-one-out cross-validation of empirical keys Alan D. Mead, IIT

Similar presentations

Presentation on theme: "A comparison of K-fold and leave-one-out cross-validation of empirical keys Alan D. Mead, IIT"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A comparison of K-fold and leave-one-out cross-validation of empirical keys Alan D. Mead, IIT

Similar presentations

Presentation on theme: "A comparison of K-fold and leave-one-out cross-validation of empirical keys Alan D. Mead, IIT"— Presentation transcript:

Similar presentations

About project

Feedback