Chapter 6 Norm-Referenced Reliability and Validity
Topics for Discussion Reliability Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability
Observed, Error, and True Scores Observed Score = True Score + Error Score
Reliability Reliability is that proportion of observed score variance that is true score variance.
Table 6.1 Systolic Blood Pressure Recordings for 10 Subjects Subject Observed BP = True BP + Error BP Sum ( ) Mean (M) Std. Dev. (s) Variance (s 2 )133.6=
Interclass Reliability Pearson Product Moment Test retest Equivalence Split halves Form AForm BTrial 1Trial 2OddEven
Table 6.2 Sit-up Performance for 10 Subjects Subject Trial 1 Trial Sum ( ) Mean (M) Std. Dev (s) Variance (s 2 ) r xx’ =.927
Spearman Brown Prophecy Formula k = the number of items I WANT to estimate the reliability for divided by the number of items I HAVE reliability for
Table 6.3 Odd-Even Scores for 10 Subjects Subject Odd Even Sum ( ) 9286 Mean (M) Std. Dev (s) Variance (s 2 ) r xx’ =.639
Table 6.4 Values of r kk From Spearman-Brown Prophecy Formula r K (change in test length)
Table 6.5 Effect of a Constant Change in Measures SubjectTrial 1Trial Sum ( ) Mean (M) Std. Dev. (s) Variance (s 2 ) r xx’ = 1.00
Intraclass Reliability ANOVA Model Cronbach's Alpha Coefficient Alpha Coefficient
Intraclass (ANOVA) Reliabilities Common terms you will encounter Alpha reliability Kuder Richardson Formula 20 (KR 20 ) Kuder-Richardson Formula 21 (KR 21 ) ANOVA reliabilities
Table 6.6 Calculating the Alpha Coefficient Subject Trial 1 Trial 2 Trial 3 Total X X s
Calculating the Alpha Coefficient
Index of Reliability The theoretical correlation between observed scores and true scores
Standard Error of Measurement Reflects the degree to which a person's observed score fluctuates as a result of errors of measurement
Factors Affecting Test Reliability 1)Fatigue 2)Practice 3)Subject variability 4)Time between testing 5)Circumstances surrounding the testing periods 6)Appropriate difficulty for testing subjects 7)Precision of measurement 8)Environmental conditions
Decline in Reliability for the Harvard Alumni Activity Survey as the Time Between Testing Periods Increases Months Between Test-Retest
Validity Types Content-related validity Criterion-related validity Statistical or correlational Concurrent Predictive Construct-related validity
Standard Error of Estimate Standard error Standard error of prediction
Standard Errors SE of Measurement SE of Estimate
Methods of Obtaining a Criterion Measure Actual participation e.g., golf, archery Perform the criterion Known valid criterion (e.g., treadmill performance) Expert judges Panel judges Tournament participation Round robin Known valid test
Table 6.7 Correlation Matrix for Development of a Golf Skills Test (From Green et al., 1987) Playing golf Long puttChip shotPitch shotMiddle distance shot Drive shot Playing golf 1.00 Long putt Chip shot Pitch shot Middle distance shot Drive shot What are these? Concurrent Validity coefficients
Table 6.8 Concurrent Validity Coefficients for Golf Test 2-item battery Middle distance shot Pitch shot.72 3-item battery Middle distance shot Pitch shot Long putt.76 4-item battery Middle distance shot Pitch shot Long putt Chip shot.77
Figure 6.1 Diagram of Validity and Reliability Terms
Interpreting the “r” you obtain Interpreting the “r” You Obtain
Various Correlations Actual Golf Score (Criterion) Putting Test Version A (Trial 1) Putting Test Version A (Trial 2) Driving Test Version A (Trial 1) Driving Test Version A (Trial 2) Swing Form Test Version A (Rating 1) Swing Form Test Version A (Rating 2) Actual Golf Score (Criterion) 1.00 Putting Test - Version A (Trial 1) Validity Coefficient (r XY ) 1.00 Putting Test - Version A (Trial 2) Reliability Coefficient (r XX′ ) 1.00 Driving Test - Version A (Trial 1) Pearson Product Moment Correlation Coefficients (r) 1.00 Driving Test - Version A (Trial 2) Reliability Coefficient (r XX′ ) 1.00 Swing Form Test - Version A (Rating 1) Pearson Product Moment Correlation Coefficients ( r XY ) 1.00 Swing Form Test - Version A (Rating 2) Objectivity Coefficient (r XX′ ) 1.00
Interpret These Correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What are these? Concurrent Validity coefficients Criterion
Interpret These Correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What are these? Reliability coefficients
Interpret These Correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What is this? Objectivity coefficient
Scatterplot Two trials of Leg Press Prediction line Line of identity
Correlation Two trials of Leg Press
Concurrent Validity This square represents variance in performance in a skill (e.g., golf)
Concurrent Validity The different colors and patterns represent different parts of a skills test battery to measure the criterion (e.g., golf)
Concurrent Validity The orange color represents ERROR or unexplained variance in the criterion (e.g., golf) Error
Concurrent Validity ACDB Consider the concurrent validity of the above 4 possible skills test batteries
Concurrent Validity ACDB Which test battery would you be LEAST likely to use? Why? D—it has the MOST error and requires 4 tests to be administered
Concurrent Validity ACDB Which test battery would you be MOST likely to use? Why? C—it has the LEAST error but it requires 3 tests to be administered
Concurrent Validity ACDB Which test battery would you use if you are limited in time? A or B—requires 1 or 2 tests to be administered but you lose some validity
PASW Examples