Presentation on theme: "The Department of Psychology"— Presentation transcript:
1The Department of Psychology Introduction toMeasurement TheoryLiu XiaolingThe Department of PsychologyECNU
2Chapter 5 Reliability §1 Theory of Reliability Interpretation of ReliabilityReliability refers to the degree of consistency or reproducibility of measurements (or test scores).
3Qualified Reliability Coefficient for Types of Tests Ability or aptitude test, achievement test.90 and abovePersonality, interests, value, attitude test.80 and above
4EXAMPLESStanford-Binet Fifth Edition: full-scale IQ (23 age ranges), ; test-retest reliability coefficients for verbal and nonverbal subtests, from the high .7’s to the low .9’s.WISC-IV: split-half reliabilities for full scale IQ, .97WAIS-III: average split-half reliability for full scale IQ is .98; .97 for verbal IQ; .94 for performance IQThurstone’s Attitude Scale:Rosenberg’s Self-Esteem Scale(1965): α( ); test –retest, .85
5Errors— Inconsistent and inaccurate Effect Refers to the inconsistent and in accurate effects caused by the variable factors which are unrelated to the objective.Three Types:Random Systematic Sampling
6Random errors reduce both the Random Error An error due to chance alone, randomly distributed around the objective .Random errors reduce both theconsistency and the accuracy of thetest scores.
7Systematic errors do not result in Systematic Error An error in data that is regular and repeatable due to improper collection or statistical treatment of the data.Systematic errors do not result ininconsistent measurement, but causeinaccuracy.
8Sampling Error Deviations of the summary values yielded by samples, form the values yielded by entire population.
9Classical True Score Theory Assumptions:One FormulaX=T+EX, an individual’s observed scoreE, random error score (error of measurement)T, the individual’s true scoreFounders:Charles Spearman(1904,1907,1913)J. P. Guilford (1936)
10CONCEPTIONTrue scoreCTT assumes that each person has a true score that would be obtained if there were no errors in measurement.INTERPRETATION:The average of all the observed scores obtained over an infinite number of repeated things with the same test.
11150 Observed Score True Score Error TABLE 5.1 One Measure Data 12 19 27415110203040502-1131150203.22003.21.8
12Three Principles1. The mean of the error scores for a population of examinees is zero.2. The correlation between true and error scores for a population of examinees is zero.3. The correlation between error scores from two independent testing is zero.
13Reliability Coefficient Reliability coefficient can be defined as the correlation between scores on parallel test forms.(5.1)Mathematical Definition:Reliability coefficient is the ratio oftrue score variance to observed score variance.
14As se increases, rtt decrease if St won’t vary . and(5.2)As se increases, rtt decrease if St won’t vary .
15§2 Sources of Random Errors Sources from TestsSources from Tests Administration and ScoringSources from Examinees
16Sources from Tests Item sampling is lack of representativeness. Item format is improper.Item difficulty is too high or too low.Meaning of sentence is not clear.Limit of test time is too short.
17Sources from Tests Administration and Scoring Test Conditions is negtive.Examiner affects examinees’ performance.Unexpected disturbances occur.Scoring isn’t objective; counting is inaccurate.
18Sources from examinees Motive for Test Negative Emotions ( e.g., anxiety)HealthLearning, Development and EducationExperience in Test
20Test-retest Reliability Also called Coefficient of Stability, which refers to the correlation between test scores obtained by administering the same form of a test on two separate occasions to the same examinee group.TEST RETESTINTERVALTHE SAME examinee
21REVIEW CORRELATIONFigure 5.1 Scatter Plots for Two Variates
22Formula for estimating reliability (5.3), test scorePearson product momentcorrelation coefficient, retest score, sample size
23An subjective wellbeing scale administered to 10 high Application ExampleAn subjective wellbeing scale administered to 10 highschool students and half a year later they were tested the same scale again. Estimate the reliability of the scale.Table 5.2Testscoreexaminees
25, standard deviation of first test scores Transform of formula 5.3(5.4),mean of first test scores, mean of retest scores, standard deviation of first test scores, SD of retest scores
26Quality of test-retest reliability: Estimates the consistence of tests across time interval.Sources of errors:Stability of the trait measuredIndividual differences on development, education, learning, training, memory, etc..Unexpected disturbances during test administration.
27Alternate-Forms Reliability also called equivalent/ parallel forms reliability, which refers to the correlation between the test scores obtained by separately administering the alternate or equivalent forms of the test to the same examinees on one occasions.IMMEDIATEFORMⅠFORMⅡTHE SAME examinee
28ApplicationExampleTwo alternate forms of a creative ability test administered ten students in seventh grade one morning. Table 5.3 shows the test result. Estimate the reliability of this test.Table 5.3Form of testexaminees
30Exercise1Use formula 5.3 and 5.4 independently to estimate the reliability coefficient for the data in the following table.TestExamineesAB
31How to eliminate the effect of order of forms administered? Method:First, divide one group of examinees into two parallel groups;Second, group one receives formⅠ of the test, and group two receives formⅡ;Third, after a short interval, group one receives formⅠ, and group two receives formⅡ;Compute the correlation between all the examinees’ scores on two forms of the test.
32Sources of ErrorWhether the two forms of test are parallel or equivalent, such as the consistence on content sampling, item format, item quantity, item difficulty, SD and means of the two forms.Fluctuations in the individual examinee’s mind, in-cluding emotions, test motivation, health, etc..Other unexpected disturbance.
33Coefficient of Stability and Equivalence the correlation between the two group observed scores, when the two alternate test forms are administered on two separate occasions to the same examinees.FORM Ⅰ FORMⅡINTERVALSAME EXAMINEES
34Coefficients of Internal Consistency When examinees perform consistently across items within a test, the test is said to have item homogeneity.Internal consistency coefficient is an index of both item content homogeneity and item quality.
35Content sampling Quality: One administration of a single form of test Error sources:Content samplingFluctuations in the individual examinee’s state, including emotions, test motivation, health, etc..
36Split-Half Reliability To get the split –half reliability, the test developer administers the test to a group of examinees;then divides the items in to two subtest, each half the length of the original test;computes the correlation of the two halves of the test.Procedures
37Methods to divide the test into two parallel halves: 1. Assign all odd-number items to half 1 and all even-number items to half 2.2. Rank order the items in terms of their difficulty levels based on the responses of the examinees; then apply the method 1.3. Randomly assign items to the two halves.4. Assign items to half-test forms as that the forms are “matched ” in content.
38Table 5.4 Illustrative Data for Split-half Reliability Estimation ExamineeItem Half Half 2Xo XeTotal scoreXt12345678910pipi qiSt2=6.0
39Employing formula 5.3 to compute rhh Attention: This rhh actually give the reliability of only ahalf-test. That is, it underestimates the reliabilitycoefficient for the full-length test.
40Employ the Spearman-Brown formula to correct rhh (5.5)
42Spearman-Brown general formula (5.6)is the estimated coefficientis the obtained coefficientis the number of times the test is lengthened or shortened
43Kuder-Richardson Reliability (Kuder & Kuder-Richardson formula 20 (KR20)( 5.7), the number of items, the total test variance,, the proportion of the examinees who pass eachitem, the proportion of the examinees who do not passeach itemDichotomouslyScored Items
45Coefficient-Alpha ( ) (Cronbach,1951) (5.8 ), the total test variance, the variance of item i
46Exercise 2Suppose that examinees have been tested on four essay items in which possible scores range form 0 to 10 points, and , ,, If total score variance is 100, then estimate the reliability of the test.
47Scorer reliability (Inter-rater Consistency) When a sample of test items are independently scored by two or more scorers or raters, each examinee should have several test scores. So there is a need to measure consistency of the scores over different scorers.
48Methods1 The correlation between the two set of scores over two scorers (Pearson correlation; Spearman rank correlation )2 Kendall coefficient of concordance
49Kendall coefficient of concordance (5.9)K, the number of scorers,N，the number of examineesRi, the sum of ranks for each examinee over allscorers
50Table 5.5 Scores of 6 Essays for 6 Examinees raterEmploy formula 5.9, compute the scorer correlation.Key: .95
51Summing up Table 5.6 Sources of Error Variance in Relation to Reliability CoefficientsType of Reliability CoefficientError VarianceTest -retestAlternate- FormStability and Equivalence CoefficientSplit -HalfKR20 and αCoefficientScorerTime samplingContent samplingTime and content samplingContent sampling and content heterogeneityInter-scorer differences
52§4 Factors That Affect Reliability Coefficients Group HomogeneityTest LengthTest difficulty
53Group HomogeneityThe magnitude of reliability coefficient depends on variation among individuals on both their ture scores and error scores.The score range is restricted, consequently, the true score variance is restricted, then the reliability coefficient is low.
54is an important consideration in test development and test selection. Figure 5.2Scatter Plots forTwo VariatesThus, the homogeneity of the examinee groupis an important consideration in test developmentand test selection.
55Predicting how reliability is altered when sample variance is altered (5.10), the predicted reliability estimate for new sample, the variance of the new sample, the variance of the original sample, the reliability estimate for the original sample
56Exercise 3Suppose one memory test have been administered to the all middle school students in one city, and the standard deviations of test score is 20, reliability coefficient is If we also obtain 10, the standard deviation of test score for the students in grade two, please to predict the reliability coefficient for the students in grade two .
57Test Length Which test seems more reliable? Test A 1+1= Test B 1+1= = = =3+2= = = =2+8= = =Conclusion: reliability is higher for the test with more items (all based on the same content) .
58Using the Transform of Spearman-Brown General Formula to Determine the Length of the Test (5.11)
59Exercise 4One language test has 10 items, and reliability coefficient is To make it’s reliability higher to .80, how many items the test developer should add into the test?
60Test DifficultyWhen a test is too hard or too easy for a group of examinees, restriction of score range , and the reliability coefficient is likely to be the result.
61§5 Standard Error of Measurement InterpretationTheoretically, each examinee ‘s personal distribution of possible observed scores around the examinee’s true score has a standard deviation. When these individual error standard deviation are averaged for the group, the result is the standard error of measurement, and is denoted as
62Figure 5.3 Approximately Normal Distribution of Observed Scores for Repeated Testing of One Examinee(Form Introduction to Measurement Theory, M. J. Allen, & W. M. Yen, p89, 2002)
63Figure 5.4 Hypothetical Illustration of different Examinees’ Distributions of Observed Scores Around Their True Score