Presentation is loading. Please wait.

Presentation is loading. Please wait.

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Similar presentations


Presentation on theme: "MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability."— Presentation transcript:

1 MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

2 ERROR & CONFIDENCE Reducing error All assessment scores have error Want to minimize so scores are accurate Protocols & periodic staff training/retraining Increasing confidence Results lead to correct placement Assessments that produce valid, reliable, and usable results

3 ASSESSMENT RESULTS Norm-referenced Individual’s score compared to others in their peer/norm group  School tests, 95% Norm group needs to be representative of test takers the test was designed for

4 ASSESSMENT RESULTS Criterion-referenced Individual’s score compared to a preset standard or criterion Standard doesn’t change based on the individual or group  A=250-295 points

5 VALIDITY Describes how well the assessment results match their intended purpose Are you measuring what you think you are measuring? Relationship between program & assessment content Does not have validity for all purposes, populations or time

6 VALIDITY Depends on different types of evidence Is a matter of degree (no tool is perfect) Is a unitary concept Change from past Former types are now considered as evidence  Content validity/content-related evidence

7 FACE VALIDITY Not listed in text Do the items seem to fit?

8 CONTENT VALIDITY (Content-related evidence) How well does assessment measure subject or content? Representative Completeness----all major areas Nonstatistical Review of literature or expert opinion Blueprint of major components Per Austin (1991), minimum requirement for any assessment

9 CRITERION-RELATED VALIDITY (Criterion-related evidence) Comparison of results Statistical Reported as validity or correlation coefficient +1 to -1 (1 is a perfect relationship) 0 = no relationship r.73 better than r.52 r +/-.40 to +/-.70 = acceptable range

10 CRITERION-RELATED VALIDITY (Criterion-related evidence) May use.30 to.40 if statistically significant If validity is reported, it is generally criterion-related validity 2 types Predictive Concurrent

11 PREDICTIVE VALIDITY The ability of an assessment to predict future behaviors or outcomes Measures are taken at different times ACT or SAT & success in college Leisure Satisfaction predicts discharge

12 CONCURRENT VALIDITY More than one instrument measures the same content Desire to predict 1 set of scores from another set of scores that are taken at the same or nearly same time measuring the same variable

13 CONSTRUCT VALIDITY (Construct-related evidence) Theoretical/conceptual Content & criterion-related validity contribute to construct validity Research concerning conceptual framework on which assessment is based contribute to construct validity Not demonstrated in a single project or statistical measure Few TR have: focus = behavior not construct

14 CONSTRUCT VALIDITY (Construct-related evidence) Factor analysis Convergent validity (what it measures) Divergent validity (what it doesn’t measure) Expert panels here too

15 THREATS TO VALIDITY Assessment s/b valid for intended use (e.g. research instruments) Unclear directions Unclear or ambiguous terms Items that are at inappropriate level for subjects Items not related to construct being measured

16 THREATS TO VALIDITY Too few items Too many items Items with an identifiable pattern of response Method of administration Testing conditions Subjects health, reluctance, attitudes See Stumbo, 2002, p.41-42

17 VALIDITY Can’t get valid results without reliable results, but can get reliable results without valid results Reliability is a necessary but not sufficient condition for validity See Stumbo, 2002, p. 54

18 RELIABLITY Accuracy or consistency of a measurement Reproducible results Statistical in nature r = between 0 & 1 (with 1 being perfect) Should not be lower than.80 Tells what portion of variance is non-error variance Increases with length of test & spread of scores

19 STABILITY (Test-retest) How stable is the assessment? Assessment not overly influenced by passage of time Same group assessed 2 times with same instrument & results of the 2 testings are correlated Are the 2 sets of scores alike? Time effects (longer, shorter)

20 EQUIVALENCY (Equivalent forms) Also known as parallel-form or alternative- form reliability How closely correlated are 2 or more forms of the same assessment? 2 forms have been developed and demonstrated to measure the same construct Forms have similar but not same items e.g. NCTRC exam Short & long forms are not equivalent

21 INTERNAL CONSISTENCY How closely are items on the assessment related? Split half 1st half vs. 2nd half Odd/even Matched random subsets If can’t divide Cronbach’s alpha Kuder-Richardson Spearman-Brown’s formula

22 INTERRATER RELIABILITY Percentage of agreements with number of observations Difference between agreement & accuracy Raters compared to each other 80% agreement

23 INTERRATER RELIABILITY Simple agreement Number of agreements & disagreements Point-to-point agreement Takes each data point into consideration Percentages of agreement for the occurrence of target behavior Kappa index

24 INTRARATER RELIABILITY Not in text Compared with self

25 RELIABILITY Manuals often give this information High reliability doesn’t indicate validity Generally a longer test has higher reliability Lessens influence of chance or guessing

26 FAIRNESS Reduction or elimination of undue bias Language Ethnic or racial backgrounds Gender Free of stereotypes & biases Beginning to be a concern for TR

27 USABILITY & PRACTICALITY Nonstatistical Is this tool better than any other tool on market or one I can design? Time, cost, staff qualifications, ease of administration, scoring, etc


Download ppt "MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability."

Similar presentations


Ads by Google