MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

ERROR & CONFIDENCE Reducing error All assessment scores have error Want to minimize so scores are accurate Protocols & periodic staff training/retraining Increasing confidence Results lead to correct placement Assessments that produce valid, reliable, and usable results

ASSESSMENT RESULTS Norm-referenced Individual’s score compared to others in their peer/norm group  School tests, 95% Norm group needs to be representative of test takers the test was designed for

ASSESSMENT RESULTS Criterion-referenced Individual’s score compared to a preset standard or criterion Standard doesn’t change based on the individual or group  A=250-295 points

VALIDITY Describes how well the assessment results match their intended purpose Are you measuring what you think you are measuring? Relationship between program & assessment content Does not have validity for all purposes, populations or time

VALIDITY Depends on different types of evidence Is a matter of degree (no tool is perfect) Is a unitary concept Change from past Former types are now considered as evidence  Content validity/content-related evidence

FACE VALIDITY Not listed in text Do the items seem to fit?

CONTENT VALIDITY (Content-related evidence) How well does assessment measure subject or content? Representative Completeness----all major areas Nonstatistical Review of literature or expert opinion Blueprint of major components Per Austin (1991), minimum requirement for any assessment

CRITERION-RELATED VALIDITY (Criterion-related evidence) Comparison of results Statistical Reported as validity or correlation coefficient +1 to -1 (1 is a perfect relationship) 0 = no relationship r.73 better than r.52 r +/-.40 to +/-.70 = acceptable range

CRITERION-RELATED VALIDITY (Criterion-related evidence) May use.30 to.40 if statistically significant If validity is reported, it is generally criterion-related validity 2 types Predictive Concurrent

PREDICTIVE VALIDITY The ability of an assessment to predict future behaviors or outcomes Measures are taken at different times ACT or SAT & success in college Leisure Satisfaction predicts discharge

CONCURRENT VALIDITY More than one instrument measures the same content Desire to predict 1 set of scores from another set of scores that are taken at the same or nearly same time measuring the same variable

CONSTRUCT VALIDITY (Construct-related evidence) Theoretical/conceptual Content & criterion-related validity contribute to construct validity Research concerning conceptual framework on which assessment is based contribute to construct validity Not demonstrated in a single project or statistical measure Few TR have: focus = behavior not construct

CONSTRUCT VALIDITY (Construct-related evidence) Factor analysis Convergent validity (what it measures) Divergent validity (what it doesn’t measure) Expert panels here too

THREATS TO VALIDITY Assessment s/b valid for intended use (e.g. research instruments) Unclear directions Unclear or ambiguous terms Items that are at inappropriate level for subjects Items not related to construct being measured

THREATS TO VALIDITY Too few items Too many items Items with an identifiable pattern of response Method of administration Testing conditions Subjects health, reluctance, attitudes See Stumbo, 2002, p.41-42

VALIDITY Can’t get valid results without reliable results, but can get reliable results without valid results Reliability is a necessary but not sufficient condition for validity See Stumbo, 2002, p. 54

RELIABLITY Accuracy or consistency of a measurement Reproducible results Statistical in nature r = between 0 & 1 (with 1 being perfect) Should not be lower than.80 Tells what portion of variance is non-error variance Increases with length of test & spread of scores

STABILITY (Test-retest) How stable is the assessment? Assessment not overly influenced by passage of time Same group assessed 2 times with same instrument & results of the 2 testings are correlated Are the 2 sets of scores alike? Time effects (longer, shorter)

EQUIVALENCY (Equivalent forms) Also known as parallel-form or alternative- form reliability How closely correlated are 2 or more forms of the same assessment? 2 forms have been developed and demonstrated to measure the same construct Forms have similar but not same items e.g. NCTRC exam Short & long forms are not equivalent

INTERNAL CONSISTENCY How closely are items on the assessment related? Split half 1st half vs. 2nd half Odd/even Matched random subsets If can’t divide Cronbach’s alpha Kuder-Richardson Spearman-Brown’s formula

INTERRATER RELIABILITY Percentage of agreements with number of observations Difference between agreement & accuracy Raters compared to each other 80% agreement

INTERRATER RELIABILITY Simple agreement Number of agreements & disagreements Point-to-point agreement Takes each data point into consideration Percentages of agreement for the occurrence of target behavior Kappa index

INTRARATER RELIABILITY Not in text Compared with self

RELIABILITY Manuals often give this information High reliability doesn’t indicate validity Generally a longer test has higher reliability Lessens influence of chance or guessing

FAIRNESS Reduction or elimination of undue bias Language Ethnic or racial backgrounds Gender Free of stereotypes & biases Beginning to be a concern for TR

USABILITY & PRACTICALITY Nonstatistical Is this tool better than any other tool on market or one I can design? Time, cost, staff qualifications, ease of administration, scoring, etc

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Similar presentations

Presentation on theme: "MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Similar presentations

Presentation on theme: "MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability."— Presentation transcript:

Similar presentations

About project

Feedback