Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assessing the Assessment Reliability. Am I measuring something? Validity. Am I measuring what I think I am measuring? Test-retest Interobserver agreement.

Similar presentations


Presentation on theme: "Assessing the Assessment Reliability. Am I measuring something? Validity. Am I measuring what I think I am measuring? Test-retest Interobserver agreement."— Presentation transcript:

1 Assessing the Assessment Reliability. Am I measuring something? Validity. Am I measuring what I think I am measuring? Test-retest Interobserver agreement Parallel forms Split-half (internal consistency) Content Criterion Construct Reliability is a necessary prerequisite for validity.

2 Reliability Reliability refers to the consistency of a measure. Across A reliable test has little measurement error. Time Versions Raters And so on Observed Score = True Score + Error

3 Reliability  True score – true or perfectly accurate  E.g. the time  Often a fictional mark in psychology  Based on multiple measurements  Aggregation = averaging a number of imprecise measurements to increase reliability

4 Reliability Test-retest Interobserver agreement Administer same measure at two points in time Multiple observers/judges/raters/scorers rate same target Parallel forms Compare alternate forms of same test Split-half reliability Split test into two halves and compare scores across halves Coefficient alpha: average of all possible split-half reliabilities

5 Validity Is the test measuring what I think it is? There are three types of validity This requires empirical demonstration Content Validity Criterion Validity Construct Validity

6 Validity Content Validity A test has content validity if it adequately covers the area of content it is supposed to cover. Difficult to examine statistically Content validity typically must be built in at beginning Course exams are the best examples

7 Validity Criterion Validity For criterion validity, tests are evaluated against some criterion Often called predictive validity Most at issue for tests employed to make decisions Selection of students Parole decisions Jobs

8 Criterion Validity - Concurrent  Concurrent validity: does my measure correlate highly with an established measure?  Can my measurement instrument predict a criterion that occurs at the same point in time?  Can my measure (i.e. my operationalization) distinguish between two groups that it should be able to distinguish between?

9 Criterion Validity - Predictive  Can my measure predict future behavior? –If yes, has predictive validity (a type of criterion validity)

10 Predictive Validity of the GRE Graduate Record Examination Kuncel, N.R., Hezlett, S.A., & Ones, D.S. (2001). A comprehensive meta-analysis of the predictive validity of the Graduate Record Examinations: Implications for graduate school student selection and performance. Psychological Bulletin, 127, Originally designed to measure “basic developed abilities relevant to performance in graduate studies” Used often and heavily in decisions about admissions Verbal measure: analogy, antonym, sentence completion, reading comprehension Quantitative measure: quantitative, quantitative comparison, data interpretation Analytic measure: analytical and logical reasoning Subject test: acquired knowledge in particular area

11 Predictive validity of GRE  Want to establish predictive validity of GRE  What will my criterion of graduate school performance be?  Use several indicators of “performance”: –Graduate GPA –1 st year graduate GPA –Comprehensive exam scores –Publication citation counts –Faculty ratings –(these are the criteria)

12 Predictive Validity of the GRE

13

14

15 Summary  All areas of GRE were found to be valid predictors of GGPA, 1 st year GGPA, faculty ratings, and comprehensive exam scores.  GRE subject tests were consistently better predictors of the criteria than quantitative or verbal tests;  also better than UGPA

16 Construct Validity  Most important type of validity  “If this were a measure of …, what would it look like?”  Depends heavily on theory:  How is this construct related to other constructs?  Requires broad thinking  In validating my construct, I am validating my theory

17 Steps to establish construct validity 1. Need to establish convergent correlations  measures of constructs that theoretically should be related to each other are, in fact, observed to be related to each other (that is, you should be able to show a correspondence or convergence between similar constructs) 2. Need to establish divergent correlations  measures of constructs that theoretically should not be related to each other are, in fact, observed to not be related to each other (that is, you should be able to discriminate between dissimilar constructs) 3. Build nomological net

18 Convergent validity  Measures that should be related are related  These 4 items are converging on the same thing (don’t know for sure that it is “self-esteem” yet

19 Divergent Validity  Self-esteem measures do not correlate with locus of control measures  These measure seem to be tapping different things

20 Establishing convergent and divergent validity

21 Nomological Network  Must develop a “lawful network” for your measure in order to establish construct validity.  Includes –Theoretical framework –Empirical framework –Observables

22 Childhood Psychopathy Scale Lynam, D.R. (1997). Pursuing the psychopath: Capturing the fledgling psychopath in a nomological net. Journal of Abnormal Psychology, 106, “The construct of psychopathy and attendant personality information might profitably be used at the childhood level to identify a more homogeneous group of antisocial children.”

23 Psychopathy  The [psychopath] is unfamiliar with the primary facts or data of what might be called personal values and is altogether incapable of understanding such matters.  It is impossible for him to take even a slight interest in the tragedy or joy or the striving of humanity as presented in serious literature or art. He is also indifferent to all these matters in life itself. Beauty and ugliness, except in a very superficial sense, goodness, evil, love, horror, and humour have no actual meaning, no power to move him.  He is, furthermore, lacking in the ability to see that others are moved. It is as though he were colour-blind, despite his sharp intelligence, to this aspect of human existence. It cannot be explained to him because there is nothing in his orbit of awareness that can bridge the gap with comparison. He can repeat the words and say glibly that he understands, and there is no way for him to realize that he does not understand (Cleckley, 1941, p. 90 quoted in Hare, 1993, pp ).

24 Developed Child Psychopathy Scale Principles of rational scale construction Working from Psychopathy Checklist (PCL-R), identified mother-reported items that assessed PCL-R constructs   Operationalized 13 of the 20 PCL-R constructs at 3- to 4-item scales – –glibness, untruthfulness, manipulation, lack of guilt, poverty of affect, callousness, parasitic lifestyle, behavioral dyscontrol, lack of planning, impulsiveness, unreliability, failure to accept responsibility, criminal versatility

25 Items on the CPS

26 Construct Validity of the CPS If the CPS is truly assessing psychopathy, scores on the CPS should be positively related to serious delinquency

27 Construct Validity of the CPS If the CPS is truly assessing psychopathy, scores on the CPS should be positively related to stable delinquency

28 Construct Validity of the CPS If the CPS is truly assessing psychopathy, scores on the CPS should be positively related to impulsivity

29 Construct Validity of the CPS If the CPS is assessing psychopathy, scores on the CPS should be positively related to externalizing problems and negatively related to internalizing problems

30 Construct Validity of the CPS If the CPS is assessing psychopathy, scores on the CPS should predict delinquency above and beyond other well known predictors


Download ppt "Assessing the Assessment Reliability. Am I measuring something? Validity. Am I measuring what I think I am measuring? Test-retest Interobserver agreement."

Similar presentations


Ads by Google