Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advising on test validity Denny Borsboom University of Amsterdam.

Similar presentations

Presentation on theme: "Advising on test validity Denny Borsboom University of Amsterdam."— Presentation transcript:

1 Advising on test validity Denny Borsboom University of Amsterdam

2 or

3 Things that keep me awake at night

4 Overview I.Rocks and hard places II.The psychometric orthodoxy III.The validity problem IV.What I think of validity V.What I advise on validity VI.Even more miscellaneous issues


6 flyinglitter........ environmentalism attituderelevanceothers

7 tell the researcher to do a PCA and be done with it! do what you can to further real scientific progress!

8 The Psychometric Orthodoxy 1.Make up a number of items you think are related to a “construct” 2.Compute Cronbach’s  3.Run a principal components analysis 4.If the scree plot drops steeply, and  >.75, use sumscore for research 5.Plug sumscore into experimental designs, ANOVAs, behavior genetic analyses, fMRI studies, etc. 6.Publish results 7.Worry about validity

9 Disclaimer: The psychometric orthodoxy works perfectly for mundane goals, like: –getting publishable results –predicting all sorts of things –building carreers in psychology That is not what I am concerned about


11 validity: does the test really measure environmentalism?

12 The construct validity doctrine To study validity, one should: - compute correlations with similar variables - compute correlations with dissimilar variables - examine group differences - etc. Results will typically be inconclusive

13 The question of validity What does it mean ‘to really measure’ something? Does it mean more than ‘to just measure something’? And: who is taking care of the measurement problem in the first place?

14 substantive psychology ville methodology mountain validity? why don’t we ask the methodologist?! we assume tests are valid and take it from there


16 Four questions what do our models assume? do these assumptions make sense in psychology? what are we really doing? should this keep me awake at night?

17 Four questions what do our models assume? <- common causes do these assumptions make sense in psychology? <- no what are we really doing? <- something else should this keep me awake at night? <-?

18 Measurement models 11 X1X1 X2X2 X3X3 11 33 22   

19 Number of firemen

20 Number of firemen Number of paramedics

21 Number of firemen Number of paramedics Number of spectators

22 Number of firemen Number of paramedics Number of spectators Correlation

23 Number of firemen Number of paramedics Number of spectators Size of fire Correlation

24 Number of firemen Number of paramedics Number of spectators Size of fire No correlation

25 Number of firemen Number of paramedics Number of spectators Size of fire Local Independence

26 I feel comfortable around people I make friends easily Correlation I am the life of the party

27 I feel comfortable around people I make friends easily I am the life of the party Extraversion Reflective measurement model

28 Reflective measurement models Are an instantiation of a common cause structure So: what causal process links ‘environmentalism’ to my decision to fly or not to fly? And: what element of that process is the same one that causes me to throw litter in the trashcan?

29 Reflective measurement Temperature

30 Reflective measurement with one item What makes one thermometer a valid measurement instrument for temperature? Its outcomes causally depend on temperature The specification of this causal link is the most important problem in assessing validity

31 Essence attribute test score causal process

32 How plausible is this......for environmentalism and flying?...for intelligence and IQ-scores?...for personality and the Big Five?...for depression and DSM-diagnoses?...

33 The Psychometric Orthodoxy 1.Make up a number of items you think are related to a “construct” 2.Compute Cronbach’s  3.Run a principal components analysis 4.If the scree plot drops steeply, and  >.75, use sumscore for research 5.Plug sumscore into experimental designs, ANOVAs, behavior genetic analyses, fMRI studies, etc. 6.Publish results 7.Worry about validity

34 So what are we really doing?

35 educational level job performance genetic differences flying annual income KLM attitude numerical ability SES physique Sex litter length annual income significant others self-efficacy

36 educational level job performance genetic differences flying annual income KLM attitude numerical ability SES shower sex litter length annual income significant others self-efficacy

37 educational level job performance genetic differences flying annual income KLM attitude numerical ability SES shower sex litter length annual income significant others self-efficacy environmentalism

38 We are constructing variables out of other variables, and labeling them as ‘constructs’

39 Advice implications? So: I think that psychology’s measurement story is implausible in many cases I do not believe that it is true for environmentalism and flying Should this play a role in my methodological advice?

40 NO

41 Reasons: I do not represent a majority position I do not know for sure that I’m right I am uncertain what the alternative should be This is not the researcher’s problem until the scientific community makes it his or her problem

42 Catharsis So what I do instead is: try to solve the researcher’s problem (not mine) Try to push the scientific and methodological literature in the direction I think should be labelled ‘forward’ Wait for alternative ideas to catch on, and the consensus to change

43 Message When you are advising, you are a window between the methodological literature and your client If the methodological literature thinks that constructs are o.k., and your client agrees, then you are not in a position to advertise your hangups Researchers should not suffer from your problems

44 But...


46 Example 1 A researcher wants to do an Anova to see whether people score higher on ‘optimism’ than they do on ‘extraversion’ Two different scales, used to measure two different attributes, thrown into an RM anova This is nonsense and will always be nonsense


48 Example 2 An organization wants to estimate the proportion of alternative healers that are involved in malpractice They have a very small, very biased sample This is not a responsible course of action


50 Example 3 An fmri researcher wants to interpret correlations in very small subgroups (n=8) She wants to satisfy a reviewer and conclude that the correlation is higher in group A than in group B Pragmatically, I understand; scientifically, I think it’s nonsense


52 Conclusion In my experience, most methodologists do have conflicts now and then I think that’s part of methodological life We should not burden clients with our personal hangups However, neither do we have the responsibility to always ‘satisfy’ your clients


Download ppt "Advising on test validity Denny Borsboom University of Amsterdam."

Similar presentations

Ads by Google