3Why is validity important? Validity is a (if not the) hallmark of quality for educational assessmentA declaration of validity provides a ‘green light’ to use an assessment procedure for the purpose at hand... a declaration of invalidity presents a ‘red light’
4Q1: Can you date this quote? “Two of the most important types of problems in measurement are those connected with the determination of what a test measures, and of how consistently it measures. The first should be called the problem of validity, the second, the problem of reliability.”
5Q2: Can you identify either series? 1951197119892006Series 219541966197419851999
8Theoretical revolution From fragmented conception of validity (Trinitarian) to an holistic one (Unitarian)Championed by Samuel MessickBetween mid-1970s and late-1980s
9The 1920s definition“By validity is meant the degree to which a test or examination measures what it purports to measure.”(Ruch, 1924, p.13)
10Validity is conditional Upon having observed procedural guidelinese.g. a well-developed test that had been administered incorrectly would not necessarily produce accurate resultsUpon the context of administratione.g. a well-developed test designed in one decade would not necessarily produce accurate results two decades laterUpon characteristics of the group assessede.g. a well-developed test of reading comprehension designed for 16-year-olds would not necessarily produce accurate results for 11-year-oldsUpon the use(s) to which results are to be pute.g. a well-developed test designed for selection would not necessarily produce accurate results for placement.
12Contra 1920s definition For any test, validity may differ if procedural guidelines are not followedfor different groupswithin different contextswhen different interpretations (using different constructs) are made, for different usesSo ‘the test’ cannot be valid or invalid, only ‘the interpretation’ of results.Each important interpretation needs to be validated in its own right.
14The s ‘conception’Different kinds of validity, requiring different kinds of validation, apply to different kinds of testing.For curriculum-based testing:content validity needs to be demonstratedcontent validation is the appropriate methodIf there is satisfactory alignment between the content of the test and the content of the curriculum then the test is valid.
15Mono-validation insufficient Even for testing educational attainment, content validation (to check adequate sampling of content) is insufficientcontent validation can only help to ‘validate’ inferences concerning students who score maximum marksthe way that questions present content may prevent them from eliciting the intended KSU evidencethe way that questions are marked may prevent evidence of KSU from being rewarded appropriatelydifferent students will use different kinds of KSU to answer the same questioninferences are drawn in terms of constructs (e.g. X is better at ‘scientific reasoning’ than Y) and even these construct labels need validating
16It’s all about construct validity “[...] the profession is coming around to the view that all validation is construct validation.” (Cronbach, 1984, p.126)“[...] construct validity may ultimately be taken as the whole of validity in the final analysis.” (Messick, 1989, p.21)
18Double whammy! Rejection of 1920s definition: for any given test, multiple interpretations will need to be validated (particularly when the same test is used for multiple purposes)Rejection of s ‘conception’:for any given interpretation, multiple validation activities will be required to establish its (construct) validity
19The last word on validity? Despite talk of a general consensus over the central tenets of modern validity theory:substantial ambiguity over detail of the theoryongoing resistance to putting it into practicegrowing debate over its plausibility
201a. Ambiguity – meaningMiller et al (2009), on a single page (p.104), refer to:“the validity of an assessment”“the validity of the assessment for that use or interpretation”“the validity of interpretations of tests and assessments”“the validity of test and assessment results”“the validity of the uses and interpretations”20
211a. Ambiguity – meaning Which is the ‘proper’ referent of validity? the interpretation of the score (i.e. the claim) is validthe use of results (i.e. the decision) is validthe inferential process (assessment procedure) is validthe intended, or actual, inferences from results are validthe argument for interpreting and using results is validthe inferential links within the argument chain are validthe validation research conclusions are validthe hypothesis is validthe explanation is valid21
221b. Ambiguity – evidence Relevance Necessity is every kind of evidence/analysis relevant to every validation?Necessityis every kind of evidence/analysis required for every validation?
23Relevance and necessity “Therefore, the profession is coming around to the view that all validation is construct validation. [...] Content- and criterion-based arguments develop parts of the story. With almost any test it makes sense to join all three kinds of inquiry in building an explanation. The three distinct terms do no more than spotlight aspects of the inquiry.”(Cronbach, 1984, p.126)
24Relevance and necessity “[...] test validity cannot rely on any one of the supplementary forms of evidence just discussed. But neither does validity require any one form, as long as there is defensible convergent and discriminant evidence supporting test meaning. To the extent that some form of evidence cannot be developed [...] heightened emphasis can be placed on other evidence [...] What is required is a compelling argument that the available evidence justifies the test interpretation and use, even though some pertinent evidence had to be forgone.”(Messick, 1998, pp.70-71)
25Relevance and necessity “[...] if the proposed interpretation of test results relies on predictions of future performance, these predictions should be empirically evaluated as part of the validation of the proposed interpretation; if no such predictions are made, no evidence for predictive accuracy is called for.”(Kane, 2008, p.79)
262. Resistance – validation Well-established disjunction between modern validity theory and contemporary validation practiceJonson & Plake (1998)Hogan & Agnello (2004)Cizek et al (2008)Wolming & Wikstrom (2010)
272. Resistance – validation Validation has become very demanding...multiple validation ‘foci’multiple validation constructsmultiple uses of results
28Multiple validation ‘foci’ If we’re not just checking test content against curriculum content, for each test, what else do we need to do for each test?OCR > A level > physics > version A > 2011 certificationHow much additional validation is required for distinct subgroups of the population?ethnicity, class, gender, region, school, etc.
313a. Debate – good/bad impacts Can bad impacts, from otherwise good tests, really undermine validity?when test used for intended purpose?when test used for unintended purposes?Must developers provide evidence that their tests have had (or will have) good impacts?is it really their responsibility?how could evidence be collected in advance?who ought to judge what counts as a good or bad impact?
323b. Debate – unitary concept Borsboom, et al (2004)the test is valid (after all)Lissitz & Samuelsen (2007)attainment tests don’t require much more than content validationMurphy (2009)aptitude tests don’t require much more than criterion validation
34My modern validity theory Technical standardvalidity of (each) use of results – depends on strength of argument for interpreting those results in terms of the validation construct (includes reliability)Ethical standarddefensibility (includes social policy, i.e. good/bad impacts)Legal standardlegalityEconomic standardfeasibilityPolitical standardacceptability (includes ‘face validity’)