Presentation on theme: "1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure."— Presentation transcript:
1 Lecture 4 CONSTRUCT VALIDITY
2 Validity A test is said to be VALID if it measures what it is supposed to measure.
3 Summary … There have been many different interpretations of validity. There are FOUR main approaches: 1.FACE VALIDITY 2.CONTENT VALIDITY 3.PREDICTIVE VALIDITY 4.CONSTRUCT VALIDITY.
4 Tests in action Psychometric tests are now widely used in job selection. There, the emphasis is upon PREDICTIVE validity. I have 100 applications for three places on a course in electronics. Which applicant shall I choose? I know very little about any of the applicants. I have an hour or so to make a decision.
5 A valid test Fortunately, I have a test which enables me to predict success on the course. The test is highly reliable; moreover, there is a large body of data showing that those who do best on the test tend to perform best on the electronic course itself. My test is not only RELIABLE but also VALID.
6 Theory What exactly is the test measuring? Perhaps it doesnt really matter. It is simply an instrument I use to help select the right candidate. There is practical justification for saying, This test measures whatever ability (or abilities) the course requires!
7 Practice or theory? The usefulness of a test, that is, its PREDICTIVE VALIDITY, is improved by continuously modifying its items so that it meets STATISTICAL criteria. But the items that perform best may not seem theoretically to be the best measures of what the test was originally supposed to be measuring. Thus there can be a TENSION between considerations of psychometric PERFORMANCE and the building of sound THEORY.
8 History The mental testing movement received an enormous boost from the two world wars. New recruits had to be assigned at short notice to activities they could perform. Not everyone can be a navigator in a bomber crew, for example. In such circumstances, theoretical considerations about what exactly the tests were measuring seemed largely irrelevant, as long as they helped to assign the right person to the right job.
9 Methodology Cognitive psychology makes greatest use of EXPERIMENTAL METHOD, because that approach enables the researcher to identify the key variables. Psychometrics is an essentially CORRELATIONAL enterprise. It is very difficult to identify crucial variables from correlational data. It is therefore difficult to map the results of psychometric research on to those of cognitive psychology.
10 4. Construct validity The extent to which a test can be shown to measure a hypothetical construct is known as its CONSTRUCT VALIDITY. Here the emphasis switches from PREDICTION to THEORY. Of the various kinds of validity, construct validity is by far the most difficult to demonstrate.
11 Demonstration of construct validity 1.Your test must CORRELATE substantially with SOME other variables (CONVERGENCE). 2.But your Your measure must also show DISSOCIATION from other variables (DIVERGENCE). 3.Where expected, your measure should also show AGE DIFFERENTIATION. Cognitive ability, for example, increases with age and any supposed test of cognitive ability should reflect this developmental trend.
12 Field Dependence-Independence Witkin held that people vary on a hypothetical psychological dimension he called FIELD- DEPENDENCE-INDEPENDENCE. The field-independent person is supposed to be able to analyse the total field of experience into its component parts and manipulate the parts independently of the overall organisation in order to solve a variety of problems. This analytic capacity is claimed to be wide- ranging and to pervade most aspects of a persons mental life.
13 Witkins tests I described three of Witkins tests: 1.The Rod-and-frame Test (RFT); 2.The Embedded Figures Test (EFT); 3.The Body Adjustment Test (BAT).
14 Convergence? The person who can adjust the rod to the true vertical (in the RFT) should be able to see the embedded figure (in the EFT). Such people should also be able to adjust their chairs to the upright position (BAT), despite the tilt of the walls of the artificial room.
15 Convergence … Since they are supposed to be measuring the same hypothetical construct (field-dependence- independence), Witkins tests should certainly correlate highly with one another. Since they are cognitive tests, however, they could also be expected to correlate positively with at least SOME of the abilities that are required for performance on an intelligence test.
16 Witkins evidence Witkin (and many others) have shown that there are indeed substantial positive CORRELATIONS among the EFT, BAT and RFT tests. The person who adjusts the rod to the true vertical can also make the chair upright and quickly spot the embedded figures. The person who cannot spot the embedded figure insists that the rod is vertical when it is actually aligned with the long axis of the frame and claims that a chair is truly upright when it is actually aligned with the tilted room.
17 Convergent validation Each of the three measures correlates significantly and substantially with the other two. The correlations in the table below are typical of those found in many studies by many different teams of researchers. The criterion of CONVERGENCE is met by Witkins tests.
18 Just intelligence? Witkins measures correlate positively with the Full Scale WAIS IQ. For example, one study (Witkin, 1965) showed that EFT and WAIS IQ correlated significantly: r(72) =.36; p <.01. Is Witkins hypothetical construct simply INTELLIGENCE? Is there really a separate dimension of Field-Dependence-Independence? To make his case, Witkin must also show theoretically meaningful DISSOCIATION, or DIVERGENCE, of his measures from other cognitive activities.
20 The analytical subgroup Consider: Block Design Picture Arrangement. Object Assembly. According to Witkin, these three tests all require the participant to analyse the field into its component parts and reassemble them to solve the problem. This is not true of other subtests, such as Vocabulary, Comprehension or Digit Span. Witkin therefore predicted that the EFT should correlate highly with the tests in the analytical subgroup, but not significantly with the other WAIS items.
21 Divergence The EFT does indeed correlate highly with the Kohs blocks, from the analytical subgroup. But the correlation with non-analytic items such as Vocabulary is insubstantial and insignificant. Witkin has demonstrated the DIVERGENCE he needs to demonstrate the CONSTRUCT VALIDITY of his tests as measuring a distinct dimension of cognition.
22 Construct validity of Witkins tests Witkin has made a cogent case for the construct validity of his tests of field-dependence- independence. There is CONVERGENCE: the tests correlate substantially among themselves; and they also correlate significantly with IQ, as they should do. But there is also DIVERGENCE: the tests correlate strongly with the analytical subgroup of WAIS tests; but they do not correlate with non- analytic items such as vocabulary and arithmetic.
23 Nonverbal working memory In the first lecture, I described two measures of non- verbal working memory: 1.The Corsi Blocks Test; 2.The Visual Patterns Test.
24 The Corsi and Visual spans The Corsi Span is the length of the longest sequence of tapped blocks that the participant can correctly reproduce. The Visual Span is the size of the largest pattern that the participant can correctly reproduce.
25 The Visual Patterns Test: Does it have construct validity? It is claimed that the Visual Patterns Test measures visual storage in purer form than the Corsi Blocks Test, which measures visual plus spatial working memory. But could both tests be measuring the same functions?
26 Convergence The VP and the CB should correlate positively and significantly. But, since the CB taps more than visual memory, the correlation should be far from perfect. This is, in fact, the case. There is a significant correlation between the VP and CB tests: r(74) =.27; p <.01. This value of r is similar to the correlation between Field-Dependence- Independence and IQ: although significant, it is suitably small. This correlation accounts for less than 10% of the variance (CD = r 2 =.09).
27 Divergence The claim is that the Corsi and Patterns tests are not measuring the same functions. If we can manipulate a theoretically relevant variable and demonstrate differential effects upon the Corsi and Pattern spans, we shall have produced evidence to confirm this claim.
28 An experiment Della Sala, S., Gray, C., Baddeley, A., Allamano, N., & Wilson, L. (1999) Pattern span: A tool for unwelding visuo-spatial memory. Neuropsychologia, 37,
29 The experiment First, we obtained the Corsi and Visual Patterns spans. Next, the participants performed an interference task. Finally, the Corsi and Visual Patterns spans were redetermined. As expected, the new spans were shorter, as a result of the interference.
30 Interference tasks There were two kinds of interference tasks: –1. Visual; –2. Spatial. We should find that Visual interference has a greater effect upon the Visual Patterns span; but Spatial interference should have more effect upon the Corsi span.
31 A graph showing the differential effects of interference Visual Patterns Corsi Blocks
32 The dissociation pattern Visual interference has much greater shortening effect upon the Pattern Span than upon the Corsi Span. Spatial interference has a much greater shortening effect upon the Corsi Span than it does upon the Pattern Span. Such DIVERGENCE supports the claim that the Patterns and Corsi tests measure different kinds of nonverbal working memory. Patterns Corsi
33 Age differentiation If a test is supposed to measure a cognitive function, performance on the test should show a typical age trajectory. The Visual Patterns test does indeed show the expected decline from early adulthood: r(345) = -.55; p <.01. The Corsi Blocks test also shows a similarly substantial negative correlation with age.
34 The Colours Test Psychological tests are widely used in industry. The test I am about to described is used in the oil industry to help to assign an employee to the role in a team for which he is best suited. The attributes supposedly measured by the test are letter and colour-coded and the management take note of colour codes when assigning employees to team projects.
35 Four team functions A (RED). Directing and leading. B(YELLOW). Sociability. C(BLUE). Troubleshooting. D(GREEN). Thinking and planning.
36 The Test Instrument The response sheet has 28 boxes to be completed In each box, circle the response that you are –Most like –Least like (Your instinctive response is probably the most accurate. First thoughts are best, here. So try to answer the questions quickly.)
38 Analysis Transfer your Difference scores to this sheet. Draw a line through the scores. The highest values on the page are your Dominant colours. This persons dominant colours are A and D. This person is a leader and a planner.
40 A reliability study I have carried out an informal investigation of the test-retest reliability of the colours test. I gave the Colours Test twice to this class, leaving a week between each session. I obtained sixty-one pairs of responses.
41 Preliminary analysis The profiles are based on the four difference scores. Here is the test-retest reliability for each of these four measures.
42 Directing (A; Red) The scatterplot is a narrow ellipse. There should be a very high correlation. Indeed there is: r (61) =.90; p <.01. This level of reliability is very acceptable.
43 Thinking (D; Green) The scatterplot is a narrow ellipse. The correlation should be high. It is high: r(61) =.85; p <.01. This level of reliability is also very acceptable.
44 Relating (C; Blue) The scatterplot is a narrow ellipse. The correlation should be high. It is: r(61) =.83; p <.01. This level of reliability is very acceptable.
45 Sociability (B; Yellow) This time the scatterplot is messier: there are some outliers. We cannot expect the value of r to be so high. Indeed, it is not: r(61) =.76; p <.01. This level of reliability is just acceptable.
46 Appraisal The Colours Test would appear to be reliable, at least when used with Level 2 students at this university. What is needed is another (larger) study with oil workers. THE NORMS FOR A TEST SHOULD ALWAYS BE GATHERED FROM THE POPULATION IN WHICH THE TEST IS TO BE USED.
47 Appraisal … On the basis of the evidence we have, the test appears to be reliable. But is it also VALID? Do the PROFILES match up with the employees ACTUAL PERFORMANCE in the team roles to which they have been assigned? Managers think they do; but the validity of the Colours Test has yet to be confirmed statistically.
48 Summary A test is VALID if it measures what it is supposed to measure. This simple definition, however, is open to a variety of interpretations. Today, I have considered CONSTRUCT VALIDITY, the kind of validity that is the most problematic of all. To demonstrate the construct validity of a test, the researcher must show, not only that the test correlates with the right variables, but also that it dissociates from the wrong ones. These two essential properties are known as CONVERGENCE and DIVERGENCE.
49 Summary … Witkins tests of Field-Dependence- Independence show convergence with other analytical cognitive tests and dissociation from non-analytical tests. The Visual Patterns and Corsi tests of nonverbal working memory correlate to some extent (convergence) but the Corsi and Pattern spans are affected in opposite directions by visual and spatial interference (divergence).
50 Practice question What is meant by the validity of a psychological test? What is the relationship between the two properties? Describe one approach to the determination of validity.