Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

Similar presentations


Presentation on theme: "Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different."— Presentation transcript:

1 Measurement Concepts & Interpretation

2 Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different the client is from the norm group (inter-individual) By comparing a client to a peer in the norm group to determine how different the client is from the norm group (inter-individual) –Scores provided in norm tables –The score in the norm table usually indicates how the client with peers in same age group or grade

3 Interpretation, cont. Comparing a client with his or her own performance (intra- individual) Comparing a client with his or her own performance (intra- individual)

4 Define: Mean Mean Median Median Mode Mode

5 So you wanna use psychological tests… Um, CAREFULLY review the test manual Um, CAREFULLY review the test manual Consider these aspects: Consider these aspects: –Theoretical Orientation of test/instrument –Practical Considerations –Standardization –Reliability –Validity Gary Groth-Marnat, 2003

6 Theoretical Orientation Do you adequately understand the theoretical construct the test is supposed to be measuring? Do you adequately understand the theoretical construct the test is supposed to be measuring? –If not, do some research. Do the test items correspond to the theoretical description of the construct? Do the test items correspond to the theoretical description of the construct? –Usually manuals provide individual analyses of the items…are the items relevant?

7 Practical Considerations If reading is required by the examinee, does his or her ability match the level required by the test? If reading is required by the examinee, does his or her ability match the level required by the test? –Tests vary in terms of the level of education How appropriate is the length of the test? How appropriate is the length of the test? –Some are too damn long and who likes that? You can always get additional training for some tests so you become Über good at it. You can always get additional training for some tests so you become Über good at it.

8 Standardization (adequacy of norms) Is the population to be test similar to the population the test was standardized on? Is the population to be test similar to the population the test was standardized on? Was the size of the standardization sample adequate? Was the size of the standardization sample adequate? Have specialized subgroup norms been established? Have specialized subgroup norms been established? How adequately do the instructions permit standardized administration? How adequately do the instructions permit standardized administration?

9 Norms!

10 Reliability The reliability of a test refers to its degree of stability, consistency, predictability, and accuracy The reliability of a test refers to its degree of stability, consistency, predictability, and accuracy Are reliability estimates sufficiently high? (correlations generally around.90 for clinical decision making and around.70 for research purposes) Are reliability estimates sufficiently high? (correlations generally around.90 for clinical decision making and around.70 for research purposes) What implications do the relative stability of the trait, the method of estimating reliability, and the test format have on reliability? What implications do the relative stability of the trait, the method of estimating reliability, and the test format have on reliability?

11 You tell me… Test-Retest Reliability Test-Retest Reliability –The reliability coefficient is calculated by correlating the scores obtained by the same person on two different administrations. Alternate Forms Alternate Forms –Trait is measured several times on the same individual by using parallel/alternate forms of the test – the different measurements should produce similar results Split half Reliability Split half Reliability –Test only given once (items split in half…and two halves are correlated) Interscorer Reliability Interscorer Reliability –When scoring is based partially on the judgment of the examiner (e.g., Rorschach). Responses are scored by two people or two people score one client’s responses)

12 All tests have a degree of error The inevitable, natural variation in human performance The inevitable, natural variation in human performance –Measures of ability usually have less variability than measures of personality…why? Psychological testing methods are necessarily imprecise Psychological testing methods are necessarily imprecise –Constructs in psychology are measured indirectly

13 Standard Error of Measurement Test scores consist of both truth and error Test scores consist of both truth and error SEM provides a range of to indicate how extensive that error is likely to be SEM provides a range of to indicate how extensive that error is likely to be –The higher the reliability, the narrower the range of error The SEM is a standard deviation score. The SEM is a standard deviation score. –A SEM of 3 on an IQ test would indicate that individual’s score has a 68% chance of being +/-3 IQ points from the estimated true score – refer back to the normal distribution curve –The SEM is a statistical index of how a person’s repeated scores on a specific test would fall around a normal distribution (also referred to as a confidence interval)

14 Validity Wheras reliability addresses issues of consistency, validity assess what the test is to be accurate about. Wheras reliability addresses issues of consistency, validity assess what the test is to be accurate about. What criteria and procedures were used to validate the test? What criteria and procedures were used to validate the test? Will the test produce accurate measurements in the context and for the purpose for which you would like to use it? Will the test produce accurate measurements in the context and for the purpose for which you would like to use it? –A psychological test is not valid in any abstract or absolute sense. It must be valid in a particular CONTEXT and for a specific group of people.

15 Face validity Face validity is present if the test looks good to the persons taking it, the policymakers who decide to include it in their programs, and to other untrained personnel. Face validity is present if the test looks good to the persons taking it, the policymakers who decide to include it in their programs, and to other untrained personnel.

16 Criterion validity Concurrent validity Concurrent validity –Measurements taken at the same, or approximately the same, time as the test –Concurrent validation is preferable if an assessment of the client’s current status is required Predictive validity Predictive validity –Outside measurements that were taken some time after the test scores were derived. For example, the predictive validity may be evaluated by correlating test scores with other scores from similar measures a year after the initial testing

17 Construct Validity The extent to which the test measures a theoretical construct or trait The extent to which the test measures a theoretical construct or trait –First, the trait must be carefully analyzed –Consider the ways in which the trait should relate to other variable –Test the hypothesized relationships Does the test converge with variables that are theoretically similar to it? Does the test converge with variables that are theoretically similar to it? Does it discriminate from variables that are dissimilar to it? Does it discriminate from variables that are dissimilar to it?

18 Incremental validity For a test to be considered useful and efficient, it must be able to produce accurate results above and beyond the results that could be obtained with greater ease and less expense For a test to be considered useful and efficient, it must be able to produce accurate results above and beyond the results that could be obtained with greater ease and less expense Hey, self-assessments are pretty handy! Hey, self-assessments are pretty handy!

19 Beck Depression Inventory II (BDI-II) Add up the score for each of the twenty- one questions and obtain the total. The highest score on each of the twenty-one questions is three, the highest possible total for the whole test is sixty-three. The lowest possible score for the whole test is zero. Only add one score per question (the highest rated if more than one is circled). Add up the score for each of the twenty- one questions and obtain the total. The highest score on each of the twenty-one questions is three, the highest possible total for the whole test is sixty-three. The lowest possible score for the whole test is zero. Only add one score per question (the highest rated if more than one is circled).

20 “So what does my BDI-II score mean?” Below 4 = possible denial of depression, faking good Below 4 = possible denial of depression, faking good 05-09 = these ups and downs are considered normal (i.e., suck it up) 05-09 = these ups and downs are considered normal (i.e., suck it up) 10-18 = mild to moderate depression 10-18 = mild to moderate depression 19-29 = moderate to severe depression 19-29 = moderate to severe depression 30-63 = severe depression 30-63 = severe depression Over 44 = pretty damn high even for severely depressed persons; possible exaggeration of symptoms Over 44 = pretty damn high even for severely depressed persons; possible exaggeration of symptoms

21 Same for BAI 0-21 = low anxiety 0-21 = low anxiety 22-35 = moderate anxiety 22-35 = moderate anxiety Over 36 = high anxiety, may be severe Over 36 = high anxiety, may be severe


Download ppt "Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different."

Similar presentations


Ads by Google