Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Scoring & Decision Making DeShon - 2005

Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test and cleaned the data… What number is used to represent the person on the latent variable of interest? What number is used to represent the person on the latent variable of interest? What’s the right answer What’s the right answer Empirical vs. Rational Keying Empirical vs. Rational Keying Summing responses Summing responses Number correct, number endorsed, number checked Number correct, number endorsed, number checked Weighted summing, non-unique summing Weighted summing, non-unique summing Corrections for artifacts (lie scales) Corrections for artifacts (lie scales) Forming Composites of subscales Forming Composites of subscales

Correct/Incorrect Measures How do you determine correct? How do you determine correct? Rational Keying Rational Keying Experts agree on the right answer Experts agree on the right answer Find right answers in authoritative texts on the topic Find right answers in authoritative texts on the topic Empirical Keying Empirical Keying Compare correlation of item response alternatives to a criterion of interest Compare correlation of item response alternatives to a criterion of interest Compare existing groups and find items that discriminate between the groups - Discriminant- groups validity model Compare existing groups and find items that discriminate between the groups - Discriminant- groups validity model

Correct/Incorrect Measures Scoring algorithms for scoring items and constructing scales from item responses are often not disclosed Scoring algorithms for scoring items and constructing scales from item responses are often not disclosed Why? Why? Item scores matter Item scores matter Scale construct routines are largely irrelevant unless you must base your interpretation on existing norms Scale construct routines are largely irrelevant unless you must base your interpretation on existing norms

Interpretation and Decision Making Once you have scores, how do you interpret test scores and use them for decision making? Once you have scores, how do you interpret test scores and use them for decision making? Ranking/Top-Down decision making Ranking/Top-Down decision making Banding Banding Cut scores Cut scores Norms Norms Z-scores, T-scores, percentiles Z-scores, T-scores, percentiles

Interpretation and Decision Making Top-Down/Ranking is very common Top-Down/Ranking is very common Decisions based on relative standing in the distribution of test scores Decisions based on relative standing in the distribution of test scores Higher scores mean more of the trait Higher scores mean more of the trait Hard to demonstrate that higher scores mean higher standing on the latent trait if there is much error in the scores Hard to demonstrate that higher scores mean higher standing on the latent trait if there is much error in the scores

Banding Set up ranges of the test scores that are distinguishable based on the standard error of the difference Set up ranges of the test scores that are distinguishable based on the standard error of the difference Then select candidates at random or using some other criterion (senority) within the band Then select candidates at random or using some other criterion (senority) within the band Fixed bands Fixed bands Sliding bands Sliding bands

Criterion-Referenced Measures Develop a cutoff and the meaning of scores is based on standing relative to the cut score Develop a cutoff and the meaning of scores is based on standing relative to the cut score Pass/fail Pass/fail Usually used for knowledge and achievement tests Usually used for knowledge and achievement tests Many methods available for computing cut scores Many methods available for computing cut scores Ebel Ebel Nedelsky Nedelsky Angoff Angoff

Angoff Method Subject Matter Experts (SMEs) evaluate all items and estimate the probability that a minimally qualified person would get the item right The average of the item scores is the cut score for the exam. The average of the item scores is the cut score for the exam. A bit more complex than this…. A bit more complex than this….

Norms Raw scores of psychological tests usually have little inherent meaning Raw scores of psychological tests usually have little inherent meaning For normative tests, meaning is derived by comparing scores to other individuals (e.g., other members of a sample or a normative sample) For normative tests, meaning is derived by comparing scores to other individuals (e.g., other members of a sample or a normative sample) Percentiles Percentiles Z scores Z scores T scores T scores Representativeness of the norming sample is crucial! Representativeness of the norming sample is crucial!

Norms - Percentiles Percentile: relative position in the sample or reference group Percentile: relative position in the sample or reference group Percentile rank: percentage of people that earned a raw score lower than the given score Percentile rank: percentage of people that earned a raw score lower than the given score Percentage of persons, not items Percentage of persons, not items Example: GRE scores Example: GRE scores

Norms – Z scores Expresses distance of score from the mean in SD units Expresses distance of score from the mean in SD units Advantages of standard scores Advantages of standard scores Includes information about the person’s standing in the distribution (ie., percentile rank) Includes information about the person’s standing in the distribution (ie., percentile rank) Allows comparisons across tests that have different raw metrics Allows comparisons across tests that have different raw metrics

Norms: T scores T scores are linear transformations of Z scores T scores are linear transformations of Z scores T score = (Z score * 10) + 50 T score = (Z score * 10) + 50 Mean = 50, SD = 10 Mean = 50, SD = 10 If normal T-scores will be between 20 and 80 If normal T-scores will be between 20 and 80 Why? Easier for lay audiences to interpret Why? Easier for lay audiences to interpret For Z scores, half the scores are negative. For Z scores, half the scores are negative.

Comparison of Norms Z score T score Percentile rank 38099.9 27097.5 16084 050504016 -2302.5 -320.1

Example: MMPI -2 Designed for routine diagnostic assessments Designed for routine diagnostic assessments Most frequently used personality test in the US for adults and adolescents Most frequently used personality test in the US for adults and adolescents Empirical keying approach Empirical keying approach 567 true/false items 567 true/false items 10 clinical scales plus validity scales 10 clinical scales plus validity scales Original Norms Original Norms 724 Minnesota”normals” and 221 psychiatric patients 724 Minnesota”normals” and 221 psychiatric patients Revised Norms Revised Norms 2600 U.S. residents aged 18-90 (census derived) 2600 U.S. residents aged 18-90 (census derived)

Example: MMPI-2 Empirical/Criterion keying Empirical/Criterion keying Identify a criterion group (e.g., people diagnosed with schizophrenia) Identify a criterion group (e.g., people diagnosed with schizophrenia) Identify a comparison group (e.g., persons with no mental illness) Identify a comparison group (e.g., persons with no mental illness) Administer many, many test items to both groups Administer many, many test items to both groups Identify a group of items that discriminates the two groups, i.e., items endorsed more frequently by the criterion group Identify a group of items that discriminates the two groups, i.e., items endorsed more frequently by the criterion group This group of items becomes the schizophrenia scale This group of items becomes the schizophrenia scale

Example: MMPI-2 Resulting scales are a “mixed bag” of items with generally undesirable measurement properties Resulting scales are a “mixed bag” of items with generally undesirable measurement properties Scales have heterogeneous item content Scales have heterogeneous item content Often multi-dimensional Often multi-dimensional Item overlap across scales Item overlap across scales Adds to complexity of interpretation Adds to complexity of interpretation But still appears to have practical use But still appears to have practical use

Example: MMPI -2 Administered individually or in groups Administered individually or in groups Administration time is approximately 1 to 1.5 hours Administration time is approximately 1 to 1.5 hours Scored by hand or computer Scored by hand or computer Separate scoring keys by gender Separate scoring keys by gender

Example: MMPI-2 Validity Scales Validity Scales ? Scale (Cannot say) ? Scale (Cannot say) number of items left unanswered number of items left unanswered If 30 or more items are left unanswered the protocol is invalid If 30 or more items are left unanswered the protocol is invalid F scale (Infrequency) F scale (Infrequency) 66 items 66 items atypical or deviant response style atypical or deviant response style endorsed by less than 10% of the population endorsed by less than 10% of the population general indicator of pathology or “faking bad.” general indicator of pathology or “faking bad.” Extreme elevations indicate invalid profile (100 or higher) Extreme elevations indicate invalid profile (100 or higher) No exact cutoff for suspecting an invalid profile No exact cutoff for suspecting an invalid profile

Example: MMPI-2 Example Items for F-scale Example Items for F-scale My father is a good man. (F) My father is a good man. (F) My teachers have it in for me. (T) My teachers have it in for me. (T) I am troubled by attacks of nausea and vomiting. (T) I am troubled by attacks of nausea and vomiting. (T) Evil spirits posses me at times. (T) Evil spirits posses me at times. (T) My parents do not really love me. (T) My parents do not really love me. (T) I am liked by most people who know me. (F) I am liked by most people who know me. (F) There is something wrong with my mind. (T) There is something wrong with my mind. (T) I think school is a waste of time. (T) I think school is a waste of time. (T) I get anxious and upset when I have to make a short trip away from home. (T) I get anxious and upset when I have to make a short trip away from home. (T) I have gotten many beatings. (T) I have gotten many beatings. (T)

Example: MMPI-2 Validity Scales Validity Scales Lie (L) Scale (15 items) Lie (L) Scale (15 items) extent to which client is “faking good” or describing self in an overly positive manner extent to which client is “faking good” or describing self in an overly positive manner Uneducated, lower SES will score higher Uneducated, lower SES will score higher Average number of endorsed items is 3 Average number of endorsed items is 3 T Scores of 65 or above are suspect and indicate profile should not be interpreted T Scores of 65 or above are suspect and indicate profile should not be interpreted High scores may lead to lower scores on clinical scales High scores may lead to lower scores on clinical scales

Example: MMPI-2 Example Items for L scale Example Items for L scale Once in a while I think about things too bad to talk about. Once in a while I think about things too bad to talk about. At times I feel like swearing. At times I feel like swearing. I do not always tell the truth. I do not always tell the truth. I do not read every editorial in the newspaper every day. I do not read every editorial in the newspaper every day. Once in a while I put off tomorrow what I ought to do today. Once in a while I put off tomorrow what I ought to do today. My table manners are not quite as good at home as when I am out in company. My table manners are not quite as good at home as when I am out in company.

Example: MMPI-2 Validity Scales Validity Scales K scale (30 Items) K scale (30 Items) More subtle and sophisticated index of “faking good” or “faking bad” More subtle and sophisticated index of “faking good” or “faking bad” T scores above 65 or 70 are higher than expected T scores above 65 or 70 are higher than expected Higher scores indicative of ego defensiveness and guardedness Higher scores indicative of ego defensiveness and guardedness K correction is added to five of the clinical scales K correction is added to five of the clinical scales And many more… And many more…

Example: MMPI-2 Example Items for the K scale Example Items for the K scale At times I feel like smashing things. (F) At times I feel like smashing things. (F) I think a great many people many exaggerate their misfortunes in order to gain sympathy and help of others. (F) I think a great many people many exaggerate their misfortunes in order to gain sympathy and help of others. (F) It takes a lot of argument to convince most people of the truth. (F) It takes a lot of argument to convince most people of the truth. (F) I have very few quarrels with members of my family. (T) I have very few quarrels with members of my family. (T) Most people will use somewhat unfair means to get what they want. (F) Most people will use somewhat unfair means to get what they want. (F) At times my thoughts have raced ahead faster than I could speak them. (F) At times my thoughts have raced ahead faster than I could speak them. (F) I get mad easily then get over it soon. (F) I get mad easily then get over it soon. (F)

Interpretation Yields individual’s clinical profile compared with the normative sample Yields individual’s clinical profile compared with the normative sample Interpretation is configural in nature and not dependent on any one scale Interpretation is configural in nature and not dependent on any one scale T-score of 65 or higher is considered a clinically significant elevation for all clinical scales T-score of 65 or higher is considered a clinically significant elevation for all clinical scales Clinical scales do not measure the low end; don’t interpret low scores except for Mf & Si Clinical scales do not measure the low end; don’t interpret low scores except for Mf & Si Interpreted by qualified professionals Interpreted by qualified professionals Welsh Coding Welsh Coding Record the 10 numbers of the clinical scales in order of T scores, from the highest on the left to the lowest on the right Record the 10 numbers of the clinical scales in order of T scores, from the highest on the left to the lowest on the right When adjacent scores are within one T score point, they are underlined. When they have the same T score they are placed in the ordinal sequence found on the profile sheet and underlined When adjacent scores are within one T score point, they are underlined. When they have the same T score they are placed in the ordinal sequence found on the profile sheet and underlined

Example: MMPI-2

Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Similar presentations

Presentation on theme: "Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Similar presentations

Presentation on theme: "Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test."— Presentation transcript:

Similar presentations

About project

Feedback