Testing in Clinical Psychology

Testing in Clinical Psychology
Dr. Kline FSU-PC

What is a test? What do you think????

I. Tests-definitions “A test is a systematic procedure for observing and describing a person’s behavior in a standard situation,” (Cronbach, 1970; in Nietzel et al., 2003) In theory tests should provide clinicians with an “accurate” measure of an individual’s ability, skill, talent, or knowledge base. In Clinical Psychology, tests are extremely useful in the assessment process. This is largely because tests are more systematic and objective than clinical interviews.

II. Ways in which tests are distinct from other assessment methods:
1. A test can be administered in a non-social setting (while interviews are always conducted socially). Note: Some individuals with psychopathology can “fake” sanity during a clinical interview, but are “detected” by tests like the MMPI & SKID, where there is no room for “savvy” replies to bias the results. 2. Standardized tests produce results that are compared with “normed” populations. This insures bias does not influence testing or the results. 3. Testing can be administered in groups (GRE) & individually, so large numbers of people can be tested simultaneously.

III. What do tests measure??
Tests can be grouped into 4 distinct categories: 1. Intellectual functioning** 2. Personality characteristics** 3. Attitudes, interests, preferences, & values 4. Ability **Tests most commonly used by Clinicians (for assessment, treatment & research purposes). Because general level of intellectual functioning & personality are often influenced by psychopathology (e.g., schizophrenia, Bipolar disorder), tests assessing these constructs are of significant interest to Clinical Psychologists.

IV. How do we construct tests?
Three are three basic approaches to test construction. Which method is appropriate depends on a variety of factors. 1. Analytical Approach- Clinicians using this approach determine the content they think reflects the construct they want to measure & derive test items based on all facets of this construct.

Analytical contd. What are the qualities I want to measure?
How do I define these qualities? What kind of test items would make sense to use to measure these qualities? E.g., Using this method how could we measure “motivation” 1. We’d have to operationally define what is “motivation?” How do we measure it? Actions, verbal responses, etc. We’d need to generate test items that reflect what we know or believe makes up motivational tendencies. “True or False, I’m a self-starter” “True or false, “I like to lead, not follow,” etc. Problem with this method: The test items strongly reflect the tester's view of what concepts should be examined. This may be inaccurate.

2. Empirical approach: Problem with this method:
Instead of deciding in advance which content is suitable to assess a given construct, this method lets the content choose which items reflect the construct. For instance, instead of defining “motivation,” we could measure what people who have already been identified as “highly motivated” or “not motivated” do, feel, think, and so forth to see what items reflect motivation in people. This way, the researcher isn’t using his/her bias in the concept of motivation to determine how to measure the construct. Problem with this method: A. Is costly in terms of time & manpower to conduct it. B. Requires sampling significantly more people to identify “groups” of people who demonstrate high or low levels of the trait of interest.

3. Sequential System Approach:
Combines both analytical & empirical approaches. Test items may be chosen based on analytical method, but results may be statistically analyzed to see which items are or are not correlated with one another, which are too difficult or too easy, & so forth. Or items may be chosen empirically, but determining which items to then test may be determined analytically.

V. Tests of Intellectual Functioning:
While there is a long & dubious history regarding intelligence testing, there is still no clearly adopted definition of what “constitutes” intelligence. “Mental testing” of intelligence or the psychometric approach, describe intelligence as a general characteristic (called g), as a set of up to 150 specific intellectual functions (called s’s) or as some hierarchical combination of both. Clinicians use a variety of intelligence tests to measure specific aspects of intellectual functioning and compare the results with normed data. These tests are standardized in an attempt to rule out systematic biases based on gender, age, race, culture, & so forth. Note: Biases may still influence results.

A. Binet Scales Alfred Binet devised an intelligence test for children in 1905 that consisted of 30 items & tasks. The total score was the number of items correct. Some of the tasks for Binet’s test included requiring children to: unwrap a piece of candy Track a moving light with their eyes Compare objects of different weights Repeat #s & words from memory This primitive test was improved in 1908, when tasks in the test were graded based on the age of the participants. This meant, younger children we expected to get the easier items correct, while older children were expected to pass the more difficult items. Binet & Simon examined over 200 children and determined that at certain ages, children with normal functioning should be able to do certain things. For instance, 3-year-olds should be able to identify their body parts (eyes, nose, mouth), repeat 2-digit #’s, & 6-syllable sentences, as well as their name. Older children like 7-year-olds should be able to identify missing parts of a drawing, copy simple geometric shapes/figures, and identify coin denominations (Neitzel, et al., 2003).

Additional revisions to the 1908 Binet scale:
A Stanford Psychologist, Lewis Terman, revised Binet’s test so that the mental & chronological age of the participant would be examined. Stanford-Binet results –produced an intelligence quotient (IQ) calculated by the following formula: IQ = (mental age (MA) /chronological age)*100 Therefore, if a 5 yr-old produces a MA score of 7 the IQ for this child would be 140. Terman also designated certain labels based on IQ ranges to reflect different types of general intellectual functioning. These labels today are listed as: “very superior,” “superior,” “high average”, “average,” “low average,” “borderline,” & “mentally retarded.”

The Stanford-Binet The most popular intelligence test in the US, it was revised several more times (in 1937, 1960, 1973, 1986). In 1960, IQ was no longer computed, but determined based on tables in which the formula’s results were corrected for variances based on age. Norms have been established for standardization. The most recent edition of the test, groups test items into 15 subtests. In each of these subtests, the difficulty of the items are arranged in ascending order & are organized into four different areas of functioning: verbal reasoning, abstract/visual reasoning, quantitative reasoning, & STM.

Scoring for Stanford-Binet:
Standard age scores or SAS are obtained for each subtest by using tables that convert raw scores to normalized standard scores with a mean of 50 & a standard deviation of 8. Therefore, an IQ score of 58, means a child scored 1 standard deviation above the mean, and did better than 84% of his/her cohorts on the test.

How suitable is the Stanford-Binet for assessing children’s intelligence?
The Stanford-Binet appears to be very reliable for assessing children’s intelligence. It has high test-retest reliability (above.90) and internal consistency. The test is also highly correlated with other measures of intelligence & appears to distinguish samples of gifted, retarded, & learning-disabled children (Neitzel et al., 2003).

B. Wechsler Scales David Wechsler, a psychologist at Bellevue Psychiatric hospital in New York (still famous today), developed an intelligence test for adults in 1939. This test, called the Wechsler-Bellevue (W-B) Intelligence Scale, differed from the Stanford-Binet in several ways: 1. It was for adults aged 17+ 2. It was on a point scale, in which credit is given for each correct answer. Hence, IQ did not reflect the relationship between mental & chronological age, but a comparison of points earned for the individual tested to those earned by many individuals of equal age.

Wechsler Adult Intelligence Scale: WAIS
Wechsler revised the W-B in 1955 & restandardized it to reflect ethnic populations. The WAIS was comprised of 6 verbal and 5 performance subtests, which meant you could calculate a verbal IQ, Performance IQ, and Full-Scale IQ (a combination of verbal & performance). The test was revised & restandardized in 1981 and again in 1997 (the WAIS-III). This was done to make the test more reliable given the diversity of ethnicity in our population. In addition, because data were obtained on a sample of 2,450 people ages ranging from 16 to 89, the test can be administered to elderly individuals as well.

Types of test items on the WAIS-III:
Information (verbal): What is the capital of France? Comprehension: Why do foreign cars cost more than domestic cars? Arithmetic: If you have 4 apples & give 2 away, how many do you have left? Similarities: Identify similar aspects of pairs like: hammer- screwdriver, dog-flower, portrait-short story Digit Symbol/coding: copy designs that are associated with different #s as quickly as possible. Digit Span: Repeat in forward & reverse order: 2 to 9 digit numbers. Vocabulary: Define: chair, dime, lunch, valley, asylum, sanctuary Picture Completion: Find missing objects in increasingly complex pictures. Block Design: Arrange blocks to match increasingly complex standard patterns. Picture Arrangement: Place increasing #s of pictures together to make increasingly complex stories. Symbol Search: Visually scan & recognize a series of symbols.

Scoring for the WAIS-III
To get Full Scale, Verbal, & Performance IQ scores for subjects, the individual’s total points correct for each subtest are converted to standardized IQ scores with a mean of 100 & a standard deviation of 15.

Premorbid IQ & suitability for measuring intelligence:
Clinicians with time constraints can get a quick measure of the Full-Scale IQ by combining the individual’s scores for the vocabulary subtest and the Block design. This is the premorbid IQ. The scaling is the same, 100 is the mean score with a standard deviation of 15. The WAIS-III is reliable and valid for measuring intelligence in adults. Reliability for each subject and all subtests combined is .93 and above across all age ranges. It also correlates highly with other measures of intelligence.

Wechsler Intelligence Scale for Children (WISC-III):
The WISC originally developed in 1949 to assess intelligence in children has been revised several times and is now the WISC-III (1991). It is based on norms of 2,200 children aged 6-16 in the US in 1988. It is based on 13 subtests which examine verbal comprehension, perceptual organization, freedom from distractibility (memory & attention), and processing speed. Reliability and validity of this test are high and it correlates highly with the Stanford-Binet intelligence test.

VI. Personality Tests: One of the most influential & widely used personality tests is the Minnesota Multiphasic Personality Inventory (MMPI). Developed in the 1930s by Hathaway & McKinley at the University of Minnesota, this test is used to screen large groups of people for psychological disorders. This inventory is very useful in situations when a clinical interview may not be conducted.

Construction of the MMPI:
Over 1,000 items from older personality tests & other sources were converted into statements that individuals could respond to with “true,” “false,” or “cannot say.” A significant number of these items were then presented to thousands of normal individuals as well as individuals with diagnosed mental disorders. When compared to normal individuals, several patterns emerged for individuals diagnosed with psychological disorders. In all 10 scales were developed.

MMPI scales Validity scales: Clinical Scales:
L (Lie scale)-15 items of overtly good self-report, e.g., “I smile at everyone I meet.” F (frequency or infrequency) K (correction): 30 items reflecting defensiveness in admitting problems, “I feel bad when others criticize me.” Clinical Scales: 1 or HS (hypochondriasis)- 32 items of patients’ abnormal preoccupation with their health. E.g., I have chest pain several times a week. (true) 2 or D (Depression)- 57 items examining depressive symptoms 3 or Hy (conversion Hysteria) 4 or Pd (psychopathic deviate)-50 items examining patient’s disregard for social and conventional customs. 5 or Mf (masculinity-femininity)- 56 items showing homoeroticism & items differentiating between men & women. 6 or Pa (paranoia)- 40 items from patients showing abnormal suspiciousness 7 or Pt (Psychasthenia)- 48 items based on neurotic patients showing obsessions, compulsions, phobias, guilt, & indecisiveness. 8 or Sc (Schizophrenia)- 78 items from patients showing bizarre or unusual thoughts or behavior. 9 or Ma (Hypomania)- 46 items from patients characterized by emotional excitement, over activity, and flight of ideas. 0 or Si (Social Introversion)- 69 items from people showing shyness, little interest in others, and insecurity.

MMPI Current version has over 500 items.
Validity scales are crucial to detecting test-taking attitudes and response biases. The Lie Scale is very important as it is designed to catch individuals trying to “fake” good on the inventory. These items if answered truthfully would hint at negative information about the person (e.g., a false answer in response to, “I always read the editorial every morning.”). If a person answers “Yes” to a significant # of lie scale questions, then it indicates they are “impression managing” which suggests a problem with their performance on the inventory.

MMPI-2 The MMPI was restandardized, revised, & made available in It can be administered paper-n-pencil & by computer. It compares patterns of responding in individuals to determine if psychopathology is present and in what form. It is widely used in making diagnostic assessments of individuals and remains the most important test used by Clinicians.

California Psychological Inventory: CPI
Was designed to assess personality in the “normal” population. Half of its items are taken from the MMPI and the other are newly generated items. Because the CPI was conducted on over 13,000 males & females from all socioeconomic statuses & parts of the US, it provides a very strong test of personality assessment. It has been shown to be very reliable for predicting delinquency, parole outcome, academic performance, and likelihood of dropping out of school (Neitzel et al., 2003). Its also computerized making administration easy to conduct.

VII. Projective Personality Tests
Standard stimuli are presented to the patient (inkblots or drawings) ambiguous enough to allow for variation in responses. Patient’s responses should be based on primarily unconscious processes & will reveal his/her true feelings, thoughts, motives.

Types of Projective tests:
A. Rorschach Inkblot test– patient presented with 10 inkblots. Half of inkblots are in black, white, & shades of gray. Two have red splotches, and 3 are in pastel colors. Patients report what they “see.” The test was developed by Swiss psychiatrist, Hermann Rorschach in the early 1900s. Beck, an American psychology student, published a standardized procedure for measuring responses on the Rorschach in Following this, other reports came out.

Rorschach scoring: The client reports what he/she sees in each inkblot. The Clinician writes down the individual’s answer verbatim. When all the cards have been presented, the Clinician goes through the set of cards and conducts an inquiry about what characteristics of each inkblot led to their answer. Answers are coded. Scoring involves the location, determinants, content, & popularity of the responses.

Scoring Dimensions: 1. Location—what area of the blot led the client to respond (e.g., whole blot, an unusual detail, white space, a combination of these aspects). 2. Determinants-characteristic of the blot that influenced a response; this includes form, color, shading, & movement. 3. Content- the subject of the blot. That is what is perceived (e.g., animals, figures, objects, sexual symbols, blood, etc.) 4. Popularity- refers to frequency of specific kinds of responses made my many individuals.

2. Thematic Apperception Test (TAT)
Patient is shown a series of black-and-white pictures one by one and asked to tell a story related to each. What is the symbolic meaning underlying the story the patient provides?

Testing in Clinical Psychology

Similar presentations

Presentation on theme: "Testing in Clinical Psychology"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Testing in Clinical Psychology

Similar presentations

Presentation on theme: "Testing in Clinical Psychology"— Presentation transcript:

Similar presentations

About project

Feedback