Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Psychometrics

Similar presentations


Presentation on theme: "Introduction to Psychometrics"— Presentation transcript:

1 Introduction to Psychometrics
Some important language Properties of a “good measure” Standardization Reliability Validity Common Item types Reverse Keying Construction & Validation Process

2 (Psychological measurement)
Psychometrics (Psychological measurement) The process of assigning values to represent the amounts and kinds of specified attributes, to describe (usually) persons. We do not “measure people” We measure specific attributes or characteristics of a person Psychometrics is the “centerpiece” of empirical psychological research and practice. All data result from some form of “measurement” What we’ve meant by “Measurement Validity” all along The better the measurement, the better the data, the better the conclusions of the psychological research or application

3 Most of what we try to measure in Psychology are constructs
They’re called “constructs” because most of what we care about as psychologists are not physical measurements, such as height, weight, pressure & velocity… …rather the “stuff of psychology”  learning, motivation, anxiety, social skills, depression, wellness, etc. are things that “don’t really exist” that have “been constructed” to help us describe and understand behavior. They are attributes and characteristics that we’ve constructed to give organization and structure to behavior. Essentially all of the things we psychologists research, both as causes and effects, are Attributive Hypotheses with different levels of support and acceptance!!!!

4 Measurement of constructs is more difficult than of physical properties!
We can’t just walk up to someone with a scale, ruler, graduated cylinder or velocimeter and measure how depressed they are. We have to figure out some way to turn their behavior, self-reports or traces of their behavior into variables that give values for the constructs we want to measure. So, measurement is, much like the rest of research that we’ve learned about so far, all about representation !!! Measurement Validity is the extent to which the variable values (data) we have represent the behaviors we want to study.

5 What are the different types of constructs we measure ???
The most commonly discussed types are ... Achievement -- “performance” broadly defined (judgements) e.g., scholastic skills, job-related skills, research DVs, etc. Attitude/Opinion -- “how things should be” (sentiments) polls, product evaluations, etc. Personality -- “characterological attributes” (keyed sentiments) anxiety, psychoses, assertiveness, etc. There are other types of measures that are often used… Social Skills -- achievement or personality ?? Aptitude -- “how well some will perform after then are trained and experiences” but measures before the training & experience” some combo of achievement, personality and “likes” IQ -- is it achievement (things learned) or is it “aptitude for academics, career and life” ??

6 Some language … Mostly we will talk about measurement using self-report behavioral observation, instrumentation & trace indices are all part of measurement, but … Each question is called an  item Kinds of items  objective items vs. subject items “objective” does not mean “true” or “real” objective means “no judgment or evaluation is required” there is one “correct answer” and everything else is wrong e.g., multiple choice, T&F, fill-in-the-blanks subjective means that “someone has to judge what is correct” short answer, essay

7 Some more language … A collection of items is called many things… e.g., survey, questionnaire, instrument, measure, test, or scale Three “kinds” of item collections you should know .. Scale (Test) - all items are “put together” to get a single score Subscale (Subtest) – item sets “put together” to get multiple separate scores Surveys – each item gives a specific piece of information Most “questionnaires” or “surveys” are a combination of all three, giving data like you used for your research project single demographic & history “survey” items some instruments that gave a singe “scale” score some instruments that gave multiple “subscale” score

8 Some more language … Psychometric Sampling & Inference process… Research Sampling is about how well a sample of participants represents the target population we collect data from the sample and infer that the statistical results from the sample tell us about the entire population Measurement Sampling is about how well a scale (a set of items) represents behavior (domain) we collect data using the scale and infer that the score we get reflects the score for the behavior (entire domain) Psychometric Sampling is both … collecting a set of items sampled from a domain from a set of participants sampled from a population and using statistics calculated from the scale scores to represent the behavior of that population

9 Desirable Properties of Psychological Measures
Interpretability of Individual and Group Scores Population Norms (Typical Scores) Validity (Consistent Accuracy) Reliability (Consistency) Standardization (Administration & Scoring)

10 Scoring – test is “scored” the same way every time
Standardization Administration – test is “given” the same way every time who administers the instrument specific instructions, order of items, timing, etc. Varies greatly - multiple-choice classroom test  hand it out) WAIS page administration manual Scoring – test is “scored” the same way every time who scores the instrument correct, “partial” and incorrect answers, points awarded, etc. Varies greatly -- multiple choice test (fill in the sheet) WAIS – 200+ page scoring manual

11 Reliability (Consistency or Agreement)
Inter-rater or Inter-observers reliability do multiple observers/coders score an item the same way ? important whenever using subjective items Internal reliability -- do the items measure a central “thing” Cronbach’s alpha  α = .00 – 1.00  higher values mean stronger internal consistency/reliability External Reliability -- consistency of scale/test scores test-retest reliability – correlate scores from same test given weeks apart alternate forms reliability – correlate scores from two “versions” of the test

12 Validity (Consistent Accuracy)
Face Validity -- do the items come from “domain of interest” ? non-statistical -- decision of “target population” Content Validity -- do the items come from “domain of interest”? non-statistical -- decision of “expert in the field” Criterion-related Validity -- does test correlate with “criterion”? statistical -- requires a criterion that you “believe in” predictive, concurrent, postdictive validity Construct Validity -- does test relate to other measures it should? Statistical -- Discriminant validity convergent validity -- correlates with selected tests divergent validity -- doesn’t correlate with others

13 “Is the test valid?” Jum Nunnally (one of the founders of modern psychometrics) claimed this was “silly question”! The point wasn’t that tests shouldn’t be “valid” but that a test’s validity must be assessed relative to… the construct it is intended to measure the population for which it is intended (e.g., age, level) the application for which it is intended (e.g., for classifying folks into categories vs. assigning them quantitative values) So, the real question is, “Is this test a valid measure of this construct for this population in this application?” That question can be answered!

14 Personality, Attitude, Opinion (“Psychology”) Items
Most Common Types of Items ??? Personality, Attitude, Opinion (“Psychology”) Items 1. How do you feel today ? Unhappy happy 2. How interested are you in campus politics ? Interested Uninterested These are called Likert or Likert-Type items statement response along a continuum with “verbal anchors” & 9-point response scales are common

15 Personality, Attitude, Opinion (“Psychology”) Items
Most Common Types of Items ??? Personality, Attitude, Opinion (“Psychology”) Items 1. Which of these best describes you ? a. I am mostly interested in the “social side” of college. b. I am mostly interested in the “intellectual side” of college. 2. Would you rather spend time with a friend ... at your favorite restaurant watching a sporting event These are called Forced Choice items

16 “Test” Items Most Common Types of Items ???
1. Which of these is one of the 7 dwarves ? a. Grungy b. Sleazy c. Kinky d. Doc e. Dorky 2. What should you do if the traffic light turns yellow as you approach an intersection ? a. Stop b. Speed up c. Check for Police and then choose “a” vs. “b” These (as you know) are called Multiple Choice items Their difference from Likert items is that, for these, the response options are qualitative different.

17 Reverse Keying We want the respondents to carefully read an separately respond to each item of our scale/test. One thing we do is to write the items so that some of them are “backwards” or “reversed” … Consider these items from a depression measure… 1. It is tough to get out of bed some mornings disagree agree 2. I’m generally happy about my life 3. I sometimes just want to sit and cry 4. Most of the time I have a smile on my face If the person is “depressed”, we would expect then to give a fairly high rating for questions 1 & 3, but a low rating on 2 & 4. Before aggregating these items into a composite scale or test score, we would “reverse key” items 2 & 4 (1=5, 2=4, 4=2, 5=1)

18 Scale Construction & Validation Process
Determine what kind of scale you are trying to make Construct, Population & Application Focus groups Work with “subject matter experts” to define the domain Write items Content validity is the focus item types, reverse keying & face validity Back to the SMEs to evaluate content validity Pilot the scale Walk-through with members of target population for readability, word choices, reverse-keying issues, etc. Assess face validity Establish standards Administration Scoring

19 Scale Construction & Validation Process, cont.
Collect data from first sample Evaluate internal reliability Evaluate alternate form reliability (if applicable) Evaluate inter-rater reliability (if applicable) Collect data again from the same sample (3-18 weeks later) Evaluate test-retest reliability Collect data from second sample (including other measures) Repeat internal and external reliability analyses Evaluate criterion-related validity (if applicable) Evaluate discriminant validity (if applicable) Collect data from third sample (including other measures) Repeat validity evaluation(s) – called “cross-validation” Combining all data – Establish population standards Population norms (e.g., mean & std) Cutoff scores (diagnosis, selection, etc.)


Download ppt "Introduction to Psychometrics"

Similar presentations


Ads by Google