Introduction to Psychometrics

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

Standardized Scales.
Survey Methodology Reliability and Validity EPID 626 Lecture 12.
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Identifying Good Measurement
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Part II Sigma Freud & Descriptive Statistics
Part II Sigma Freud & Descriptive Statistics
Measurement Reliability and Validity
Face, Content & Construct Validity
CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY
Introduction to Psychological Measurement
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
Validity, Sampling & Experimental Control Psych 231: Research Methods in Psychology.
Introduction to Psychometrics Psychometrics & Measurement Validity Some important language Properties of a “good measure” –Standardization –Reliability.
Reliability & Validity
Beginning the Research Design
Psychometrics, Measurement Validity & Data Collection
Criterion-related Validity About asking if a test “is valid” Criterion related validity types –Predictive, concurrent, & postdictive –Incremental, local,
Introduction to Psychometrics Psychometrics & Measurement Validity Constructs & Measurement “Kinds” of Items Properties of a “good measure” –Standardization.
Psych 231: Research Methods in Psychology
Variables cont. Psych 231: Research Methods in Psychology.
Validity, Reliability, & Sampling
Practical Psychometrics Preliminary Decisions Components of an item # Items & Response Approach to the Validation Process.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge.
Understanding Validity for Teachers
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Measurement and Data Quality
Reliability, Validity, & Scaling
Ch 6 Validity of Instrument
Instrument Validity & Reliability. Why do we use instruments? Reliance upon our senses for empirical evidence Senses are unreliable Senses are imprecise.
Instrumentation.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Reliability & Validity
Measurement Validity.
Research: Conceptualization and Measurement Conceptualization Steps in measuring a variable Operational definitions Confounding Criteria for measurement.
Advanced Research Methods Unit 3 Reliability and Validity.
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
Psychometrics & Validation Psychometrics & Measurement Validity Properties of a “good measure” –Standardization –Reliability –Validity A Taxonomy of Item.
Unit 5: Improving and Assessing the Quality of Behavioral Measurement
Research: Conceptualization and Measurement Conceptualization Steps in measuring a variable Operational definitions Confounding Criteria for measurement.
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Measurement and Scaling
Chapter 9 Correlation, Validity and Reliability. Nature of Correlation Association – an attempt to describe or understand Not causal –However, many people.
Measurement Theory in Marketing Research. Measurement What is measurement?  Assignment of numerals to objects to represent quantities of attributes Don’t.
Measurement Reliability Objective & Subjective tests Standardization & Inter-rater reliability Properties of a “good item” Item Analysis Internal Reliability.
Chapter 14: Affective Assessment
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Testing. Psychological Tests  Tests abilities, interests, creativity, personality, behavior  Must be standardized, reliable, and valid  Timing, instructions,
Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Data Collection Methods NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN.
Consistency and Meaningfulness Ensuring all efforts have been made to establish the internal validity of an experiment is an important task, but it is.
Survey Methodology Reliability and Validity
Questions What are the sources of error in measurement?
Reliability and Validity in Research
Test Validity.
پرسشنامه کارگاه.
RESEARCH METHODS Lecture 18
Presentation transcript:

Introduction to Psychometrics Some important language Properties of a “good measure” Standardization Reliability Validity Common Item types Reverse Keying Construction & Validation Process

(Psychological measurement) Psychometrics (Psychological measurement) The process of assigning values to represent the amounts and kinds of specified attributes, to describe (usually) persons. We do not “measure people” We measure specific attributes or characteristics of a person Psychometrics is the “centerpiece” of empirical psychological research and practice. All data result from some form of “measurement” What we’ve meant by “Measurement Validity” all along The better the measurement, the better the data, the better the conclusions of the psychological research or application

Most of what we try to measure in Psychology are constructs They’re called “constructs” because most of what we care about as psychologists are not physical measurements, such as height, weight, pressure & velocity… …rather the “stuff of psychology”  learning, motivation, anxiety, social skills, depression, wellness, etc. are things that “don’t really exist” that have “been constructed” to help us describe and understand behavior. They are attributes and characteristics that we’ve constructed to give organization and structure to behavior. Essentially all of the things we psychologists research, both as causes and effects, are Attributive Hypotheses with different levels of support and acceptance!!!!

Measurement of constructs is more difficult than of physical properties! We can’t just walk up to someone with a scale, ruler, graduated cylinder or velocimeter and measure how depressed they are. We have to figure out some way to turn their behavior, self-reports or traces of their behavior into variables that give values for the constructs we want to measure. So, measurement is, much like the rest of research that we’ve learned about so far, all about representation !!! Measurement Validity is the extent to which the variable values (data) we have represent the behaviors we want to study.

What are the different types of constructs we measure ??? The most commonly discussed types are ... Achievement -- “performance” broadly defined (judgements) e.g., scholastic skills, job-related skills, research DVs, etc. Attitude/Opinion -- “how things should be” (sentiments) polls, product evaluations, etc. Personality -- “characterological attributes” (keyed sentiments) anxiety, psychoses, assertiveness, etc. There are other types of measures that are often used… Social Skills -- achievement or personality ?? Aptitude -- “how well some will perform after then are trained and experiences” but measures before the training & experience” some combo of achievement, personality and “likes” IQ -- is it achievement (things learned) or is it “aptitude for academics, career and life” ??

Some language … Mostly we will talk about measurement using self-report behavioral observation, instrumentation & trace indices are all part of measurement, but … Each question is called an  item Kinds of items  objective items vs. subject items “objective” does not mean “true” or “real” objective means “no judgment or evaluation is required” there is one “correct answer” and everything else is wrong e.g., multiple choice, T&F, fill-in-the-blanks subjective means that “someone has to judge what is correct” short answer, essay

Some more language … A collection of items is called many things… e.g., survey, questionnaire, instrument, measure, test, or scale Three “kinds” of item collections you should know .. Scale (Test) - all items are “put together” to get a single score Subscale (Subtest) – item sets “put together” to get multiple separate scores Surveys – each item gives a specific piece of information Most “questionnaires” or “surveys” are a combination of all three, giving data like you used for your research project single demographic & history “survey” items some instruments that gave a singe “scale” score some instruments that gave multiple “subscale” score

Some more language … Psychometric Sampling & Inference process… Research Sampling is about how well a sample of participants represents the target population we collect data from the sample and infer that the statistical results from the sample tell us about the entire population Measurement Sampling is about how well a scale (a set of items) represents behavior (domain) we collect data using the scale and infer that the score we get reflects the score for the behavior (entire domain) Psychometric Sampling is both … collecting a set of items sampled from a domain from a set of participants sampled from a population and using statistics calculated from the scale scores to represent the behavior of that population

Desirable Properties of Psychological Measures Interpretability of Individual and Group Scores Population Norms (Typical Scores) Validity (Consistent Accuracy) Reliability (Consistency) Standardization (Administration & Scoring)

Scoring – test is “scored” the same way every time Standardization Administration – test is “given” the same way every time who administers the instrument specific instructions, order of items, timing, etc. Varies greatly - multiple-choice classroom test  hand it out) - WAIS -- 100+ page administration manual Scoring – test is “scored” the same way every time who scores the instrument correct, “partial” and incorrect answers, points awarded, etc. Varies greatly -- multiple choice test (fill in the sheet) -- WAIS – 200+ page scoring manual

Reliability (Consistency or Agreement) Inter-rater or Inter-observers reliability do multiple observers/coders score an item the same way ? important whenever using subjective items Internal reliability -- do the items measure a central “thing” Cronbach’s alpha  α = .00 – 1.00  higher values mean stronger internal consistency/reliability External Reliability -- consistency of scale/test scores test-retest reliability – correlate scores from same test given 3-18 weeks apart alternate forms reliability – correlate scores from two “versions” of the test

Validity (Consistent Accuracy) Face Validity -- do the items come from “domain of interest” ? non-statistical -- decision of “target population” Content Validity -- do the items come from “domain of interest”? non-statistical -- decision of “expert in the field” Criterion-related Validity -- does test correlate with “criterion”? statistical -- requires a criterion that you “believe in” predictive, concurrent, postdictive validity Construct Validity -- does test relate to other measures it should? Statistical -- Discriminant validity convergent validity -- correlates with selected tests divergent validity -- doesn’t correlate with others

“Is the test valid?” Jum Nunnally (one of the founders of modern psychometrics) claimed this was “silly question”! The point wasn’t that tests shouldn’t be “valid” but that a test’s validity must be assessed relative to… the construct it is intended to measure the population for which it is intended (e.g., age, level) the application for which it is intended (e.g., for classifying folks into categories vs. assigning them quantitative values) So, the real question is, “Is this test a valid measure of this construct for this population in this application?” That question can be answered!

Personality, Attitude, Opinion (“Psychology”) Items Most Common Types of Items ??? Personality, Attitude, Opinion (“Psychology”) Items 1. How do you feel today ? Unhappy 1 2 3 4 5 happy 2. How interested are you in campus politics ? Interested 1 2 3 4 5 6 7 Uninterested These are called Likert or Likert-Type items statement response along a continuum with “verbal anchors” 5- 7- & 9-point response scales are common

Personality, Attitude, Opinion (“Psychology”) Items Most Common Types of Items ??? Personality, Attitude, Opinion (“Psychology”) Items 1. Which of these best describes you ? a. I am mostly interested in the “social side” of college. b. I am mostly interested in the “intellectual side” of college. 2. Would you rather spend time with a friend ... at your favorite restaurant watching a sporting event These are called Forced Choice items

“Test” Items Most Common Types of Items ??? 1. Which of these is one of the 7 dwarves ? a. Grungy b. Sleazy c. Kinky d. Doc e. Dorky 2. What should you do if the traffic light turns yellow as you approach an intersection ? a. Stop b. Speed up c. Check for Police and then choose “a” vs. “b” These (as you know) are called Multiple Choice items Their difference from Likert items is that, for these, the response options are qualitative different.

Reverse Keying We want the respondents to carefully read an separately respond to each item of our scale/test. One thing we do is to write the items so that some of them are “backwards” or “reversed” … Consider these items from a depression measure… 1. It is tough to get out of bed some mornings. disagree 1 2 3 4 5 agree 2. I’m generally happy about my life. 1 2 3 4 5 3. I sometimes just want to sit and cry. 1 2 3 4 5 4. Most of the time I have a smile on my face. 1 2 3 4 5 If the person is “depressed”, we would expect then to give a fairly high rating for questions 1 & 3, but a low rating on 2 & 4. Before aggregating these items into a composite scale or test score, we would “reverse key” items 2 & 4 (1=5, 2=4, 4=2, 5=1)

Scale Construction & Validation Process Determine what kind of scale you are trying to make Construct, Population & Application Focus groups Work with “subject matter experts” to define the domain Write items Content validity is the focus item types, reverse keying & face validity Back to the SMEs to evaluate content validity Pilot the scale Walk-through with members of target population for readability, word choices, reverse-keying issues, etc. Assess face validity Establish standards Administration Scoring

Scale Construction & Validation Process, cont. Collect data from first sample Evaluate internal reliability Evaluate alternate form reliability (if applicable) Evaluate inter-rater reliability (if applicable) Collect data again from the same sample (3-18 weeks later) Evaluate test-retest reliability Collect data from second sample (including other measures) Repeat internal and external reliability analyses Evaluate criterion-related validity (if applicable) Evaluate discriminant validity (if applicable) Collect data from third sample (including other measures) Repeat validity evaluation(s) – called “cross-validation” Combining all data – Establish population standards Population norms (e.g., mean & std) Cutoff scores (diagnosis, selection, etc.)