Practical Psychometrics Preliminary Decisions Components of an item # Items & Response Approach to the Validation Process.

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.
Standardized Scales.
Chapter 8 Flashcards.
Developing a Questionnaire
Survey Methodology Reliability and Validity EPID 626 Lecture 12.
Reliability and Validity
1 Reliability in Scales Reliability is a question of consistency do we get the same numbers on repeated measurements? Low reliability: reaction time High.
Research Methodology Lecture No : 11 (Goodness Of Measures)
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Measurement Reliability and Validity
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
Face, Content & Construct Validity
CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY
Copyright © Allyn & Bacon (2007) Data and the Nature of Measurement Graziano and Raulin Research Methods: Chapter 4 This multimedia product and its contents.
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
Chapter 4 Validity.
VALIDITY.
Validity, Sampling & Experimental Control Psych 231: Research Methods in Psychology.
Introduction to Psychometrics Psychometrics & Measurement Validity Some important language Properties of a “good measure” –Standardization –Reliability.
Reliability & Validity
Introduction to Psychometrics
1 Measurement PROCESS AND PRODUCT. 2 MEASUREMENT The assignment of numerals to phenomena according to rules.
Measurement: Reliability and Validity For a measure to be useful, it must be both reliable and valid Reliable = consistent in producing the same results.
Index and Scale Similarities: Both are ordinal measures of variables. Both rank order units of analysis in terms of specific variables. Both are measurements.
1 Measurement Measurement Rules. 2 Measurement Components CONCEPTUALIZATION CONCEPTUALIZATION NOMINAL DEFINITION NOMINAL DEFINITION OPERATIONAL DEFINITION.
Today Concepts underlying inferential statistics
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Measurement and Data Quality
Principal Components An Introduction
Reliability, Validity, & Scaling
MEASUREMENT OF VARIABLES: OPERATIONAL DEFINITION AND SCALES
Measurement in Exercise and Sport Psychology Research EPHE 348.
Measurement, Scales and Attitudes. Nominal Ordinal?
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
ScWk 240 Week 6 Measurement Error Introduction to Survey Development “England and America are two countries divided by a common language.” George Bernard.
Learning Objective Chapter 9 The Concept of Measurement and Attitude Scales Copyright © 2000 South-Western College Publishing Co. CHAPTER nine The Concept.
Psychometrics & Validation Psychometrics & Measurement Validity Properties of a “good measure” –Standardization –Reliability –Validity A Taxonomy of Item.
Chapter 4-Measurement in Marketing Research
Measurement and Questionnaire Design. Operationalizing From concepts to constructs to variables to measurable variables A measurable variable has been.
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Measurement and Scaling
Measurement Theory in Marketing Research. Measurement What is measurement?  Assignment of numerals to objects to represent quantities of attributes Don’t.
Measurement Reliability Objective & Subjective tests Standardization & Inter-rater reliability Properties of a “good item” Item Analysis Internal Reliability.
RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.
Chapter 6 - Standardized Measurement and Assessment
1 Announcement Movie topics up a couple of days –Discuss Chapter 4 on Feb. 4 th –[ch.3 is on central tendency: mean, median, mode]
Chapter 6 Indexes, Scales, And Typologies. Chapter Outline Indexes versus Scales Index Construction Scale Construction.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
RELIABILITY AND VALIDITY Dr. Rehab F. Gwada. Control of Measurement Reliabilityvalidity.
Measurement and Scaling Concepts
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
Survey Methodology Reliability and Validity
Indexes, Scales, and Typologies
Reliability and Validity
Test Validity.
Chapter 6 Indexes, Scales, And Typologies
RESEARCH METHODS Lecture 18
Presentation transcript:

Practical Psychometrics Preliminary Decisions Components of an item # Items & Response Approach to the Validation Process

The items you need and the validation processes you will choose all depend upon “what kind of scale you are writing” – you have to decide … measuring, predicting, or measuring to predict ? construct / content ? quantifying or classifying or multiple classifications ? target population ? single scale or multiple subscales ? want face validity ? relative/rank or value reliability ? alternate forms ?

Components of a single item… Item = target construct + systematic error + random error Systematic error sources other constructs social desirability / impression management asymmetric response scales (e.g., average, good, great, awesome) Random error sources non-singular (double-barreled) items response patterns (e.g., answer all 5s) inattention / disinterest

Item-writing Kind of item ? Judgement vs. Sentiment – what they know or what they think? Absolute vs. Comparative – what want them thinking about? Preference vs. Similarity – want ranking/selection or values? Things to consider … don’t mix item types too much – confusing to respondents consider what you are trying to measure consider what you will do with the response values are there “correct answers” or “indicative responses” e.g., ratings are easier to work with than ranks consider the cognitive abilities of the target population

Item Writing, cont. – focusing on sentiments… How many response options ? binary – useful if respondents have “limited cognitive capacity” 5 – the Likert standard (perhaps a historical accident?) 7 +/- 2 – based on working memory capacity Important Issue #1 – middle item ??? some don’t like allowing respondents to “hug the middle” research tells us that, if given an even #responses, they will “hug the middle 2” and increase error variance Important Issue #2 – verbal anchoring ??? some like to have a verbal anchor for all items & other like to anchor the ends (semantic differential) & some also like to anchor the middle some research has shown that labels are “less interval” than #s – i.e., the anchors “hurt” getting interval-like data

Important Issue #3 – Item Sensitivity Item sensitivity relates to how much precision we get from a single item Consider a binary item with the responses “good” & “bad” -- big difference between a “1” and a “2” Consider a 3-response item with “dislike” “neutral” “like” – huge “steps” among 1,2 & 3 – can lead to “position hugging” Consider a 5 – response item “strongly disagree” “disagree” “neutral” “agree” “strongly agree” – smaller “steps” among 1,2,3,4 & 5 – should get less “hugging” and more sensitivity Greater numbers of response options can increase item sensitivity – beware overdoing it (see next page)

Important Issue #4 – Scale sensitivity Scale sensitivity is the “functional range” of the scale which is tied to the variability of data values Consider a 5-item true-false test – available scores are 0% 20% 40% 60% 80% & 100% -- not much sensitivity & lots of ties How to increase scale sensitivity? Increase #responses/item sensitivity – can only push this so far Increase # items – known to help with internal consistency both – seems to be the best approach Consider: 1 item with 100 response options (98 +/- 2???) (VAS?) 100 binary items (not getting much from each of many items) 50 items with 3 options (50-150) 20 items with 6 options (20-120) 12 items with 9 options (9-108) Working the #item - #responses trade-off

Important Issue #5 – Item & Scale difficulty / response probability “What” you are trying to measure from whom will impact “how hard” the items should be… obvious for judgment items -- less obvious for sentiment consider measuring “depression” from college students vs. psychiatric inpatients – measuring very different “levels” of depression “Where” you are measuring, “Why” you are measuring & from whom will impact “how hard” the items should be … equidiscriminating math test for 3 rd graders vs. college math majors identifying “remedial students” math test for 3 rd graders vs. college math majors

Validation Process Over the years several very different suggestions have been made about how to validate a scale – both in terms of the kinds of evidence that should be offered and the order in which they should be sought. Couple of things to notice… Many of the different suggests aren’t “competing” – they were suggested by folks working in different content areas with different measurement goals – know how scales are constructed and validated in your research area ! Urgency must be carefully balanced with process – if you are trying to gather all the forms of evidence you’ve decided you need in a single or couple of studies you can be badly delayed if one or more don’t pan out…

Desirable Properties of Psychological Measures Interpretability of Individual’s and Group’s Scores Population Norms (Typical Scores) Validity (Consistent Accuracy) Reliability (Consistency) Standardization (Administration & Scoring)

Process & Evidence Approaches Let’s start with 2 that are very different… “Cattell/Likert Approach” focus on criterion-related validity requires a “gold standard” criterion – not great for “1 st measures” emphasizes the predictive nature of scales & validation a valid measure is a combination of valid items a scale is constructed of items that are each related to criterion criterion-related validity coefficient is the major evidence construct validation sometimes follows tend to have “limited” internal consistency and “complex” factor structures – these are not selection or validation goals

Process & Evidence Approaches “Nunnally Approach” & very different from that … focus on content validity does not require a “gold-standard” criterion is the most common approach for “1st measures” emphasizes the measurement nature of scales & validation a valid measure is made up of items from the target content domain internal consistency, test-retest reliability & content validity (using content experts) are the major evidence construct validation usually follows

Obviously there are all sorts of permutations & combinations of these two… One-shot approach … “a good scale is made of good items” so select items that… related to the criterion construct (convergent validity) don’t relate to non-criterion constructs (divergent validity) interrelate with each other show good temporal stability represent the required range of difficulty/response probability You might imagine that individual items that meet all these criteria are infrequent. Remember -- the reason that we build scales is that individual items don’t have these attributes, while composites are more likely to have these attributes.

Interesting variation – content validation of predictive validity… Range restriction is a real problem when substituting concurrent validation with incumbent populations for predictive validity with applicant populations. Validities of often shrink to Alternative approach is to base predictive validation on “content experts” (SMEs – incumbents & supervisors) – three steps… 1.Get frequency & importance data from SMEs supporting that the content of the performance criterion reflects the job performance domain for that population (O*NET) 2.Get data from SMEs that content of the predictive scale reflects the predictive domain for that population 3.Get data from SMEs that the predictive domain and criterion domains are related

Interesting variation – content validation of predictive validity… Example … Want to use “job focus” to select among applicants, so have to show that it is predictive of job performance. Job performance criterion is validated by having SMEs rate each of several job performance specifics (e.g., “notice whether or not all the bolts are tightened” is frequent and/or important to perform the job successfully. Job performance predictor is validated by having SMEs rate whether each of items is related to “job focus” (e.g., Does your mind sometimes wonder so that you miss details?). Predictive validity is established by showing that each of the job performance predictor items is related to at least one of the job performance specifics