Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series Abbey Love, MS, & Dani Rosenkrantz, MS, EdS Guiding Steps for the Evaluation.

Similar presentations


Presentation on theme: "Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series Abbey Love, MS, & Dani Rosenkrantz, MS, EdS Guiding Steps for the Evaluation."— Presentation transcript:

1 Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series
Abbey Love, MS, & Dani Rosenkrantz, MS, EdS Guiding Steps for the Evaluation or Creation of a Scale: A Starter Kit March 20, 2018 DANI Current abstract: This presentation will be an application-focused talk on 1.) How to determine if an existing scale should be used, and 2.) How to develop a scale when wanting to measure a psychological construct. The first half of the discussion will advise on where to look for measures and what a reliable and valid scale looks like. The second part of the presentation will focus on the steps needed to develop a measure, should you find this is necessary. We will discuss the measurement issues that arise from poor tools and suggest best practices in scale development.

2 What is a “bad” scale? I saw in someone’s dissertation a newly published scale… It depends! dani

3 But, will bad measurement really hurt my study?
DANI

4 Yes! That’s why we are here. 
Big Picture: Poor measurement is an ethical concern because If the measurement is problematic, the reliability of our findings is compromised …in other words… Our degree of trust in our results is in question, which means our statistical conclusion validity is in question!

5 Sample specific measurement challenges that may occur
Low reliability or large measurement error around scores Uncertainty about using a total score for each person or subscale scores for each person Not all response options are being used or low item response variability Poor factor structure solutions due to cross-loading issues on multiple factors, low loadings on factors, and/or influences such as item phrasing dani

6 Examples From Applied Work
Cognitive Flexibility Inventory for Dissertation Only original paper explored factor structure Cross loading issues on original Poor recovery of factor structure in my sample Had to reduce items for better fit, a controversial decision Internal structure assessment of Objectified Body Consciousness Scale Poor fit with trans women, indicating inappropriate to use without further study DANI

7 Why does good measurement matter?
DANI

8 If you use a well established measure you will likely find the following
High reliability which will result in low measurement error accurate effect size estimates (d, R2) better captures effects of interest (Beta, path coefficients) improves inferential techniques (more accurate SEs and ultimately statistical decisions) Confidence in how to score the scale (total and/or subscales) All response options are being used Strong recovery of factor structure solution with near zero cross-loadings across factors, high loadings on intended factors, and/or minimal influence due to method factors

9 1.) Evaluation of psychological scales - Should I use a scale I found?
Looking for a Scale? Guiding Steps for the Evaluation or Creation of a Scale 1.) Evaluation of psychological scales - Should I use a scale I found? 2.) Scale development – I can’t find a scale. What steps do I need to develop a scale to measure a psychological construct? DANI We will be giving you a global picture of both – not talking specifics

10 How do I know if my scale is “good?”
DANI

11 Good scales… have ongoing and multiple sources of evidence that can be used to evaluate the validity of the interpretation of the scale for a particular use. ABBEY

12 Sources of Validity in Instrument Development
Evidence based on... Test content Am I measuring what I planned to measure? ABBEY As recommended by AERA, APA, and NCME (2014), we followed multiple steps to establish evidence for the validity of the newly-created scale. Each source of evidence described below was used to increase the degree to which these student-specific teaching self-efficacy scores can be accepted as reliable and valid.

13 Sources of Validity in Instrument Development
Evidence based on... Test content Am I measuring what I planned to measure? Reponses processes Are my participants understanding the items on my scale in an expected way? ABBEY As recommended by AERA, APA, and NCME (2014), we followed multiple steps to establish evidence for the validity of the newly-created scale. Each source of evidence described below was used to increase the degree to which these student-specific teaching self-efficacy scores can be accepted as reliable and valid.

14 Sources of Validity in Instrument Development
Evidence based on... Test content Am I measuring what I planned to measure? Reponses processes Are my participants understanding the items on my scale in an expected way? Relations to other variables Do the items I have chosen to represent my construct relate to other variables in an expected way? This can include convergent and discriminant evidence. ABBEY As recommended by AERA, APA, and NCME (2014), we followed multiple steps to establish evidence for the validity of the newly-created scale. Each source of evidence described below was used to increase the degree to which these student-specific teaching self-efficacy scores can be accepted as reliable and valid.

15 Sources of Validity in Instrument Development
Evidence based on... Test content Am I measuring what I planned to measure? Reponses processes Are my participants understanding the items on my scale in an expected way? Relations to other variables Do the items I have chosen to represent my construct relate to other variables in an expected way? This can include convergent and discriminant evidence. Internal structure What is the degree to which the items on my scale are conforming to the construct and how I intend to interpret the scale? ABBEY As recommended by AERA, APA, and NCME (2014), we followed multiple steps to establish evidence for the validity of the newly-created scale. Each source of evidence described below was used to increase the degree to which these student-specific teaching self-efficacy scores can be accepted as reliable and valid.

16 Sources of Validity in Instrument Development
Evidence based on... Test content Am I measuring what I planned to measure? Reponses processes Are my participants understanding the items on my scale in an expected way? Relations to other variables Do the items I have chosen to represent my construct relate to other variables in an expected way? This can include convergent and discriminant evidence. Internal structure What is the degree to which the items on my scale are conforming to the construct and how I intend to interpret the scale? ABBEY As recommended by AERA, APA, and NCME (2014), we followed multiple steps to establish evidence for the validity of the newly-created scale. Each source of evidence described below was used to increase the degree to which these student-specific teaching self-efficacy scores can be accepted as reliable and valid.

17 Sources of Validity in Instrument Development
Evidence based on... Test content Literature review, content specification, expert judges Reponses processes Cognitive interviews Relations to other variables Analysis of the relationship of the scale scores to variables external to the scale (correlational evidence) Internal structure Factor analysis, measurement invariance ABBEY **See Standards for Educational and Psychological Testing

18 Determining If You Should Use An Instrument
Length and Content Does the scale represent the breadth of the construct? Reliability Is the score reliability from your scale reasonable, using similar samples?* *Depends on the seriousness/specificity of the reliability issue… DANI

19 Determining If You Should Use An Instrument
Previous Samples Has the scale been used with samples similar to your sample of interest? Intended Performance Has the scale previously performed as intended based on review of past psychometric analyses? EFA, CFA, correlational, SEM DANI

20 Determining If You Should Use An Instrument
Scoring How has the scale been scored in the past? Was sufficient testing done to evaluate appropriateness of using a total score, if needed? DANI 10 studies, 4 factors, crossloadings, have they been well done or just trying to use efa cfa CONSISTENCY ACROSS PAPERS IS NOT ENOUGH - if I see consistency, that is not enough Papers that use a total score are not enough

21 Is one EFA okay? When is it enough?
Consider whether there is psychometric evidence for your specific sample abbey Some will argue against this slide

22 Where can I find good scales?
ABBEY

23 Where To Find Instruments: Literature
Review the literature on your construct and scales that measure your construct, paying close attention to: Definitions about the construct Reliability Factor Structure Subscales vs. total scores Exploratory Factor Analysis Confirmatory Factor Analysis Validation sample Measurement invariance ABBEY

24 Where To Find Instruments: Reviews
Mental Measurements Yearbook (MMY) A tool to locate information about commercial tests and measures Issues from Provides factual information on published tests Critical test reviews written by: Professionals and psychometricians in education, psychology, speech/language/hearing, law, health care, and other related fields ABBEY The MMY includes timely, consumer-oriented test reviews, providing evaluative information to promote and encourage informed test selection. Latest version was july 2017

25 Developing an Instrument If Needed
ABBEY

26 Developing an Instrument If Needed
ABBEY

27 Developing an Instrument If Needed
ABBEY

28 Developing an Instrument If Needed
ABBEY

29 Developing an Instrument If Needed
ABBEY

30 ABBEY

31 Best Practices In Instrument Development
Recognize Instrument Development as an ongoing process, not a one time event ABBEY

32 Helpful References for Scale Construction
American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association. DeVellis, R. F. (2012). Scale development: Theory and applications. Los Angeles, CA: Sage. Kline, P. (1986). Making tests reliable II: Personality inventories. In P. Kline (Ed), A Handbook of Test Construction: Introduction to psychometric design (pp ). London, United Kingdom: Methuen. Thorndike, R. M., & Thorndike-Christ, T. (2010). Measurement and evaluation in psychology and education. Boston, MA: Prentice Hall. Willis, G. B. & Artino, A. R. (2013). What do our respondents think we’re asking? Using cognitive interviewing to improve medical education surveys. Journal of Graduate Medical Education, 5, doi: /JGME-D ABBEY

33 What things did we want to get into, but did not have time to do so?
Best practices in CFA and EFA Bifactor Analyses SEM Using IRT Measurement Invariance or DIF Cognitive diagnostic models Multilevel SEM and IRT DANI

34 What did I learn? Think twice before using a scale to measure a construct of interest Good measurement matters Instrument development is an ongoing process Consider gathering psychometric evidence to support the intended use of the scale within your study DANI

35

36


Download ppt "Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series Abbey Love, MS, & Dani Rosenkrantz, MS, EdS Guiding Steps for the Evaluation."

Similar presentations


Ads by Google