Presentation on theme: "What’s New in the I/O Testing and Assessment Literature That’s Important for Practitioners? Paul R. Sackett."— Presentation transcript:
1What’s New in the I/O Testing and Assessment Literature That’s Important for Practitioners? Paul R. Sackett
2New Developments in the Assessment of Personality
3Topic 1: A faking-resistant approach to personality measurement Tailored Adaptive Personality Assessment System (TAPAS)Developed for Army Research Institute by Drasgow Consulting GroupMultidimensional Pairwise Preference Format combined with applicable Item Response Theory modelItems are created by pairing statements from different dimensions that are similar in desirability and trait “location”Example item: “Which is more like you?”__1a) People come to me when they want fresh ideas.__1b) Most people would say that I’m a “good listener”.
4A faking-resistant approach to personality measurement (continued) Extensive work show it’s faking-resistantNon-operational field study in Army show useful prediction of attrition, disciplinary incidents, completion of basic training, adjustment to Army life, among other criteriaNow in operational use on a trial basisDrasgow, F., Stark, S., Chernyshenko, O. S., Nye, C. D., and Hulin, C. L. (2012). Development of the Tailored Adaptive Personality Assessment System (TAPAS) to Support Army Selection and Classification Decisions. Technical Report 1311, Army Research Institute
5Topic 2: The Value of Contextualized Personality Items A new meta-analysis documents the higher predictive power obtained by “contextualizing” items (e.g., asking about behavior at work, rather than behavior in general)Mean r with supervisory ratings for work context vs. general:Conscientiousness: .30 vs .22Emotional Stability: .17 vs. 12Extraversion: .25 vs. .08Agreeableness: .24 vs. .10Openness: .19 vs. .02Shaffer, J.A., & Postlethwaite, B. E. (2012). A matter of context: A meta-analytic investigation of the relative validity of contextualized and noncontextualized personality measures. Personnel Psychology, 65,
6Topic 3: Moving from the Big 5 to Narrower Dimensions DeYoung, Quilty and Peterson (2007) suggested the following:Neuroticism:Volatility - irritability, anger, and difficulty controlling emotional impulsesWithdrawal - susceptibility to anxiety, worry, depression, and sadnessAgreeableness:Compassion - empathetic emotional affiliationPoliteness - consideration and respect for others’ needs and desiresConscientiousness:Industriousness - working hard and avoiding distractionOrderliness - organization and methodicalnessExtraversion:Enthusiasm - positive emotion and sociabilityAssertiveness - drive and dominanceOpenness to Experience:Intellect - ingenuity, quickness, and intellectual engagementOpenness - imagination, fantasy, and artistic and aesthetic interestsDeYoung, C. G., Quilty, L. C., & Peterson, J. B. (2007). Between facets and domains: 10 Aspects of the Big Five, Journal of Personality and Social Psychology, 93,
7Moving from the Big 5 to Narrower Dimensions (continued) Dudley et al (2006) show the value of this perspectiveFour conscientiousness facets: achievement, dependability, order, and cautiousnessValidity was driven largely by the achievement and/or dependability facets, with relatively little contribution from cautiousness and orderAchievement receives the dominant weight in predicting task performance, while dependability receives the dominant weight in predicting counterproductive work behaviorDudley NM, Orvis KA, Lebiecki JE, Cortina JM A meta-analytic investigation of conscientiousness in the prediction of job performance: Examining the intercorrelations and the incremental validity of narrow traits. J. Appl. Psychol. 91:40-57
8Topic 4: The Use of Faking Warnings Landers et al (2011) administered a warning after 1/3 of the items to managerial candidates exhibiting what they called “blatant extreme responding”.Rate of extreme responding was halved after the warningLanders, R. N., Sackett, P. R., & Tuzinski, K. A. (2011). Retesting after initial failure, coaching rumors, and warnings against faking in online personality measures for selection. Journal of Applied Psychology, 96(1), 202.
9More on the Use of Faking Warnings Nathan Kuncel suggests three potentially relevant goals when individuals take a personality test:- be impressive- be credible- be true to oneself
10More on the Use of Faking Warnings Jenson and Sackett (2013) suggested that “priming” concern for being credible could reduce faking.Test-takers who scheduled a follow-up interview just before taking the personality test obtained lower scores than those who did notJenson, C. E., and Sackett, P. R. (2013). Examining ability to fake and test-taker goals in personality assessments. SIOP presentation.
11New Developments in the Assessment of Cognitive Ability
12A cognitive test with reduced adverse impact In 2011, SIOP awarded its M.Scott Myers Award for applied research to Yusko, Goldstein, Scherbaum, and Hanges for the development of the Siena Reasoning TestThis is a nonverbal reasoning test, using unfamilar item content, such as made-up words (if a GATH is larger than a SHET…) and figuresConcept is that adverse impact will be reduced by eliminating content with which groups have differential familiarity
13Validity and subgroup d for Siena Test Black-White d commonly in the rangeSizable number of validity studies, with validities in the range commonly seen for cognitive tests.In one independent study, HumRRO researchers included Siena along with another cognitive test; corrected validity .45 for other test (d = 1.); .35 for Siena (d = .38) (SIOP 2010: Paullin, Putka, and Tsacoumis)
14Why the reduced d?Somewhat of a puzzle. There is a history of using non-verbal reasoning testsRaven’s Progressive MatricesLarge sample military studies in Project ABut these do not show the reduced d that is seen with the Siena TestThings to look into: does d vary with item difficulty, and how does Siena compare with other tests?(Note: Nothing published to date that I am aware of. Some powerpoint decks from SIOP presentations can be found online: search for “Siena Reasoning Test”)
15New Developments in Situational Judgment Testing
16Sample SJT itemYou find yourself in an argument with several co-workers about who should do a very disagreeable, but routine task. Which of the following would be the most effective way to resolve this situation?(a) Have your supervisor decide, because this would avoid any personal bias.(b) Arrange for a rotating schedule so everyone shares the chore.(c) Let the workers who show up earliest choose on a first-come, first-served basis.(d) Randomly assign a person to do the task and don't change it.
17Key findings Extensive validity evidence Can measure different constructs (problem solving, communication skills, integrity,etc.)Incremental validity over ability and personalitySmall subgroup differences, except for cognitively-oriented SJTsItems can be presented in written form or by video; recent move to animation rather than recording live actors
18Lievens, Sackett, and Buyse, T. (2009) comparing response instructions Ongoing debate re “would do” vs. “should do” instructionsLievens et al. randomly assigned Belgian medical school applicants to “would do” or “should do” in operational interpersonal skills SJT; did the same with a student sample
19Lievens, Sackett, and Buyse, T. (2009) comparing response instructions In operational setting, all gave “should do” responsesSo: we’d like to know “would do”, but in effect, can only get “should do”
20Arthur et al (2014): comparing response formats Compared 3 options:Rate effectiveness of each responseRank the responsesChoose best and worst response20-item integrity-oriented SJTAdministered to over 30,000 retail/hospitality job applicantsOn-line admin; each format used for one week
21“Rate each response” emerges as superior Higher reliabilityLower correlation with cognitive abilitySmaller gender mean differenceHigher correlation with conceptually relevant personality dimensions (conscientiousness, agreeableness, emotional stability)Follow-up study with student sampleHigher retest reliabiltyMore favorable reactions
22Question: how “situational” is situational judgment? Krumm et al. (in press)Question: how “situational” is situational judgment?Some suggest SJTs really just measure general knowledge about appropriate social behaviorSo Krumm et al. conducted a clever experiment: they “decapitated” SJT itemsRemoved the stem – just presented the responses
23559 airline pilots completed 10 items each from Airline pilot knowledge SJTIntegrity SJTTeamwork SJTOverall, mean scores are 1 SD higher with the stemBut for more than half the items, there is no difference with and without stemSo stem matters overall, but is irrelevant for lots of SJT itemsDepends on specificity of stem content
24“You are flying an “angel flight” with a nurse and noncritical child patient, to meet an ambulance at a downtown regional airport. You filed visual flight rule: it is 11:00 p.m. on a clear night, when, at 60 nm out, you notice the ammeter indicating a battery discharge and correctly deduce the alternator has failed. Your best guess is that you have from 15 to 30 min of battery power remaining. You decide to:(a) Declare an emergency, turn off all electrical systems, except for 1 NAVCOM and transponder, and continue to the regional airport as planned.(b) Declare an emergency and divert to the Planter’s County Airport, which is clearly visible at 2 o’clock, at 7 nm.(c) Declare an emergency, turn off all electrical systems, except for 1 NAVCOM, instrument panel lights, intercom, and transponder, and divert to the Southside Business Airport, which is 40 nm straight ahead.(d) Declare an emergency, turn off all electrical systems, except for 1 NAVCOM, instrument panel lights, intercom, and transponder, and divert to Draper Air Force Base, which is at 10 o’clock, at 32 nm.”
25Arthur, W. , Jr. , Glaze, R. M. , Jarrett, S. M. , White, C. D Arthur, W., Jr., Glaze, R. M., Jarrett, S. M., White, C. D., Schurig, I., & Taylor, J. E. (2014). Comparative evaluation of three situational judgment test response formats in terms of construct-related validity, subgroup differences, and susceptibility to response distortion. Journal of Applied Psychology, 99(3),Krumm, S, Lievens, F., Huffmeier,J., Lipnevich, A., Bendels,H., and Hertel, G.(in press). How “situational” is judgment in situational judgment tests? Journal of Applied Psychology.Lievens, F., Sackett, P. R, and Buyse, T. (2009). The effects of response instructions on situational judgment test performance and validity in a high-stakes context. Journal of Applied Psychology, 94,
27Two meta-analyses with differing findings Ones, Viswesvaran, and Schmidt (1993) is the “classic” analysis of integrity test validity.found 662 studies, including many where only raw data was provided (i.e., no write-up). Info sharing from many publishersIn 2012, Van Iddekinge et al conducted an updated meta-analysisapplied strict inclusion rules as to what studies to include (e.g., reporting of study detail)104 studies (including 132 samples) met inclusion criteria.30 publishers contacted; only 2 shared info.Both based bottom line conclusions on studies using a predictive design and a non-self report criterion.
28Predicting Counterproductive Behavior K N Mean ValidityOnes et al – overt testsOnes et al- personalitybased testsVan Iddekinge et al
29Why the difference?Not clear. A number of factors do not seem to be the cause:Differences in types of studies examined (e.g., both excluded studies with polygraph as criteria)Differences in corrections (e.g., unreliability)Several factors may contribute, though this is speculationSome counterproductive behaviors may be more predictable than others, but all are lumped together in these analysesGiven reliance in both on studies not readily available to public scrutiny, this won’t be resolved until further work is done
30Broader questionsThis raises broader issues about data openness policiesPublisher obligations?Researcher obligations?Journal publication standards?Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology, 78, 679 –703Van Iddekinge, C. H., Roth, P. L., Raymark, P. H., & Odle-Dusseau, H. N. (2012). The criterion-related validity of integrity tests: An updated meta-analysis. Journal of Applied Psychology, 97, 499 –530.
31New Developments in Using Vocational Interest Measures
32BUT: how many studies in this meta-analysis? Since Hunter and Hunter (1984), interest in using interest measures for selection has diminished greatlyThey report a meta-analytic estimate of validity for predicting performance as .10BUT: how many studies in this meta-analysis?3!!!
33New meta-analysis by Van Iddekinge et al. (2011) Lots of studies (80)Mean validity for a single interest dimension: .11Mean validity for a single interest dimension relevant to the job in question: .23Other studies suggest incremental validity over ability and personality
34The “catch”: studies use data collected for research purposes Concern that candidates can “fake” a job-relevant interest profileI expect interest to turn to developing faking-resistant interest measures
35Van Iddekinge, C. H. , Roth, P. L. , Putka, D. J. , & Lanivich, S. E Van Iddekinge, C. H., Roth, P. L., Putka, D. J., & Lanivich, S. E. (2011). Are you interested? A meta-analysis of relations between vocational interests and employee performance and turnover. Journal of Applied Psychology, 96(6), 1167.Nye, C. D., Su, R., Rounds, J., & Drasgow, F. (2012). Vocational interests and performance a quantitative summary of over 60 years of research. Perspectives on Psychological Science, 7(4),
37Van Iddekinge et al (in press) Students about to graduate made Facebook info availableRecruiters rated profile on 10 dimensionsSupervisors rated performance a year laterFacebook ratings did not predict performanceHigher ratings for women than menLower ratings for Blacks and Hispanics than WhitesVan Iddekinge, C. H., Lanivich, S. E., Roth, P. L., & Junco, E. (in press). Social Media for Selection? Validity and Adverse Impact Potential of a Facebook-Based Assessment. Journal of Management.
39Is performance normally distributed? We’ve implicitly assumed this for yearsData analysis strategies assume normalityEvaluations of selection system utility assume normalityO’Boyle and Aguinis (2012) offer hundreds of data sets, all consistently showing that a “power law” distribution fits betterThis is a distribution with the largest number of observations at the very bottom, with the number of observations then dropping rapidly
41The O’Boyle and Aguinis data They argue against looking at ratings data, as ratings may “forced” to fit a normal distributionThus they focus on objective dataTallies of publication in journalsSports performance (e.g., golf tournaments won, points scored in NBA)Awards in arts and letters (e.g. Number of Academy Award nominations)Political elections (number of terms to which one has been elected)
42An alternate view“Job performance is defined as the total expected value of the discrete behavioral episodes an individual carries out over a standard period of time” (Motowidlo and Kell, 2013)
43Aggregating individual behaviors affects distribution
46ReferencesO’Boyle Jr. E., & Aguinis, H. (2012). The best and the rest: Revisiting the norm of normality of individual performance. Personnel Psychology, 65(1), 79.Beck, J., Beatty, A. S., and Sackett, P. R. (2014) On the distribution of performance: A reply to O’Boyle and Aguinis. Personnel Psychology, 67,