Presentation on theme: "The response category labeling effect: How the wording of labels affects response distributions in Likert data Bert Weijters Maggie Geuens Hans Baumgartner."— Presentation transcript:
The response category labeling effect: How the wording of labels affects response distributions in Likert data Bert Weijters Maggie Geuens Hans Baumgartner
The response category labeling effect Research questions Do the labels attached to scale categories influence response behavior? What mechanism(s) can account for this response category labeling effect? Are there moderators of this effect? What are the implications of the response category labeling effect for cross-cultural research?
The response category labeling effect The importance of category labels ________ strongly disagreedisagreeneither agree nor disagree agreestrongly agree ________ completely disagree disagreeneither agree nor disagree agreecompletely agree versus I try to avoid foods that are high in cholesterol.
The response category labeling effect The intensity hypothesis Label intensity refers to the perceived degree of (dis)agreement implied by the label; More intense labels represent more extreme positions, which are endorsed less often (e.g., agree vs. strongly agree; superior vs. very good); Even more subtle adverbial modifiers (e.g., strongly vs. completely agree) may influence response behavior; Prior evidence that different intensities are associated with different adverbs (e.g., Cliff 1959; Smith et al. 2009), but little evidence that different adverbs lead to differential category endorsement;
The response category labeling effect The fluency hypothesis Research on processing fluency shows that the meta-cognitive experience of ease of processing affects judgment and decision making: perceptions of the truth value of statements (e.g., Unkelbach 2007); liking for objects and events (e.g., Reber, Schwarz, and Winkielman 2004); Repeated statements are more likely to be rated as true (Unkelbach 2007) and repetition increases liking, as suggested by the mere exposure effect (e.g., Bornstein 1989), in part because repetition makes stimuli more familiar and contributes to greater processing fluency; Words vary in how often they are encountered, and high frequency words are processed more fluently; If scale labels are more commonly used in everyday language and are thus easier to process, this may increase the likelihood that the corresponding response option on the rating scale is selected;
The response category labeling effect Two alternative hypotheses to explain the effect of response category labels Intensity hypothesis: H1: Response categories are endorsed less frequently if their labels are more intense. Fluency hypothesis: H2: Response categories are endorsed more frequently if their labels are more fluent.
The response category labeling effect Verbal ability as a moderator of the fluency effect when people are processing more carefully or when people are highly experienced, their actual thoughts, not the ease of generating them, play a more decisive role; Verbal ability (as a form of language expertise) may moderate the fluency effect; We posit that for respondents who tend to use words in a precise manner and who make fine- grained distinctions as to the exact meaning and implications of words, fluency will be less important as a cue in selecting a response;
The response category labeling effect Study 1: Scaling intensity and fluency Do different methods for scaling the intensity and fluency of response category labels lead to similar results? If the intensity or fluency of scale labels is to have a reliable effect on responses to questionnaires, consistent differences in the perceived intensity and fluency of category labels should emerge across respondents. Can we identify endpoint labels that vary significantly in intensity and fluency for use in subsequent studies? We need two labels that imply contradictory responses under the intensity and fluency hypotheses.
The response category labeling effect Study 1 (cont’d) Label intensity – Direct ratings of intensity (0 = neutral; 10 = 100% agreement) – Pairwise comparisons of intensity (“Which expression indicates the stronger sense of agreement?”) Label fluency – Direct ratings of fluency (0 = we never use this term in day-to-day speech; 10 = we use this term very often in day-to-day speech) – Pairwise comparisons of fluency (“Which expression is more commonly used in day-to-day speech?”) – Lexical decision task (press a button labeled ‘end category label’ or ‘not an end category label’ for 6 endpoint labels and five non- endpoint labels) – Word frequency counts in corpora of texts (Google hits, available for specific word combinations in particular countries and languages)
The response category labeling effect Study 1: Method Sample 1: 83 undergraduates; pairwise comparisons of intensity and fluency of six endpoint labels; Sample 2: 112 respondents (mean age 32.03, 66% female) from an online panel; direct ratings of intensity and fluency on 11-point scales; Sample 3: 125 under graduates (57% female); lexical decision task;
The response category labeling effect Study 1: Results (cont’d) For intensity, the correlation of the means obtained from the paired comparison and direct rating tasks is.92; The correlations of the means derived from the four fluency methods range from.66 to.97, with an average of r =.84; Thus, there is considerable consistency in respondents’ judgments of the perceived intensity and fluency of different category labels; ‘sterk eens’ (strongly agree) consistently emerged as one of the least intense and least fluent labels, while ‘volledig eens’ (completely agree) surfaced as one of the most intense and most fluent labels;
The response category labeling effect Study 2 Direct test of the intensity and fluency hypotheses: The endorsement rate for a high intensity and high fluency label should be relatively low if the intensity hypothesis is true, and it should be relatively high if the fluency hypothesis is true. Preliminary test of whether the intensity/fluency of labels affects predictive validity.
The response category labeling effect Measuring response distributions A major challenge is to measure differences in response distributions that are not item-specific and independent of substantive content; To do this, we need to observe patterns of responses across heterogeneous items (i.e., items that do not share common content but have the same response format): Deliberately designed scales consisting of heterogeneous items (Greenleaf 1992) Random samples of items from scale inventories (Weijters, Geuens & Schillewaert 2010)
The response category labeling effect Study 2: Method 161 Dutch-speaking respondents (mean age 31.27, 67% female) from a university panel were randomly assigned to two versions of an online questionnaire: □ Endpoint labels of ‘completely (dis)agree’ □ Endpoint labels of ‘strongly (dis)agree’ Four sections: □ 6 attitudinal items, one of which was “I love to go out for dinner”; □ 10 heterogeneous items from unrelated scales (e.g., “I am a sensitive person”, “Financial security is important to me”), rated on 5-point scales; □ Direct ratings of the intensity and fluency of six end labels (100- point scale for intensity, 11-point scale for fluency); □ Behavioral measure of choice between five different vouchers worth 15 EUR (cinema, book, restaurant, theatre, gym);
The response category labeling effect Study 2: Results The findings support the fluency hypothesis: IntensityFluencyMean endorsement of the extreme positive category Strongly agree75.473.202.47 Completely agree93.638.223.61 Logistic regression of choice of restaurant voucher on label, attitude toward going out for dinner, and interaction indicates a significant interaction: predictive validity is better for ‘strongly agree’ than ‘completely agree’. (p<.001 based on a Poisson regression)
The response category labeling effect Study 3 Replication of the fluency effect with a sample drawn from the general population; Literacy as a potential moderator;
The response category labeling effect Study 3: Method 369 Dutch-speaking panel members (mean age 45.8, 50% female) of an online market research agency in Belgium were randomly assigned to two versions of an online questionnaire: Endpoint labels of ‘completely (dis)agree’ Endpoint labels of ‘strongly (dis)agree’ Questionnaire: 16 heterogeneous items based on Greenleaf (1992), rated on 5-point scales; Pairwise comparisons of four endpoint labels in terms of intensity and fluency (strongly, completely, fully, and absolutely); Literacy measure: “I do a lot of reading” and “I prefer activities that don’t require a lot of reading” (strongly associated with having a higher education);
The response category labeling effect Study 3: Results The findings support the fluency hypothesis: IntensityFluencyMean endorsement of the extreme categories Strongly agree.74.652.66 Completely agree2.022.183.05 Fluency effect occurs primarily for respondents with lower literacy; (p<.05 based on a Poisson regression)
The response category labeling effect The moderating effect of literacy on the fluency effect
The response category labeling effect Study 4: Method 271 Dutch-speaking panel members (mean age 39.2, 51% female) of an online market research agency in Belgium were randomly assigned to two versions of an online questionnaire: Endpoint labels of ‘completely (dis)agree’ Endpoint labels of ‘strongly (dis)agree’ Questionnaire: 10 heterogeneous items, rated on 5-point scales; Pairwise comparisons of six response category labels in terms of intensity and fluency; Antonym test as a measure of verbal ability (4 items); antonym test strongly associated with having a higher education;
The response category labeling effect Study 4: Results Manipulation checks: IntensityFluency Strongly agree1.931.67 Completely agree3.444.26 Fluency effect occurs primarily for respondents low in verbal ability (significant interaction, with significant simple main effect for low verbal ability respondents);
The response category labeling effect The moderating effect of verbal ability on the fluency effect
The response category labeling effect Implications of the category labeling effect for cross-cultural research Response category labels can affect findings in a single-language context (e.g., estimation of population parameters, meta-analytic comparisons), but they are particularly important in cross-cultural research, where labels have to be translated; Two types of translation: Literal Idiomatic Some authors have emphasized the need to choose scale anchors that are equal in intensity (e.g., Harzing 2006), and prior research has demonstrated that supposedly similar terms may differ in intensity across languages (e.g., definitely vs. bestimmt; see Smith et al. 2009); Translated adverbial modifiers may also differ in fluency;
The response category labeling effect Schematic representation of the translation process (based on Bassetti and Cook 2011)
The response category labeling effect Study 5: Method Approx. 200 English- or French-speaking respondents in five regions (nationality/language combinations) of North America and Europe; Five endpoint labels in each language; 16 heterogeneous items from Greenleaf (1992), rated on 5-point scales; Pairwise comparisons of the six labels plus “agree” or “d’accord” in terms of intensity and fluency;
The response category labeling effect Study 5: Method FranceUSACanadaUKTotal Language French22702030430 English0185196187568 Total227382399187998 VersionEnglishFrench 1Strongly agreeFortement d'accord 2Completely agreeComplètement d'accord 3Extremely agreeExtrêmement d'accord 4Definitely agreeDéfinitivement d'accord 5Fully agreeEntièrement d'accord 6Very much agreeTout à fait d'accord
The response category labeling effect Study 5: Results Intensity and fluency ratings by region Note: Correlation between the fluency ratings and the natural logarithm of the number of Google hits was at least.88.
The response category labeling effect Dependent variableIndependent variableBSEtp Individual extreme responding (level 1) Fluency-.015.029-.512.609 Intensity.020.0171.213.225 Group-level extreme responding (level 2) Fluency.164.0652.532.011 Intensity-.130.132-.979.327 Language = French.054.087.621.535 Country = USA.114.1001.139.255 Country = France.002.076.026.979 Country = UK-.004.096-.040.968 Intercept term.995.06016.486.000 Study 5: Results Multilevel model estimates
The response category labeling effect Study 6 Demonstration that fluency is a viable determinant of extreme responding differences between regions in an international survey; Illustration of how to construct and use relative measures of fluency and extreme responding based on secondary data only;
The response category labeling effect Study 6: Method 13,520 respondents from 17 European regions; 16 heterogeneous items based on Greenleaf (1992); Use of fully labeled 7-point response scales; Fluency: relative measure of fluency as the natural logarithm of the ratio of the number of Google hits for the 7 th category (strongly agree) to the number of Google hits for the 6 th category (agree); Endorsement: relative endorsement of the 7 th vs. the 6 th response category (natural logarithm).
The response category labeling effect NfemaleM age SD age Belgium, Dutch64451%41.011.1 Belgium, French37151%40.511.7 UK, English90856%41.811.3 Germany, German99350%39.311.0 Hungary, Hungarian100351%38.311.8 Slovakia, Slovakian106350%38.212.1 Poland, Polish80237%32.211.0 Netherlands, Dutch104650%40.811.4 France, French100051%39.411.9 Spain, Spanish93450%37.810.5 Romania, Romanian97050%37.911.5 Turkey, Turkish91443%32.59.4 Italy, Italian93950%39.010.6 Switzerland, French30351%42.59.7 Switzerland, German60648%43.59.4 Switzerland, Italian5056%32.98.7 Sweden, Swedish97449%39.911.3 Total1352049%38.711.4 Sample descriptive statistics Pan-European study (Study 7 and 8)
The response category labeling effect Study 6: Results Note: Standardized regression slope of.67 (p<.01, R 2 =.45)
The response category labeling effect Discussion: Summary of findings response category labels that are more commonly used (i.e., that are more fluent) lead to higher endorsement of their associated response categories; respondents do not simply scale response categories along an intensity dimension and then map their latent response to the best-matching category, but they are also influenced by the fluency of the labels; the effect of fluency is more pronounced for respondents who are lower in literacy and verbal ability; the problem may be particularly serious in cross-cultural research when different languages are used;
The response category labeling effect Implications for multilingual survey research □ Translations usually imply a trade-off between the attempt to be literal and the attempt to be idiomatic; □ Optimize equivalence: use response category labels that are equally fluent in different languages (rather than literal translations or words with equal intensity); e.g., ‘Strongly agree’ is most commonly used in scales, but may not have valid equivalents in some other languages. ‘Completely agree’ seems to be a viable alternative. fluencyERS% Completely agree 1.2418.8% Tout à fait d’accord 1.2219.2%