1 Measurement Issues in Health Disparities Research Anita L. Stewart, Ph.D. University of California, San Francisco Health Disparities Research Methods.

1 Measurement Issues in Health Disparities Research Anita L. Stewart, Ph.D. University of California, San Francisco Health Disparities Research Methods EPI 222, Spring April 19, 2012

2 Overview of Class 4 u Background: culture-specific versus generic measures u Conceptual and psychometric adequacy and equivalence –Adequacy in one group –Equivalence across groups u Modifying measures

3 Measurement Implications of Research in Diverse Groups u Most self-reported measures were developed and tested in mainstream, well-educated groups u Little information is available on appropriateness, reliability, validity, and responsiveness in diverse groups –Although this is changing rapidly

4 “Disparity Populations” u Health disparities research focuses on differences in health and its determinants between … –Minority vs. non-minority –Lower income vs. others –Lower education vs. others –Limited English Proficiency vs. others –…. and many others

5 Why Not Use Culture-Specific Measures? u Measurement goal is to identify measures that can be used across all groups in one study, yet maintain sensitivity to diversity and have minimal bias u Most health disparities studies compare mean scores across diverse groups

6 Universal vs Group-Specific u Concepts unlikely to be defined exactly the same way across diverse ethnic groups u Generic/universal –features of a concept that are appropriate and relevant across groups u Group-specific –idiosyncratic or culture-specific portions of a concept

7 Universal versus Group Specific (cont.) u Goal in health disparities research with more than one group: –identify generic/universal portion of a concept that are applicable across all groups u For within-group studies: –the culture-specific portion is also relevant

9 Measurement Adequacy vs. Measurement Equivalence u Adequacy - within a “diverse” group –Concepts appropriate and relevant –Psychometric properties meet minimal criteria »Good variability »Evidence of reliability and validity »Sensitive to change over time u Equivalence - between “diverse” groups –conceptual and psychometric properties are comparable

10 Conceptual and Psychometric Adequacy and Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent Psychometric properties meet minimal standards Psychometric properties invariant (equivalent) Concept meaningful

11 Left Side of Figure: Adequacy in a Single Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent Psychometric properties meet minimal standards Psychometric properties invariant (equivalent) Concept meaningful

12 Ride Side of Figure: Equivalence in More Than One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent Psychometric properties meet minimal standards Psychometric properties invariant (equivalent) Concept meaningful

13 Conceptual Adequacy in One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent Psychometric properties meet minimal standards Psychometric properties invariant (equivalent) Concept meaningful

14 Approaches to Explore Conceptual Adequacy in a Diverse Group u Literature reviews of concepts and measures u In-depth interviews and focus groups –discuss concepts, obtain their views u Expert review (from diverse group) –review concept definitions –rate relevance of items

15 Example: Review Measures of Dietary Intake in Minority Populations u Reviewed food frequency questionnaires for use in minority populations u Group differences that could affect scores: –Portion sizes differ –Traditional foods differ (ethnic foods) u Could underestimate total intake and nutrients RJ Coates et al. Am J Clin Nutr; 1997;65(suppl):1108S-15S.

16 A Structured Quantitative Method to Examine Conceptual Relevance u Compiled set of 33 typical HRQL items u Administered to older African Americans u After each question, asked “how relevant is this question to the way you think about your health?” –0-10 scale with 0=not at all relevant, 10=extremely relevant Cunningham WE et al., Qual Life Res, 1999;8:749-768.

17 HRQL Relevance Results u Most relevant items: –Spirituality, weight-related health, hopefulness u Least relevant items: –Physical functioning, role limitations due to emotional problems

18 A Qualitative Method to Establish Relevance u Bilingual/bicultural expert panel reviewed Spanish Functional Assessment of Cancer Therapy – General (FACT-G) for conceptual relevance to Hispanics –One item had low relevance (I worry about dying) –One domain missing – spirituality »Developed new spirituality scale (FACIT-Sp) with input from cancer patients, psychotherapists, and religious experts D Cella et al. Med Care 1998; 36:1407

19 Example of Inadequate Concept u Patient satisfaction typically conceptualized in terms of, e.g., –access, technical care, communication, continuity, coordination, interpersonal style u In minority and low income groups, additional relevant domains: –discrimination by health professionals –sensitivity to language barriers MN Fongwa et al., Ethnicity Dis, 2006;16(3):948-955.

20 Psychometric Adequacy in One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent Psychometric properties meet minimal standards Psychometric properties invariant (equivalent) Concept meaningful

21 Psychometric Adequacy of a Measure in any Group u Minimal standards met: – Sufficient variability – Minimal missing data – Adequate reliability/reproducibility – Evidence of construct validity – Evidence of sensitivity to change

22 Example: Adequacy of Reliability of Spanish SF-36 in Argentinean Sample SF-36 scaleCoefficient alpha Physical functioning.85 Role limitations - physical.84 Bodily pain.80 General health perceptions.69 Vitality.82 Social functioning.76 Role limitations - emotional.75 Mental health.84 F Augustovski et al, J Clin Epid, 2008;61:1279-84.

23 Conceptual Equivalence Across Groups Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent Psychometric properties meet minimal standards Psychometric properties invariant (equivalent) Concept meaningful

24 Conceptual Equivalence u Is the concept relevant, familiar, acceptable to all diverse groups being studied? u Is the concept defined the same way in all groups? –all relevant “domains” included (none missing) –interpreted similarly u Concept needs to be similar to all groups in the sample

25 Quality of Care Survey for Spanish- and English-speaking Inpatients u Wanted to include Spanish-speaking patients in hospital quality of care surveys u Administered Hospital Quality of Care Survey (H- CAHPS ® ) to Spanish-speaking patients; asked 2 questions to detect experiences missed by survey »What they liked most about care »What aspects of care they would change u Analyzed responses in relation to survey MP Hurtado et al. Health Serv Res, 2005;40-6, Part II:2140-2161

26 Psychometric Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent Psychometric properties meet minimal standards Psychometric properties invariant (equivalent) Concept meaningful

27 Psychometric or Measurement Equivalence u When comparing groups (as in health disparities research): –Measures should have similar or equivalent measurement properties in all diverse groups of interest in your study

28 Psychometric Equivalence Across Groups u Psychometric characteristics should be “equivalent” across all groups: – Factor structure – Variability – Reliability/reproducibility »Item-scale correlations – Construct validity – Sensitivity to change

29 Systematic Error - A Special Concern in Comparing Scores Across Groups u Observed mean differences in a measure can be due to: –Culturally- or group-mediated differences in true score (true differences) -- OR -- –Systematic differences between observed scores not attributable to true scores

30 Random versus Systematic Error Observed true score score =+ error random systematic Relevant to reliability Relevant to validity “Bias”

31 Systematic Error (Bias) u Systematic measurement error may make group comparisons invalid u Systematic differences in scores can be due to group differences in: –the meaning of concepts or items –the extent to which measures represent a concept –cognitive processes of responding –use of response scales

32 “Bias” or “Systematic Difference”? u Bias = “deviation from true score” u Cannot speak of a “bias” in one group compared to another w/o knowing true score u Preferred term: differential “item” functioning (DIF) –Item (or measure) that has a different meaning in one group than another

33 Methods for Identifying Differential Item Functioning (DIF) u Item Response Theory (IRT) u Examines each item in relation to underlying latent trait u Tests if responses to an item predict the underlying latent “score” similarly in two groups –if not, items have DIF

34 Example of Effect of DIF u 5 CES-D items administered to Black and White men –1 item subject to differential item functioning (DIF) u 5-item scale including item suggested that Black men had more somatic symptoms than White men (p <.01) u 4-item scale excluding biased item showed no differences S Gregorich, Med Care, 2006;44:S78-S94.

35 Equivalence of Reliability?? No! u Difficult to compare reliability because it depends on the distribution of the construct in a sample –Lower reliability in one group may simply reflect less variability u More important is the adequacy of the reliability in both groups –Reliability meets minimal criteria within each group

36 Equivalence of Criterion Validity u Determine if hypothesized patterns of associations with specified criteria are confirmed in both groups, e.g. –a measure predicts utilization in both groups –a cutpoint on a screening measure has the same specificity and sensitivity in identifying a condition in both groups

37 Equivalence of Construct Validity u Are hypothesized patterns of associations confirmed in both groups? –Example: Scores on the Spanish version of the FACT-G had similar relationships with other health measures as scores on the English version u Two ways of testing –Subjectively examining pattern of correlations –Confirmatory factor analysis (CFA)

38 Equivalence of Construct Validity of Spanish SF-36 in Argentinean Sample u Compared to English SF-36 results in the U.S. u Tested (and confirmed) several hypotheses : –PCS (Physical Component Score) decreases with age and # of diseases –Relationship of PCS and MCS (Mental Component Score) with utilization –Known groups validity (scores lower for those with various diseases) F Augustovski et al, J Clin Epid, 2008, 61:1279-84.

39 Equivalence of Factor Structure u Factor structure in new group similar to original study u Factor structure similar across diverse groups in one study –Subjective - visually compare factor loadings across group-specific exploratory factor analysis –Empirical - confirmatory factor analysis comparing groups

40 Factor Structure of CES-D (Center for Epidemiological Studies Depression Scale) u Original study found 4 factors (Radloff, 1977) –Somatic symptoms –Depressive affect –Interpersonal behavior –Positive affect u In a new population group: do you find 4 factors? LS Radloff, Applied Psychol Measurement, 1977;1:385-401.

41 Psychometric Invariance: Equivalence of Factor Structure u Important properties of theoretically-based factor structure (measurement model) do not vary across groups (are invariant) –measurement model is the same across groups u Empirical comparison across groups using confirmatory factor analysis –Not simply by examination

42 Hierarchical Tests of Psychometric Equivalence Across 2 or more groups – sequential u Same number of factors or dimensions u Same items on same factors u Same factor loadings u No bias on any item across groups u Same residuals on items u No item or scale bias AND same residuals

43 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Scalar or Strong Factorial Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances are unbiased Strict Factorial Invariance Both scalar and residual criteria are met Criteria for Evaluating Invariance Across Groups: Technical Terms

44 Test for Dimensional Invariance of CES- D: Same Number of Factors u Two studies of Latinos: –2 factors in both studies »Depression and well-being u American Indian adolescents –3 factors »Depressed affect »Somatic symptoms and reduced activity »Positive affect TQ Miller et al., J Gerontol: Soc Sci 1997;520:S259 SM Manson et al., Psychol Assessment 1990;2:231-237

45 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Configural Invariance

46 Configural Invariance u Assumes: dimensional invariance is found (same number of factors) u Definition: Item-factor patterns are the same, same items load on same factors in both groups u CES-D example –4 factors found in Anglos, Blacks, and Chicanos –Same items loaded on each factor in all groups RE Roberts et al., Psychiatry Research, 1980;2:125-134

47 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Metric Invariance

48 Metric Invariance or Factor Pattern Invariance u Assumes: dimensional and configural invariance are found u Definition: Item loadings are the same across groups –i.e., the correlation of each item with its factor is the same in all groups

49 Metric Invariance Example from Interpersonal Processes of Care u Out of 91 items – factor structure of 29 items met criteria of invariance across 4 groups –Spanish-speaking Latinos, English speaking Latinos, African Americans, Whites u Dimensional –Similar factor structure across all 4 groups u Configural –Same items loaded on each factor in all 4 groups u Metric –Same item loadings in all 4 groups Stewart et al., Health Services Research, 2007; 42 (3, Part I):1235-56.

50 Seven “Metric Invariant” Scales: Same Item Loadings Across Groups I. COMMUNICATION Hurried communication Elicited concerns, responded Explained results, medications II. DECISION MAKING Patient-centered decision-making III. INTERPERSONAL STYLE Compassionate, respectful Discriminated Disrespectful office staff

51 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Strong Factorial Invariance

52 Strong Factorial Invariance or Scalar Invariance u Assumes: dimensional, configural, and metric invariance are found u Definition: Observed scores are unbiased, i.e., means can be compared across groups u Requires test of equivalence of mean scores across groups using confirmatory factor analysis

53 Seven “Scalar Invariant” (Unbiased) IPC Scales (18 items) I. COMMUNICATION Hurried communication – lack of clarity Elicited concerns, responded Explained results, medications – explained results II. DECISION MAKING Patient-centered decision-making – decided together III. INTERPERSONAL STYLE Compassionate, respectful–(subset) compassionate, respectful Discriminated – discriminated due to race/ethnicity Disrespectful office staff

55 What if Measures Need Modifying or Adapting? u Why would we modify a measure? u What information is used to modify? u What are the types of modifications? u How should we test modified measures?

56 When Problems are Found Through Pretesting… Investigators Face a Choice Use the existing measure “as is” to preserve integrity of measure OR Try to modify the measure to address problems in diverse group

57 Argument in Favor of Using Measure “As Is” u Known psychometric properties u Allows comparison of findings to other research using the measure

58 Argument Against Using Measure “As Is” …. (if problems are found) u Reliability and validity may be poor u Results pertaining to the measure could be erroneous –Limited internal validity –May not observe true associations

59 Why Would You Consider Modifying an Existing Measure? u In health disparities research –Sample/population differs from that in which original measure developed »Anticipate problems in new population u More generally –Measure developed awhile ago –Poor format/presentation –Study context issues

60 Key Reason: Population Group Differences from Original u Research in a “disparity population” –Different culture, race/ethnic group –Lower level of socioeconomic status (SES) –Limited English proficiency, lower literacy u Other reasons –Different disease, health problem, patient group, age group

61 Why Might a Measure Not be Suitable for New Population Group? u Concept or dimension is missing u Meaning of concepts differ from mainstream u New group may not interpret items as intended u Process of answering questions may differ

62 What Information is Used to Decide How to Modify a Measure? u Same data identifying conceptual differences in diverse population… –often includes information for making revisions

63 Published Review - Physical Activity (PA) Measures for Minority Women u Convened experts to identify issues in measuring PA in minority and older women u Some conclusions: –Assess culturally sensitive activities (e.g., walking for transportation and errands) –Measure intermittent activities –Phrases “leisure time, free time, spare time” not understood u Review can help select appropriate measures and adapt as needed LC Masse et al., J Women’s Health, 1998;7:57-67.

64 Types of Modifications u Format or presentation u Content –Dimensions –Item stems –Response options

65 Format/Presentation Modifications u Goal: reduce respondent burden u Improve appearance or way of responding –Simplify instructions –Modify format for responding –Create more space, reduce crowded items –Improve contrast, increase font size

66 Poor Format/Presentation = High Respondent Burden u Instructions unnecessarily wordy, unclear u Way of responding is complicated u Difficult to navigate the questionnaire –Crowded on the page –Hard to track across the page u Hard to read –Poor contrast, small font

67 Example: Complex Instructions Instructions: There are 12 statements on this form. They are statements about families. You are to decide which of these statements are true of your family and which are false. If you think the statement is TRUE or MOSTLY TRUE of your family, please mark the box in the T (TRUE) column. If you think the statement is FALSE or MOSTLY FALSE of your family, please mark the box in the F (FALSE) column. You may feel that some of the statements are true for some family members and false for others. Mark the box in the T column if the statement is TRUE for most members. Mark the box in the F column if the statement is FALSE for most members. If the members are evenly divide, decide what is the stronger overall impression and answer accordingly. Remember, we would like to know what your family seems like to you. So do not try to figure out how other members see your family, but do give us your general impression of your family for each statement. Do not skip any item. Please begin with the first item.

68 Example: Burdensome Way of Responding For each question, choose from the following alternatives: 0 = Never 1 = Almost Never 2 = Sometimes 3 = Fairly Often 4 = Very Often 1. In the last month, how often have you felt nervous and “stressed”? ………..…………….01234 2. In the last month, how often have you felt that things were going your way?............................01234 S Cohen et al. J Health Soc Beh, 1983;24:385-96.

Example: Can Modify to Matrix Format 69

70 Types of Modifications u Format or presentation u Content –Dimensions –Item stems –Response options Add Drop Replace Modify

71 Content Modification Example: Add Dimension u Study of older Korean/Chinese immigrants u Added language support to existing social support measure u Based on focus group data: –Help with translation at medical appointments –Help to ask questions in English when on the phone –Help to learn English S Wong et al. Int J Health Human Dev, 2005;61:105-121.

72 Content Modification Example: Add Dimension (cont) u New items were embedded in existing social support measure using same format

73 Content Modification Example: Too Few Response Choices How much is each person (or group of persons) supportive for you at this time in your life. –Your wife, husband, or significant other person:  - None  - Some  - A lot G Parkerson et al. Fam Med; 1991;23:357-60.

74 Content Modification Example: Replace Response Choices How much is each person (or group of persons) supportive for you at this time in your life. –Your wife, husband, or significant other person:  - None  - Not at all  - Some  - A little  - A lot  - Moderately  - Quite a bit  - Extremely

75 Minor to Major Modifications? u Each type of modification can hypothetically be rated on a continuum from having minor to major impact on reliability and validity of original measure –Minor – slight changes in format/presentation …… –Major – numerous changes in dimensions, items, and response choices

76 Need to Test Psychometric Properties of Modified Measures u All modifications, no matter how small, can affect reliability and validity of original measure u Burden is on investigator to test modified measure

77 Recommendations for Testing Modified Measures u Pretest modified measure extensively before fielding in new study u Build in ability to do psychometric testing when measure is fielded –Add validity variables (e.g., similar to original measure to test comparability) –Add follow-up to assess test-retest reliability

78 Analyze Psychometric Adequacy of Modified Measure in New Study u Modified measure should meet minimal criteria –Item-scale correlations –Internal-consistency reliability

79 Analyzing Modified Measure: Comparability to Original Measure u Compare measurement results of modified measure to original measure –Reliability (sample dependent) –Factor structure –Construct validity –Sensitivity to change

80 Overall Conclusions u Measurement in health disparities research is relatively new field u We encourage reporting on adequacy and equivalence of measures tested in any diverse population u As evidence grows, easier to find measures that work better across diverse groups

81 Resource: Reviews Measures for Diverse Populations (2002) Reviews measures that have been used cross- culturally of:  Acculturation  Socioeconomic status  Social support  Cognition  Health  Depression  Religiosity

82 Resource: Special Journal Issue u Measurement in a Multi-ethnic Society –Med Care, Vol 44, November 2006 Qualitative and quantitative methods in addressing measurement in diverse populations

83 Guidelines for Translating Measures u Handout: annotated bibliography of articles in which optimal methods of translation are used u Compiled by CADC Measurement and Methods Core

New Publication: Modifying Measures u Stewart AL, Thrasher AD, Goldberg J, Shea J. A framework for understanding modifications to measures for diverse populations. J Aging Health, 2012, in press. 84

85 Homework for Class 4 u Complete rows 16-21 in matrix –Use form posted on the website u Include your name in the filename –Smith_HW_epi222_class3 u Email by Monday April 23 to anita.stewart@ucsf.edu

1 Measurement Issues in Health Disparities Research Anita L. Stewart, Ph.D. University of California, San Francisco Health Disparities Research Methods.

Similar presentations

Presentation on theme: "1 Measurement Issues in Health Disparities Research Anita L. Stewart, Ph.D. University of California, San Francisco Health Disparities Research Methods."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Measurement Issues in Health Disparities Research Anita L. Stewart, Ph.D. University of California, San Francisco Health Disparities Research Methods.

Similar presentations

Presentation on theme: "1 Measurement Issues in Health Disparities Research Anita L. Stewart, Ph.D. University of California, San Francisco Health Disparities Research Methods."— Presentation transcript:

Similar presentations

About project

Feedback