1 Class 7 Measurement Issues in Research with Diverse Populations Including Health Disparities Research November 5, 2009 Anita L. Stewart Institute for.

1 Class 7 Measurement Issues in Research with Diverse Populations Including Health Disparities Research November 5, 2009 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

2 Overview of Class 7 u Background: culture-specific versus generic measures u Conceptual and psychometric adequacy and equivalence –Adequacy in one group –Equivalence across groups

3 Background u U.S. population becoming more diverse u Minority groups are being included in research due to: –NIH mandate (1993 – women and minorities) –Health disparities initiatives

4 Types of Diverse Groups u Health disparities research focuses on differences in health between … –Minority vs. non-minority –Lower income vs. others –Lower education vs. others –Limited English Proficiency (LEP) vs. others –…. and many others

5 Measurement Implications of Research in Diverse Groups u Most self-reported measures were developed and tested in mainstream, well-educated groups u Little information is available on appropriateness, reliability, validity, and responsiveness in diverse groups –Although this is changing rapidly

6 Measurement Adequacy vs. Measurement Equivalence u Adequacy - within a “diverse” group –concepts are appropriate and relevant –psychometric properties meet minimal criteria »Good variability »Reliable and valid »Sensitive to change over time u Equivalence - between “diverse” groups –conceptual and psychometric properties are comparable

7 Why Not Use Culture-Specific Measures? u Measurement goal is to identify measures that can be used across all groups in one study, yet maintain sensitivity to diversity and have minimal bias u Most health disparities studies compare mean scores across diverse groups

8 Generic/Universal vs Group-Specific (Etic versus Emic) u Concepts unlikely to be defined exactly the same way across diverse ethnic groups u Generic/universal (etic) –features of a concept that are appropriate across groups u Group-Specific (emic) –idiosyncratic or culture-specific portions of a concept

9 Etic versus Emic (cont.) u Goal in health disparities research with more than one group: –identify generic/universal portion of a concept that are applicable across all groups u For within-group studies: –the culture-specific portion is also relevant

11 Conceptual and Psychometric Adequacy and Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

12 Left Side of Matrix: Adequacy in a Single Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

13 Ride Side of Matrix: Equivalence in More Than One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

15 Approaches to Explore Conceptual Adequacy in Diverse Groups u Literature reviews of concepts and measures u In-depth interviews and focus groups –discuss concepts, obtain their views u Expert consultation from diverse groups –review concept definitions –rate relevance of items

16 Basis: Published Review - Physical Activity Measures for Minority Women u WHI convened experts to identify issues in measuring PA in minority and older women u Some conclusions: –Add culturally sensitive activities (e.g., walking for transportation and errands) –Measure intermittent activities –Phrases “leisure time, free time, spare time” (used to denote non-occupational activities) not understood u Review can help select appropriate measures and adapt as needed LC Masse et al., J Women’s Health, 1998;7:57-67.

17 Basis: Published Review - Measures of Dietary Intake in Minority Populations u Reviewed food frequency questionnaires for appropriateness for minority populations –method of development, minority-group specific features, reliability, validity, and systematic bias u Group differences that could affect scores: –Portion sizes differ –Missing common foods of minority groups/cultures u Would underestimate total intake and nutrients RJ Coates et al. Am J Clin Nutr; 1997;65(suppl):1108S-15S.

18 A Structured Method for Examining Conceptual Relevance u Compiled set of 33 HRQL items u Assessed relevance to older African Americans u After each question, asked “how relevant is this question to the way you think about your health?” –Response scale: 0-10 scale with endpoints labeled –0=not at all relevant, 10=extremely relevant Cunningham WE et al., Qual Life Res, 1999;8:749-768.

19 HRQL Relevance Results u Most relevant items: –Spirituality, weight-related health, hopefulness –Spirituality items u Least relevant items: –Physical functioning, role limitations due to emotional problems

20 Qualitative Research: Expert Panel Reviewed Spanish FACT-G u Functional Assessment of Cancer Therapy – General (FACT-G) u Bilingual/bicultural panel reviewed items for conceptual relevance to Hispanics –One item had low relevance ( I worry about dying) »Added new item "I worry my condition will get worse" –One domain missing – spirituality »Developed new spirituality scale (FACIT-Sp) with input from cancer patients, psychotherapists, and religious experts D Cella et al. Med Care 1998: 36;1407

21 Example of Inadequate Concept u Patient satisfaction typically conceptualized in mainstream populations in terms of, e.g., –access, technical care, communication, continuity, interpersonal style u In minority and low income groups, additional relevant domains include, e.g., –discrimination by health professionals –sensitivity to language barriers MN Fongwa et al., Ethnicity Dis, 2006;16(3):948-955.

22 Measuring Park/Recreation Environments in Low-Income Communities u New policy focus on how environments promote physical activity –Many good new measures u None considered concerns or environments of lower-income minority communities MF Floyd et al. Am J Prev Med, 2009;36:S156-S160.

23 Measuring Park/Recreation Environments in Low-Income Communities (cont) u Recommendations: In low-income communities of color: –Identify and address most salient environmental needs –Incorporate research on preferred recreational activities –Ensure representation of perceptions of residents MF Floyd et al. Am J Prev Med, 2009;36:S156-S160.

24 Psychometric Adequacy in any Group u Minimal standards: – Sufficient variability – Minimal missing data – Adequate reliability/reproducibility – Evidence of construct validity – Evidence of sensitivity to change

26 Conceptual Equivalence Across Groups Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

27 Conceptual Equivalence u Is the concept relevant, familiar, acceptable to all diverse groups being studied? u Is the concept defined the same way in all groups? –all relevant “domains” included (none missing) –interpreted similarly

28 Obtain Perspective of All Diverse Groups on Concept Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups

29 Example: Developing Concept of Interpersonal Processes of Care IPC II conceptual framework IPC Version I framework in Milbank Quarterly 19 new focus groups - African American, Latino, and White adults Literature review of quality of care in diverse groups

30 IPC-II Conceptual Framework: 91 items I. COMMUNICATION III. INTERPERSONAL STYLE General clarity Respectfulness Elicitation/responsiveness Courteousness Explanations of Perceived discrimination --processes, condition, Emotional support self-care, meds Cultural sensitivity Empowerment II. DECISION MAKING Responsive to patient preferences Consider ability to comply

31 IPC-II Conceptual Framework (cont) IV. OFFICE STAFF Respectfulness Discrimination V. FOR LIMITED ENGLISH PROFICIENCY PATIENTS MD’s and office staff’s sensitivity to language

32 Psychometric Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

33 Psychometric or Measurement Equivalence u When comparing groups (as in health disparities research): –Measures should have similar or equivalent measurement properties in all diverse groups of interest in your study »e.g., English and Spanish, African Americans and Caucasians

34 Psychometric Equivalence Across Groups u Psychometric characteristics should be “equivalent” across all groups: – Sufficient variability – Minimal missing data – Reliability/reproducibility – Construct validity – Sensitivity to change

35 Bias (Systematic Error) - A Special Concern u Observed group mean differences in a measure can be due to: –Culturally- or group-mediated differences in true score (true differences) -- OR -- –Bias - systematic differences between observed scores not attributable to true scores

36 Random versus Systematic Error Observed true item score score =+ error random systematic Relevant to reliability Relevant to validity

37 Bias (Systematic Error) - A Special Concern (cont) u Measurement bias may make group comparisons invalid u Bias can be due to group differences in: –the meaning of concepts or items –the extent to which measures represent a concept –cognitive processes of responding –use of response scales –appropriateness of data collection methods

38 Bias or “Systematic Difference”? u Bias refers to “deviation from true score” u Cannot speak of a measure being “biased” in one group compared to another w/o knowing true score u Preferred term: differential “item” functioning (DIF) –Item (or measure) that has a different meaning in one group than another

39 Item Equivalence u Differential Item Functioning (DIF) –Items are non-equivalent if they are differentially related to the underlying trait u Meaning of response categories is similar across groups u Distance between response categories is similar across groups

40 Methods for Identifying Differential Item Functioning (DIF) u Item Response Theory (IRT) u Examines each item in relation to underlying latent trait u Tests if responses to one item predict the underlying latent “score” similarly in two groups –if not, items have “differential item functioning”

41 Example of Effect of Biased Items u 5 CES-D items administered to Black and White men –1 item subject to differential item functioning (bias) u 5-item scale including item suggested that Black men had more somatic symptoms than White men (p <.01) u 4-item scale excluding biased item showed no differences S Gregorich, Med Care, 2006;44:S78-S94.

42 Equivalence of Response Choices: Spanish and English Self-rated Health u Excellent u Very good u Good u Fair u Poor u Excelente u Muy buena u Buena u Regular u Mala “Regular” in Spanish may be closer to “good” in English, thus is not comparable to the meaning of “fair”

43 Equivalence of Response Choices: Spanish and English Self-rated Health u Excellent u Very good u Good u Fair u Poor u Excelente u Muy buena u Buena u Regular (pasable?) u Mala “Regular” in Spanish may be closer to “good” in English, thus is not comparable to the meaning of “fair”

44 Equivalence of Reliability?? No! u Difficult to compare reliability because it depends on the distribution of the construct in a sample –Thus lower reliability in one group may simply reflect poorer variability u More important is the adequacy of the reliability in both groups –Reliability meets minimal criteria within each group

45 Example: Adequacy of Reliability of Spanish SF-36 in Argentinean Sample SF-36 scaleCoefficient alpha Physical functioning.85 Role limitations - physical.84 Bodily pain.80 General health perceptions.69 Vitality.82 Social functioning.76 Role limitations - emotional.75 Mental health.84 F Augustovski et al, J Clin Epid, 2008, in press;

46 Equivalence of Criterion Validity u Determine if hypothesized patterns of associations with specified criteria are confirmed in both groups, e.g. –a measure predicts utilization in both groups –a cutpoint on a screening measure has the same specificity and sensitivity in identifying a condition in both groups

47 Equivalence of Construct Validity u Are hypothesized patterns of associations confirmed in both groups? –Example: Scores on the Spanish version of the FACT-G had similar relationships with other health measures as scores on the English version u Primarily tested through subjectively examining pattern of correlations u Can also test using confirmatory factor analysis (CFA)

48 Equivalence of Construct Validity of Spanish SF-36 in Argentinean Sample u Compared Spanish SF-36 construct validity test results to U.S. English SF-36 results u Tested several previously tested hypotheses (which were confirmed): –PCS decreases with age and # of diseases –Relationship of PCS and MCS with utilization –Known groups validity (scores lower for those with various diseases)

49 Equivalence of Factor Structure u Factor structure is similar in new group to structure in original groups in which measure was tested –measurement model is the same across groups u Methods –Specify the number of factors you are looking for –Determine if the hypothesized model fits the data

50 How Evidence for Equivalence of Factor Structure is Obtained u Subjectively –visually compare factor pattern matrixes across “group-specific” exploratory factor analysis solutions u Empirically –confirmatory factor analysis of data that includes multiple groups –studies of psychometric invariance

51 Empirical Examination of Equivalence of Factor Structure u Psychometric invariance (equivalence) u Important properties of theoretically-based factor structure (measurement model) do not vary across groups (are invariant) –measurement model is the same across groups u Empirical comparison across groups using confirmatory factor analysis –Not simply by examination

52 Confirmatory Factor Analysis Hierarchical Tests of Equivalence Across all groups – a sequential process: u Same number of factors or dimensions u Same items on same factors u Same factor loadings u No bias on any item across groups u Same residuals on items u No item or scale bias AND same residuals

53 Measurement or Psychometric Invariance Gregorich, S.E. Do self-report instruments allow meaningful comparisons across population groups? Testing measurement invariance using the confirmatory factor analysis framework. Med Care, 2006;44 (11, supp 3):S78-S94.

54 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Scalar or Strong Factorial Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances are unbiased Strict Factorial Invariance Both scalar and residual criteria are met Criteria for Evaluating Invariance Across Groups: Technical Terms

55 Dimensional Invariance of CES-D u Definition: same number of factors observed in all groups u Original 4 CES-D factors –Somatic symptoms –Depressive affect –Interpersonal behavior –Positive affect LS Radloff, The CES-D scale: A self-report depression scale for research in the general population, Applied Psychol Measurement, 1977;1:385-401.

56 No Evidence of Dimensional Invariance u Hispanic EPESE (n=2,536) and a study of older Mexican Americans (n=330) u 2 factors in both studies –Depression (somatic symptoms, depressive affect, and interpersonal behavior) –Well-being u American Indian adolescents (n=179) u 3 factors –Depressed affect –Somatic symptoms and reduced activity –Positive affect TQ Miller et al., J Gerontol: Soc Sci 1997;520:S259 SM Manson et al., Psychol Assessment 1990;2:231-237

57 Configural Invariance u Assumes: dimensional invariance is found (same number of factors) u Definition: Item-factor patterns are the same, i.e., the same items load on the same factors in both groups u CES-D example –4 factors found in Anglos, Blacks, and Chicanos –Same items loaded on each factor in all groups RE Roberts et al., Psychiatry Research, 1980;2:125-134

58 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Configural Invariance

59 Metric Invariance or Factor Pattern Invariance u Assumes: dimensional and configural invariance are found u Definition: Item loadings are the same across groups –i.e., the correlation of each item with its factor is the same in all groups

60 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Metric Invariance

61 Metric Invariance Example from Interpersonal Processes of Care u Out of 91 items – factor structure of 29 items met criteria of dimensional, configural, and metric invariance across 4 groups –Spanish-speaking Latinos, English speaking Latinos, African Americans, Whites u Dimensional –Similar factor structure across all 4 groups u Configural –Same items loaded on each factor in all 4 groups u Metric –Same item loadings in all 4 groups

62 Seven “Metric Invariant” Scales: Same Item Loadings Across Groups I. COMMUNICATION Hurried communication Elicited concerns, responded Explained results, medications II. DECISION MAKING Patient-centered decision-making III. INTERPERSONAL STYLE Compassionate, respectful Discriminated Disrespectful office staff

63 Strong Factorial Invariance or Scalar Invariance u Assumes: dimensional, configural, and metric invariance are found u Definition: Observed scores are unbiased, i.e., means can be compared across groups u Requires test of equivalence of mean scores across groups using confirmatory factor analysis

64 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Strong Factorial Invariance

65 Seven “Scalar Invariant” (Unbiased) IPC Scales (18 items) I. COMMUNICATION Hurried communication – lack of clarity Elicited concerns, responded Explained results, medications – explained results II. DECISION MAKING Patient-centered decision-making – decided together III. INTERPERSONAL STYLE Compassionate, respectful–(subset) compassionate, respectful Discriminated – discriminated due to race/ethnicity Disrespectful office staff

66 Equivalence of Spanish and English Hospital Quality of Care Survey (H-CAHPS ® ) u Tested 7 subscales –Nurse communication, MD communication, communication about meds, nursing services, discharge information, pain control, and physical environment u Report on translation/adaptation, pretesting, item-scale correlations, internal consistency reliability, and construct validity u CFA methods compared factor structure between Spanish and English groups MP Hurtado et al. Health Serv Res, 2005;40-6, Part II:2140-2161

67 Psychometric or Measurement Equivalence: Second Meaning u Measurement properties of a measure in your diverse group are similar to original (mainstream) groups on which the measures were developed u Subjective comparison and evaluation

68 Mixed Methods for Assessing Equivalence u U se qualitative and quantitative methods in tandem to address issues of cultural equivalence

69 Mixed Methods: Developing IPC Measure of “Cultural Sensitivity” u Initial concept and items from qualitative work u 1 st survey: In psychometric analyses, did not meet minimal criteria u Second version of concept and items –new qualitative work, results of first study u 2 nd survey: In psychometric analyses, measure again did not meet minimal criteria u Analyzed focus group data in more depth –cultural sensitivity is multidimensional u 3 rd survey: testing multidimensional measures of cultural sensitivity

70 Conclusions u Measurement in health disparities and minority health research is a relatively new field u Encourage testing and reporting on adequacy and equivalence of measures tested in any diverse population u As evidence grows, concepts and measures that work better across diverse groups will be identified

71 Resource: Reviews of Measures for Diverse Populations u Multicultural measurement in older populations, JH Skinner et al (eds), Springer Publishing Co: NY, 2002 –ALSO published as: Measurement in older ethnically diverse populations, J Mental Health Aging, Vol 7, Spring 2001 Reviews measures that have been used cross-culturally in: acculturation, socio-economic status, social supports, cognition, health and functional capacity, depression, health locus of control, health-related quality of life, and religiosity

72 Resource: Special Journal Issue u Measurement in a multi-ethnic society –Med Care, Vol 44, November 2006 –Qualitative and quantitative methods in addressing measurement in diverse populations

73 Resource: Clinical Research with Diverse Communities u Epi 222, Spring u Course Director: Eliseo Pérez-Stable, MD u Thursdays 2:45-4:15 –China Basin u Summary and syllabus for 2008: http://www.epibiostat.ucsf.edu/courses/schedule/diverse_pops.html

74 Epi 222 Provides Overview Of…. u Meaning of race, ethnicity, social class and culture u Multi-level factors that are mechanisms of health disparities u Methodological and measurement considerations in research in ethnically diverse populations u Qualitative methods in developing and pre-testing instruments u Strategies for recruiting ethnically diverse populations and for expanding the role of communities

75 Homework for Next Week u For those interested in studying any diverse population group: –Finish matrix: complete rows 27-34 »Translations, equivalence across diverse groups, acceptability for your population u For everyone: –Complete row 34: can measure be modified

76 Next Week (Class 8) u Pretesting measures and creating a questionnaire

1 Class 7 Measurement Issues in Research with Diverse Populations Including Health Disparities Research November 5, 2009 Anita L. Stewart Institute for.

Similar presentations

Presentation on theme: "1 Class 7 Measurement Issues in Research with Diverse Populations Including Health Disparities Research November 5, 2009 Anita L. Stewart Institute for."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Class 7 Measurement Issues in Research with Diverse Populations Including Health Disparities Research November 5, 2009 Anita L. Stewart Institute for.

Similar presentations

Presentation on theme: "1 Class 7 Measurement Issues in Research with Diverse Populations Including Health Disparities Research November 5, 2009 Anita L. Stewart Institute for."— Presentation transcript:

Similar presentations

About project

Feedback