Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Task Force on the Development of a Common Instrument to Measure Health States: Conceptual and Logistic Issues in Item Construction Cameron N. McIntosh;

Similar presentations


Presentation on theme: "1 Task Force on the Development of a Common Instrument to Measure Health States: Conceptual and Logistic Issues in Item Construction Cameron N. McIntosh;"— Presentation transcript:

1 1 Task Force on the Development of a Common Instrument to Measure Health States: Conceptual and Logistic Issues in Item Construction Cameron N. McIntosh; Julie Bernier; Jean-Marie Berthelot; Sarah Connor Gorber; Michael C. Wolfson Statistics Canada Ottawa, Ontario, Canada Working Paper No.3 22 November 2005 STATISTICAL COMMISSION andSTATISTICAL OFFICE OF THE UN ECONOMIC COMMISSION FOREUROPEAN COMMUNITIES EUROPE (EUROSTAT) CONFERENCE OF EUROPEAN WORLD HEALTH STATISTICIANS ORGANIZATION (WHO) Joint UNECE/WHO/Eurostat Meeting on the Measurement of Health Status (Budapest, Hungary, 14-16 November 2005) Session 3-Invited paper

2 2 Selected Domains Selected Domains 1. Physical Functioning: Mobility 2. Physical Functioning: Dexterity 3. Vitality/Fatigue 4. Affect (happiness, depression) 5. Anxiety (worry, fear, nervousness) 6. Vision (visual acuity) 7. Hearing (auditory acuity) 8. Pain and Discomfort 9. Social Relationships (including aspects of communication) 10. Cognition (a) memory and concentration (a) memory and concentration (b) problem solving and thinking (b) problem solving and thinking For inclusion on the common instrument, the task force selected the following 10 domains, for which specific items were to be constructed:

3 3 Developing Questions for the Health Domains Developing Questions for the Health Domains A number of conceptual and logistic issues needed to be considered in the item construction process for all domains; these can be grouped under the following five major headings: A number of conceptual and logistic issues needed to be considered in the item construction process for all domains; these can be grouped under the following five major headings: (1) Number of Questions per Domain (2) Questions Should be Uni-dimensional (3) Duration of the Recall Period for the Questions (4) Dealing with Technical and Medicinal Prosthetics (5) Item Wording and Response Categories

4 4 (1) Number of Questions per Domain (1) Number of Questions per Domain trade-off between adequate domain coverage and operational feasibility of the survey instrument; ideally, each question should be assessed using only one or two items trade-off between adequate domain coverage and operational feasibility of the survey instrument; ideally, each question should be assessed using only one or two items multi-faceted domains (e.g., Cognition) may necessitate multiple items to enhance measurement precision multi-faceted domains (e.g., Cognition) may necessitate multiple items to enhance measurement precision Filter questions should be considered for screening out respondents with “no limitations” on a given domain Filter questions should be considered for screening out respondents with “no limitations” on a given domain Advantage: might conserve interview time since not all response categories would need to be read in all cases Advantage: might conserve interview time since not all response categories would need to be read in all cases Disadvantage: might result in a bias toward “no” responses, as it provides a relief from the mental effort needed to generate an estimate of functioning Disadvantage: might result in a bias toward “no” responses, as it provides a relief from the mental effort needed to generate an estimate of functioning

5 5 (2) Questions Should be Uni-Dimensional (2) Questions Should be Uni-Dimensional to maximize measurement precision, each item should only assess one domain (or domain aspect); “double-barreled” response categories should be avoided, for example (EQ-5D): to maximize measurement precision, each item should only assess one domain (or domain aspect); “double-barreled” response categories should be avoided, for example (EQ-5D): 1. I am not anxious or depressed 1. I am not anxious or depressed 2. I am moderately anxious or depressed 2. I am moderately anxious or depressed 3. I am extremely anxious or depressed 3. I am extremely anxious or depressed responses to items mixing different concepts are difficult to interpret; do not know which part of the question was being answered responses to items mixing different concepts are difficult to interpret; do not know which part of the question was being answered multiple concepts within a single question might also confuse respondents and result in natural questions for interviewers, for example: multiple concepts within a single question might also confuse respondents and result in natural questions for interviewers, for example: “If I am not anxious but am moderately depressed, should I pick 1 or “If I am not anxious but am moderately depressed, should I pick 1 or 2?” 2?”

6 6 (3) Duration of the Recall Period for the Questions (3) Duration of the Recall Period for the Questions respondents need to base their functional status reports on some time period respondents need to base their functional status reports on some time period just asking about “general” or “usual” functioning might provide the least biased estimates, as it helps to avoid picking up the impact of time-limited health conditions (e.g., flu) just asking about “general” or “usual” functioning might provide the least biased estimates, as it helps to avoid picking up the impact of time-limited health conditions (e.g., flu) a problem is that “usual” or “general” are vague terms and might not have consistent meaning across countries and cultures; may pose translation difficulties a problem is that “usual” or “general” are vague terms and might not have consistent meaning across countries and cultures; may pose translation difficulties A specific recall period (e.g., the previous 30 days) would help standardize measurement, as well as facilitate translation A specific recall period (e.g., the previous 30 days) would help standardize measurement, as well as facilitate translation

7 7 (3) Duration of the Recall Period for the Questions (3) Duration of the Recall Period for the Questions choice of specific recall period must take several factors into account choice of specific recall period must take several factors into account shorter the recall period, the greater the tendency to only consider frequent, highly patterned events of lower intensity shorter the recall period, the greater the tendency to only consider frequent, highly patterned events of lower intensity longer the recall period, the tendency is toward consideration of infrequent, more intense events (e.g., intense episodes of anger) longer the recall period, the tendency is toward consideration of infrequent, more intense events (e.g., intense episodes of anger) optimum recall period would lead to a balanced consideration of domain-related events (i.e., events of varying intensity) optimum recall period would lead to a balanced consideration of domain-related events (i.e., events of varying intensity)

8 8 (3) Duration of the Recall Period for the Questions (3) Duration of the Recall Period for the Questions Telescoping: events are improperly included or excluded from the recall period Telescoping: events are improperly included or excluded from the recall period Forward telescoping: an event that is better-represented in memory (highly vivid and intense) is included incorrectly in the recall period Forward telescoping: an event that is better-represented in memory (highly vivid and intense) is included incorrectly in the recall period Backward telescoping: an event that is more poorly represented in memory (less vivid and intense) is excluded incorrectly from the recall period Backward telescoping: an event that is more poorly represented in memory (less vivid and intense) is excluded incorrectly from the recall period questions may need to reinforce that the focus is on respondents’ lives during the specified recall period only questions may need to reinforce that the focus is on respondents’ lives during the specified recall period only

9 9 (4) Dealing With Technical and Medicinal Prosthetics (4) Dealing With Technical and Medicinal Prosthetics to accurately measure capacity and feelings, the questions may need to incorporate information on the use of aids (e.g., walking equipment, glasses and contact lenses, hearing aids, medication for controlling pain and regulating mood) to accurately measure capacity and feelings, the questions may need to incorporate information on the use of aids (e.g., walking equipment, glasses and contact lenses, hearing aids, medication for controlling pain and regulating mood) if certain items do not specify the use of aids, respondents who use aids might pose natural questions to interviewers, for example: if certain items do not specify the use of aids, respondents who use aids might pose natural questions to interviewers, for example: “Do you mean how much difficulty I have getting around the “Do you mean how much difficulty I have getting around the neighbourhood with or without my walker/wheelchair?” neighbourhood with or without my walker/wheelchair?” “Are you referring to the intensity of my pain when I am on or off my “Are you referring to the intensity of my pain when I am on or off my medication?” medication?” questions for domains where aids are most relevant (e.g., mobility, vision, hearing, pain and discomfort) should probably mention the use of aids in the preamble and/or the response categories questions for domains where aids are most relevant (e.g., mobility, vision, hearing, pain and discomfort) should probably mention the use of aids in the preamble and/or the response categories

10 10 (5) Item Wording and Response Categories Terminology will have to be chosen carefully in order to facilitate translation and international comparability of concepts Terminology will have to be chosen carefully in order to facilitate translation and international comparability of concepts language that is either overly colloquial or overly scientific should be avoided language that is either overly colloquial or overly scientific should be avoided might be best to assess capacity in terms of “difficulty in doing __”; questions directly using the terms “capacity” (or “ability”) might be ambiguous for respondents might be best to assess capacity in terms of “difficulty in doing __”; questions directly using the terms “capacity” (or “ability”) might be ambiguous for respondents need to determine whether problems in functioning will be assessed in terms of frequency (how often), intensity (how bad), or both need to determine whether problems in functioning will be assessed in terms of frequency (how often), intensity (how bad), or both

11 11 (5) Item Wording and Response Categories Response category cut-point shift problem – the same underlying level of capacity or feeling may not receive the same rating across countries, cultures, or individuals (e.g., limitations seen as “mild” in one culture may be seen as “severe” in another; the frequency of a given problem might be rated as “some of the time” in one culture and “all of the time” in another) Response category cut-point shift problem – the same underlying level of capacity or feeling may not receive the same rating across countries, cultures, or individuals (e.g., limitations seen as “mild” in one culture may be seen as “severe” in another; the frequency of a given problem might be rated as “some of the time” in one culture and “all of the time” in another) alternative to full sets of quantifiers and qualifiers would be to use scales with qualifiers or quantifiers on the endpoints only (e.g., Visual Analogue Scale, or a ladder) alternative to full sets of quantifiers and qualifiers would be to use scales with qualifiers or quantifiers on the endpoints only (e.g., Visual Analogue Scale, or a ladder) measurement precision is lessened when descriptors are not attached to all scale values; also, it may be optimal to define every domain level for future preference measurement. measurement precision is lessened when descriptors are not attached to all scale values; also, it may be optimal to define every domain level for future preference measurement. both types of items (i.e., a fully defined system of levels versus endpoint labels only) should be subjected to cognitive testing both types of items (i.e., a fully defined system of levels versus endpoint labels only) should be subjected to cognitive testing

12 12 Issues Requiring Input Issues Requiring Input What should be the upper limit on questions for each domain? What should be the upper limit on questions for each domain? How do we arrive at an optimal balance between precision in measurement (i.e., maintaining item uni-dimensionality) and operational feasibility (i.e., having a reasonably brief survey module)? How do we arrive at an optimal balance between precision in measurement (i.e., maintaining item uni-dimensionality) and operational feasibility (i.e., having a reasonably brief survey module)? What is the best recall period for the items? What is the best recall period for the items? What is the best way to incorporate information on technical and medicinal prosthetics be built into the items? What is the best way to incorporate information on technical and medicinal prosthetics be built into the items? Should there be response category labels for every level of a domain, or should there be scales with labels on the endpoints only? How do we derive a set of internationally comparable descriptors? Should there be response category labels for every level of a domain, or should there be scales with labels on the endpoints only? How do we derive a set of internationally comparable descriptors?


Download ppt "1 Task Force on the Development of a Common Instrument to Measure Health States: Conceptual and Logistic Issues in Item Construction Cameron N. McIntosh;"

Similar presentations


Ads by Google