Presentation on theme: "Scaling Session Measurement implies “assigning numbers to objects or events…” Distinguish two levels: we can assign numbers to the response levels for."— Presentation transcript:
Scaling Session Measurement implies “assigning numbers to objects or events…” Distinguish two levels: we can assign numbers to the response levels for a single question (mild, moderate or severe pain), and we can also assign different numerical weights to each question. Thus, saying ‘No’ to “Can you get out of bed?” might get a higher score than ‘No’ to “Can you run a mile?” The purpose of scaling is to select appropriate numbers for these two purposes to represent amounts of health. Where do these numbers come from?
Where Do the Weights Come From? You can assign the same weight to each question: e.g., one point for each affirmative response You could assign arbitrary values to response levels (mild pain = 1, moderate = 2, severe = 3) You might base these numbers on some type of conceptual model of the phenomenon Or infer weights from administrative, legal, or social decisions (how much compensation is paid for this type of disability?) Or you can calculate weights through a scaling task.
Scaling tasks These produce weights through an empirical procedure. Scaling is undertaken by people who are asked to provide their personal judgment; this measures their ‘preferences’ (or aversion) for specified health states; Two categories of preferences can be distinguished: ‘values’ and ‘utilities’. These correspond to two contrasting historical traditions that have influenced the way we assign numbers in health measurement: psychometrics and econometrics.
Psychometrics & Econometrics Psychometrics deals with feelings, opinions and perceptions, and is appropriate in judging single items; it measures “values”. The econometric tradition derives from studies of consumption and choices between goods; it focuses on making decisions under conditions of uncertainty (as with investing). It measures “utilities”: choice given risk. “Utilities are the numbers that represent the strength of a person’s preferences for particular outcomes when faced with uncertainty” (George Torrance)
Psychometrics & Econometrics (cont’d) Hence the econometric approach is suitable for weighting health states for clinical decision analysis and the patient’s choice of therapy, where there is uncertainty. Used in planning care & anything to do with future health. The psychometric approach is good for valuing current health states. In general, utility scores are higher than value judgments, although the difference may not be great.
The main methods of calculating weights Psychometric –Paired comparisons method –Equal-appearing interval scaling –Likert scaling –Magnitude estimation methods Utility Methods –Standard gamble –Time tradeoff –Willingness to pay
Psychometric Rating Tasks Many variants. For example: Thurstone ‘equal-appearing interval scaling’. Cards with descriptions of health states (the items) written on each; raters place these on a scale representing intensity of the relevant concept (e.g., disability). Typically 15 spaces on scale. The item weights come from the average of individual judgments. High SD suggests ambiguous item. Magnitude estimation: Raters compare the health states with a standard state and are asked to provide a number or ratio indicating how much worse or better each is than the standard.
Econometric Rating Tasks ‘Standard Gamble’. Respondent chooses between a certain outcome (e.g., living in the restricted health state for 10 years and then dying) and a gamble (e.g., 90% chance of immediate cure, but with a 10% chance of immediate death). The more severely they judge the current state, the higher the risk of death they will accept (12% or 15%, etc) to avoid it. Time trade-off. Respondent asked to imagine being in the health state being rated and is then asked how many years of life hw will give up to be cured from it.
The Procedure 1.Choose people to make the judgments. Think carefully about the sample! 2.Choose the health states to be rated (often a brief description) 3.Select a scaling method (psychometric or econometric) 4.Collect the preference judgments 5.Analyze the data and calculate weights for each health state
Alternatives All of that seems quite a bother to do! An alternative is to derive weights from the pattern of responses from a representative sample of people on the health measure. This fits within the classical test theory approach to measurement. It is ‘norm referenced’ – compared to a distribution. One example is Likert scaling: next slide
Example of Calculating Likert Scale scores (this was a satisfaction question “My doctor gives excellent care”) Strongly Disagree DisagreeUnsureAgree Strongly Agree p choosing this option.126.96.36.199.10 Cum p.188.8.131.521.0 Mid – point of interval *.065.345.665.835.950 Z (std deviate for mid-point) -1.514-0.3990.4260.9741.645 Add 1.514 to re-calibrate 0.01.1151.9402.4883.159 Z rounded 0122.53 * Half p for that category plus p for category below
Guttman Scaling A B C D E F I can use the toilet without assistance yes yes yes yes yes no I can rise from an armchair yes yes yes yes no no I can walk one block yes yes yes no no no I can do the grocery shopping yes yes no no no no I can run a mile yes no no no no no score 5 4 3 2 1 0 People This generates an ordinal scale in which all items fall on one dimension.
Some points to recognize & ponder Is scaling worth the effort? The weighted and unweighted versions of many health measures often correlate > 0.9 Where a scale has different sections, the overall score is weighted by the number of items in each section. Think about unidimensionality. Is a notion such as “independence” really a single dimension? Do overall scores make sense? Should we add incontinence to mobility? Is Hi + Lo equivalent to Med + Med?
Further thoughts… Note that numerical ratings can represent many different aspects of a health state: – frequency of occurrence of the symptom –probability it will occur –unpleasantness of the symptom –utility (or undesirability, given its probability) Do interval scales necessarily represent conceptually equal intervals? (Is age an interval scale when you are using it to represent maturity?)