Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measures of Disease Association

Similar presentations


Presentation on theme: "Measures of Disease Association"— Presentation transcript:

1 Measures of Disease Association
Measuring occurrence of new outcome events can be an aim by itself, but usually we want to look at the relationship between an exposure (risk factor, predictor) and the outcome The type of measure showing an association between an exposure and an outcome event is linked to the study design We have reviewed basic study design and examined the different measures of disease occurrence. In the next two lectures we will look as the measures that are used to show association beween an exposure or treatment and disease. These measures are called measures of association and the type of measure available is determined by the study design. So we will come full circle in linking specific measures of disease association with study design.

2 Main points to be covered
Measures of association compare measures of disease between levels of a predictor variable Cross-sectional study Introducing: The 2 X 2 table Prevalence ratio Odds ratio Cohort study Risk ratio (cumulative incidence) Rate ratio (incidence rate) Absolute risk versus relative risk

3 Cross-Sectional Study Design: A Prevalent Sample
We return once more to the text’s schematic for a cross-sectional study. When it is viewed in the setting of some hypothetical cohort whose experience of the disease outcome over time is being sampled at a single point in time, it is clear that only prevalent cases of the disease are being captured, not incidence. Likewise, the non-diseased are also a prevalent sample from one point in time. Thus, when we turn to the measure of association between a predictor/exposure and the disease outcome in a cross-sectional sample, we need to keep in mind that it is a measure based on prevalence. As pointed out earlier, examining the association of exposures/predictors with prevalent cases of disease is to look at both factors associated with disease occurrence and factors associated with disease duration (survival). These factors can be quite different, and interpretation of the measure of association using prevalent disease is complicated by this fact.

4 Measures of Association in a Cross-Sectional Study
Simplest case is to have a dichotomous outcome and dichotomous exposure variable Everyone in the sample is classified as diseased or not and having the exposure or not, making a 2 x 2 table The proportions with disease are compared among those with and without the exposure NB: Exposure=risk factor=predictor The only kind of outcome we have been discussing is an event, which is by definition a dichotomous variable as it either has or has not occurred. There are, of course, other outcome measures that are important in clinical research, and they can be measured in various ways, such as multiple ordered categories (none, mild, severe congestive heart failure), a scale (such as a series of questions which form a quality of life scale), or a continuous measure (like blood pressure). The basic measures commonly used in clinical research that we will be discussing in the next two lectures are used for an association between an exposure and a dichotomous outcome. To keep it simple we will also assume a dichotomous exposure. The same principles will apply if the exposure has more than two categories; for example, separate 2x2 tables can be formed for the combinations of the outcome and the levels of the exposure if the exposure has more than two levels. We are using the word “exposure” for the variable whose association with the outcome the study seeks to investigate. Other terms are frequently used to describe the exposure variable, the most common being “risk factor” or “predictor” or “predictor variable.” In a clinical trial the exposure is usually a treatment, procedure, or intervention whereas in an observational study the exposure might be an immutable characteristic such as a person’s gender or race. In a regression analysis the outcome is called the dependent variable and the exposure or predictor variables are called independent variables. Although independent and dependent variables are widely used terms in statistics texts, they seem less intuitive to us than outcome and exposure.

5 a b a + b c d N = a+b+c+d a + c b + d
2 x 2 table for association of disease and exposure Disease Yes No Yes a b a + b Exposure c + d c d No This is a conventional schematic of a what is called in epidemiology a 2 x 2 table. We will illustrate basic measures of association using this table. Of course, exposures might have more than two levels and thus more than just yes/no categories, but the same principles would apply to the measures of association. This 2 x 2 schematic is widely used, but exposure and disease can switch position as can the yes and no columns and rows so it is important to notice how the table you are looking at has been arranged. Our arrangement is the same as in Szklo and Nieto and in many other basic epidemiology text books. STATA puts the exposure variable across the top and the outcome variable on the side. So the notation we will be using for a 2x2 table would read differently for data arranged in STATA, but the measures remain the same. N = a+b+c+d a + c b + d Note: data may not always come to you arranged as above. STATA puts exposure across the top, disease on the side.

6 Prevalence ratio of disease in exposed and unexposed
Yes No a a Yes b a + b PR = Exposure c c d c + d No The proportion of the persons with disease is calculated in the group with the exposure and separately in the group without the exposure. So a/a+b is the prevalence of disease among the exposed persons and c/c+d is the prevalence of disease among the non-exposed. Since this is a cross-sectional study, both are prevalence. To compare these two prevalences, we can form a ratio with one over the other to get a prevalence ratio of disease in the exposed and unexposed. If the prevalence is the same, the ratio will equal If disease prevalence is higher in those with the exposure (placed on top in the ratio), the ratio will be greater than 1.0 and if the prevalence is lower in those with the exposure, the ratio will be less than How much greater or less than 1.0 is a measure of the strength of the association between the exposure and the disease. This kind of measure of association is known as a ratio measure. A different kind of measure could have been formed by subtracting one prevalence from the other instead of forming a ratio. We will discuss that type of measure later.

7 Prevalence Ratio Text refers to Point Prevalence Rate Ratio in setting of cross-sectional studies We like to keep the concepts of rate and prevalence separate, and so prefer to use prevalence ratio This is one other place where the text language differs a bit from the language we suggest you use. We have already discussed this, incorrect, use of rate for prevalence data. The textbook is yielding to some older usage which we think is just confusing, so we will simply call this measure a prevalence ratio.

8 Sample data with prevalence ratio calculated
Cases Noncases Total Prevalence Exposed 14 17 31 0.45 Unexposed 388 248 636 0.61 667 Prevalence ratio = 0.45/0.61 = 0.74 Here is the data shown with outcome on top and exposure on the side. We prefer this schematic because it emphasizes the cohort design: starting with exposure status and experiencing the outcome. However, the key point is to be aware of how a 2x2 table is set up.

9 Describing a PR < 1 Prevalence ratio = 0.45/0.61 = 0.74 In words: Those who are exposed are 0.74 times less likely to have the disease compared with those who are not exposed. OR There is a 0.74 fold lower prevalence of disease among exposed compared to unexposed.

10 Describing a PR > 1 For example, a prevalence ratio = 1.5 In words: Those who are exposed are 1.5 times more likely to have the disease compared with those who are not exposed OR There is a 1.5 fold higher prevalence of disease among exposed compared to unexposed.

11 Example of 2 x 2 Table Layout in STATA
Exposed Unexposed | Total Cases | | Noncases | | Total | | | | STATA puts exposure across the top, disease on the side.

12 Prevalence ratio (STATA output)
Exposed Unexposed | Total Cases | | Noncases | | Total | | | | Risk | | Point estimate [95% Conf. Interval] Risk ratio | chi2(1) = Pr>chi2 = This illustrates output for a 2 x 2 table in STATA, which puts exposure across the top and disease on the side. This results in the cells with 17 and 388 being interchanged in the table, but you can verify for yourself that it doesn’t affect the calculation of the prevalence ratio. It is no matter which way you arrange the table. STATA calls this ratio a “risk ratio” (as the program has no way of knowing whether it is being given prevalence or incidence data), but we think it is important to be reminded that these are prevalent data from a cross-sectional sample. If these data were from a cohort with the same amount of follow-up on everyone (such as our example of the investigation of an outbreak of gastroenteritis following a party), risk ratio would be correct as a description of the ratio of the two proportions. STATA uses “risk” and “risk ratio” by default

13 a/(a+b) and c/(c+d) = probabilities of disease
Prevalence ratio of disease in exposed and unexposed Disease Yes No a a Yes b a + b PR = Exposure c c d c + d No The proportion with prevalent disease among those exposed is the probability of prevalent disease among the exposed, and similarly for the unexposed. We are making this point to distinguish a ratio based on probabilities from a ratio based on odds, to be explained shortly. a/(a+b) and c/(c+d) = probabilities of disease and PR is ratio of two probabilities

14 Study Reporting Prevalence Ratios
Prevalence of hip osteoarthritis among Chinese elderly in Beijing, China, compared with whites in the United States Abstract: The crude prevalence of radiographic hip OA in Chinese ages 60–89 years was 0.9% in women and 1.1% in men; it did not increase with age. Chinese women had a lower age-standardized prevalence of radiographic hip OA compared with white women in the SOF (age-standardized prevalence ratio 0.07) and the NHANES-I (prevalence ratio 0.22). Chinese men had a lower prevalence of radiographic hip OA compared with white men of the same age in the NHANES-I (prevalence ratio 0.19). Nevitt et al, 2002 Arthritis & Rheumatism Just to prove that we are not alone in suggesting use of the term prevalence ratio, we cite a publication from a study comparing prevalence of OA in Beijing and the US. This study is reporting a ratio formed from the prevalences. However, you may also see “prevalence ratio” used to refer to the “prevalence odds ratio.” We’ll talk about odds ratios next.

15 Probability and Odds Odds another way to express probability of an event Odds = # events # non-events Probability = # events # events + # non-events = # events # subjects Odds is an alternative way to express the probability of an event. Odds are most simply calculated as the number of events divided by the number of non-events. Probability is the number of events divided by the total of the number of events plus the number of non-events, which is the same as the number of events divided by the number of subjects. It may appear that odds are just an unnecessary complication of the familiar probability as a proportion of total subjects, but odds have some mathematical proporerties that we will explain shortly which make them very useful in clinical research.

16 Probability and Odds Probability = # events # subjects Odds = # events
# subjects = probability # non-events (1 – probability) Odds = p / (1 - p) [ratio of two probabilities: unlike probability, can be greater than 1] The formal way to describe the odds is as the probability of the event divided by the probability of the non-event. So odds are the ratio of two fractions: the number of events divided by the number of subjects (the probability of the event) and that fraction divided by the number of non-events divided by the number of subjects (the probability of the non-event). So the formula for odds is p / (1 – p). Since both fractions have the number of subjects in the denominator, they reduce to our first presentation of odds as the number of events divided by the number of non-events.

17 Probability and Odds If event occurs 1 of 5 times, probability = 1/5 = 0.2. Out of the 5 times, 1 time will be the event and 4 times will be the non-event, odds = 1 / 4 = 0.25 To calculate probability given the odds: probability = odds / 1+ odds To apply some numbers to the formulas, if the probability is 1 out of 5 = 0.2, then the odds are 1/5 / 4/5 = ¼ = Calculating the odds without the number of subjects by the ratio of the number of events (1) by the number of non-events (4), odds = ¼ = 0.25. To go in the other direction from odds to probability divide the odds by 1 + odds. In this example, ¼ / 1+1/4 = ¼ / 5/4 = 1/5, the probability. For those interested in why odds are used in gambling: odds give the ratio of the event to the non event, and the ratio of the event to the non-event tells you what the pay out has to be for a fair bet. For example, if you wager 1 dollar that the toss of one die will be the number 3, the probability is 1/6 since the die has 6 sides with 1 through 6 on the sides. Since the probability is 1/6, the odds are 1/6 / 5/6 = 1/5. The expectation is that 6 rolls of the die will result in a 3 one time and other numbers 5 times (on average!). So for the payout and the amount bet to be “fair”, i.e., balanced, you would need to receive 5 dollars when the number 3 comes up one time out of six to equal the 5 dollars you will lose one at a time on the other 5 tosses. So the odds are 1/5 usually written as 1:5 or 5:1 and spoken as “five to one.” Of course, in gambling calculating the odds are usually used to insure that the payout is not according to fair odds but odds that favor the casino.

18 Understanding Odds To express odds in words, think of it as the frequency of the event compared to the frequency of the non-event “For every time the event occurs, there will be 3 times when the event does not occur” In words: “Odds are 1 to 3” Written as 1:3 or 1/3 or 0.33

19 Odds Less intuitive than probability (probably wouldn’t say “my odds of dying are 1 to 4”) No less legitimate mathematically, just not so easily understood

20 Odds (continued) Used in epidemiology because the measure of association available in case-control design is the odds ratio (more on this next week) Also important because the log odds of the outcome is given by the coefficient of a predictor in a logistic regression. Can use models to obtain multivariable adjustment in cross-sectional design. Odds ratios are very useful in epidemiology because, as we will see, they are the only measure of association available in a case-control study. A second reason they are common in epidemiology is that in multivariate analysis of a dichotomous outcome logistic regression is used and the coefficients, the “betas”, from a logistic regression give the odds ratio by raising e to the beta (exponentiating it on the natural logarithm scale).

21 Odds ratio As odds are just an alternative way of expressing the occurrence of an outcome, odds ratio (OR) is an alternative to the ratio of two probabilities (prevalence ratio in cross-sectional studies or risk ratio in cohort study) Odds ratio = ratio of two odds To repeat, the odds ratio is less intuitive than a prevalence ratio or a risk ratio, but it is no less valid and expresses the occurrence of an outcome in a mathematically rigorous way. We discourage the use of OR for a cohort study since disease associations can be expressed with a more easily interpreted risk ratio or rate ratio. In the past, the use of the OR for a cohort study was attractive because the logistic regression model allows for multivariable adjustment. The Framingham cohort study was an impetus for development of the logistic regression models. However, with advances in biostatistics and in the computing power to run other models, there are now other good models (e.g. proportional hazards) available for analysis of cohort studies, making OR a poor choice.

22 2 3 5 1 4 5 3 7 10 Probability and odds in a 2 x 2 table Disease Yes
No What is p of disease in exposed? What are odds of disease in exposed? And the same for the un-exposed? 2 Yes 3 5 Exposure 1 4 5 No Odds are already the ratio of two probabilities, so the odds ratio involves four probabilities: p(event) / 1 – p(event) in the exposed, and p(event) / 1 – p(event) in the unexposed that yield a single ratio. So in this example the p of the event in the exposed is 2/5, 1 – p of the event in the exposed is 3/5. Similarly, p of the event in the unexposed is 1/5 and 1- p(event) in the unexposed is 4/5. So the odds of the event in the exposed is 2/3 and the odds of the event in the unexposed is ¼. The ratio of those two odds is 2/3 / ¼ = 2.67 3 7 10

23 Probability and odds ratios in a 2 x 2 table
Disease Yes No PR = 2/5 1/5 = 2 2 3 Yes 5 0R = 2/3 1/4 Exposure = 2.67 1 4 5 No Notice that the odds ratio is larger than the prevalence ratio. Unless both ratios = 1.0, the odds ratio will always be farther from 1.0 than the prevalence ratio (greater if > 1.0 and smaller if < 1.0) because the prevalence ratio is the ratio of two probabilities and the odds ratio is the ratio of two probabilities each of which has been divided by (1 – probability), a quantity that is always < 1 unless the probability is 1.0. 3 7 10

24 a a + b a b a a + b OR = c d c c c + d
Odds ratio of disease in exposed and unexposed Disease a Yes No a + b a b a Yes 1 - a + b OR = Exposure c d c c + d No We switch from numbers and return to our algebraic schematic of the 2x2 table to illustrate some important characteristics of the odds ratio. This just expresses probability and (1 - probability) for the exposed and for the unexposed groups using the a, b, c, and d of the schematic. c 1 - c + d Formula of p / 1-p in exposed / p / 1-p in unexposed

25 a + b b c c + d d a a b c d a + b a a + b ad bc = OR = = = c c c + d
Odds ratio of disease in exposed and unexposed a a + b b c c + d d a a b c d a + b a 1 - a + b ad bc = OR = = = c c + d c 1 - c + d Doing a little algebra to show how these four fractions reduce down to what is called the cross-product of the 2x2 table, the ratio of the products of the two diagonals. Notice that this result would not be affected by whether we arrange disease across the top of the table and exposure down the side or the other way around (as in STATA). Nor is it affected by switching the order of yes no in the columns and rows (unless they were switched only for disease or only for exposure). OR is the cross-product. However, calculating as odds of disease in exposed/ odds of disease in unexposed helps to keep track of what you are comparing.

26 Odds Ratio in Cross-Sectional Study
The study design affects not just the measure of disease occurrence but also the measure of disease association Cross-sectional design uses prevalent cases of disease, so Odds Ratio in a cross-sectional study is a Prevalence Odds Ratio Many authors do not use but we encourage Promotes clarity of thought and presentation to be as accurate as possible about measures It is useful to keep in mind always how the study design determines the interpretation of the measures of the association, not just the measure of disease occurrence. Prevalence is the measure of disease occurrence in cross-sectional study designs. The measure of association, therefore, should reflect this and labeling the odds ratio a prevalence odds ratio accomplishes that goal. This is not a convention that every author follows, as many will just call it an odds ratio without acknowledging the role measuring prevalent disease plays, but we think it aids clarity of thought and presentation to label it for what it is, a prevalence odds ratio. Later on, we will show how the odds ratios from different case-control sampling designs have different interpretations, so it is good to begin recognizing that not all odds ratios are created equal.

27 OR compared to Prevalence Ratio
If Prevalence Ratio = 1.0, OR = 1.0; otherwise OR farther from 1.0 1 Stronger effect OR Prev Ratio Stronger effect Prev Ratio OR This graphic illustrates the point that, unless both the prevalence ratio and the odds ratio are 1.0 (no difference), the odds ratio is always farther from 1.0 than the prevalence ratio, larger if the prevalence ratio is greater than 1.0 and smaller if the prevalence ratio is less than 1.0.

28 Prevalence ratio and Odds ratio
If Prevalence Ratio > 1, then OR farther from 1 than Prevalence Ratio: PR = 0.4 = 2 0.2 OR = 0.4 0.6 = = 2.7 0.8 When the ratio of two probabilities, a prevalence ratio, is > 1.0, the OR will be larger than the PR. The OR is dividing each probability by a quantity forced to be < 1.0 (unless probability = 1.0), so each probability increases and the ratio between them also increases. The only exception occurs when the prevalence ratio is exactly In that case the OR will also be This can easily be seen by modifying the example above to RR = 0.4 / 0.4 = The OR would then be 0.4/0.6 / 0.4/0.6 = 1.0.

29 Prevalence ratio and Odds ratio
If Prevalence Ratio < 1, then OR farther from 1 than PR: PR = 0.2 = 0.67 0.3 OR = 0.2 0.8 = = 0.7 The same phenomenon occurs when the ratio of two probabilities, a prevalence ratio, is less than 1.0 Since the OR is dividing each probability by a quantity forced to be < 1.0 unless probability = 1.0, each probability increases and the ratio between them also increases, which in this case moves the value farther away from Values for both prevalence ratios and odds ratios less than 1 are bounded by 0. In that respect they differ from ratios with values greater than 1, which can be infinitely large.

30 Odds ratio (STATA output)
Exposed Unexposed | Total Cases | | Noncases | | Total | | | | Risk | | Point estimate [95% Conf. Interval] Risk ratio | Odds ratio | chi2(1) = Pr>chi2 = The same STATA output as before but with the addition of the odds ratio (OR) to the “risk ratio”, really the prevalence ratio (PR) in our cross-sectional data. Notice that the OR is smaller than the PR. If the PR is less than 1.0, the OR will always be farther from 1.0 (= closer to 0). Similarly, if the PR is greater than 1.0, the OR will be farther from 1.0, i.e., larger. If the PR = 1.0, the OR also = 1.0

31 Important property of odds ratio #1
OR approximates Prevalence Ratio only if disease prevalence is low in both the exposed and the unexposed group OR is different from PR (except if they are equal to 1) but this difference is small if the prevalence of disease is low in the exposed and unexposed groups. Often described as “OR is a close approximation to the PR” if prevalence of disease is low.

32 Prevalence ratio and Odds ratio
If risk of disease is low in both exposed and unexposed, PR and OR approximately equal. Text example: prevalence of MI in high bp group is and in low bp group is 0.003: Prev Ratio = 0.018/0.003 = 6.0 OR = / = 6.09 When incidence is very low in both groups, the risk and the odds ratio are very close. This is due to the fact that the odds, the probability of the event divided by the probability of the non-event, is dividing the probability of the event by a value close to In this example the probability of the non-event in the group with the highest incidence (the exposed group) is (1 – 0.018). The probability of the event in the unexposed is even closer to 1.0: (1 – 0.003). So dividing the two probabilities by values close to 1.0 is going to have very little effect on their ratio. If either probability of the non-event had been far from 1.0, then there would have been a substantial effect on the ratio.

33 Prevalence ratio and Odds ratio
If prevalence of disease is high in either or both exposed and unexposed, Prevalence Ratio and OR differ. Example, if prevalence in exposed is 0.6 and 0.1 in unexposed: PR = 0.6/0.1 = 6.0 OR = 0.6/0.4 / 0.1/0.9 = 13.5 OR approximates Prevalence Ratio only if prevalence is low in both exposed and unexposed group It is important to understand this feature of the odds ratio because you will see it referred to as an approximation of the prevalence ratio or, in a case-control study, the risk ratio. As we have been stressing, it is no less valid as a mathematical expression than the prevalence ratio, but its less intuitive nature leads many investigators to prefer to talk about the prevalence ratio, a more intuitive concept. This results in measures which are in fact OR’s being presented in some papers as prevalence ratios. In many instances this is not appropriate even as an approximation because the prevalence is not low in both the exposed and unexposed group and the OR and the prevalence ratio are quite different numerically.

34 “Bias” in OR as estimate of PR
Text refers to “bias” in OR as estimate of Prevalence Ratio (or Risk Ratio in a cohort study) Not “bias” in usual sense because both OR and PR are mathematically valid and use the same numbers Simply that OR cannot be thought of as a surrogate (“close approximation”) for the PR unless incidence is low It is preferable to reserve the word “bias” to situations where invalid or inaccurate results are being obtained because of some factor or factors that are affecting the study, the sources of bias. Since OR and PR are mathematically valid measures using the same numbers, it is a bit misleading to talk about the bias of the OR. But the point that the OR and the prevalence ratio are not the same thing is important to note.

35 Table 2—Prevalence and odds of disability according to diabetes status (NHANES) – 60+ years old
No Diabetes Adjusted OR Fell in previous year 36.3 24.9 1.58 Prevalence 36.3/100 24.9/100 PR= 36.3/24.9= 1.46 Odds 36.3/63.7 24.9/75.1 OR= 36.3/63.7/24.9/75.1 = 1.72 Here’s an example of a report on the prevalence of falls comparing women with and without diabetes. The data are from NHANES III, the national survey that we mentioned in our first lecture on study design. The outcome is falls – although we put it on the side here. This is an example of period prevalence. The question asked at the interview was “have you fallen in the past year?” The outcome (falls) is relatively common – prevalence of 25% in one group and 36% in the other. When we calculate the OR and prevalence ratio, the PR is closer to This would be reported for the prevalence ratio as “Among older women, those with diabetes were 1.46 times more likely to have fallen in the past year compared with those without diabetes.” For the odds ratio: “older women with diabetes had 1.72 greater odds of having fallen in the past year…” Why did the authors use OR as an outcome? Probably because they wanted to use a logistic regression model to adjust for potential confounding factors. Gregg et al. Diabetes Care (2000) 23: 1272

36 Important property of odds ratio #2
Unlike Prevalence Ratio, OR is symmetrical: OR of event = 1 / OR of non-event

37 Symmetry of odds ratio versus non-symmetry of prevalence ratio
OR of non-event is 1/OR of event PR of non-event = 1/PR of event

38 Example: Prevalence ratio not symmetrcial
Cases Noncases Total Prevalence Exposed 14 17 31 0.45 Unexposed 388 248 636 0.61 667 Prevalence ratio (of event) = 0.45/0.61 = 0.74 PR of non-event = (17/31)/(248/636) = 1.41 1/PR = 1 /0.74 = 1.35 NOT EQUAL to PR of non-event Here is the data shown on the previous slide re-arranged with outcome on top and exposure on the side. We prefer this schematic because it emphasizes the cohort design: starting with exposure status and experiencing the outcome. However, the key point is to be aware of how a 2x2 table is set up.

39 Example: OR is symmetrical
Cases Noncases Total Prevalence Exposed 14 17 31 0.45 Unexposed 388 248 636 0.61 667 Odds ratio (of event) = (14/17)/(388/248)= 0.53 OR of non-event = (17/14)/(248/388) = 1.9 1/OR = 1/0.53 = 1.9 EQUAL to OR of non-event Here is the data shown on the previous slide re-arranged with outcome on top and exposure on the side. We prefer this schematic because it emphasizes the cohort design: starting with exposure status and experiencing the outcome. However, the key point is to be aware of how a 2x2 table is set up.

40 Important property of odds ratio #3
Coefficient of a predictor variable in logistic regression is the log odds of the outcome ecoefficient = OR Logistic regression is the method of multivariable analysis used most often in cross-sectional and case-control studies In logistic regression the log odds of the outcome is modeled with the linear equation y = a + b1x + b2x, etc. Log odds (called the logit, hence logistic regression) are used because they extend from minus to plus infinity and therefore do not constrain the prediction of the linear model unlike directly modeling the probability, which has a minimum of 0 and a maximum of 1, or the odds, which has a minimum of 0. Prediction in a linear model isn’t constrained to a 0 minimum or a 1 maximum, so using log odds of the outcome (y) solves this problem. So exponentiating the b from a logistic equation returns an odds for one unit of change in the variable x. In the case of a dichotomous variable x, the b is comparing the odds for the variable = 1 versus the variable = 0. In other words, exponentiating b gives the odds ratio for the two values of x. In the case of a continuous variable x, exponentiating b gives the odds ratio for a one unit difference in the value of x. While logistic regression is not a topic for this course, it will be covered in the winter Biostatistics II course. We point it out here to emphasize that the odds ratio has an important role in clinical reseach.

41 3 Useful Properties of Odds Ratios
Odds ratio of non-event is the reciprocal of the odds ratio of the event (symmetrical) Regression coefficient in logistic regression equals the log of the odds ratio Odds ratio of disease equals odds ratio of exposure Important in case-control studies (Discussed next week)

42 Measures of Association in a Cohort Study
With cross-sectional data we can calculate a ratio of the probability or of the odds of prevalent disease in two groups, but we cannot measure incidence A cohort study allows us to calculate the incidence of disease in two or more groups

43 Measuring Association in a Cohort
Following two groups by exposure status within a cohort: Equivalent to following two cohorts defined by exposure One way to conceptualize measuring incidence in a cohort in two groups defined by a dichotomous exposure variable is to think of it as following two different cohorts which had an enrollment criterion of exposure, for one cohort, and no exposure, for the other cohort. Thinking of it this way requires that exposure is defined at baseline and that it is not changing over time.

44 Analysis of Disease Incidence in a Cohort
Measure occurrence of new disease separately in a sub-cohort of exposed and a sub-cohort of unexposed individuals Compare incidence in each sub-cohort How? We have already seen that there are two ways to measure incidence in a cohort: cumulative incidence and incidence rate. So it follows that a comparison of incidence in two sub-cohorts defined by their exposure to a risk factor could be compared by comparing either cumulative incidence or incidence rates.

45 Two Measures Recall from previous lectures the 2 measures of incidence: cumulative incidence and incidence rate Corresponding measures of disease association are risk ratio for comparing cumulative incidences and rate ratio for comparing incidence rates

46 Risk Ratio and Rate Ratio
Risk is based on proportion of persons with disease = cumulative incidence Risk ratio = ratio of 2 cumulative incidence estimates = also called relative risk Rate is based on events per person-time = incidence rate Rate ratio = ratio of 2 incidence rates = also called relative rate We prefer risk ratio, rate ratio in cohort studies (just as we prefer prevalence ratio and odds ratio in cross-sectional study) Since all of the measures are ratios, either of probabilities or of odds, it is clearer and simpler to use the word ratio in describing each type. We have already discussed the concept of “risk” as reflecting the proportion of persons experiencing the event, so it follows that comparing two cumulative incidences is called a risk ratio. We have limited the use of the word “rate” to measures of incidence based on person-time rates, so a ratio of two such measures is called a rate ratio. And finally we have just seen how the odds ratio is the ratio of two odds.

47 A Note on RR or “Relative Risk”
Relative risk or RR is very common in the literature, but may represent a risk ratio, a rate ratio, a prevalence ratio, or even an odds ratio We will try to be explicit about the measure and distinguish the different types of ratios There can be substantial difference in the association of a risk factor with prevalent versus incident disease As you have just seen, even our textbook, which is generally very good on these matters, uses the word “rate” for a prevalence measure. In the general medical literature, the inconsistency is much greater and is compounded by the fact that an abbreviation for both a risk ratio and a rate ratio is RR. Often RR is used to mean relative risk, which is taken loosely to include several different ratio measures. We will define each of these measures and be explicit about their differences.

48 What is that “Relative Risk”?
Determine if prevalent or incident disease was measured If incident disease, determine if cumulative incidence (at what time?) or a person-time incidence rate was used to calculate ratio Hazard is an instantaneous person-time rate Any measure of association labeled “relative” should be a ratio, not a difference In a paper reporting a “relative risk” you need to look at the statistical methods to determine what measure of disease association is being reported.

49 Risk Ratio (No Censoring)
Diarrheal Disease (w/in 3 days) Yes No Total Ate potato salad 54 16 70 Did not eat potato salad 2 26 28 56 42 98 Probability of disease, ate salad = 54/70 = 0.77 Probability of disease, no salad = 2/28 = 0.07 Risk ratio = 0.77/0.07 = 11 Illustrates risk ratio in cohort with complete follow-up Here is an example of a risk ratio from cohort data from a cohort with equal follow-up on everyone. It is the example of the outbreak of gastrointestinal illness we looked at earlier. Because the follow-up is short (3 days) and identical for everyone, the risk ratio is just the ratio of the proportion with disease in the exposed group (those who ate the potato salad) and in the unexposed group (those who didn’t eat the salad). Eleven is a large value for a risk ratio but that might be expected in a study such as this looking for a single likely food source for the outbreak.

50 Risk Ratio in a Cohort with Censoring
As we have pointed out frequently, however, follow-up in most cohorts that are not very short term outbreak investigations, have differing amounts of follow-up time on the subjects and the risk of the event has to be estimated in the exposed and unexposed group using a method like the Kaplan-Meier or the life table. So in forming a risk ratio from a Kaplan-Meier analysis of the survival in two groups, we have to choose a point in time. As you can see from inspecting the curve, the risk ratio will be different for different points in time. If one point in time is selected, then the risk ratio becomes the ratio of the two proportions failing at that point in time. But, once again, in reporting Kaplan-Meier results you must always specify at what amount of follow-up time. This applies to the risk ratio as well. Choose a time point for comparing two cumulative incidences: At 6 years, % dead in low CD4 group = 0.70 and in high CD4 group = Risk ratio at 6 years = 0.70/0.26 = 2.69

51 Risk Ratio: 1yr= 0.95 2yr=0.86 3yr=0.80 5yr=0.78
Clinical trial originally designed for 3 years, extended by DSMB to 5 years To illustrate how risk ratios can be calculated at different points in time in a cohort study we have taken the example of this clinical trial of treatment with an ACE inhibitor to prevent cardiovascular events and death. The trial was originally planned for 3.5 years, but the DSMB recommended before that time point was reached (without knowing how the drug was performing) that the trial be extended to 5 years because it was clear that the event rate for the composite endpoint of MI, stroke, or death from CVD causes was lower than the estimate used for planning the trial. They report here only the risk ratio at about 5 years (RR=0.78), but they could have reported risk ratios for other time points. We have estimated them for 1, 2, and 3 years of follow-up. The risk ratio does not change greatly over time but, as is clear from the figure, the gap between the treatment and placebo steadily widens with increasing follow-up time. The difference reaches statistical significance at 5 yrs but would not have been significant at the earlier time points. Risk Ratio: 1yr= yr= yr=0.80 5yr=0.78

52 NB: If displayed as survival curves, take 1-survival probability to get risk: RR = 0.3/0.5=0.6

53 Comparison of two person-time rates with a ratio
Ratio of two person-time rates NB: denominators of two person-time rates must be in the same units Rate NSAID use = per 1000 person-years Rate for non use = per 1000 person-years Rate ratio = 12.02/11.86 = 1.01 Described as “rate ratio” in the article “Incidence rate ratio” (IRR) also acceptable This is the example we looked at when we were discussing forming incidence rates in a cohort with a time varying exposure, in this case use of non-steroidal anti-inflammatory medications (NSAIDS). The basic measure of association reported was the rate ratio, the event rate during the person-time of NSAID use divided by the rate during the person-time of no use. A rate ratio is simply the ratio of two rates and thus is the analogue of the risk ratio for person-time measures of incidence. It should be clear that an incidence rate should not be compared to cumulative incidence: a ratio using one of each type of incidence measure would not be interpretable. In comparing two incidence rates, the important thing to remember is that the time units of the two measures forming the ratio must be the same. In the example above, if the first rate were per 100 person-years and the second remained per 1,000 person-years, the rate ratio would jump up to 10.1, and it would appear there was a strong association when there was no association. The authors correctly reported their measure of association as a rate ratio. They could have also reported a rate difference if they had chosen. It was very small, 0.16 per 1,000 person-years or 1.6 per 10,000 person-years. Ray, Lancet, 2002

54 Proportional hazards model
Proportional hazards model compares hazards in the exposed and unexposed Result is a type of rate ratio and is often reported as a “hazard ratio”

55 Example: Mortality after pediatric kidney transplant, stratified by donor type
Survival curves Hazard functions Here is an example from a study of mortality after pediatric kidney transplant, comparing survival in children who received a kidney from a living versus a recently deceased donor. As we saw in the lecture on disease occurrence, we can generate a smoothed hazard function from a dataset with survival data (or “time to event” data). This shows the survival curves and smoothed hazard functions for the two exposure levels in this study (living versus cadaveric donor). From the graphs we can see that the survival is better in the group that received a kidney from a living donor. We have 2 broad ways to measure disease occurrence in this cohort study: cumulative incidence and incidence rate. With cumulative incidence we could calculate a risk ratio for living vs cadaveric donor. But, how would we calculate a rate ratio? We could use the average incidence rate for the time period of the study, but we can see from the smoothed hazard function that the rate changes over time. So, better to compare the hazard (an instantaneous incidence rate). This can be done with the proportional hazards model. This model is currently the “workhorse” for analysis of longitudinal studies that have individual “time to event” data. We do not go into any detail about this (or other) statistical models in this course. You will learn more about proportional hazards in your Biostatistics courses. Our goal here is for you to be aware that this model provides a type of rate ratio for analysis of disease associations in cohort studies. Vittinghoff et al. Regression Methods in Biostatistics 2005

56 From proportional hazards model: Hazard ratio = 2.06
This table gives the values of the mortality rates from the smoothed hazard functions in the previous slide. These are estimates of the underlying hazard and their ratio is an estimate of the hazard ratio. In this example, the hazard ratio does not vary over time, i.e. the hazards are proportional over time. Thus, the hazard ratio (or rate ratio) in this study can be adequately reflected by one number. When calculated using the proportional hazards model, the hazard ratio (or rate ratio) is “Proportional” hazards is an assumption of the basic proportional hazards model but, if it is violated, all is not lost. The violations can be handled using an extension of the basic model with time-dependent covariates. From proportional hazards model: Hazard ratio = 2.06

57 Rate Ratio vs. Relative Risk
Example: What was reported comparing death in two BMI groups: “the relative risk of death was 1.52” (Calle, NEJM, April 2003) What was calculated (from Methods): “Relative risks (the age-adjusted death rates in specific body mass index category divided by the corresponding rate in the reference category) were calculated.” The ratio of two person-time rates was calculated but reported as a relative risk. Although we think everyone should use the term “rate ratio” (“relative rate” is a defensible alternative since the term “relative” implies a ratio), rate ratios are probably reported as “relative risks” more often than they are called “rate ratios” or “relative rates” in the literature. In the recent NEJM article cited above, the abstract uses relative risk to describe rate ratios even thought the Methods section is very clear that they calculated and used person-time rates. They also adjusted the rates in a proportional hazards regression (remember that a hazard is an instantaneous rate) and report the ratio of two hazards as a relative risk.

58 Rate ratio vs. Risk ratio
Risk must be between 0 and 1 Thus in comparing 2 groups high risk in unexposed group limits how large ratio can be Eg, risk in unexposed group = 0.7 means maximum risk ratio = 1.0/0.7 = 1.42 Rates are not restricted between 0 and 1 If exposed rate = 10/100 person-years and unexposed rate = 5/100 person-years, risk (cumulative incidence) in 2 groups after 20 years = 0.88 and 0.64. Risk ratio would be 0.88/0.64 = 1.38 but rate ratio = 10/5 = 2.0. An important difference between cumulative incidence (=risk) and incidence rates is that the constraint placed on risk of being a proportion and therefore bounded by 0 and 1 does not apply to rates which can be greater than 1. When the ratio of two proportions is formed (risk ratio), the constraint on proportions also constrains the possible values of the ratio. As the proportion rises in the group with the smaller proportion (say, the unexposed group), a maximum risk ratio value is imposed. For example, if the cumulative incidence is 0.33 in the unexposed group, the maximum risk ratio is 3 (1.0/0.33); if it is 0.50, the maximum risk ratio is 2 (1.0/0.5); and if it is greater than 0.50, as in the example above, the risk ratio must be < The rate ratio is not limited in the same way.

59 Risk Ratio and Rate Ratio with constant incidence rate
Exp = 50 per 100 pers-yr; Unexp = 25 per 100 pers-yr To illustrate the difference between risk ratio and rate ratio, this is an example with constant incident rates in the exposed and unexposed groups, shown in the upper left graph. The upper right-hand graph shows the cumulative survival curves in the two exposure groups for these constant incidence rates. These are like the graphs that we showed you in our previous lecture on disease occurrence, showing that cumulative incidence has an exponential curve when a constant incidence rate is “applied” to a closed cohort. We can also show the cumulative failure curve (lower left-hand graph) with the proportion who have experienced the event. The last graph (lower left-hand) shows the rate ratio and risk ratio. The rate ratio, calculated from the two constant incidence rates, is a constant 2.0 (0.5/0.25) throughout the 10-year study. The risk ratio is calculated from the two cumulative incidences of the event, shown in the failure curve. The risk ratio decreases throughout the study and is less than the rate ratio.

60 Risk Ratio and Rate Ratio with lower incidence rate
Exp = 5.0 per 100 pers-yr; Unexp = 2.5 per 100 pers-yr This illustrates that the difference between the rate ratio and risk ratio is smaller for an outcome with a lower incidence. These are the same plots as the previous slide, also with constant incidence rates, but now the rates are much lower (5 per 100 person-years in the exposed compared with 50 per 100 person years in the previous slide). The rate ratio is the same: 2.0 (0.05/0.025). And the risk ratio is still less than the rate ratio throughout the follow-up, but the difference is much less.

61 Risk Ratio vs. Rate Ratio
In preceding example of risk ratio = 1.38 and rate ratio = 2.0, which would you report? Are the two ratios telling you something different? It would be perfectly legitimate to report either ratio, or even both, but your choice would depend on your research question and what you were trying to emphasize. So they are telling you something different even though they use the same data.

62 Risk Ratio vs. Rate Ratio
Use depends on data available and desired emphasis Risk ratio How long-term probability of disease differs by exposure Rate ratio Exposure as a risk factor for the disease. Preserves the relative “force” of exposure on disease outcome. More fundamental measure of disease occurrence. You might prefer the risk ratio if your emphasis is on how different the long-term probability of the disease is for an exposed and an unexposed population. If your emphasis is on whether the exposure is causal for the disease, the rate ratio would be preferable because it preserves the larger relative force on disease occurrence of being exposed compared to being unexposed. So this comparison illustrates why we refer to the rate as the more fundamental measure of disease occurrence. When we form ratio (relative) measures with it, it is not subject to the artifact of the ceiling effect that we see here with ratio measures of two proportions.

63 Preferred ratio measures of association by study design
Cross-sectional study Prevalence ratio (Prevalence) odds ratio Cohort Study Risk ratio Rate ratio Case-control study (next week) Odds ratio (only possible) The type of study limits the kind of ratio measure that can be calculated. The cohort study allows calculation of all three measures, but as we have discussed, the risk and rate ratio measures are more easily understood than the odds ratio and are therefore the measures preferred in cohort studies rather than the odds ratio. The prevalence ratio looks like a risk ratio, but one should always keep in mind that it is risk based on existing – or prevalent – disease rather than incident disease. The odds ratio from a cross-sectional study is not usually described as a prevalence odds ratio, hence the parentheses above, but of course it is also based on a prevalent sample. Prevalence odds ratio is a more precise way to characterize what is being reported. Finally, the case-control study only allows calculation of the odds ratio, a major reason for the importance of the odds ratio in clinical research.

64 Difference vs. Ratio Measures
Two basic ways to compare measures: difference: subtract one from the other ratio: form a ratio of one over the other Can take the difference of either an incidence or a prevalence measure (but rarely done with prevalence) Example using incidence: cumulative incidence 26% in exposed and 15% in unexposed, risk difference = 26% - 15% = 11% risk ratio = 0.26 / 0.15 = 1.7 Although comparing two measures of disease occurrence by a ratio or by a difference uses the same numbers, they convey different information and are used for different purposes. Risk difference is often called an “absolute measure,” as it conveys how much the one group differs from the other in the scale of 0 to 100%. A ratio measure is known as a “relative measure” since it is telling you how large the incidence is in one group relative to the other and tells you nothing about the absolute difference between them. For example, a risk ratio of 2.0 could be obtained by a cumulative incidence of 4% versus 2%, 20% versus 10%, 80% versus 40% and many other combinations.

65 Example: Long-Term Use of Statins and Risk of Colorectal Cancer (Manitoba)
Variable Person years of follow-up CRC cases Incidence Rate (per 1,000 person-years) IRR 95% CI No statin use 3,250,266 6,235 2.16 1.0 Reference Regular statin users 134,734 402 2.29 1.03 An example of incidence rates used to calculate a rate ratio in a recent study of statin use and colorectal cancer, using data for residents of Manitoba. An excerpt from the Methods section is below, describing the data sources for the study. In words, the rate ratio shows that “regular statin users have a 1.03 times higher rate of colorectal cancer compared with those without any statin use.” The rate difference shows that “the rate of colorectal cancer is higher in regular statin users by 0.13 per 1,000 person-years compared with non-users.” Data sources Manitoba has a relatively stable population of about 1.2 million inhabitants. Since 1984, every resident of Manitoba has been assigned a unique personal health identifi cation number (PHIN) by Manitoba Health and Healthy Living (MHHL), the province ’ s single health insurance provider. With the PHIN as a key identifi er, longitudinal health service use and outcomes in the province can be determined by deterministic linkage of health utilization fi les and other databases such as the Manitoba Cancer Registry (MCR). Vital status of the subjects in this study was obtained from MHHL Population Registry , a database of all permanent residents in the province. Using the MHHL Population Registry, the source population for study cohort was defi ned as residents of Manitoba between 1 April 1995 and 31 December 2005 who never had a diagnosis of CRC before 1 April 1995 and were ≥ 40 years, as of 1 April We used the electronic database of the Manitoba ’ s Drug Program Information Network ( DPIN ) to construct exposure histories to statins, nonsteroidal anti-infl ammatory drugs (NSAIDs) and hormone replacement therapy (HRT) between 1 April 1995 and 31 December 2005. Rate difference = = 0.13 per 1,000 person-yrs Singh et al, Amer Jour of Gastroenterology 2009

66 Summary of Measures of Association
Ratio Difference Cross-sectional prevalence ratio (prevalence difference) odds ratio (odds difference) Cohort risk ratio risk difference rate ratio rate difference (odds ratio) This schematic shows all the possible ratio and difference measures that can be calculated from cross-sectional and from cohort data. All of the ratios are commonly seen, but the difference measures are rare except for the risk difference and possibly the rate difference. Odds differences are almost never seen as most find them hard to interpret. (rarely used)

67 Why use difference vs. ratio?
Risk difference gives an absolute measure of the association between exposure and disease occurrence public health implication is clearer with absolute measure: how much disease might eliminating the exposure prevent? Risk ratio gives a relative measure relative measure gives better sense of strength of an association between exposure and disease for inferences about causes of disease Difference or absolute measures are important because they indicate how much disease eliminating the exposure (or providing the treatment) would prevent. Shortly, we will look at an application of this feature known as the number needed to treat. Because a ratio measure gives the incidence in one group relative to another, the magnitude of the ratio reflects the strength of the association between the exposure and the disease. The strength of association is one of the criteria considered in assessing causality in the relationship between an exposure and a disease. So ratio measures are more useful in making inferences about the causes of disease. Ratio measures are given by the some of the most commonly used multivariate analyses such as logistic regression and proportional hazards regression.

68 Relative Measures and Strength of Association with a Risk Factor
In practice many risk factors have a relative measure (prevalence, risk, rate, or odds ratio) in the range of 2 to 5 Some very strong risk factors may have a relative measure in the range of 10 or more Asbestos and lung cancer Relative measures < 2.0 may still be valid but are more likely to be the result of bias Second-hand smoke risk ratio < 1.5 These ranges for ratio measures can be used as a kind of rough rule of thumb. There is certainly nothing absolute about them, but they suggest that it is more difficult to demonstrate an etiologic role for exposures that have associated ratios less than This is one reason it took a very long time to convince the medical and public health community that there were risks associated with second-hand smoking. Even though the risk ratio is low, an exposure like second-hand smoke that is highly prevalent in the population can have very significant implications for public health. This topic, known as population attributable risk, is not covered in our lectures, but you can read about it in the Szklo and Nieto textbook.

69 Example of Absolute vs. Relative Measure of Risk
TB recurrence over 1 yr No TB recurrence over 1 yr Total Treated: > 6 mos 14 986 1000 < 3 mos 40 960 Risk ratio = 0.040/0.014 = 2.9 Risk difference = – = 2.6% So a ratio of about 3 suggests that treating more than 3 months really does make a difference, but because TB recurrence is a relatively rare event in treated patients, the absolute difference of 2.6% is not so impressive. The absolute measure is important when cost effectiveness is being evaluated. If incidence is very low, relative measure can be large but difference measure small

70 Reciprocal of Absolute Difference ( 1/difference)
Depending on scenario: Number needed to treat to prevent one case of disease Number needed to treat to harm one person Number needed to protect from exposure to prevent one case of disease TB rifampin example: 1/0.026 = 38.5, means that you have to treat 38.5 persons for 6 mos vs. 3 mos. to prevent one case of TB recurrence This is one of the most useful applications of the absolute difference. Depending on the scenario, you can calculate the number needed to treat to prevent one case of disease, number needed to treat to harm one person, or number needed to protect from an exposure to prevent one case of disease. From this relationship, one can see how the absolute difference can be used in cost effectiveness analysis. For example, in this example we might ask what are the cost implications of treating 38.5 persons an extra 3 months to prevent one case of TB. The answer will depend on the relative cost of treating the 38.5 persons with rifampin versus the cost of managing and treating one case of TB. Those costs can be compiled and compared. Of course, the two costs may not be the only the consideration. In this example TB is a contagious disease and allowing one case of recurrence by under-treating could lead to additional infections and other cases of TB.

71 Example of study reporting risk difference
Table 2. Return of spontaneous circulation according to intervention Intervention Return of Spontaneous Circulation Risk Difference (95% CI) p-value Rapid Defibrillation (N=1391) 12.9% -- Advanced Life Support (N=4247) 18.0% 5.1% ( ) <0.001 Here is an example of a study in the New England Journal of Medicine that reported risk differences rather than risk ratios. This shows one of their findings, the absolute increase in the percent of patients suffering heart failure having a return to spontaneous circulation. Because this was an attempt to evaluate a public health intervention of making advanced life support available to the emergency responders for all such cases, the absolute percent of cases, rather than a ratio measure, was of primary interest. (If a risk ratio had been supplied, it would have been around 1.5.) The absolute difference is low, but the percentages were low on both phases. Taking 1 over this difference gives a number needed to treat of 20. So every 20 patients receiving advanced life support results in one additional person returning to spontaneous circulation. Risk difference = 0.051; number needed to treat = 1/0.051 = 20 Stiel et al., NEJM, 2004

72 Summary points Cross-sectional study Cohort study:
Prevalence ratio Odds ratio Cohort study: Risk ratio Rate ratio Risk/rate difference Ratio measures of association Strength of association For etiologic research Difference measures of association Public health/clinical importance


Download ppt "Measures of Disease Association"

Similar presentations


Ads by Google