Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quantifying Disease Outcomes Readings Jewell Chapters 2 & 3 Rothman and Greenland Chapter 3.

Similar presentations


Presentation on theme: "Quantifying Disease Outcomes Readings Jewell Chapters 2 & 3 Rothman and Greenland Chapter 3."— Presentation transcript:

1 Quantifying Disease Outcomes Readings Jewell Chapters 2 & 3 Rothman and Greenland Chapter 3

2 Types of disease outcomes Random variables representing disease outcomes in epidemiological studies may be Continuous – IQ in the methylmercury study, Lung Function in the “Home Allergen Study” A count – number of wheeze episodes in the past month in the Home Allergen Study Categorical – whether the patient died in hospital, was transferred to a nursing home or sent home Binary – whether or not a study subject develops cancer, whether or not the child has asthma Binary outcomes are probably the most commonly encountered in epidemiology and much of our course will focus on this case.

3 Quantifying the rates of disease More complicated than one might think! Good study design requires careful consideration. E.g., What proportion of women will be diagnosed with breast cancer in the next year? What proportion of women will be diagnosed with breast cancer in their lifetime? What proportion of women have breast cancer now? Age affects all these definitions – we will come back to this issue presently.

4 Prevalence vs Incidence (Jewell p10) Point Prevalence (prevalence) = proportion of a defined population who have the disease at a specified point in time Interval Prevalence = proportion who have the disease at any point in an interval of time Incidence Proportion = proportion of those at risk for a disease at the beginning of an interval who have it by the end of the interval. Also called Cumulative Incidence Proportion

5 Prevalence vs incidence (cont’d) Adapted from Jewell Fig 2.1. Lines represent duration of disease In a hypothetical starting population of 100 individuals, 6 have disease between times t0 and t1. Point prevalence at t is 4/100 if case 4 is still considered at risk, 4/99 otherwise. Incidence proportion in [t0,t1] is 4/98 (cases 1 and 4 were not at risk because they already had the disease)

6 Interpreting disease prevalence Caution needed - a rare chronic disease might have same prevalence as common disease that kills quickly. E.g. Prevalence can be very useful for non-fatal chronic conditions. E.g. prevalence of obesity (bmi>30%) is 22% in Australia, 34% in USA, 3% in Japan, 5% in China. Ten year Chronic Heart Disease Rates from the Framingham Heart Study (Jewell Table 2.1) IncidencePrevalence CholesterolCHDNo CHDCHDNo CHD High85 (75%)462 (47%)38 (54%)371 (52%) Low28 (25%)516 (53%)33 (46%)347 (48%)

7 Disease Rates Incidence proportions can be difficult to interpret over long intervals of time: –Risk may vary within the interval –Doesn’t reflect variation in time to disease onset –Doesn’t account for loss to follow-up A disease incidence rate in the time period from t 0 to t 1 can be defined as

8 Illustration Point prevalence at t=0 is 0/5 t=5 is 1/2 Incidence proportion from 0 to 5 is 3/5 Incidence rate in [0,5]: 3/(5+1+4+3+1)=3/14 Adapted from Jewell Fig 2.2. X denotes disease onset, O death.

9 Incidence rates and survival analysis A disease incidence rate has a close connection with hazard functions from survival analysis. Consider an instantaneous incidence proportion obtained by letting the time interval go to zero. Let I(t) be the incidence proportion defined over the time period [0,T]. Then in absence of censoring, This is the standard definition of a hazard function, with I(t) corresponding to a standard cdf.

10 Incidence rates (cont’d) Standard survival analysis tools can be used to compute hazard and incidence functions Mortality hazard for California Males in 1980 (Adapted from Jewell Fig 2.3) Incidence function estimated by 1-S(t), with S(t) estimated using Kaplan-Meier curve Nelson-Aalen estimate of hazard function Lifetable methods (very common) Exponential models

11 Constant hazards – closed population Assume time to event follows an exponential distribution with rate h. Let T i be the age the event occurs for subject i. The log-likelihood can be written as: And simple algebra shows that These calculations assume a closed population where a group of n individuals begin observation at time 0, no new individuals can enter, and individual leave the population only by experiencing the event.

12 Constant hazards – open population Open population: allows for new entry (not necessarily at age 0 - left truncation). Subjects can leave for reasons unrelated to occurrence of the event of interest (censoring). Assume exponential with rate h and let –T i be the age that subject i leaves the population –d i be a censoring indicator (1 if experience event, 0 otherwise) –E i be the age when the subject entered the population.

13 Piecewise constant hazards Let h(t) be hazard at time t and as before, let –T i be the age that subject i leaves the population –E i be the age when the subject entered the population. But assume hazards in different agegroups. Let h k be the hazard at agegroup k. Let δ ik be an indicator of whether the person experienced the event in agegroup k and r ik be time at risk in agegroup k.

14 Hypothetical Example Age at entryAge at onsetAge at death(δ i1, δ i21 )(r i1, r i21 ) 717579(0,1)(0,4) 65-72(0,0)(5,2) 6072 (0,1)(10,2) 617980 697275 626768 64-77 Data for estimating constant hazards for [60,70) and [70,80) Estimated hazard: agegroup [60,70) agegroup [70,80): Note: in practice might do more precise actuarial adjustments (e.g. half year contributions to time at risk)

15 Analysis via Poisson Regression Create a line of data for each individual in each interval where they were at risk Include agegroup as a binary covariate Include log(PYR) as an offset Poisson regression models in log-scale, so need to convert results to get estimated rates in each interval # Poisson regression for Hypothetical Example y= c(1,0,0, 0,1,0,1,0,1,1,0,0) age=c(1,0,1, 0,1,0,1,0,1,0,0,1) pyr=c(4,5,2,10,2,9,9,1,2,5,6,7) summary(glm(y~age,offset=log(pyr),family="poisson")) Estimate Std. Error z value Pr(>|z|) (Intercept) -3.583 1.000 -3.584 0.000339 age 1.712 1.118 1.531 0.125768

16 Real Example – arsenic in drinking water SW Taiwan population Agegrp PYR events 22.5 2595529 7 27.5 1846189 19 32.5 1402764 17 37.5 1215899 41 42.5 1191615 75 47.5 1111810 112 52.5 957985 160 57.5 774836 200 62.5 634758 258 67.5 492203 230 72.5 342767 190 77.5 199630 108 82.5 96293 45 High arsenic village Agegrp PYR events 22.5 1861 0 27.5 987 0 32.5 928 0 37.5 759 0 42.5 758 0 47.5 815 0 52.5 798 1 57.5 544 4 62.5 401 3 67.5 236 1 72.5 126 1 77.5 70 0 82.5 59 0 Data extracted from public population and mortality records and cancer registry, then reported in terms of person years at risk (PYR) in 5-year agegroups, as well as numbers of cancers in each agegroup. Table shows agegroup midpoints.

17 Standardized Rates Consider a comparison of female lung cancer incidence rates between US and Taiwan. This is hard because –Age distributions vary between the two countries –Incidence rates vary substantially with age We’ll discuss three different ways to address this issue. –Direct Standardization – recalculate # cases so as to calibrate to an appropriate external population –Indirect Standardization – compute ratio of observed to expected cases, with expecteds computed with appropriate adjustment for the age, gender and ethnicity mix of the population of interest. –Regression-based adjustments

18 Direct Standardization Suppose h k, k=1,…K are the age-specific incidence rates for a population of interest. The following expression represents the average incidence rate that would be seen in a standard population that had age-specific person-years-at-risk of r 1 ….r K –Can replace PYR by population size in each agegroup –Sometimes called external standardization. –Often reported per 100,000 population

19 Example- stomach cancer Denmark males 1988-92 Di=# new stomach cancer cases (1988, 1992) in ith agegroup, y_i is hundreds of thousands of person-years-at-risk in (# males in the different age groups times 5) w_i is the number of persons in the different age groups per 100,000 standard world population. Age standardized incidence rate/100,000 world population for stomach cancers among Danish males which is 9.03 Age d i y i w i w i *d i /y i 0 0 749800 120000.00 5 0 695500 100000.00 10 0 808900 90000.00 15 1 931100 90000.01 20 2 1017500 80000.02 25 6 1032700 80000.05 30 4 955800 60000.02 35 16 946500 60000.10 40 34 1025500 60000.20 45 76 926900 60000.49 50 97 718900 50000.68 55150 626800 40000.96 60187 590800 40001.27 65302 553100 30001.64 70315 449900 20001.40 75309 337200 10000.92 80247 196200 5000.63 85152 115700 5000.66

20 Back to arsenic example Average incidence rate in whole pop: Average rate in high arsenic village: Village rate standardized to whole population:

21 Role of standardized rates of disease For a stand-alone epidemiological study, standardization is not so much of an issue. The biggest decision has to do with how to model the data. For vital statistics record keeping, it is very important. The International Agency for Research on Cancer (IARC) has a great website that defines many of the relevant terms and why they are important. http://www-dep.iarc.fr/glossary.htm

22 Indirect standardization Example – Cape Cod, MA Whole State Cape Cod agegroup pop cases pop expected 5-24 819538 20 12717 0 25-34 552659 768 8881 12 35-44 465950 3619 8601 67 45-54 306719 6014 5430 106 55-64 272295 7357 5809 157 65-74 262749 9723 6189 229 75-84 173447 6919 3604 144 85+ 68434 2013 1386 41 Total of 864 cases, expected 756. So SIR = 1 00*864/756=114. Standardized Incidence Rate (SIR) is the ratio of observed new cases to the number expected if the population of interest experience disease at the same rate as a comparison population. SMR does the same calc, but for death.

23 An SIR for the arsenic example Agegroup midpoint Taiwan pyr Taiwan cancersVillage pyr Village cancersExpected 22.525955297186100.0050 27.518461891998700.0102 32.514027641792800.0112 37.512158994175900.0256 42.511916157575800.0477 47.5111181011281500.0821 52.595798516079810.1333 57.577483620054440.1404 62.563475825840130.1630 67.549220323023610.1103 72.534276719012610.0698 77.51996301087000.0379 82.596293455900.0276

24


Download ppt "Quantifying Disease Outcomes Readings Jewell Chapters 2 & 3 Rothman and Greenland Chapter 3."

Similar presentations


Ads by Google