Presentation on theme: "LECTURE 3 – June 9 2006 Cohort Studies, Selection Bias Survival analysis Dr. Dick Menzies."— Presentation transcript:
LECTURE 3 – June 9 2006 Cohort Studies, Selection Bias Survival analysis Dr. Dick Menzies
Cohort Studies – General Prospective study: Incidence of new disease in persons who start without disease. –Follow-up period – weeks, months, years –One or more diseases can be measured Measure exposures – at start or ongoing. –Can measure multiple exposures Compare incidence in exposed vs unexposed groups within population – per unit of time
Advantages of cohort over case-control or cross-sectional designs KEY – exposure measurement is made before disease occurs –Exposure more accurate – prospective, and often repeated –Eliminates bias in measurement of exposures: Recall bias of patients, or observer bias in exposure assessment - with knowledge of disease status.
Experimental vs cohort studies Expt studies are a form of cohort study –Same - Persons are free of disease at outset –But - Exposure is RANDOMLY ASSIGNED to some/not others –Same - Measure outcomes after exposure Cohort study – exposures NOT assigned, but occur naturally, or are chosen purposely by subjects, or by their MD’s, etc
Advantages of cohort studies over experimental Ideal to study natural history, course of disease, prognostic factors. Etiologic research for exposures that can not be given experimentally, for ethical reasons –Smoking, asbestos, air pollution Interventions not feasible for randomization –Diagnostic tests, complex care management Some outcomes not well measured in trials: –Compliance by patients and MD’s,
Advantages of cohort studies over experimental Total population studied. –Children, elderly, pregnancy, mentally incompetent, Full spectrum of illness –From patients in ICU to minimal forms of disease –Often excluded in RCT – esp Pharma trials Findings more likely to be applicable in real world –Adverse events often more accurately measured Population based estimates of exposure effects BUT you MUST include the full spectrum of patients as possible (No exclusions in observational studies)
Disadvantages Selection bias – Persons who get exposed not same as unexposed –Surgery – who is ‘operable’ vs ‘inoperable’ Exposures that seem same, are not –Potential bias in measuring Drop-outs – reduce power, may bias (a lot) Outcome assessment can be biased
Cohort Designs Prospective: Subjects without disease followed to determine incidence of diseases –Exposures measured at baseline, and/or concurrently. –Disease – measured during follow-up Retrospective: Subjects first identified based on past Exposures (Hiroshima survivors, work-force) –Outcomes may then be ascertained directly, or also have already occurred –Key – exposure well defined, AND occurred well before disease (useful for diseases like cancer)
Cohort Populations General populations – no special exposures –Framingham study – a true general population All persons in the community invited Proxy general Pop’n - Nurses, Military, Company –Exposures studied are those of general pop’n. –Diet, exercise, smoking, alcohol Exposure defined cohort –Work-force to study occupational exposures –Group of patients who received certain therapy
Cohorts of patients Clinical cohorts – patients with a given condition –Case series can be form of cohort study –But – must have differences in ‘exposure’ Different types, severity, causes Potential problems in cohort studies with patients: –Referral bias – only sickest, rarest, –Lead-time bias – better facilities = earlier Dx –Multi-serial cohorts – Cohort starts with all diabetics in 2004 New, and old = very different patients
Open versus Closed Cohorts An open cohort – or dynamic cohort - is one where people can enter or leave –Examples: A workforce study that is ongoing –A city or other geographic location A closed cohort is where all persons in the cohort are defined at entry. No one enters, members can only exit. –Eg. McGill medical school class of 2004
Selection Bias Definition – selection bias occurs when there is a distortion in the estimate of effect (association) because the study or sample population is not truly representative of the underlying population in terms of the distribution of exposures and/or outcomes. Other terms: referral bias, volunteer bias, healthy worker effect, susceptibility bias, drop-out bias How/where in a study can this occur?
Figure 15-2. Diagram showing successive transfers from the intended population to the group admitted to a study of therapy GROUPSLOSSES REASONS FOR LOSSES INTENDED POPULATION AVAILABLE GROUP CANDIDATE GROUP ELIGIBLE GROUP QUALIFIED GROUP ADMITTED GROUP NOT AVAILABLE NOT CANDIDATES NOT ELIGIBLE EXCLUDED NON- RECEPTIVE Treated at other hospitals or by other doctors Not identified or accessible Did not fulfill diagnostic criteria Superimposed condition of severity, co-morbidity, co- medication, or non-compliance Refused participation or acceptance of assigned maneuver
Obtaining a representative sample In a representative sample we hope for a sample that shows us the true underlying distribution of exposure and disease: Truth – distribution of exposure and disease in source population ExposedNot Exposed DiseasedAB Not DiseasedCD Odds Ratio = (A/B) / (C/D) = A x D B x C
Un-biased Sampling ExposedNot Exposed DiseasedP1AP1AP2BP2B Not DiseasedP3CP3CP4DP4D Odds Ratio = (P 1 x P 4 ) x (A x D) (P 2 x P 3 ) (B x C) IF (P 1 x P 4 ) THEN OR = (A x D) (P 2 x P 3 ) (B x C) x 1= Truth!
Biased Sampling ExposedNot Exposed DiseasedP1AP1AP2BP2B Not DiseasedP3CP3CP4DP4D If sample all of A (P 1 =1) but only half of B (P 2 =0.5) And 1/3 of C and D (P 3 =0.33, P 4 =0.33) Odds Ratio = (P 1 x P 4 ) = (1x.33) = 2 x (A x D) (P 2 x P 3 ) = (.5X.33) (B x C) IF (P 1 x P 4 )=2 THEN OR estimated = 2X OR True (P 2 x P 3 )
Example – Biased sampling We are planning a case control study of spicy foods and peptic ulcer disease –Cases = endoscopy proven peptic ulcer disease –Controls = elective inguinal hernia repair at the same hospital The truth: no relationship i.e. the odds ratio = 1 The problem – physician at this hospital strongly believe spicy foods is an important risk factor for peptic ulcer disease. –Therefore they tend to refer patients for endoscopy more often if they had a diet of spicy foods
Example: biased sampling So, 100% of patients with peptic ulcer disease AND history of spicy foods have endoscopy But only half of those with peptic ulcer, but WITHOUT history of spicy food are in fact diagnosed – (they do not have endoscopy, so they are missed) Estimated association will be twice what is correct.
To achieve Un-biased Sampling To achieve un-biased sampling the easiest is: P 1 = P 2 =P 3 =P 4 This means the proportion sampled from each group is the same, i.e., 10% are sampled from each of the groups However if P 1 is higher than P 2 this can be okay as long as P 4 is also increased more than P 3
Volunteer Bias Participants in a study are different from refuseniks –Mortality of non-participants in the Framingham study Subjects with exposure and the outcome are more (or less) likely to participate –Eg HIV infection and homosexuality – in Africa –Disease and occupational exposures, particularly for self-reported exposures, and compensable illnesses.
Susceptibility bias Persons allocated to one form of treatment, or who who self-select to certain exposures are more, or less susceptible to develop health outcomes of interest. –Eg Cancer patients who have surgery vs medical or radiotherapy only. Surgical patients often appear to do better.
Healthy worker effect An important bias – found in work-force studies –Reflects medical screening (military, mining) –Or, physical requirements of job Results in better health status initially than general population, or certain control pop’n –Strongly affects results in cross-sectional studies –Reduces risk or delays occurrence of health outcomes of interest. Also occurs in smokers “healthy smoker effect” –Lung function in adolescent smokers > non-smokers
Example of healthy smoker effect
Selection Bias in Cohort Studies – Dropouts Losses to follow up occur in all cohort studies Reduce power, and dilute results Problematic if more drop-outs in one exposure group REALLY important if drop-out is due to development of disease
Selection Bias in Cohort Studies – Dropouts Example: –study of incidence of diabetes in obese persons. –Truth: IRR = 3.0 –Losses – 33% in diabetes/obesity group (death/other) 5% losses in all other groups –(P 1 x P 4 ) does not = 1 (P 2 x P 3 )
Selection Bias from Dropouts - Example At onsetDropped No DM Out Diabetes Detected at end with diabetes Obese22710918 Not Obese77335330 Incidence (biased): In obese – 18/208 = 8.7% In non-obese – 30/735 = 4.1% Biased incidence rate ratio – 8.7%/4.1% = 2.1
Drop-outs from a work-force - impact An occupational exposure causes health effects quickly in a susceptible sub-group. –They leave the work-force (quit) quickly. –Examples: Allergy to lab animals in researchers Asthma in Grain workers Cross-sectional studies – no susceptibles left Cohorts – Can miss when setting up cohort. –Outcomes occur in small number of new workers (power problem)
Controlling Selection Bias Control in design - Most important is prevention –Recruitment – high % in all groups –Same %recruitment in exposed/not exposed –Close follow-up to prevent dropouts Assess in analysis –Compare participants to non-participants Sub-groups of non-participant –Compare dropouts with those who remained –Sensitivity analysis – best case/ worst case to assess impact of selection biases
Cohort Studies – Exposure Assessments Prospective - Measure one or more exposures at start –Specific: cholesterol, obesity, smoking, blood pressure. –Proxies: occupation, housing –Measure once, or repeatedly to account for changes in exposure over time (obesity, smoking, BP). Retrospective –Exposure based upon past events –These are rarely quantified Proxies used (job description, distance from blast) Sometimes records (transfusions, dust levels)
Pitfalls in exposure assessments Observer bias – disease ascertained at same time –Blind observers to study hypothesis –Standardized protocols Are all exposures the same? –Complications of pleural tap at MGH/RVH >> MCI Did you forget something? –Hard to go back to the start of cohort –Measure everything, freeze the rest –Add measures as new things reported
Cohort Studies – Outcome Assessments Baseline – ensure cohort members free of disease. –Easy if prospective, harder if retrospective Outcomes measured periodically –Through questionnaire, exam, labs (direct) –Through health service utilization (databases) –Through vital statistics (databases) Case definition key for outcome assessments –Diagnosis of milder disease common problem
Pitfalls in outcome assessments Ascertainment bias – if patients with Factor X are more likely to have testing to detect outcome. –Standardized protocols, blinding to exposures Observer bias – patients with Factor X more likely to be diagnosed with outcome of interest –Common with more subjective tests – eg CXR –Solution – independent reviewers, blinded to exposure status (Factor X) Lead time bias – earlier diagnosis makes survival look better
Lead-time bias - example
Cohort Studies – Measures of Incidence Incidence rate (simplest) = number developing disease Total number who entered cohort per unit of time Cumulative incidence = number developing disease Total number who entered cohort Over total follow-up period
Measuring Incidence in Cohort Studies How to handle drop outs etc..? Drop-outs from loss to follow-up, death other causes, or withdraw consent are common –Up to 50% in long term cohorts Include or exclude from analysis? Simple incidence measures - excludes Need to allow variable length of follow up –And count people who enter after the first year
Incidence Density (ID) PatientExposedEnter in year Stop in Year Years of FU Disease occurrence 1YES132NO 2YES31210YES 3NO188 4 21110YES Counts person-time (person-years/months) Starts count when person enters cohort Each year of follow-up added up ID in Exposed = 1 event in 12 person years ID in Unexposed = 1 event in 18 person years
Cohort studies – Measure of Association: Risk Ratios, or Incidence rate ratios Summary measure of association in Cohort Studies Formula for Incidence rate ratio (IRR) = Incidence of disease in persons with exposure Incidence of disease in persons without exposure Ndisease/Nexposed per unit time Ndisease/Nunexposed per unit time * Note – in IRR there is no unit of time. This assumes the amount of time was similar for those with and without disease and those exposed and unexposed or
Calculation of Risk Ratio - example Cohort at inception: 1,000 people without diabetes –Prevalence of obesity at inception = 22.7% Outcome: Incidence of diabetes in a population Exposure - obesity at inception of cohort Follow-up - six years Overall incidence of diabetes = 1% per year –Cumulative Incidence = 6% –Risk = cumulative incidence
Risk Ratio Calculation - Example Number with exposure Developed Diabetes Cumulative Incidence rate Obese2272727/227 Non Obese7733333/773 Total1,00060 Ratio of Incidence = risk ratio = 27/227 / 33/773 = 12 / 4 = 3.0
Incidence Density Ratio PatientExposedFollow up YearsDisease 1YES2NO 2YES10YES 3NO8 4 10YES Incidence rate ratio = (1/2) / (1/2) = 1 Density method = (0/2 years) + (1/10 years) (0/8 years) + (1/10 years) Incidence density ratio = (1/12) (1/18) = 1.5
Incidence Rate Difference A patient asks “How much will my risk of heart attack go down if I take this new drug (B), instead of old one (A)?” Answer using incidence rate difference Incidence with Drug A - Incidence with Drug B = 0.5%/year – 0.3%/year = 0.2%/year, or, a 40% reduction Same answer using Incidence rate ratio: = Incidence with Drug B = 0.3% = 0.6, or, a 40% reduction Incidence with Drug A 0.5%
Attributable risk “How many lung cancers are due to air pollution in Montreal?” Same as “What is attributable risk?” Attributable risk = IRR x Prevalence of exposure –Increases with higher IRR –Or if exposure more common Diabetes vs Silicosis and TB –Diabetes: IRR = 3.5, Prevalence = 3% –Silicosis: IRR = 12, Prevalence = 0.1% –Attrib risk for Diabetes >> than for Silicosis
Cohort Studies – Survival Analysis Analysis of time to event Accounts for variable length of follow up. Advantage if time to event affected by exposure. Can find important differences in treatments even overall survival same: –Cancer treatment A increases survival at two years –But five year mortality is same as treatment B. –Treatment A - preferred by most patients!
Important differences found using Survival analysis
Types of Survival Analysis Simplest – Direct Kaplan-Meier – still pretty simple. Calculates cumulative proportion free of outcome (survived) at each point in time when that outcome occurs. People who drop out or die of other causes are ‘censored’. At each point numerator is all who have developed disease, while denominator is all without outcome in the interval just before Cox regression analysis – multivariate analysis with same basic principles
Kaplan Meier survival analysis - example Time Number at start During interval Surviving at end Proportion surviving Drop-outsDeathsIntervalCumulative 010000 1.0 3 months100100901.0 6 months9010 700.88 10 months70010600.860.75 12 months6010 400.80.6 18 months40100301.00.6 Notes: Intervals are variable – defined by when subjects die Proportion surviving interval – excludes drop-outs during the interval (censored)
Kaplan Meier survival analysis - example
Example of Kaplan-Meier analysis: General Hospital Ventilation and time to TST conversion
Selection Bias – Berkson’s This is described in case control studies in hospitalized patients First described on mathmatical basis. –Probability Hospitalization if Factor Z = 0.1 Probability Hospitalization if Factor Y = 0.05 Probability Hospitalization if both = higher –These two independent conditions will appear to be associated – but may not be. In practice it is common that patients with 2 or more conditions ARE more likely hospitalized (eg CHF and pneumonia) so in hospital based Case-control study they appear to be strongly associated. Fundamental problem is the same. P 1 does not equal P 2 does not equal P 3 does not equal P 4