Presentation on theme: "SJS SDI_141 Design of Statistical Investigations Stephen Senn 14 Case Control Studies."— Presentation transcript:
SJS SDI_141 Design of Statistical Investigations Stephen Senn 14 Case Control Studies
SJS SDI_142 Case-Control Study Definition The observational epidemiologic study of persons with the disease (or other outcome variable) of interest and a suitable control (comparison, reference) group of person with the disease. The relationship of the attribute to the disease is examined by comparing the diseased or nondiseased group with regard to how frequently the disease is present, or if quantitative, the levels of the attribute in each group. In short the past history of exposure to a suspected risk factor is compared between cases and controls, persons who resemble the cases in such respects as age and sex but do not have the disease or condition of interest. Last, J.M. A Dictionary of Epidemiology
SJS SDI_143 Schematic Representation of Cohort Study Each point represents a member of the cohort of 10,000 persons
SJS SDI_144 200 cases and 200 controls are sampled from diseased and healthy persons respectively
SJS SDI_145 The number of cases and controls is a foregone conclusion. Exposure becomes the random variable and is studied as a function of status Note that axes have been exchanged to reflect this
SJS SDI_146 Smoking and Lung-Cancer Obs_7 Famous study of Hill and Doll Sampled 1357 cases of lung cancer from four hospitals in the United Kingdom Sampled 1357 hospital-based controls Compared the two groups as regards smoking history
SJS SDI_1411 Notes Thus the odds-ratio can be estimated even though n E, n U, and are unknown. However, although the assumption that and are equal is not needed, an assumption that they do not vary with exposure is needed.
SJS SDI_1412 Sources for Controls ( Rothman ) Population –using population register Neighbourhood –For example one or two control from neighbourhood of case Not suitable for environmental exposure Random digit dialing Hospitals or clinics
SJS SDI_1413 Cohort and Case Control Studies Cohort Case Control Complete population Can calculate incidence rates Usually expensive Convenient for studying many diseases Can be prospective or retrospective Sampled population Can calculate ratios only Usually less expensive Convenient for studying many exposures Can be prospective or retrospective Rothman p 91
SJS SDI_1416 Variance of the log-odds ratio The log-odds ratio is the difference between two logits. Since these are independent, the variance of their difference is the sum of their variances. Thus, in terms of our previous table, we have Note the implications of the variance formula. The variance cannot be reduced beyond the reciprocal of the entry in a given cell by increasing the frequencies of the other cells.
SJS SDI_1417 S-Plus Analysis Obs_7 #Doll and Hill options(contrasts=c("contr.treatment", "contr.poly")) #set contrast options #To analyse the famous case-control study Outcome<-factor(c("case","case","control","control")) Exposure<-factor(rep(c("smoker","non-smoker"),2)) Freq<-c(1350,7,1296,61) Doll.Hill<-data.frame(Outcome, Exposure, Freq) Doll.Hill OR<-Freq*Freq/(Freq*Freq) l.OR<-log(OR) var<-(1/Freq+1/Freq+1/Freq+1/Freq) SE<-sqrt(var) t<-l.OR/SE LCL<-exp(l.OR-1.96*SE) UCL<-exp(l.OR+1.96*SE) results.1<-data.frame(l.OR,var,SE,t,LCL,OR,UCL) results.1
SJS SDI_1418 #Fit results using a log-linear model fit1<-glm(Freq~Exposure*Outcome,family=poisson) summary(fit1,cor=F) #Prepare data to perform logistic regression Y<-c(Freq,Freq) N<-c(Freq+Freq,Freq+Freq) Exposure2<-factor(c("Smoker","Non-smoker")) P<-Y/N DollHill.2<-data.frame(Y,N,P,Exposure2) DollHill.2 #Logistic regression fit2<-glm(P~Exposure2,family=binomial,weight=N) summary(fit2,cor=F)
SJS SDI_1419 > Doll.Hill Outcome Exposure Freq 1 case smoker 1350 2 case non-smoker 7 3 control smoker 1296 4 control non-smoker 61 > results.1 l.OR var SE t LCL OR UCL 1 2.205786 0.1607629 0.4009525 5.501364 4.136784 9.077381 19.91857 Call: glm(formula = Freq ~ Exposure * Outcome, family = poisson) Coefficients: Value Std. Error t value (Intercept) 1.945910 0.3779645 5.148394 Exposure 5.261950 0.3789431 13.885857 Outcome 2.164964 0.3990621 5.425129 Exposure:Outcome -2.205786 0.4009525 -5.501364
SJS SDI_1420 > DollHill.2 Y N P Exposure2 1 1350 1357 0.9948416 Smoker 2 1296 1357 0.9550479 Non-smoker Call: glm(formula = P ~ Exposure2, family = binomial, weights = N) Coefficients: Value Std. Error t value (Intercept) 3.056164 0.1310154 23.326746 Exposure2 2.205786 0.4009483 5.501422 (Dispersion Parameter for Binomial family taken to be 1 )
SJS SDI_1421 Questions Why did Hill and Doll choose a case-control study rather than a cohort study? We now believe that the choice of controls used in the Hill and Doll study led to an underestimate of odds ratio for lung cancer and smoking why? Consider the recent controversy over breast implants and connective tissue disease. What difficulty does press-coverage cause for any case- control study in this field? Why do epidemiologists rarely use more than three controls per case?