# Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.

## Presentation on theme: "Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5."— Presentation transcript:

Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5

Housekeeping Pickup your Lab 1’s… Questions about Lab 3 or Lab 4? –Boolean statements and missing values –Protect/unprotect demo No answer keys to be posted Forum – useful? Who has “viewed” a response and gotten an answer? Extra credit puzzle Final Project – by the last session you should: –Have dataset imported into Stata –Clean up the variables you will use –Sketch out (paper and pencil) a table and a figure –Be ready to write analysis do files

Today... Stata as a tool for learning concepts Interaction and confounding with 2 x 2’s Stata’s “Epitab” commands Adjusting for many things at once Logistic regression Testing for trends

Stata: A tool for understanding theory Hands on vs. theoretical teaching –Use Stata to get your hands on the data See the dataset Write the command See the output Lab 1: Exposure to basic stats Today: Exposure to basic epi concepts –Confounding and interaction

Confounding and Interaction Practical questions –What is confounding? –What does it mean to “adjust” for something? –When to adjust and what to adjust for? –When to stratify? –What do the adjusted estimates mean?

An Example Does binge drinking cause atherosclerosis? RQ: Is there an association between self- reported binge drinking and presence of coronary calcium among young adults? –CARDIA Year 15 examination

An Example Yes! –RR = 1.9 (1.6 – 2.4), p<.001

An Example Yes! –RR = 1.9 (1.6 – 2.4), p<.001 Coronary calcium is present 1.9 times more commonly in persons who report binge drinking –But does binge drinking CAUSE atherosclerosis?

An Example Possible explanations* –Chance –Bias –Effect-cause –Confounding –Cause-effect * Hulley et al. Designing Clinical Research

An Example Possible explanations –Chancevery unlikely –Biaspossible – not focus here! –Effect-causeunlikely? –ConfoundingYES! –Cause-effect?

An Example Definition of a confounder

An Example Definition of a confounder –Associated with the predictor and a cause of the outcome (and NOT a mediator)

An Example Male gender Binge drinking Coronary calcium Male ?

An Example Male gender Now what do we do?? Binge drinking Coronary calcium Male ?

2 x 2 Tables Practical tools –“Contingency tables” are a traditional analytic tool of the epidemiologist Outcome Exposure + - +-+- ab cd OR = (a/b) /(c/d) = ad/bc RR = a/(a+b) / c/(c+d)

2 x 2 Tables Example Coronary calcium Binge drinking + - +-+- 106585 1862165 OR = 2.1 (1.6 – 2.7) RR = 1.9 (1.6 – 2.4) 2922750 2351 691 3042

2 x 2 Tables Example Coronary calcium Binge drinking + - +-+- 106585 1862165 OR = 2.1 (1.6 – 2.7) RR = 1.9 (1.6 – 2.4) 2922750 2351 691 3042 Can we say that binge drinking CAUSES atherosclerosis?

2 x 2 Tables Is male gender a confounder? Binge drinking Coronary calcium Male ?

2 x 2 Tables Men more likely to binge –34% of men, 14% of women Men have more coronary calcium –15% of men, 7% of women

2 x 2 Tables Is it a mediator? –(intermediate step along the causal pathway) Binge drinking Coronary calcium Male ?

2 x 2 Tables Is it a mediator? –(intermediate step along the causal pathway) No Binge drinking Coronary calcium Male ?

2 x 2 Tables But what does confounding look like in contingency tables? And how do you adjust for it?

2 x 2 Tables But what does confounding look like in contingency tables? And how do you adjust for it? –Stratify –Examine strata-specific estimates (for interaction) –Combine estimates if appropriate (if no interaction) Weighted average of strata-specific estimates

2 x 2 Tables First, stratify… 106585 1862165 CAC Binge + - +-+- 89374 118801 CAC Binge + - +-+- 17211 681364 CAC Binge + - +-+- In menIn women RR = 1.94 (1.55-2.42) (34%)(14%) (15%)(7%) RR = 1.57 (0.94-2.62)RR = 1.50 (1.16-1.93)

2 x 2 Tables …compare strata-specific estimates… (they’re about the same) 89374 118801 CAC Binge + - +-+- 17211 681364 CAC Binge + - +-+- In menIn women (34%)(14%) (15%)(7%) RR = 1.57 (0.94-2.62)RR = 1.50 (1.16-1.93)

2 x 2 Tables …and then “combine” the estimates. 89374 118801 CAC Binge + - +-+- 17211 681364 CAC Binge + - +-+- In menIn women RR = 1.50 (1.16-1.93)RR = 1.57 (0.94-2.62) RRadj = 1.51 (1.21-1.89)

106585 1862165 Binge + - +-+- 89374 118801 CAC Binge + - +-+- 17211 681364 CAC Binge + - +-+- In menIn women (34%)(14%) (15%)(7%) RR = 1.57 (0.94-2.62)RR = 1.50 (1.16-1.93) RR = 1.94 (1.55-2.42) RRadj = 1.51 (1.21-1.89)

2 x 2 Tables How do we do this with Stata? –Tabulate – output not exactly what we want. –The “epitab” commands Stata’s answer to stratified analyses cs, cc csi, cci tabodds, mhodds

2 x 2 Tables Example – demo using Stata cs cac binge cs cac binge, by(male) cs cac modalc cs cac modalc, by(racegender) cc cac binge

2 x 2 Tables Example of a crude association (unadjusted). cs cac binge | Binge pattern [>5 drinks| | on occasion] | | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 106 186 | 292 Noncases | 585 2165 | 2750 -----------------+------------------------+------------ Total | 691 2351 | 3042 | | Risk |.1534009.0791153 |.0959895 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference |.0742856 |.0452852.103286 Risk ratio | 1.938954 | 1.551487 2.423187 Attr. frac. ex. |.484258 |.355457.5873203 Attr. frac. pop |.1757923 | +------------------------------------------------- chi2(1) = 33.96 Pr>chi2 = 0.0000

2 x 2 Tables Example of Confounding. cs cac binge, by(male) male | RR [95% Conf. Interval] M-H Weight -----------------+------------------------------------------------- 0 | 1.570175.9402789 2.622042 9.339759 1 | 1.497071 1.164201 1.925117 39.53256 -----------------+------------------------------------------------- Crude | 1.938954 1.551487 2.423187 M-H combined | 1.511042 1.205656 1.89378 ------------------------------------------------------------------- Test of homogeneity (M-H) chi2(1) = 0.027 Pr>chi2 = 0.8700

2 x 2 Tables Example of Effect Modification. cs cac modalc, by(racegender) racegender | RR [95% Conf. Interval] M-H Weight -----------------+------------------------------------------------- Black women |.75888.3595892 1.601547 8.043758 White women |.8960739.4971477 1.61511 11.07552 Black men | 1.945668 1.114927 3.3954 8.304878 White men |.9279831.66551 1.293974 29.45557 -----------------+------------------------------------------------- Crude | 1.30072 1.023022 1.653798 M-H combined | 1.046446.8225915 1.331218 ------------------------------------------------------------------- Test of homogeneity (M-H) chi2(3) = 6.245 Pr>chi2 = 0.1003

2 x 2 Tables Inmediate commands –csi, cci –No dataset required – just 2x2 cell frequencies csi a b c d csi 106 186 585 2165 (for cac binge)

Multivariable adjustment Binge drinking appears to be associated with coronary calcium –Association partially due to confounding by gender What about race? Age? SES? Smoking?

Multivariable adjustment manual stratification # 2x2 tables Crude association1 Adjust for gender2 Adjust for gender, race4 Adjust for gender, race, age68 Adjust for “” + income, education816 Adjust for “” + “” + smoking2448

Multivariable adjustment cs command cs command –Does manual stratification for you Lists results from every strata Tests for overall homogeneity Adjusted and crude results –Demo cs cac binge, by(male black age)

Multivariable adjustment cs command cs command –Does manual stratification for you Lists results from every strata Tests for overall homogeneity Adjusted and crude results –Demo cs cac binge, by(male black age) –Can’t interpret interactions!

Multivariable adjustment mhodds command mhodds allows you to look at specific interactions, adjusted for multiple covariates –Does same stratification for you –Adjusted results for each interaction variable –P-value for specific interaction (homogeneity) –Summary adjusted result Demo mhodds cac binge age, by(racegender)

Multivariable adjustment mhodds command mhodds allows you to look at specific interactions, adjusted for multiple covariates –Does same stratification for you –Adjusted results for each interaction variable –P-value for specific interaction (homogeneity) –Summary adjusted result Demo mhodds cac binge age, by(racegender) But strata get thin!

Multivariable adjustment logistic command Assumes logit model –Await biostats class for details! –Coefficients estimated, no actual stratification –Continuous variables used as they are

Multivariable adjustment logistic command Basic syntax: logistic outcomevar [predictorvar1 predictorvar2 predictorvar3…]

Multivariable adjustment logistic command If using any categorical predictors: logistic outcomevar [i.catvar var2…] Creates “dummy variables” on the fly If you forget, Stata won’t know they are categorical, and you’ll get the wrong answer!

Multivariable adjustment logistic command Demo logistic cac binge logistic cac binge male logistic cac binge male black logistic cac binge male black age logistic cac binge male black age i.smoke logistic cac binge##i.racegender age i.smoke logistic cac modalc##racegender

Multivariable adjustment logistic command Demo. logistic cac binge male black age i.smoke Logistic regression Number of obs = 3036 LR chi2(6) = 211.95 Prob > chi2 = 0.0000 Log likelihood = -852.99988 Pseudo R2 = 0.1105 ------------------------------------------------------------------------------ cac | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- binge | 1.387573.1985356 2.29 0.022 1.04825 1.836736 male | 3.253031.4608842 8.33 0.000 2.464287 4.294227 black |.7282563.0994953 -2.32 0.020.5571755.9518675 age | 1.19833.025771 8.41 0.000 1.148869 1.24992 | smoke | 1 | 1.357694.2308652 1.80 0.072.9728859 1.894707 2 | 2.120925.3302699 4.83 0.000 1.563063 2.87789 ------------------------------------------------------------------------------

logistic command interaction demo. logistic cac modalc##racegender age i.smoke Logistic regression Number of obs = 2795 LR chi2(10) = 186.28 Prob > chi2 = 0.0000 Log likelihood = -739.54359 Pseudo R2 = 0.1119 ------------------------------------------------------------------------------ cac | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.modalc |.6024889.2430813 -1.26 0.209.2732258 1.328546 | racegender | 2 | 1.018361.3137632 0.06 0.953.5567262 1.862783 3 | 1.601149.519393 1.45 0.147.8478374 3.023786 4 | 4.119486 1.100853 5.30 0.000 2.439922 6.955209 | modalc#| racegender | 1 2 | 1.422897.7314808 0.69 0.493.5195041 3.897247 1 3 | 2.867897 1.473405 2.05 0.040 1.047736 7.850102 1 4 | 1.546468.7057105 0.96 0.339.6322751 3.782472 | age | 1.184036.0271845 7.36 0.000 1.131937 1.238534 | smoke | 1 | 1.438413.2623889 1.99 0.046 1.00603 2.056629 2 | 2.464978.4157232 5.35 0.000 1.771154 3.430597 ------------------------------------------------------------------------------

Multivariable adjustment logistic command Pro’s –Provides all OR’s in the model –Accepted approach ( mhodds rarely used by statisticians) –Can deal with continuous variables (like age) –Better estimation for large models? Con’s –Interaction testing more cumbersome, less automatic –More assumptions –Harder to test for trends

Multivariable adjustment Format for linear regression, and other types of regression is the same as for logistic regression, except for the initial command: regress outcomevar [predictorvar1 predictorvar2 predictorvar3…] ologit outcomevar [predictorvar1 predictorvar2 predictorvar3…] etc

For trends in a dichotomous variable with “higher” categories of an ordinal categorical variable. tabodds cac alccat -------------------------------------------------------------------------- alccat | cases controls odds [95% Conf. Interval] ------------+------------------------------------------------------------- 0 | 110 1325 0.08302 0.06835 0.10084 <1 | 90 933 0.09646 0.07770 0.11976 1-1.9 | 46 295 0.15593 0.11429 0.21275 2+ | 45 193 0.23316 0.16856 0.32252 -------------------------------------------------------------------------- Test of homogeneity (equal odds): chi2(3) = 36.70 Pr>chi2 = 0.0000 Score test for trend of odds: chi2(1) = 32.20 Pr>chi2 = 0.0000 Testing for trends tabodds command

Adjustment for multiple variables possible. tabodds cac alccat, adjust(age male black) Mantel-Haenszel odds ratios adjusted for age, male and black --------------------------------------------------------------------------- alccat | Odds Ratio chi2 P>chi2 [95% Conf. Interval] -------------+------------------------------------------------------------- 0 | 1.000000.... <1 | 0.927877 0.22 0.6387 0.678791 1.268367 1-1.9 | 1.323939 1.98 0.1594 0.894483 1.959584 2+ | 2.051599 11.81 0.0006 1.349690 3.118536 --------------------------------------------------------------------------- Score test for trend of odds: chi2(1) = 10.99 Pr>chi2 = 0.0009 Testing for trends tabodds command

Approaching your analysis Number of potential models/analyses is daunting –Where do you start? How do you finish? My suggestion –Explore –Plan definitive analysis, make dummy tables/figures –Do analysis (do/log files), fill in tables/figures –Show to collaborators, reiterate prn –Write paper

Summary Make sure you understand confounding and interaction with 2x2 tables in Stata Epitab commands are a great way to explore your data –Emphasis on interaction Logistic regression is a more general approach, ubiquitous, but testing for interactions and trends is more difficult

In lab today… Lab 5 –Epi analysis of coronary calcium dataset –Walks you through evaluation of confounding and interaction Judgment calls – often no right answer. Focus on reasoning! Reminder – put your answers as comments in the do file * 15c – 15%, p<.001

Next week Tables and Figures