Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sec 9C – Logistic Regression and Propensity scores

Similar presentations


Presentation on theme: "Sec 9C – Logistic Regression and Propensity scores"— Presentation transcript:

1 Sec 9C – Logistic Regression and Propensity scores

2 propensity In a randomized trial, we know the probability (“propensity”) that a person with a certain set of covariates/ risk factors (age, gender …) will be in group A or B. In a randomized trial with 50:50 allocation, the probability is 50% for being in A or B for all variables. Everyone has the SAME propensity and, on average, the covariates are the same in A and B. These are linked.

3 Controlling for confounding
In comparing group A to B in a non randomized study, one may have confounding as the risk factors are not necessarily balanced between the two groups. One option to control for confounding is to include all the potential covariates in a multivariate model. If there are only a few covariates, another option is to make strata. Within a stratum, there would be no association between treatment (A or B) and covariates. For example, if gender and smoking were the only risk factors, could compare A to B in male smokers, female smokers, male non smokers and female non smokers.

4 However, if we knew the probability of each person being assigned to treatment A (= 1- prob of assignment to B), one can shows that if one stratifies or matches on this probability (this propensity), the average values of the covariates within each stratum (or each match) are (at least) roughly the same between the two treatments! That is, it is not necessary to use all the covariate variables directly to make (too many) strata or matches. Those with the same propensity have the same (or very similar) covariate values.

5 Logistic for estimating propensity
While we do not know the probability of assignment to A (and B) we can model it using logistic regression. Here the outcome is treatment group (A or B) and the potential confounders are the predictors. We can then use the logit score to “summarize” all the covariates into a single score. We can then make strata using this score or use it as a single continuous covariate. We do not have to be concerned whether this model for A or B is “correct”, or have any meaning as long as the strata made in this way produce balance.

6 Example: tooth whitening
Is a new treatment for "whiter teeth" better than the standard treatment? Sample of n=350 people. t test - comparing mean gray scale scores (high is bad) Unadjusted scores - observational study This is not a randomized trial Group n mean SD SEM STD 208 39.45 24.1 1.67 NEW 142 42.51 20.8 1.75 Mean difference 3.06 2.49 t=-1.23, p=0.219

7 Covariate comparison- not the same
STD, n=208 NEW, n=142 p value mean SD sem  age 22.36 6.47 0.45 24.4 6.33 0.53 0.004 Sugar use 6.10 3.08 0.21 5.84 3.06 0.26 0.435 PCT SE Male 28.4% 3.1% 47.2% 4.2% 0.0003 Floss 28.9% 35.9% 4.0% 0.1629 Yearly clean 31.7% 3.2% 32.4% 3.9% 0.8960 drink coffee 42.3% 3.4% 74.7% 3.7% <0.0001 drink tea 30.8% 62.7% 4.1% use mouthwash 22.1% 2.9% 25.4% 0.4827

8 Logistic model for “new tx”-propensity
variable Log OR SE p value Intercept -1.798 0.5417 0.0009 Age 0.0214 0.0196 0.2744 Male 0.3898 0.2559 0.1277 Floss 0.3280 0.2601 0.2073 Yearly clean 0.2556 0.8319 Sugar use 0.0393 0.3078 Coffee 0.9042 0.2767 0.0011 Tea 0.8681 0.2570 0.0007 mouthwash 0.2844 0.7228 Score = Age Male Floss – Y clean – Sugar coffee tea – mouthwash Propensity (“new tx”) = exp(score) / [1 + exp(score)]

9 Covariate compare by propensity strata
mean age tx stratum 1 stratum 2 stratum 3 stratum 4 STD 18.0 24.8 25.5 25.6 NEW 25.2 23.5 23.7 25.8 p value 0.0668 0.2648 0.1696 0.8743 mean sugar use 6.55 5.63 6.05 5.76 7.62 6.66 5.55 5.33 0.4616 0.1587 0.3865 0.5455 pct male 3.6% 24.5% 44.7% 71.1% 0.0% 30.8% 46.0% 65.3% 0.078 0.514 0.906 0.566 pct who floss 20.5% 34.7% 26.3% 42.1% 25.0% 23.1% 30.0% 53.1% 0.838 0.225 0.702 0.307

10 Covariate compare by propensity strata
pct yearly tooth clean tx stratum 1 stratum 2 stratum 3 stratum 4 STD 26.5% 40.8% 34.2% 28.9% NEW 75.0% 25.6% 32.0% 34.7% p value 0.070 0.126 0.827 0.566 pct drink coffee 0.0% 86.8% 100.0% 46.2% 78.0% 1.000 0.274 0.271 pct drink tea 8.2% 57.9% 60.0% 0.040 0.842 pct use mouthwash 19.3% 14.3% 31.6% 50.0% 16.0% 32.7% 0.226 0.186 0.150 0.915

11 Gray scale means by propensity strata (quartiles)
STD NEW n mean p value score stratum difference 1 83 21.3 4 27.5 87 6.2 0.5304 0-.2 2 49 43.9 39 36.9 88 -7.0 0.0915 3 38 53.9 50 40.6 -13.3 0.0014 58.9 50.2 -8.7 0.0358 0.6+ total n 208 142 350 adjusted mean 44.5 38.8 -5.7 0.06 unadjusted mean 39.4 42.5 3.1 0.21 adj mean 52.2 -9.7 stratum 2,3,4

12 Propensity score as continuous covariate Regression on gray scale
variable Regression coefficient SE p value Intercept 52.57 1.69 < New tx -9.77 2.31 score 17.56 1.43 New tx * score -7.94 2.76 0.0042 R square = 0.328, SDe = 18.8 Q- If the propensity score is a good proxy for the 8 covariates, what should happen if any or all of the 8 covariates are added to the above model?

13 Propensity score as continuous covariate
As the propensity to choose the NEW treatment increases, the mean difference between the two treatments increases.

14 Advantages of propensity score
1. Reduces all the covariates to one dimension 2. Easy to check if the two groups being compared overlap on the score (ie on the covariates) 3. Does not extrapolate beyond the range of the data (unlike linear regression) 4. Robust – Does not matter if model for propensity score is incorrectly specified as long as covariates are the same in the strata or matches made by the score Disadvantages Can only have two groups (can be modified) Don’t directly assess effects of covariates on outcome

15 Can check propensity score overlap between the two groups
Lack of overlap indicates that some subjects have covariate values on one group that are completely absent in the other group.


Download ppt "Sec 9C – Logistic Regression and Propensity scores"

Similar presentations


Ads by Google