Download presentation
Presentation is loading. Please wait.
1
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500 grams) given the collection of variables available Do not particularly care which variables are included Want to maximize our prediction of the outcome Need to validate our prediction on data not used to generate the model
2
BIOST 536 Lecture 9 2
3
3 Outcome variable Look at the distribution of birthweights Were low birthweight babies oversampled or was there bias in recording?
4
BIOST 536 Lecture 9 4 Simple descriptives Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
5
BIOST 536 Lecture 9 5 Number of first trimester physician visits may also have to be grouped for analysis (ptvgrp (0, 1, 2+))
6
BIOST 536 Lecture 9 6 Relationship of LBW with continuous variables age of mother and weight of mother Possible relationship of LBW with either age or weight univariately Need to consider setting aside an internal validation sample Not much data so use 75% for training and 25% for validation Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
7
BIOST 536 Lecture 9 7 Generate training and validation samples Generate a random number uniform on the interval (0,1) Assign an observation to the training sample if U < 0.75 and validation sample if U ≥ 0.75 Will not guarantee that there are exactly 75% of the observations or cases in the training sample If you want exactly 75% of the observations then sort by the random number and assign the first.75*n to the training sample Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
8
BIOST 536 Lecture 9 8 This has achieved greater balance in the observations, but still there are not 75% of the cases in the training set Just use the original training classification in the analysis below Still need to consider how to model the continuous variables age and weight without being too complex Could just use linear age and weight terms Could categorize into age groups and weight groups Could use a simple polynomial (e.g. age and age squared) Could use a smoother or a spline Could use fractional polynomials Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
9
BIOST 536 Lecture 9 9 Fractional polynomials First consider a 2 degree polynomial model for age Two degree model not significantly better than age as a linear term so just use age as a linear variable Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
10
BIOST 536 Lecture 9 10 Fractional polynomials Plot of 2 degree model for age Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
11
BIOST 536 Lecture 9 11 Fractional polynomials Now consider a 2 degree polynomial model for weight Two degree model not significantly better than weight as a linear term so just use weight as a linear variable Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
12
BIOST 536 Lecture 9 12 Fractional polynomials Plot of 2 degree model for weight Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
13
BIOST 536 Lecture 9 13 Model exploration (stepwise) Try to screen for possible predictors (forward stepwise) Ptl is included as a linear term – may want to dichotomize Other race may be excluded due to sample size rather than magnitude Create indicator covariates Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
14
BIOST 536 Lecture 9 14 Model exploration (stepwise) Refit model Now other race is significant, but Smoke also is added to the model Backwards stepwise gives the same result (not shown) Still may want to dichotomize ptl and put that in the model Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
15
BIOST 536 Lecture 9 15 Model exploration Refit model replacing ptl with everptl Same number of parameters – 2 nd model is “better” and simpler Look at some goodness-of-fit tests Test is not reliable since # obs is too close to the number of covariate combinations Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
16
BIOST 536 Lecture 9 16 Model diagnostics Use Hosmer-Lemeshow goodness-of-fit test and calculate c-statistic (lroc) Model predicts pretty well for the training sample – also need to consider the validation sample Compute estimated probabilities and logits for both samples and compute predictive power Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
17
BIOST 536 Lecture 9 17 Model diagnostics Estimation in validation sample is still good, but certainly inferior to the training sample Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
18
BIOST 536 Lecture 9 18 Model prediction Look at classification statistics in the training and validation samples Are there any risk factors so high that LBW babies are probable? Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
19
BIOST 536 Lecture 9 19 Model prediction Consider smoking as a risk factor 35% of smokers are predicted as LBW (but they have many other diverse risk factors) Get estimated probability of LBW by weight & smoking, but no other elevated risk factors Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
20
BIOST 536 Lecture 9 20 Model prediction Higher risk for smokers with low weight, but would still need to have another risk factor to have a low birthweight baby with probability greater than 50% Consider hypertension as well Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
21
BIOST 536 Lecture 9 21 Model prediction Would have to have several risk factors to be at high risk We did not consider interactions, but they may help only slightly in improving overall model prediction Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
22
BIOST 536 Lecture 9 22 Model prediction Do the same factors also estimate actual birth weight in grams in a linear regression model? Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
23
BIOST 536 Lecture 9 23 Association example Same dataset Now interested in whether smoking is related to low birthweight Assume this has not been tested before, but some animal models suggest a causal relationship May want to control for factors believed to be related to LBW and/or smoking Other variables are not of particular interest, but smoking may be a modifiable risk factor This specific hypothesis is proposed prior to data collection so use all the available data Unadjusted odds ratio for smoking suggests a possible association
24
BIOST 536 Lecture 9 24 Potential confounders or predictors of the outcome May want to consider other variables that are potential confounders for the relationship of smoking with LBW May also want to consider other variables that predict the outcome even if they are not confounders Precision for smoking variable may be improved What is the relationship of smoking to some of the other variables in the data? Whites are much heavier smokers so race could be a potential confounder Consider some of the other covariates Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
25
BIOST 536 Lecture 9 25 Potential confounders or predictors of the outcome Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
26
BIOST 536 Lecture 9 26 Potential confounders or predictors of the outcome Explore continuous covariates as well Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
27
BIOST 536 Lecture 9 27 Control for age and race Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
28
BIOST 536 Lecture 9 28 Control for other factors Age is neither a predictor or a confounder, but leave in the model anyway Race is both a predictor and a confounder of the association of smoking and low birthweight Now consider some other potential confounders/predictors Weight may also be a confounder and a predictor Add hypertension and uterine irritability to the model Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
29
BIOST 536 Lecture 9 29 Control for other factors Predictors, but not confounders Only other potentially modifiable risk factor and/or confounder might be number of first trimester prenatal visits Conduct an unplanned exploratory analysis of this variable and outcome Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
30
BIOST 536 Lecture 9 30 Control for other factors Not a strong predictor as a grouped linear variable Consider as a categorical variable Runs into numerical issues due to sparseness of the data Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
31
BIOST 536 Lecture 9 31 Control for other factors Collapse into three levels Not a strong predictor – return to model for smoking, but add this as a potential confounder Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
32
BIOST 536 Lecture 9 32 Control for other factors Small change in the OR for smoking – leave in anyway Model shows no obvious lack-of-fit Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
33
BIOST 536 Lecture 9 33 Unadjusted and Adjusted Odds Ratios Suggests an association of smoking and low birthweight that remains after adjustment for age, race, weight of the mother, hypertension, uterine irritability, and number of physician visits LR test for smoking in the final model History of premature labor was accidentally omitted from this analysis but should have been included Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl) OR for smoking 95% CIWald p-value Unadjusted2.02(1.08, 3.78).028 Adjusted for age and race3.01(1.45, 6.23).003 Adjusted for age and race & other factors 2.71(1.23, 5.99).014
34
BIOST 536 Lecture 9 34 Unadjusted and Adjusted Odds Ratios Suggests an association of smoking and low birthweight that remains after adjustment for age, race, weight of the mother, hypertension, uterine irritability, and number of physician visits LR test for smoking in the final model History of premature labor was accidentally omitted from this analysis but should have been included Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl) OR for smoking 95% CIWald p-value Unadjusted2.02(1.08, 3.78).028 Adjusted for age and race3.01(1.45, 6.23).003 Adjusted for age and race & other factors 2.71(1.23, 5.99).014
35
BIOST 536 Lecture 9 35 Effect Modification Significant effect modification would render the entire previous analysis null and void Some analysts prefer to start with interactions to rule out effect modification before looking at confounding Add interactions between smoking and each covariate and test in a LR test If significant, then the interpretation of the association of smoking and low birthweight depends on that effect modifier Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
36
BIOST 536 Lecture 9 36 Effect Modification with Age? No apparent effect modification by age Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
37
BIOST 536 Lecture 9 37 Effect Modification with Race? No apparent effect modification by race No significant effect modification by any variable Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
38
BIOST 536 Lecture 9 38 Final model OR for a smoker with hypertension compared to a non-smoker without hypertension all else being equal OR = 2.71*6.68 = 18.11 OR for a smoker aged 30 compared to a nonsmoker aged 20 all else equal Number of prior pre-term labors may be too thin to model – consider dichotomizing none versus 1 or more (everptl)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.