Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stratified Covariate Balancing Using R

Similar presentations


Presentation on theme: "Stratified Covariate Balancing Using R"— Presentation transcript:

1 Stratified Covariate Balancing Using R
Farrokh Alemi, Ph.D. This presentation focusses on how to use stratified covariate balancing package within R.

2 Download Stratified Balancing
The first step is to install the package. Note Capital Letters

3 Load Package into Library
You need to load the plyr package as well.

4 Remove Impossible Values
Prepare Your Data Remove Impossible Values Before doing the analysis, make sure that your data fits the assumptions. Remove impossible values such as zero blood pressure, visits that are after patient is reported to have died, pregnant males and other anomalies in the data. Visit after Death Negative ID Zero Blood Pressure Pregnant Males

5 Predict from Other Variables
Prepare Your Data Impute Missing Values Impute missing values that are missing at random. Typically, the mode or average can be used for missing values. It is also helpful to impute missing values from the levels of other variables. Be careful. If data is not missing at random, you need to create a new dummy variable that would be 1 when the variable is missing and 0 otherwise. Predict from Other Variables Use Mode Use Average No Report No Diagnosis

6 Prepare Your Data Binary Indicators
Initial Analysis Binary Indicators The package transforms your continuous data into binary variables. It uses values above and below average to define the binary variable. This allows a coarse matching of cases and controls. If you prefer a different matching, you can revise your data prior to reading it into R. More cases match to controls Use R discretization software Above or below average Worst category vs. all others No report then no diagnosis

7 Prepare Your Data Binary Indicators
Initial Analysis Binary Indicators Keep in mind that binary data leads to coarse matching and more refined matching can occur through additional categories in the data. We find it that in most data, breaking continuous variables in quintiles helps get a more refined matching of cases and controls. More cases match to controls Use R discretization software Above or below average Worst category vs. all others No report then no diagnosis

8 Read Data Read into directory
Read your data. Here we are reading a csv file called “simulated bundled data.” Data came from

9 Look at Data Using fix(data)
Examine the data. Use the fix function in R to examine the data. The network shown here is the network that was used to simulate the data. If you know the structure of your data, examine to make sure that the data reflects it.

10 Select Right Subset of Data
Data should include, treatment, outcome, and covariates Please note that the file you read into R must have a treatment variable, an outcome variable, and one or more covariates. There should not be any other variable in the data. For example the data should not include a row number or ID.

11 Don’t Stratify Variables
on Causal Path The data file should not contain any variables that are on the causal path from treatment to outcome. So treatment complications should not be included in the data file, otherwise the package will stratify them and distort the relationship between treatment and outcome. Examine sequence of events Avoid complications of treatment Don’t stratify mediators Conduct Collider Tests

12 Balance Data > balanced=stratadisc(4,5,subset)
This command shows how the package is called and its minimum output.

13 Common Odds Ratio does not include one so it is significant
Check Signficance Common Odds Ratio does not include one so it is significant In this output the confidence interval for the common odds ratio across the strata does not include 1 so it is statistically significant at alpha levels of 5%.

14 Check Clinical Significance
In massive data, effect size should be large Keep in mind that in large almost everything is significant and you need to also make sure that the effect size is large enough to be clinically meaningful.

15 At least 60% of cases should match to controls
Check Overlap At least 60% of cases should match to controls Also check the overlap between cases and controls. If the overlap is too low then the results cannot be generalized.

16 Check the Strata # Check the strata you have created fix(balanced)
Look at the various strata that the package has created. Cases are always getting a weight of 1 and controls are weighted so that the number of weighted controls are the same as cases. Check that the weights for controls in some strata is not radically different from weights in other controls.

17 Are Covariates Balanced?
Check that the weighting procedure balances the data. Look at the odds before and after balancing. The odds of observing a covariate among treated and untreated group should be 1 to 1. Check that the weighting procedure has accomplished this and all covariates are balanced. Stratified covariate balancing is guaranteed to balanced all main effects and interactions among the covariates so you will always see that the covariates are balanced. But check and use these charts as many do not believe that the data are balanced until they see it by their own eyes.

18 Conduct Sensitivity Analysis
# Sensitivity analysis of treatment in column 4 # Outcome in column 5 # In data set called subset revised=sensdisc(4,5,subset) revised Use the function sensdisc to conduct sensitivity of conclusions

19 Most of the work in using stratified covariate balancing is in preparing the data before starting the analysis Most of the work in using stratified covariate balancing is in preparing the data before starting the analysis


Download ppt "Stratified Covariate Balancing Using R"

Similar presentations


Ads by Google