Presentation on theme: "Fall 2013Biostat 5110 (Biostatistics 511) Discussion Section Week 4 Sandrine Moutou Medical Biometry I."— Presentation transcript:
Fall 2013Biostat 5110 (Biostatistics 511) Discussion Section Week 4 Sandrine Moutou Medical Biometry I
Fall 2013Biostat 5111 Discussion Outline Perform a more in-depth analysis of a dataset using STATA FEV1 Clinical Trial data revisited (we saw them in week 2): Reading Data into STATA (Excel worksheet, text file) Creating variable labels and value labels Analyze data using often used statistical commands for Statistical summaries and graphical summaries Flagging outlying or possibly influential values Creating and executing STATA batch (.do) files KEY: Interpreting the results of the study objectives
Fall 2013Biostat 5112 Background Cystic fibrosis (CF) affects 30,000 individuals in the U.S. The condition is complicated by recurrent pulmonary infection. A study was conducted to determine if the aerosolized antibiotic tobramyacin was effective in treating a recurrent bacterial infection in CF patients. 520 CF patients from 10 to 60 years of age were randomized to receive tobramycin or placebo in a double-blind controlled trial. The primary endpoint was the pulmonary function test forced expiratory volume in one second (FEV1 ). Measurements were collected at baseline and again at the end of the 24-week study. Three variables in FEV1 dataset: Baseline FEV1 measurement Y0, Follow-up FEV1 measurement Y1 and Treatment assignment T.
Fall 2013Biostat 5113 Preliminaries… Start.log file log using “D:/smoutou/My Documents/week4disc.log” Replace “smoutou” above with your username Load the FEV1 data: From the text file FEV1ClinTrial.dat infile Y1 Y0 T using “http://courses.washington.edu/b511/Data/FEV1ClinTrial.dat” Also try importing an Excel dataset (with copy and paste) Compute the simple difference between 24 week FEV1 and baseline FEV1 generate diff = Y1 - Y0 Set up your Stata session
Fall 2013Biostat 5114 Labels and variable names We can attached variable labels to the variables as follows: label variable Y1 "FEV1 at week 24" label variable Y0 "FEV1 at baseline" label variable T "Treatment assignment" label variable diff "24 WK FEV1 - BL FEV1” Value Labels (for levels of a variable) label define trtlabel 0 "Placebo" 1 "Treatment" label values T trtlabel Data Analysis
Fall 2013Biostat 5115 Univariate summaries First characterize the distributions of: Univariate summaries: Y0, Y1, T and diff. For continuous variables, provide measures of central tendency (i.e., mean, median) and variation (e.g., SD, range, IQR). Look for outliers and possibly anomalies in the data. tabstat Y0 Y1 T diff, stats(n mean sd p50 min max) col(stat) tabulate T Data Analysis
Fall 2013Biostat 5116 More univariate summaries Graphical summaries graph box Y0 hist Y0, kdens graph box Y1 hist Y1, kdens graph box diff hist diff, kdens Comments? (Skewness/symmetry, similar to the normal density, etc.) Data Analysis
Fall 2013Biostat 5117 Questions: Do the variables appear to be normally distributed? hist Y0, norm hist Y1, norm hist diff, norm Also try: hist Y0, kdens norm Are there baseline FEV1 differences between the two treatment groups? Why would this matter? (Hint: randomization) tabstat Y0, stats(count mean p50 sd iqr) by(T) Data Analysis
Fall 2013Biostat 5118 Questions: What happens if you stratify by treatment group? Is the variation equal in the baseline measurements? Is it equal in the follow-up measurements of FEV1? graph box Y0, by(T) hist Y0, kdens by(T) graph box Y1, by(T) hist Y1, kdens by(T) graph box diff, by(T) hist diff, kdens by(T) Data Analysis
Fall 2013Biostat 5119 Bivariate summaries Examine the association between baseline FEV1 and 24 week FEV1. Do they appear to be associated? corr Y1 Y0 spearman Y1 Y0 How about graphically? scatter Y1 Y0 lowess Y1 Y0 Do the two FEV1 values appear to be linearly associated? Data Analysis
Fall 2013Biostat 51110 KEY Questions: In this study, the primary scientific aim was to compare the outcome relative (or percent) change in FEV1. (Quick Q: why relative change is preferred to absolute change?) Generate a relative change variable for this analysis: generate rc = 100*(Y1-Y0)/Y0 Create a label for the new variable label variable rc "Relative Change in FEV1" Q: Are there differences in relative change in FEV1 between the treatment groups? Data Analysis
Fall 2013Biostat 51111 Graphical comparisons and descriptive analyses Quick and crude graphical comparisons of relative change in FEV1 and the treatment groups: graph box rc, by(T) hist rc, by(T) Are there concerns with outlying values? lowess rc Y0 How might one investigate how sensitive the results are to an outlying values or values? Data Analysis
Fall 2013Biostat 51112 Quantitative comparisons For numerical comparisons of relative change in FEV1 between the two groups, we have tabstat rc, stats(count mean var sd) by(T) Do you believe there is evidence of a treatment effect? Data Analysis
Fall 2013Biostat 51113 Summaries Statistical summaries The mean relative change for the treatment group is 7.03% The mean relative change for the placebo group is -1.09% The difference in relative change between the two groups is 8.12% What is the interpretation of The mean relative change for the placebo group? Are they doing better, worse, the same at the end of the study? The mean relative change for the treatment group? Comparing the mean relative change difference, what is the interpretation of the values 8.12 percent?
Fall 2013Biostat 51114 Summary Working with data (importing your own data, cleaning up your variables and values) More formal analysis of a dataset Without the statistical inference (more soon) Strategies to investigating outlying (influential) observations Investigating assumptions (Pearson’s “linear” correlation coefficient + scatter plots with smooth lowess curve fitting) Interpreting results Next: STATA batch (.do) files