Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables.

Similar presentations


Presentation on theme: "Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables."— Presentation transcript:

1 Data Workshop H397

2 Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables

3 Inputting and Merging Data  Inputting  STATA “insheet using /Users/daphnepenn/Dropbox/CleaningPractice.csv”  SPSS (dropdown menu EASY)  Merging  “merge m:1 sch_no using "C:\Users\dmp869\Desktop\bpsschools.dta”  SPSS (dropdown menu EASY)

4 Strategies for Missing Data  Figure out why!  Analyze only the available data (i.e. ignoring the missing data)  Imputing the missing data with replacement values, and treating these as if they were observed  Imputing the missing data and accounting for the fact that these were imputed with uncertainty  Using statistical models to allow for missing data, making assumptions about their relationships with the available data.

5 Converting String Variables  Summarizing string variables…  You can’t!  Convert them into numeric variables  “describe”  “destring, replace” (for the entire dataset)  “destring var” (for a particular variable)  “destring schoolethnicityw2, replace”  “encode schoolethnicityw2, generate(schoolethnicityw2)”  encode lowincomestatus, generate(lowincomestatus2)

6 Creating Scales  Stata  Average – “egen avg = rowmean(v1 v2 v3 v4)”  Sum – “egen total = rowtotal(v1 v2 v3 v4)”  SPSS  Average – “COMPUTE MPW2=mean (MP1W2,MP2W2,MP3W2,MP4W2,MP5W2,MP6W2,MP7W2,MP8W2, MP9W2R).”  Sum – “COMPUTE AGW2=AG1W2+AG2W2+AG3W2+AG4W2+AG5W2+AG6W2+AG 7W2.”

7 Creating Dummy Variables  STATA  “ gen newvar = oldvar ==__”  gen male = 0  replace male = 1 if schoolgenderw2=="M”  SPSS  Dropdown menu

8 Summarizing Data and Choosing Tests  tabstat ytdgpaw2, stat(me min med max)  tab schoolgenderw2 schoolethnicityw2  tab schoolethnicityw22 lowincomestatus2  tabstat ytdgpaw2, s (me med sd co) by (schoolethnicityw22)  http://www.som.soton.ac.uk/learn/resmethods/statistical notes/which_test.htm

9 Using appropriate statistics and graphs  Report statistics and graphs depends on the types of variables of interest:  For continuous (Normally distributed) variables  N, mean, standard deviation, minimum, maximum  histograms, dot plots, box plots, scatter plots  For continuous (skewed) variables  N, median, lower quartile, upper quartile, minimum, maximum, geometric mean  histograms, dot plots, box plots, scatter plots  For categorical variables  frequency counts, percentages  one-way tables, two-way tables  bar charts

10 Using appropriate statistics and graphs… Z=Cat. Y=Cat.Y=Cont.Y=Cat.Y=Cont. X=Cat. Use 3-Way Table X=Cont. X=Time N/A 10 All these graphs are available in Chart Builder, from the Choose from: list.

11  Bar chart  Clustered bar charts (two categorical variables)  Bar charts with error bars  Histogram (can be plotted against a categorical variable)  Box & Whisker plot (can be plotted against a categorical variable)  Dot plot (can be plotted against a categorical variable)  Scatter plot (two continuous variables)  Mean  Median  Standard deviation  Range (Min, Max)  Inter-quartile range (LQ, UQ) Flow chart of commonly used descriptive statistics and graphical illustrations  Frequency  Percentage (Row, Column or Total) Exploring data  Descriptive statistics  Graphical illustrations  Categorical data  Continuous data: Measure of location  Continuous data: Measure of variation  Categorical data  Continuous data

12 Choosing appropriate statistical test  Having a well-defined hypothesis helps to distinguish the outcome variable and the exposure variable  Answer the following questions to decide which statistical test is appropriate to analysis your data  What is the variable type for the outcome variable?  Continuous (Normal, Skew) / Binary / If more than one outcomes, are they paired or related?  What is the variable type for the main exposure variable?  Categorical (1 group, 2 groups, >2 groups) / Continuous  For 2 or >2 groups: Independent (Unrelated) / Paired (Related)  Any other covariates, confounding factors? 12

13 13 Continuou s Categoric al Outcom e variable NormalSkew Survival 1 group 2 groups >2 groups Paired Sign test / Signed rank test Mann-Whitney U test Wilcoxon signed rank test Kruskal Wallis test 1 group 2 groups >2 groups Paired Chi-square test / Exact test Chi-square test / Fisher’s exact test / Logistic regression McNemar’s test / Kappa statistic Chi-square test / Fisher’s exact test / Logistic regression 2 groups >2 groups KM plot with Log-rank test Continuou s Spearman Corr / Linear Reg Logistic regression / Sensitivity & specificity / ROC Cox regression Two-sample t test Paired t test One-way ANOVA test Pearson Corr / Linear Reg One-sample t test Exposure variable Flow chart of commonly used statistical tests

14 Other Issues  Organizing Quantitative Data  Choosing the right tests  Sampling

15 Favorite Stats Resources  Youtube  http://www.ats.ucla. edu/stat/stata/ http://www.ats.ucla


Download ppt "Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables."

Similar presentations


Ads by Google