Presentation is loading. Please wait.

Presentation is loading. Please wait.

Session I How to use STATA & Basic Data Management Commands.

Similar presentations


Presentation on theme: "Session I How to use STATA & Basic Data Management Commands."— Presentation transcript:

1 Session I How to use STATA & Basic Data Management Commands

2 What will be covered?  Introduction to STATA Software  General Guidelines in Data entry  Data Management in STATA

3 Introduction to STATA

4

5

6 Open & Close the Output File To open the log file log using “directory\path\filename.log” log using d:\trials\zinc.log To close log close zinc.dta

7 To Open Log (Output) File

8 To Close the Log File

9 Append & Replace the Existing Log File To append the existing log file log using d:\trials\zinc.log, append To replace the existing log file log using d:\trials\zinc.log, replace

10 Open the Data File To open the data file use “directory\path\filename.dta” use d:\trials\zinc.dta To save save zinc.dta zinc.dta

11 To Make A New Directory

12 To Change the Directory

13 General Guidelines in Data Entry  Rows in the datasheet should contain individual information - Record.  Each column should contain values of a single entity of all the individuals – Variable.  Variable name should not exceed more than eight characters.  Variables can be either numeric or string or alphanumeric.  A numeric variable must posses only numbers.  In any datasheet, identification number is must.

14 DATA DESCRIPTION

15 Data Management using STATA

16  Inputting Data  Editing Data  Creating and Changing Variables  Saving and Reusing Data  Data Reorganization  Merging and Appending datasets Data Management using STATA

17 Inputting Data   Enter data from keyboard – –input varlist – – input str25 name age str1 sex – –Best way is copy from excel and directly paste the data to STATA editor – –Transfer from other programs

18 Arithmetic Operators + (Addition) - (Subtraction) * (Multiplication) / (Division) ^ (Raise to power)

19 Relational Operators > (greater than) < (less than) > = (greater than or equal) < = (less than or equal) = = (equal) != (not equal)

20 Logical Operators & (and) | (or) != (not equal)

21 Expressions If – used when expression is to be specified with the condition In – used when range is to be specified in the condition

22 Editing Data   Edit using Data Editor – – edit [varlist] [if] [in] – – edit treatment centre age – – edit treatment age if centre==3&age>25

23 Browsing Data  List using Data Editor – browse [varlist] [if] [in] – browse treatment centre age – browse treatment age if centre==3&age>25

24 Do this Exercise…  Edit the following: –pcode, treatment and cough only for centre 4 –browse for the same and feel the difference zinc.dta

25 Creating & Changing Variables Creating & Changing Variables  Create new variable – generate newvar = exp [if] [in] – gen totstl24= s1_tstool_wt+ s2_tstool_wt+ s3_tstool_wt

26 Do this Exercise…… Generate total stool output from 0-48 hours zinc.dta

27 Creating & Changing Variables …contd   Change contents of existing variable – –To replace   replace oldvar =exp [if] [in]   replace sodium1 =. if sodium1==0 – –To recode   recode varlist (erule) [(erule)...] [if] [in]   recode age min/6=1 7/11=2 12/max=3, gen(agecat) RuleExampleMeaning # = # # # = # #/# = # nonmissing = # missing = # 3 = 1 2 4 = 5 4/8 = 3 nonmissing = 2 missing = 9 3 recoded to 1 2 and 4 recoded to 5 4 through 8 recoded to 3 all other nonmissing to 2 all other missing to 9

28 Do this Exercise…… Replace all zeros in serum Potassium as missing. Ex 1: Replace all zeros in serum Potassium as missing. Ex 2: Recode pre admission diarrhea duration into 0-24h, 25-72h and > 72h zinc.dta

29   Rename the existing variable – – rename oldvarname newvarname – – ren tlc_t2 tlc2 – – ren tlc_t3 tlc3   Eliminate the existing variable – –To drop   drop varlist   drop name address – –To keep   keep varlist   keep idno age sodium albumin-tlc Creating & Changing Variables …contd zinc.dta

30 Saving & Reusing Data in Stata Format   To Save data – – save filename.dta – – save zinc, replace – – clear   To reuse data – – use filename – – use zinc zinc.dta

31 Data Reorganization   Sorting observations and changing variable order – – To sort   sort varlist [in] {ascending}   sort pcode – – Move specified variables to front of dataset   order varlist – – Move one variable to specified position   move varname1 varname2 – – Alphabetize specified variables and move to front of dataset   aorder [varlist] zinc.dta

32 Data Reorganization …contd   Convert data from wide to long – – reshape long stubnames, i(varlist) j(varname) – – reshape long albumin, i(pcode) j(time) Wide Shape DataLong Shape Data

33 Data Reorganization …contd   Convert data from long to wide – – reshape wide stubnames, i(varlist) j(varname) – – reshape wide albumin, i(pcode) j(time) Long Shape DataWide Shape Data

34 Do this Exercise… Convert serum zinc from wide to long shape data using zinclab.dta zinclab.dta

35 Answer!!! zinclab.dta

36 Merging & Appending Datasets   To append datasets – – append using filename   use zinc1.dta   append using zinc2.dta   To merge datasets – – merge [varlist] using filename   use zinclab   sort pcode   save zinclab, replace   use zincprognostic   sort pcode   merge pcode using zinclab zinclab.dta

37 Merge file 1 (zinclab.dta) with file 2 (zincprognosis.dta) Do this Exercise… zinclab.dta

38 Session II Data Cleaning & Preparing Data for Analysis

39 Preparing Data for Analysis Inclusion criteria ≤ 35 months old children

40 Preparing Data for Analysis …contd

41 Do this Exercise… Inclusion criteria for the study was pre admission diarrhea duration < 7 days Ex 1: Convert pre admission diarrhea duration from hours to days using zincclean.dta Ex 2: Find values beyond expected range zinc.dta

42 Answer!!!

43 Preparing Data for Analysis …contd

44

45

46 Do this Exercise… Do similar exercise for hemoglobin using zinc.dta zinc.dta

47 Answer!!!

48 Preparing Data for Analysis …contd What do you mean by 1 & 2??? zinc.dta

49 Preparing Data for Analysis …contd Label name

50 Preparing Data for Analysis …contd What is wrong and how to correct it??? zinc.dta

51 Preparing Data for Analysis …contd

52

53 Generate total stool output for first 48 hrs Do this Exercise… zinclean.dta

54 Preparing Data for Analysis …contd

55 Draw a boxplot and identify extreme value, if any, for s2_tstool_wt using zincclean.dta Do this Exercise… zincclean.dta

56 Session III Introduction to Basic Data Analysis

57 What will be Covered?   Descriptive Statistics   Parametric tests   Non-parametric tests

58 Analyses  Univariate (one variable at a time)  Bivariate (two variables at a time)  Multivariate (more than two variables at a time)

59 Descriptive Statistics

60 Univariate Analysis Quantitative Mean Median Range/IQ Range SD Categorical CategoricalFrequencypercentage

61 Descriptive Statistics-Categorical Variable Can we label the variables???

62 Contingency Table

63 Contingency Table …contd

64

65

66 Immediate commands

67 Ex 1: Draw a crosstab between treatment and withdrawn using zinc.dta Ex 2: Draw a crosstab between treatment and diarr24, diarr48 Do this Exercise… zinc.dta

68 Descriptive Statistics-Quantitative Variable

69 Summary in Detail

70 Calculate summary statistics for the following variables: 1.Total stool output 0-48h 2.Total ORS intake 0-24h 3.Total stool frequency in 24h before admission 4.Serum zinc at admission Do this Exercise… zinc.dta

71 Summary Statistics by Group

72 Calculate summary statistics by “treament” for the following variables: 1.Total stool output 0-48h 2.Total ORS intake 0-24h 3.Total stool frequency in 24h before admission 4.Serum zinc at admission Do this Exercise… zinc.dta

73 Percentile Values

74 Calculate 3 rd and 97 th percentile value by “treatment” for the following variables: 1.Total stool output 0-48h 2.Total ORS intake 0-24h Do this Exercise… zinc.dta

75 Session IV (A) Bi-variate Analyses

76 Analysis of Clinical Trial Data

77 1. 1. Compare patient characteristics at the time of randomization and baseline measurements between the groups 2. 2. Assess the difference in outcome variable(s) between the groups (adjusting for any imbalance in patient characteristics or baseline outcome variables) Analysis of Clinical Trial Data

78 1.Categorical vs Categorical 2.Categorical vs Quantitative Bi-variate Analyses

79 1. Categorical Vs Categorical UnrelatedRelated -Chi square testMcNemar test - Fishers Exact test X=2, Y=2X>2, Y>2 Unrelated - Chi square test - Fishers Exact test X :Group variable Y :Outcome variable

80 Chi-square test

81 Is there a difference between the proportion of patients requiring IV fluids in the two treatment groups? Do this Exercise… zinc.dta

82 Chi-square Test/Fisher’s exact Test by Group

83 Comparison of two proportions

84 1.Is there a difference in the proportion of patients recovered in rota virus negativity between the two treatment groups? 2.91% of patients recovered in treatment A (n=248) and 95% of patients recovered in treatment B (n=252). Test these proportions and find out the p-value Do this Exercise… zinc.dta

85 McNemar’s Chi-square Test

86 McNemar’s Chi-square Test …contd < <

87 Is there a shift in zinc deficiency from baseline after giving treatment B? Do this Exercise… zinc.dta

88 2. Categorical vs Quantitative X=2 &Y: Normal UnrelatedRelated Student’s t testPaired ‘t’ test X=2 &Y: Non Normal UnrelatedRelated Wilcoxon ranksumWilcoxon signrank X>2 &Y: Non-NormalX> 2 &Y: Normal UnrelatedRelated One wayRepeated ANOVAmeasures ANOVA Unrelated Related Kruskal Wallis Freidmans test Parametric Non-Parametric

89 Student’s ‘t’ Test for Independent Groups

90 Student’s ‘t’ Test for Independent Groups …contd

91 What is the Difference in the Total ORS Intake in the First 24h between the Two Groups?

92 Transformations

93 Transformations …contd

94 Ex 1: What is the difference in total stool output 0-48hours between the two groups? Ex 2: Is there a difference between total duration of diarrhea (in hours) (varname: tot_du_dia_h) between the two treatment groups? Do this Exercise… zinc.dta

95 Geometric Mean if Log Transformation is Used

96 Do this Exercise Ex: Calculate the geometric mean for stool output 0-48 hours zinc.dta

97 Paired t-Test

98 Do this Exercise… Is there a change in zinc value from baseline after giving treatment B? zinc.dta

99 Is there a Change in the Serum Zinc from Baseline to Recovery between Two Treatment Groups? Discuss………..

100 One-way ANOVA * * Analysis of Variance

101 Multiple Comparisons Difference in means of zinc values between age group of ≤6 & > 12 P-value

102 Non-Parametric Methods

103 Is there a difference in total stool output in the first 24h between the two treatment groups? Answer: Wilcoxon Ranksum test

104 Is there a difference in total stool output in the first 24h between the two treatment groups? …contd Answer: Wilcoxon Ranksum test

105 Do this Exercise… Is there a difference in total diarrhea duration between the two groups? zinc.dta

106 Is there a Change in zinc from baseline after giving treatment A? Answer: Wilcoxon signed-rank test

107 Is There a Change in zinc from baseline after giving treatment A?

108 Do this Exercise… 1.Is there any difference in zinc from baseline after giving treatment B? zinc.dta

109 Is there a difference in total stool output across age groups? …… Contd Answer: Kruskal-Wallis Test

110 …… Contd Is there a difference in total stool output across age groups? Answer: Kruskal-Wallis Test

111 … Contd Is there a difference in total stool output across age groups? Answer: Kruskal-Wallis Test

112 …… Contd Is there a difference in total stool output across age groups? Answer: Kruskal-Wallis Test

113 Do this Exercise… 1.Is there any difference in serum zinc (at admission) across the age groups ? zinc.dta


Download ppt "Session I How to use STATA & Basic Data Management Commands."

Similar presentations


Ads by Google