Presentation is loading. Please wait.

Presentation is loading. Please wait.

Random (but useful) STATA commands Jen Cocohoba, Pharm.D., MAS Health Sciences Associate Clinical Professor UCSF School of Pharmacy.

Similar presentations


Presentation on theme: "Random (but useful) STATA commands Jen Cocohoba, Pharm.D., MAS Health Sciences Associate Clinical Professor UCSF School of Pharmacy."— Presentation transcript:

1 Random (but useful) STATA commands Jen Cocohoba, Pharm.D., MAS Health Sciences Associate Clinical Professor UCSF School of Pharmacy

2 Housekeeping Evolution of this lecture – “How do I … for my final project/research project” Assortment of topics discussed today – Data in other formats – Programming a loop – Date data in STATA – Basics for merging datasets – Introduction to reshaping data Follow along – No lab exercises – work on final project

3 STATA.dta files and Excel Spreadsheets STATA datasets – use “c:\biostat212\dataset.dta” Recap (lab 3)- text (ASCII) or Excel data – insheet using “c:\biostat212\dataset.xls” – insheet using “c:\biostat212\dataset.csv”

4 Starters for SAS files Step 1: check your SAS dataset type – STATA can read SAS xport files (*.xpt, *.stx) import sasxport “dataset.xpt” If you have both SAS and STATA on your computer – Method 1: use SAS to turn it into a STATA dataset Open dataset in SAS Menu  File  Export  choose STATA dataset as type Save the new dataset Can do this at UCSF library if you don’t own SAS – Method 2: download usesas package Module requires both programs on computer to “read” sas datasets ssc install usesas, replace usesas using “filename.sasb7dat”

5 SPSS datasets Method 1: see previous slide – Open dataset in SPSS and save as STATA dataset Method 2: usespss – Plugin “reader” installed into STATA which does not require you to have SPSS installed – Type findit usespss to download – Reads *.sav files originating from WINDOWS SPSS usespss using “dataset.sav” For more information: http://adeptanalytics.org/radyakin/stata/usespss/radyakin_usespss.pdf

6 Programming loops Example – Determine whether age, number of side effects, and scaled severity of side effects differ by gender Start programming… ttest age, by(sex) ttest numsidefx, by(sex) ttest severity, by(sex ) MenWomenP-value Age, mean (SD) # Reported adverse effects, mean (SD) Adverse effect severity index, mean(SD) Table 1

7 Simple Loop Syntax foreach var in variable1 variable2 { firstcommand `var’ secondcommand `var’ } List of variables Loop begin Perform these commands, replacing the `var’ with the variables in the list. NOTE the special apostrophe marks (the first one lies below the ~ on the keyboard, the other is a normal apostrophe) Loop end * NOTES 1.Open brace must appear on the same line as the foreach command. 2.Nothing may follow the open brace (except for comments) 3.The first command must be on a separate line 4.The close brace must be on its own line

8 Simple Loop Syntax foreach var in age numsidefx severity { ttest `var’, by(gender) } Loop begin Perform this command, replacing the generic placeholder `var’ with variables I specified in my list. NOTE the special apostrophe marks (the first one lies below the ~ on the keyboard, the other is a normal apostrophe) Loop end Variables * NOTES 1.Open brace must appear on the same line as the foreach command. 2.Nothing may follow the open brace (except for comments) 3.The first command must be on a separate line 4.The close brace must be on its own line

9 Loops in do files - examples ******** Frequencies for Table 1 – baseline characteristics ******** /* Get proportions of categorical variables and estimates of missing */ foreach var in agecat afamer highschool income employed insurtype { tab `var' sex, missing col chi2 } /* Get means, standard deviations, and test for differences of continuous variables */ foreach var in ageatvisit cd4 log10vl { bysort sex: sum `var', detail ttest `var', by(sex) } ******** Table 2 Odds Ratios ******** /* Get all of the univariate odds ratios for important factors with guideline not recommended regimens */ foreach var in highschool income employed drugcover depressed { xi: logistic guideline i.`var' }

10 Date Data in STATA Dates in your research STATA can help you manipulate dates – Add, subtract, calculate time between dates – Comparing dates (e.g. before 1999, after 1999) – Extract components of dates (year, day of week)

11 How STATA thinks about dates “Counts” date as the # of days from a specific reference – January 1, 1960 = 0 – January 2, 1960 = 1 – January 3, 1960 = 2 – December 31, 1960 = 364 This makes it “easy” for STATA to manipulate mathematically We will come back to this when formatting dates 012364 1/1/19601/3/196012/31/1960

12 Cleaning strings to STATA dates STATA reads dates as string Do the “usual” – Open my Excel spreadsheet – Copy data – Paste into STATA data editor – Note color of variable

13 Cleaning STATA dates STATA sees dates as string (text) even if you type them into the editor Need to convert to date recognized by STATA – Generate a new date variable using date function – Show it which old variable contains the date you want to convert – Give it a format (most common is month, day, year) – Compare old and new results

14 generate dob = date(birthdate, “MDY”, topyear) New variable name Date function Old variable name How the date is arranged *NOTE: your original date variable can be “date-like” (e.g. 8/10/1970) or can be in a true string format (August 10, 1970) --- STATA can figure it out. For 2-digit years, the “top year” that should be interpreted

15 Number nonsense Emerges as the date in STATA speak Can mask the numerical date so that it is easier for you to understand Command: format dob %td dob -2372 -4366 -3839 150 -4862 -3626 -2788 -3562 -1868 -5946 -5984 -1962 -4694 -6018 -4407 0 * NOTE: Other formats aside from %td – in STATA help

16 Dates: series of commands Date conversions usually 2-commands plus checking generate dob = date(birthdate, “MDY”) format dob %td browse dob birthdate drop birthdate NOTE: STATA issues with 2-digit years (8/10/76) – Will get “missing values” generated – Two ways to fix this Format dates to 4 digit years in Excel, then copy to STATA Add “topyear” cutoff to the STATA command. Anything beyond topyear = previous century generate dob = date(birthdate, “MDY”, 2011) 9/10/10 = September 10, 2010 9/10/11 = September 10, 2011  top year 9/10/12 = September 10, 1912 9/10/13 = September 10, 1913

17 Recap: dates Generate vdate = date(visitdate, “MDY”, 2011) New variable name Date function Old variable name How the date is arranged Top year = if the year is “11” this is interpreted as 2011. If beyond the topyear (“12”) then it is interpreted as 1912

18 Other ways dates are stored Date components in separate variables birth_mbirth_dbirth_y 741953 1181948 6281949 5301960 STATA can concatenate these for you using an mdy command

19 Generate olddate = mdy(birth_m, birth_d, birth_y) New variable name mdy date function Name of month, day, and year variables Don’t forget to format the new date variable using format olddate %td

20 Date is now formatted – what can you do with it? Extract components of the date into new variables (columns) – gen nameofdayvariable = day(datevariable) – gen weekdayvariable = dow(datevariable) Lists as 0(Sunday) - 6(Saturday) – gen monthvariable = month(datevariable) – gen yearvariable = year(datevariable)

21 What else can you do with dates Find time elapsed between dates Suppose you wanted to find participants’ age at the date of their study visit. – Generate new variable called ageatvisit gen ageatvisit = vdate - dob – Note this gives you their age in number of DAYS – Can do this more efficiently by gen ageatvisit =(vdate – dob)/365.25 gen agevisityears = int(ageatvisit)

22 Comparing dates Suppose you wanted to categorize patients by their visit dates – Those who had a visit before 12/31/07 = earlyvisit Using a literal dates – Formatted as day month year (01jan1960) – Must be denoted by parenthesis – Must use pseudocommand td – Example: td(01jan1960) Example – gen earlyvisit = 0 – replace earlyvisit = 1 if vdate <= td(31dec2007) – replace earlyvisit =. if vdate==.

23 A little on merging datasets Merge versus append – Merge = add new variables from 2 nd dataset to existing obs (across) – Append = add new obs to existing variables (under) Merging requires datasets to have a common variable (ID) Nomenclature for the datasets – One dataset is defined as the “master” (in memory) dataset – The other dataset is called the “using” dataset Many merge types – need to specify for STATA

24 One to One/One to many merge 1:1 merge 1:m master using master using

25 Many to one, Many to Many merge m:1 merge m:m using master using

26 Merging datasets Need to make sure data are sorted by the common variable AND saved Steps – Load the master dataset into memory – Sort (just to be safe) & save – Merge command – Check to make sure it makes sense sort idvariable merge type commonvariable using “name of 2 nd dataset.dta” See appearance of a “merge” variable which tells you where the observations came from (dataset 1, dataset 2, etc.) Example: with two datasets called “wihsdrugs” (master) and “socdem” (using) use “socdem.dta” sort wihsid save “socdem.dta”, replace clear use “wihsdrugs.dta” sort wihsid merge 1:1 wihsid using “socdem.dta” browse

27 Shape-shifting Conceptually difficult – Example: chart with patients and average # cigarettes smoked per day over time idnumber198119821983 125108 2143017 32184 May want data to look different to manipulate idnumberyearcigs 1198125 1198210 119838 2198114 2198230 2198317 319812 3198218 319834 WIDE LONG

28 Reshaping wide to long Wide to long: make dataset with multiple records per patient Group variables need common “stub” – In our example 1982, 1983, 1984 – STATA doesn’t know to group these unless named similarly rename 1982 year1982 rename 1983 year1983 rename 1984 year1984 Nomenclature – i = primary index variable (the patient identification number) – j = secondary index variable (often generated from a “stub”) reshape long cigs, i(idnumber) j(year)

29 Reshaping long to wide Long to wide: one record per patient (our example) reshape wide datavariable, i(indexvariable) j(2 nd -indexvar) reshape wide cigs, i(idnumber) j(year) indexvariable2ndindex12ndindex22ndindex3 1cigs@subindex1cigs@subindex2cigs@subindex3 2cigs@subindex1cigs@subindex2cigs@subindex3

30 Thoroughly confused?

31 The wonders of STATA on the Web Brushing the surface, but many possibilities To figure out how… – STATA help (http://www.stata.com/support) – UCLA has a helpful STATA site (http://www.ats.ucla.edu/stat/stata)http://www.ats.ucla.edu/stat/stata – Good old Google & other web discussion strings Good luck with your final projects!


Download ppt "Random (but useful) STATA commands Jen Cocohoba, Pharm.D., MAS Health Sciences Associate Clinical Professor UCSF School of Pharmacy."

Similar presentations


Ads by Google