Random (but hopefully useful) STATA commands Jen Cocohoba, Pharm.D., MAS Health Sciences Associate Clinical Professor UCSF School of Pharmacy.

Slides:



Advertisements
Similar presentations
Stata as a Data Entry Management Tool
Advertisements

Mostly Dates… and a few other useful STATA commands Jen Cocohoba, Pharm.D., MAS Associate Clinical Professor UCSF School of Pharmacy.
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode Final Project Dataset! –“Housekeeping” commands vs. data.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Random (but useful) STATA commands Jen Cocohoba, Pharm.D., MAS Health Sciences Associate Clinical Professor UCSF School of Pharmacy.
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
Generating new variables and manipulating data with STATA Biostatistics 212 Session 2.
Good Data Management Practices Patty Glynn 10/31/05
Getting Started with your data
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Introduction to SPSS (For SPSS Version 16.0)
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Stata 12 Merging Guide Nathan Favero Texas A&M University October 19, 2012.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?
Session I How to use STATA & Basic Data Management Commands.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
4/22/2017 5:36 PM EViews Training Creating Workfiles.
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
AP/H SCIENCE SKILLS: EXCEL & SIG FIG Suggested summer work for incoming students.
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
Key Data Management Tasks in Stata
Tricks in Stata Anke Huss Generating „automatic“ tables in a do-file.
STATA Mini Course Fall 2015 Jane Leber Herr Littauer 113 1Stata Mini Course – Spring 2015.
Organizing a project, making a table Biostatistics 212 Lecture 7.
Organizing a project, making a table Biostatistics 212 Session 5.
Organizing a project, making a table Biostatistics 212 Lecture 7.
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode –Cross-checking/recoding missing values –Analysis of.
Introduction to Statistical Computing in Clinical Research Biostatistics 212.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
Chapter 6 Creating, Sorting, and Querying a Table
SPSS- Tutorial The following power-point slides show you how to use some of the features in SPSS. A survey of 20 randomly selected companies asked them.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Comparison of different output options from Stata
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
AP/H SCIENCE SKILLS: SPREADSHEETS & SIG FIG Suggested summer work for incoming students.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Ec 2390: Section 1 Useful STATA commands Jack Willis September 14th, 2015.
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
Before the class starts: 1) login to a computer 2) start Stata 13.
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
TDA Direct Certification
Introduction to SPSS.
DEPARTMENT OF COMPUTER SCIENCE
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
ECONOMETRICS ii – spring 2018
Introduction Introduction to Stata 2016.
Introduction to Stata Spring 2017.
Stata Basic Course Lab 4.
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Presentation, data and programs at:
Evaluation of Public Policy
Presentation transcript:

Random (but hopefully useful) STATA commands Jen Cocohoba, Pharm.D., MAS Health Sciences Associate Clinical Professor UCSF School of Pharmacy

Housekeeping Evolution of this lecture – “How do I … for my final project/research project” Assortment of topics – Data in other formats – Programming a loop – Managing duplicate observations – Date data in STATA – Basic merging for datasets – Introduction to reshaping data Follow along – No lab exercises – work on final project

SAS files Step 1: check your SAS dataset type – STATA can read SAS xport files (*.xpt, *.stx) import sasxport “dataset.xpt” If you have both SAS and STATA on your computer – Method 1: use SAS to turn it into a STATA dataset Open dataset in SAS Menu  File  Export  choose STATA dataset as type Save the new dataset Can do this at UCSF library if you don’t own SAS – Method 2: download usesas package Module requires both programs on computer to “read” sas datasets ssc install usesas, replace usesas using “filename.sasb7dat”

Method 1

SPSS datasets Method 1: similar to previous – Open dataset in SPSS and save as STATA dataset Method 2: usespss – Plugin “reader” installed into STATA which does not require you to have SPSS installed – Type findit usespss to download – Reads *.sav files originating from WINDOWS SPSS usespss using “dataset.sav” For more information:

Programming loops Example – Determine whether age, number of side effects, and scaled severity of side effects differ by gender Start programming… ttest age, by(sex) ttest numsidefx, by(sex) ttest severity, by(sex ) MenWomenP-value Age, mean (SD) # Reported adverse effects, mean (SD) Adverse effect severity index, mean(SD) Table 1

Simple Loop Syntax foreach var in variable1 variable2 { firstcommand `var’ secondcommand `var’ } List of variables Loop begin Perform these commands, replacing the `var’ with the variables in the list. NOTE the special apostrophe marks (the first one lies below the ~ on the keyboard, the other is a normal apostrophe) Loop end * NOTES 1.Open brace must appear on the same line as the foreach command. 2.Nothing may follow the open brace (except for comments) 3.The first command must be on a separate line 4.The close brace must be on its own line

Simple Loop Syntax foreach var in age numsidefx severity { ttest `var’, by(gender) } Loop begin Perform this command, replacing the generic placeholder `var’ with variables I specified in my list. NOTE the special apostrophe marks (the first one lies below the ~ on the keyboard, the other is a normal apostrophe) Loop end Variables * NOTES 1.Open brace must appear on the same line as the foreach command. 2.Nothing may follow the open brace (except for comments) 3.The first command must be on a separate line 4.The close brace must be on its own line

Loops in do files - examples ******** Frequencies for Table 1 – baseline characteristics ******** /* Get proportions of categorical variables and estimates of missing */ foreach var in agecat afamer highschool income employed insurtype { tab `var' sex, missing col chi2 } /* Get means, standard deviations, and test for differences of continuous variables */ foreach var in ageatvisit cd4 log10vl { bysort sex: sum `var', detail ttest `var', by(sex) } ******** Table 2 Odds Ratios ******** /* Get all of the univariate odds ratios for important factors with guideline not recommended regimens */ foreach var in highschool income employed drugcover depressed { xi: logistic guideline i.`var' }

Handling Duplicate Observations Just want to find the duplicates? duplicates list variable1 variable2 Or… duplicates report variable1 variable2 or… duplicates tag variable1 variable2, gen(newvar)

Handling Duplicate Observations Method 1: 2 step process of tagging then dropping duplicates tag variable1 variable2, gen(newvar) duplicates drop if newvar==1 Method 2: just dropping duplicates drop variable1 variable2, force Usual goal is to either find the duplicates or get rid of them … or both

STATA dates Dates common in research STATA reads dates as string Do the “usual” – Open Excel spreadsheet, copy, paste into editor – (OR import the data) – Note color of variable

How STATA thinks about dates “Counts” date as the # of days from a specific reference – January 1, 1960 = 0 – January 2, 1960 = 1 – January 3, 1960 = 2 – December 31, 1960 = 364 This makes it “easy” for STATA to manipulate mathematically We will come back to this when formatting dates /1/19601/3/196012/31/1960

Cleaning STATA dates Need to convert to STATA-recognizable date to perform analysis 1)Generate a new date variable using date function 2)Identify the “old” string variable which contains the date 3)Tell STATA what format it was in (e.g. month, day, year) 4)Compare old and new results

generate dob = date(birthdate, “MDY”, topyear) New variable name Date function Old variable name How the date is arranged *NOTE: your original date variable can be “date-like” (e.g. 8/10/1970) or can be in a true string format (August 10, 1970) --- STATA can figure it out. For 2-digit years, the “top year” that should be interpreted

Number nonsense Emerges as the date in STATA speak Can mask the numerical date so that it is easier for you to understand Command: format dob %td dob * NOTE: Other formats aside from %td – in STATA help

Dates: series of commands Date conversions usually 2-commands plus checking generate dob = date(birthdate, “MDY”) format dob %td browse dob birthdate drop birthdate NOTE: STATA issues with 2-digit years (8/10/76) – Will get “missing values” generated – Two ways to fix this Format dates to 4 digit years in Excel, then copy to STATA Add “topyear” cutoff to the STATA command. Anything beyond topyear = previous century generate dob = date(birthdate, “MDY”, 2012) 9/10/11 = September 10, /10/12 = September 10, 2012  top year 9/10/13 = September 10, /10/14 = September 10, 1914

MDY command Date components housed in separate variables birth_mbirth_dbirth_y STATA can concatenate these for you New variable name mdy date function Name of month, day, and year variables

Date is now formatted – what can you do with it? Extract components of the date into new variables (columns) – gen nameofdayvariable = day(datevariable) – gen weekdayvariable = dow(datevariable) Lists as 0(Sunday) - 6(Saturday) – gen monthvariable = month(datevariable) – gen yearvariable = year(datevariable)

What else can you do with dates Find time elapsed between dates Suppose you wanted to find participants’ age at the date of their study visit (or today) – Generate new variable called ageatvisit gen ageatvisit = vdate - dob – Note this gives you their age in number of DAYS – Can do this more efficiently by gen ageatvisit =(vdate – dob)/ gen agevisityears = int(ageatvisit)

Comparing dates Suppose you wanted to categorize patients by a date – Patients starting ARV < 1996 = pre-HAART Using literal dates – Formatted as day month year (01jan1960) – Must be denoted by parenthesis – Must use pseudocommand td – Example: td(01jan1960) Example – gen prehaart = 0 – replace prehaart = 1 if artstart<= td(01jan2006) – replace prehaart =. if artstart ==.

A little on merging datasets Merge versus append – Merge = add new variables from 2 nd dataset to existing obs (across) – Append = add new obs to existing variables (under) Merging requires datasets to have a common variable (ID) Nomenclature for the datasets – One dataset is defined as the “master” (in memory) dataset – The other dataset is called the “using” dataset Many merge types – need to specify for STATA

One to One/One to many merge 1:1 merge 1:m master using master using

Many to one, Many to Many merge m:1 merge m:m using master using

Merging datasets Need to make sure data are sorted by the common variable AND saved Steps – Load the master dataset into memory – Sort (just to be safe) & save – Merge command – Check to make sure it makes sense sort idvariable merge type commonvariable using “name of 2 nd dataset.dta” See appearance of a “merge” variable which tells you where the observations came from (dataset 1, dataset 2, etc.) Example: with two datasets called “wihsdrugs” (master) and “socdem” (using) use “socdem.dta” sort wihsid save “socdem.dta”, replace clear use “wihsdrugs.dta” sort wihsid merge 1:1 wihsid using “socdem.dta” browse

Shape-shifting Conceptually difficult – Example: chart with patients and average # cigarettes smoked per day over time idnumber May want data to look different to manipulate idnumberyearcigs WIDE LONG

Reshaping wide to long Wide to long: make dataset with multiple records per patient Group variables need common “stub” – In our example 1982, 1983, 1984 – STATA doesn’t know to group these unless named similarly rename 1982 cigs1982 rename 1983 cigs1983 rename 1984 cigs1984 Nomenclature – i = primary index variable (the patient identification number) – j = secondary index variable (often generated from a “stub”) reshape long cigs, i(idnumber) j(year)* Stub New variable

Reshaping long to wide Long to wide: one record per patient (our example) reshape wide datavariable, i(indexvariable) j(2 nd -indexvar) reshape wide cigs, i(idnumber) j(year) indexvariable2ndindex12ndindex22ndindex3 Existing variable (going to be dropped) Stub: to be created

Demonstration: Tiny datasets

Need to know more??

STATA help – your new best friend

Can try various terms

The wonders of STATA What statistical test do I run… – Google and Statisticians How do I run a particular test/command … – STATA within-program help feature – STATA help ( – UCLA STATA site ( – Google & other web discussion strings Good luck with your final projects!