Stata: Getting Starting and Being Productive with VA Data Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham.

Slides:



Advertisements
Similar presentations
Creating Data Entry Screens in Epi Info
Advertisements

ADABAS to RDBMS UsingNatQuery. The following session will provide a high-level overview of NatQuerys ability to automatically extract ADABAS data from.
Live Excel PRESENTER: Brad Leupen | CTO, Entrinsik |
1 Research Methods Lecture 2 The dummies’ guide to STATA Wiji Arulampalam 18/10/2006.
Research Methods Lecture 3 More STATA Ian Walker Room S2.109   Slides available at:
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode Final Project Dataset! –“Housekeeping” commands vs. data.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Teaching Statistics Using Stata Software Susan Hailpern BSN MPH MS Department of Epidemiology and Population Health Albert Einstein College of Medicine.
Debugging What can debuggers do? Run programs Make the program stops on specified places or on specified conditions Give information about current variables’
Spreadsheets and Non- Spatial Databases Unit 4: Module 15, Lecture 2- Advanced Microsoft Excel.
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) School of Social Sciences (SSS) Jawaharlal Nehru University (JNU) New Delhi -
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
Final Thoughts. When you get data… Check for Viruses Lock the files – Right click> properties>click on read only Assume the data has not been cleaned.
Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.
Stata Introduction Sociology 229A, Class 2 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
Guide To UNIX Using Linux Third Edition
Generating new variables and manipulating data with STATA Biostatistics 212 Session 2.
Getting Started with your data
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
RESEARCH HUB AT THE UNIVERSITY LIBRARIES PENN STATE UNIVERSITY TOUR OF STATISTICAL PACKAGES.
Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.
ABS Tablebuilder and DataAnalyser Session 7 UNECE Work Session on Statistical Data Confidentiality October 2013 Daniel Elazar
Econometric Analysis Using Stata
Working With Large Datasets in Corporate Settings Ed Bassin
1 CCPR Computing Services Workshop: Introduction to Stata June, 2006.
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
Project organisation in Stata Adrian Spoerri and Marcel Zwahlen Department of Social and Preventive Medicine University of Berne, Switzerland Research.
Office 2003 Advanced Concepts and Techniques M i c r o s o f t Excel Project 5 Creating, Sorting, and Querying a List.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
A Brief Introduction to Stata(1). 1. Getting Started.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
Key Data Management Tasks in Stata
Tricks in Stata Anke Huss Generating „automatic“ tables in a do-file.
PROC SQL Phil Vecchione. SQL Structured Query Language Developed by IBM in the early 1970’s From the 70’s to the late 80’s there were different types.
STATA Mini Course Fall 2015 Jane Leber Herr Littauer 113 1Stata Mini Course – Spring 2015.
Sustainable Grading Ralph Westfall, Ph.D. April 2009
Ts_print IN A FEW EASY STEPS. C L E A N, Q U A L I T Y D A T A F O R E X C E L L E N C E I N R E S E A R C H ts_print is CRSP’s flexible report writer.
Being Productive with Stata and VA Data Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham Lincoln Todd.
Oracle Data Integrator Procedures, Advanced Workflows.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
A Baker's Dozen Tricks in a Button Thirteen Tricks of the SIR Trade Rolled into a Single Useful Application © Tom Shriver, DataVisor 2002.
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
Statistical Methodology for the Automatic Confidentialisation of Remote Servers at the ABS Session 1 UNECE Work Session on Statistical Data Confidentiality.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Intermediate 2 Computing Unit 2 - Software Development.
An Introduction Katherine Nicholas & Liqiong Fan.
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Stata Review Session Economics 1018 Abby Williamson and Hongyi Li November 17, 2006.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Ec 2390: Section 1 Useful STATA commands Jack Willis September 14th, 2015.
Data Analysis using Stata workshop #4 / Kristin Bott reed.edu > K.Bott / Instructional Technology Services Reed College / Portland, OR.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Before the class starts: 1) login to a computer 2) start Stata 13.
Xxx Presentation, No 1 Copyright © TAC AB Engineering Classic Networks1.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
ECONOMETRICS ii – spring 2018
Introduction to Stata Spring 2017.
Stata Basic Course Lab 4.
CSCI N317 Computation for Scientific Applications Unit 1 – 1 MATLAB
Presentation, data and programs at:
Presentation transcript:

Stata: Getting Starting and Being Productive with VA Data Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham Lincoln Todd Wagner June 2007

Outline Getting data into Stata Getting data into Stata Editing in Stata Editing in Stata How does Stata handle data How does Stata handle data Stata notation and help Stata notation and help Using Stata and Basic Stata commands Using Stata and Basic Stata commands

Transferring Data Stattransfer or DBMS copy work Stattransfer or DBMS copy work Stattransfer often seeks to optimize the Stata dataset by default Stattransfer often seeks to optimize the Stata dataset by default –If transferring data with SCRSSN, FORCE Stattransfer to transfer SCRSSN as double precision

Stattransfer CLICK ON DOUBLE

Editing in Stata Any ASCII text editor will work Any ASCII text editor will work Stata has a built in text editor, but it is limited. Stata has a built in text editor, but it is limited. I recommend using another text editor I recommend using another text editorhttp://fmwww.bc.edu/repec/bocode/t/textEditors.html

Handling Data SAS processes one record at a time SAS processes one record at a time Stata processes all the records at the same time Stata processes all the records at the same time –Loops are commonly used in SAS –Loops are very rarely used in Stata

Loading Data into Memory Stata reads the data into memory Stata reads the data into memory –set mem 100m (before you load the data) You must have enough memory for your dataset You must have enough memory for your dataset With large datasets: With large datasets: –drop unnecessary variables –Use the compress command (but don’t compress SCRSSN)

Stata Abbreviations Stata commands can be abbreviated with the first three letters Stata commands can be abbreviated with the first three letters –regression income education female could be written –reg income education female Can also abbreviate variables if uniquely defined Can also abbreviate variables if uniquely defined –reg inc educ fem

Stata Help Stata’s built in help is great Stata’s built in help is great –Help –Help Stata manuals are great because they review theory Stata manuals are great because they review theory

Stata and the Web Stata is “web aware” Stata is “web aware” Check for updates periodically Check for updates periodically –update all You can search for user-written programs You can search for user-written programs –findit output –findit outreg (click to install)

Stata in Windows Page up scrolls through the previous commands Page up scrolls through the previous commands There is a graphical user interface (menus) if you forget a command There is a graphical user interface (menus) if you forget a command We have Stata on rocky and tasha– no graphical capabilities, no menus, and loss of some shortcuts We have Stata on rocky and tasha– no graphical capabilities, no menus, and loss of some shortcuts

Using Stata Create batch files called “.do” files Create batch files called “.do” files I work interactively I work interactively –Run Stata and create do file as I go –I can then use the do file as needed Debugging code and exploratory data analysis is very fast in Stata Debugging code and exploratory data analysis is very fast in Stata

Sysdir, ls and cd Stata recognizes some unix commands, such as ls and cd Stata recognizes some unix commands, such as ls and cd Sysdir provides a listing of Stata’s working directories Sysdir provides a listing of Stata’s working directoriessysdir STATA: C:\Program Files\Stata9\ UPDATES: C:\ProgramFiles\Stata9\ado\updates\ BASE: C:\Program Files\Stata9\ado\base\ SITE: C:\Program Files\Stata9\ado\site\ PLUS: c:\ado\stbplus\ PERSONAL: c:\ado\personal\ OLDPLACE: c:\ado\

Delimiters SAS recognizes “;” as a delimiter SAS recognizes “;” as a delimiter Stata recognizes the carriage return Stata recognizes the carriage return –Always add a carriage return after your last command You can change delimiters to ; You can change delimiters to ; #delimit ;

Missing Data Stata and SAS both use “.” as missing Stata and SAS both use “.” as missing Stata implicitly values a missing as a very large number Stata implicitly values a missing as a very large number SAS implicitly values a missing as a very small number SAS implicitly values a missing as a very small number

Generating and Recoding Variables In SAS you type In SAS you typequality=0; If VA=1 then quality=1; In Stata you type In Stata you type gen quality=0 recode quality 0=1 if VA==1 or replace quality=1 if VA==1

Boolean Logic Stata is picky about Boolean logic Stata is picky about Boolean logic gen y=x if a==b (must use two ==) gen y=x if a>b & b>10 (must use &) gen y=x if a must be before =)

Creating Dummy Variables Goal: create dummy variable for each DRG Goal: create dummy variable for each DRG gen drgnum1=drg==1 or tab drg, gen(drgnum) This second command automatically creates dummy variables This second command automatically creates dummy variables

Drop Drop (drops variables) Drop (drops variables) Drop if X==1 (drop cases where value is 1) Drop if X==1 (drop cases where value is 1)

egen Commands You want to generate total costs for a medical center You want to generate total costs for a medical center In SAS this is done by proc summary In SAS this is done by proc summary In Stata, you can type In Stata, you can type collapse (sum) costs, by (stan3) or sort sta3n by sta3n: egen sumcost=total(cost)

ICD-9 Codes Stata has capabilities to handle ICD-9 diagnosis and procedure codes Stata has capabilities to handle ICD-9 diagnosis and procedure codes You can You can –check to see if codes are valid –generate identifiers based on codes or ranges of codes

Dates Same date functions as SAS Same date functions as SAS

Combining Data Merge Merge –this automatically creates a variable called _merge –merge==1 obs. from master data –merge==2 obs. from only one using dataset –merge==3 obs. from at least two datasets, master or using merge scrssn admitday disday using data_y Append (stacking data) Append (stacking data)

Explicit Subscripting Identify the most recent encounter in an encounter database Identify the most recent encounter in an encounter database gsort id -date by id : gen n=_n by id : gen N=_N gen select=n==1 Ascending sort by ID and reverse by date Record counter from 1 to N per person Total number of records per person

Using Stata

Stata Interface in Windows

Set, Clear and More Set: sets system parameters Set: sets system parameters –Need to set memory size to open a database set mem 100m Clear erases data from memory Clear erases data from memory When output is >1 page, you are asked to continue ( set more off ) When output is >1 page, you are asked to continue ( set more off )

Summarizing Data. sum gender age educ Variable | Obs Mean Std. Dev. Min Max gender | age | educ | Sum, d provides more details on each variable Sum, d provides more details on each variable Tabstat provides summary info, including totals Tabstat provides summary info, including totals

Tabulating Data. tab gender gender | Freq. Percent Cum. gender | Freq. Percent Cum | 2, | 2, | 2, | 2, Total | 4, Total | 4, table gender gender | Freq. gender | Freq | 2,058 1 | 2,058 2 | 2,027 2 | 2,

Tabulating Data tab gender age too many values r(134); tab age gender tab age gender | gender | gender age | 1 2 | Total age | 1 2 | Total | | | | | | | | 143… 94 | 1 0 | 1 94 | 1 0 | Total | 2,058 2,027 | 4,085 Total | 2,058 2,027 | 4,085

Tabstat. tabstat age, by (gender) gender | mean gender | mean | | | | Total | Total | table gender, c(mean age) gender | mean(age) gender | mean(age) | | | |

Graphing Diagnostic graphics Diagnostic graphics Presenting Presentingresults

Basic Analytical Functions OLS (reg) OLS (reg) Logistic, probit, count data (e.g., CLAD) Logistic, probit, count data (e.g., CLAD) Multinomials Multinomials GLM/HLM GLM/HLM Duration models Duration models Semi and non-parametric models Semi and non-parametric models

Output Linear regressionNumber of obs= 1306 F( 21, 1284)= Prob > F= R-squared= Root MSE= Robust wtp Coef.Std. Err.tP>t[95% Conf.Interval] ethn Ethn ethn ethn english lifeus age income incmis _cons

Outreg Outputs data to a delimited file Outputs data to a delimited file Delimited file can be read into Excel Delimited file can be read into Excel Very flexible Very flexible Creates publishable tables Creates publishable tables