1 CCPR Computing Services Workshop: Introduction to Stata June, 2006.

Slides:



Advertisements
Similar presentations
{ Advanced Stata Programming Andrew Hicks CCPR Statistics and Methods Core.
Advertisements

Do files, log files, and workflow in Stata Biostatistics 212 Lecture 2.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
RESEARCH WORKFLOW USING STATA How to Be an Effective Researcher CCPR Workshop.
1. Overview Brief guide to the display windows and toolbar
CCPR Computing Services More Efficient Programming July 13, 2006.
1 Chapter 2 Introduction to Java Applications Introduction Java application programming Display ____________________ Obtain information from the.
INTRODUCTION TO STATA Võ Tuấn Khoa Trần Thế Trung.
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
Linux+ Guide to Linux Certification, Second Edition
A Simple Guide to Using SPSS© for Windows
Generating new variables and manipulating data with STATA Biostatistics 212 Session 2.
Getting Started with your data
UNIX Filters.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
1 CCPR Computing Services Introduction to Stata Courtney Engel October 26, 2007.
Day 1: Getting Started Department of Economics
Stata 12 Merging Guide Nathan Favero Texas A&M University October 19, 2012.
ASP.NET Programming with C# and SQL Server First Edition
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
Project organisation in Stata Adrian Spoerri and Marcel Zwahlen Department of Social and Preventive Medicine University of Berne, Switzerland Research.
Session I How to use STATA & Basic Data Management Commands.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
EC501 Gabriella Conti University of Essex
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
Key Data Management Tasks in Stata
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Organizing a project, making a table Biostatistics 212 Session 5.
Linux+ Guide to Linux Certification, Third Edition
CIS 250 Advanced Computer Applications Introduction to Access.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Algorithms  Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs.
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.
1 Stephen L. DesJardins Professor Center for the Study of Higher and Postsecondary Education School of Education and Professor, Gerald R. Ford School of.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Comparison of different output options from Stata
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
Python Let’s get started!.
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
Econometrics-3 XENA BONDARENKO. I. Preparation for Data Analysis a)Create / change working directory b)Specify data c)End Stata d)The four Stata windows.
Linux+ Guide to Linux Certification, Second Edition
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Stata: Getting Starting and Being Productive with VA Data Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham.
Ec 2390: Section 1 Useful STATA commands Jack Willis September 14th, 2015.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Before the class starts: 1) login to a computer 2) start Stata 13.
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
Linux Administration Working with the BASH Shell.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
LINGO TUTORIAL.
Advanced Quantitative Techniques
Chapter 5 Introduction to SQL.
Econometrics 704 Emilio Cuilty
ECONOMETRICS ii – spring 2018
Introduction Introduction to Stata 2016.
Chapter 4: Sorting, Printing, Summarizing
Introduction to Stata Spring 2017.
Objectives This is an introduction to the statistical software STATA aiming at: Preparing the participants in STATA basics (interphase and commands) for.
Stata Basic Course Lab 4.
Lab 2 and Merging Data (with SQL)
Stata Basic Course Lab 2.
Presentation, data and programs at:
Data Manipulation (with SQL)
Presentation transcript:

1 CCPR Computing Services Workshop: Introduction to Stata June, 2006

2 Outline Stata  Command Syntax  Basic Commands  Abbreviations  Missing Values  Combining Data  Using do-files  Basic programming  Special Topics  Getting Help  Updating Stata

3 Stata Syntax Basic command syntax: [by varlist:] command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options] Brackets = optional portions Italics = user specified

4 Complete syntax [by varlist:] command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options] Example 1 (webuse union)  Stata Command:.bysort black: summarize age if year >= 80, detail  Results: Summarizes age separately for different values of black, including only observations for which year >= 80, includes extra detail. Stata Syntax, cont.

5 Complete syntax [by varlist:] command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options] Example 2 (webuse union)  Stata Commands:.generate agelt30 = age.replace agelt30 = 1 if age < 30.replace agelt30 = 0 if age >= 30 & age <.  Result: Variable agelt30 set equal to 1, 0, or missing  Generally [= exp] used with commands generate and replace Stata Syntax, cont.

6 Basic Commands – Load “auto” data and look at some vars Load data from Stata’s website webuse auto.dta Look at dataset describe Summarize some variables codebook make headroom, header inspect weight length

7 Basic Commands – Load “auto” data and look at some vars Look at first and last observation list make price mpg rep78 if _n==1 list make price mpg rep78 if _n==_N Summarize a variable in a table table foreign table foreign, c(mean mpg sd mpg)

8 Keep/Save a Subset of the Data “Keep” a subset of the variables in memory keep make headroom trunk weight length List variables in current dataset  ds List string variables in current dataset  ds, has(type string) Save current dataset  save tempdata/myauto

9 Generating New Variables Create new variable = headroom squared generate headroom2 = headroom^2 Generate numeric from string variable encode make, generate(makeNum) list make makeNum in 1/5  Can’t tell it’s numeric, but look at “storage type” in describe: describe make makeNum

10 Generating New Variables, cont. Create categorical variable from continuous variable “price” is integer-valued with minimum 3291 and max Generate categorical version - Method 1: generate priceCat = 0 replace priceCat = 1 if price < 5000 replace priceCat = 2 if price >= 5000 & price < replace priceCat = 3 if price >= & price <.

11 Generating New Variables, cont. Generate categorical version of numerical variable: Method 2 generate priceCat2 = price recode priceCat2 (min/5000 = 1) (5000/10000=2) (10000/max=3) Compare price, priceCat, and priceCat2 table price priceCat table priceCat priceCat2

12 Variable Labels and Value Labels Create a description for a variable: label variable priceCat “Categorical price" Create labels to represent variable values: label define priceCatLabels 1 cheap 2 mid-range 3 expensive label values priceCat priceCatLabels View results: describe list price priceCat in 1/10

13 Reshape Wide -> Long: reshape long uniqueschool author, i(year session order) j(count) Long -> Wide: reshape wide author, i(year session order) j(count) yearSessionOrderAuthor1Author2 2006P013BiddlecomBankole 2006P014AnyaraHinde 2006P015AmouzouBecker YearSessionOrderAuthorCount 2006P013Biddlecom1 2006P013Bankole2 2006P014Anyara1 2006P014Hinde2 2006P015Amouzou1 2006P015Becker2 Wide format: Long format:

14 A few other commands compress - saves data more efficiently sort/ gsort order rename more

15 Abbreviations in Stata Abbreviating command, option, and variable names  shortest uniquely identifying name is sufficient Example:  Assume three variables are in use: make, price, mpg  “UN-abbreviated” Stata command:.summarize make price  Abbreviated Stata command:.su ma p Exceptions  describe (d), list (l), and some others  Commands that change/delete  Functions implemented by ado-files

16 Missing Values in Stata 8 and 9 Stata 8 and later versions  27 representations of numerical “missing” .,.a,.b, …,.z Relational comparisons  Biggest number <. <.a <.b < … <.z Mathematical functions  missing + nonmissing = missing String missing =  Empty quote: “”

17 Missing Values in Stata - Pitfalls Pitfall #1  Missing values changed after Stata7: Pitfall #2  Do NOT:.replace weightlt200 = 0 if weight >= 200  INSTEAD:. replace weightlt200 = 0 if weight >= 200 & weight <. Stata 7Stata 8 and later varname !=.varname <. varname ==.varname >=.

18 Combining Data Append vs. Merge  Append – two datasets with same variables, different observations  Merge – two datasets with same or related observations, different variables Appending data in Stata  Example: append.do

19 Combining Data- merge and joinby Demonstrate with two sample datasets:  Neighborhood and County samples One-to-one merge  onetoone.do One-to-many merge – use match merge  onetomany.do Many-to-many merge – use joinby  manytomany.do

20 Combining Data Variable _merge (generated by merge and joinby) Pitfalls  pitfall_merge1.do: Merging unsorted data  pitfall_merge2.do : many-to-many using merge instead of joinby _mergeObservation in master dataObservation in “using” data 1YesNo 2 Yes 3

21 Do-files What is a do-file?  Stata commands can be executed interactively or via a do-file  A do-file is a text file containing commands that can be read by Stata  Running a do-file within Stata.do dofilename.do

22 Do-files Why use a do-file?  Documentation  Communication  Reproduce interactive session? Interactive vs. do-files Record EVERYTHING to recreate results in your do-file!

23 Do-files > Header, Version Control Header  Include in do-files – name, project, project location, date, purpose, inputs, outputs, special instructions Version Control  include version at top of do-file  Why?  Example: Under version 7,.==.a==.b==….==.z

24 Do-files > Comments Comments  Lines beginning with * will be ignored  Words between // and end of line will be ignored  Spanning commands over two lines: Words between /* and */ will be ignored, including end of line character Words between /// and beginning of next line will be ignored

25 Do-file > End of Line Character Commands requiring multiple lines  delimit ; This command tells Stata to read semi-colons as the end-of-line character instead of the carriage return  Comment out the carriage return with /* at the end of line and */ at the beginning of next  Comment out the carriage return with ///

26 Do-files > Examples webuse auto, clear *this is a comment #delimit ; summarize price mpg rep78 headroom trunk weight; #delimit cr summarize price mpg rep78 headroom trunk weight //this is a comment summarize price mpg rep78 /// headroom trunk weight summarize price mpg rep78 /* */ headroom trunk weight

27 Saving output Work in do-files and log your sessions! log using filename  replace, append log close Output choices:  *.log file - ASCII file  *.smcl file - nicer format for viewing and printing in Stata

28 Saving Output, cont. Graphs are not saved in log files Use “saving” option of graph commands  saving(graph.ext) Export current graph:  graph export graph.ext  Ex: graph export graph.eps Supported formats: .ps,.eps,.wmf,.emf.pict

29 Example using local macro. local mypath "C:\Documents and Settings\MyStata". display `mypath' C:\Documents invalid name r(198);. display C:\Documents and Settings\MyStata C:\Documents invalid name r(198);. display "`mypath'" C:\Documents and Settings\MyStata

30 Example– foreach, return, display *see samplePrograms.do, runLoop foreach var of varlist tenure-lnwage { quietly summarize `var' local varmean = r(mean) display "Variable `var' has mean `varmean’ " }

31 Example using forvalues, display *see samplePrograms.do, runCount forvalues counter = 1/10 { display `counter' } forvalues counter = 0(2)10 { display `counter' }

32 Example: forvalues, generating random variables *see samplePrograms.do, runRandomGen forvalues j = 1/3 { generate x`j' = uniform() generate y`j' = invnormal(uniform()) } foreach x of varlist x1-x3 y1-y3 { summarize `x' }

33 Example – if/else *see samplePrograms.do, runIfElse foreach var of varlist tenure-ln_wage { quietly summarize `var' local varmean = r(mean) if `varmean' > 10 { display "`var' has mean greater than 10" } else { display "`var' has mean less than 10" }

34 Special Topic: regular expressions webuse auto List all values of make starting with a capital and containing an additional capital: list make if regexm(make, "^[A-Z].+[A-Z].+") AND ending in a number list make if regexm(make, "^[A-Z].+[A-Z].+[0-9]+$")

35 Special Topic: accessing data in another database odbc list odbc query testStata odbc desc "Summary2006$“ odbc load year type session order author1 author2, table("Summary2006$") dsn("testStata")

36 Special Topic: Exporting results using outreg User-written program called outreg From within Stata, type findit outreg Very simple!! Basically add one line of code after each regression to export results For an example of code, see

37 Getting Help in Stata help command_name  abbreviated version of manual search  search keywords, local  search keywords, net  search keywords, all findit keywords  same as search keywords, all Search Stata Listserver and Stata FAQ

38 Stata Resources > Resources and Support  Search Stata Listserver  Search Stata (FAQ)  Stata Journal (SJ) articles for subscribers programs free  Stata Technical Bulletin (STB) replaced with the Stata Journal Articles available for purchase, programs free  Courses (for fee)

39 Updating Stata help update update all

40 Questions/feedback