Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CCPR Computing Services Introduction to Stata Courtney Engel October 26, 2007.

Similar presentations


Presentation on theme: "1 CCPR Computing Services Introduction to Stata Courtney Engel October 26, 2007."— Presentation transcript:

1 1 CCPR Computing Services Introduction to Stata Courtney Engel October 26, 2007

2 2 Outline Stata  Command Syntax  Basic Commands  Abbreviations  Missing Values  Combining Data  Using do-files  Basic programming  Special Topics  Getting Help  Updating Stata

3 3 Stata Syntax Basic command syntax: [by varlist:] command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options] Brackets = optional portions Italics = user specified http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/stataslides10.07.log

4 4 Complete syntax [by varlist:] command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options] Example 1 (webuse union)  Stata Command:.bysort black: summarize age if year >= 80, detail  Results: Summarizes age separately for different values of black, including only observations for which year >= 80, includes extra detail. Stata Syntax, cont.

5 5 Complete syntax [by varlist:] command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options] Example 2 (webuse union)  Stata Commands:.generate agelt30 = age.replace agelt30 = 1 if age < 30.replace agelt30 = 0 if age >= 30 & age <.  Result: Variable agelt30 set equal to 1, 0, or missing  Generally [= exp] used with commands generate and replace Stata Syntax, cont. Obs #ageagelt30 1101 2151 3.. 4300 5730

6 6 Basic Commands – Load “auto” data and look at some vars Load data from Stata’s website webuse auto.dta Look at dataset describe Summarize some variables codebook make headroom, header inspect weight length

7 7 Basic Commands – Load “auto” data and look at some vars Look at first and last observation list make price mpg rep78 if _n==1 list make price mpg rep78 if _n==_N Summarize a variable in a table table foreign table foreign, c(mean mpg sd mpg)

8 8 Keep/Save a Subset of the Data “Keep” a subset of the variables in memory keep make headroom trunk weight length price List variables in current dataset  ds List string variables in current dataset  ds, has(type string) Save current dataset  save autokeep, replace

9 9 Generating New Variables Create new variable = headroom squared generate headroom2 = headroom^2 Generate numeric from string variable encode make, generate(makeNum) list make makeNum in 1/5  Can’t tell it’s numeric, but look at “storage type” in describe: describe make makeNum Obs #HeadroomHeadroom2 110100 2981 3416

10 10 Generating New Variables, cont. Create categorical variable from continuous variable “price” is integer-valued with minimum 3291 and max 15906 Generate categorical version - Method 1: generate priceCat = 0 replace priceCat = 1 if price < 5000 replace priceCat = 2 if price >= 5000 & price < 10000 replace priceCat = 3 if price >= 10000 & price <.

11 11 Generating New Variables, cont. Generate categorical version of numerical variable: Method 2 generate priceCat2 = price recode priceCat2 (min/5000 = 1) (5000/10000=2) (10000/max=3) Compare price, priceCat, and priceCat2 table price priceCat table priceCat priceCat2

12 12 Variable Labels and Value Labels Create a description for a variable: label variable priceCat “Categorical price" Create labels to represent variable values: label define priceCatlabels 1 “cheap” 2 “mid-range” 3 “expensive” label values priceCat priceCatLabels View results: describe list price priceCat in 1/10

13 13 Reshape > Wide to Long Wide -> Long: reshape long author, i(year session order) j(count) long - reshape from wide to long author- Stem of the variable going from wide to long i(year session order)- Uniquely identifies an observation in wide form j(count)- Variable which will be created to contain suffix of Author i.e. (1 2) yearSessionOrderAuthor1Author2 2006P013BiddlecomBankole 2006P014AnyaraHinde 2006P015AmouzouBecker Wide format:

14 14 Reshape > Long to Wide Long -> Wide: reshape wide author, i(year session order) j(count) wide - reshape from long to wide author - variable to be converted from long to wide i(year session order) - variables uniquely identify observations in wide j(count)- variable gives the suffix of Author i.e. (1 2) YearSessionOrderAuthorCount 2006P013Biddlecom1 2006P013Bankole2 2006P014Anyara1 2006P014Hinde2 2006P015Amouzou1 2006P015Becker2 Long format:

15 15 A few other commands compress - saves data more efficiently sort/ gsort – ascending/descending observation sort order - variable order rename – rename variables set more on/off – produce results with pause?

16 16 Abbreviations in Stata Abbreviating command, option, and variable names  shortest uniquely identifying name is sufficient Example:  Assume three variables are in use: make, price, mpg  “UN-abbreviated” Stata command:.summarize make price  Abbreviated Stata command:.su ma p Exceptions  describe (d), list (l), and some others  Commands that change/delete  Functions implemented by ado-files

17 17 Missing Values in Stata 8-10 Stata 8 and later versions  27 representations of numerical “missing” .,.a,.b, …,.z Relational comparisons  Biggest number <. <.a <.b < … <.z Mathematical functions  missing + nonmissing = missing String missing =  Empty quote: “”

18 18 Missing Values in Stata - Pitfalls Pitfall #1  Missing values changed after Stata7: Pitfall #2  Do NOT:.replace weightlt200 = 0 if weight >= 200  INSTEAD:. replace weightlt200 = 0 if weight >= 200 & weight <. Stata 7Stata 8 and later varname !=.varname <. varname ==.varname >=.

19 19 Combining Data Append vs. Merge  Append – two datasets with same variables, different observations  Merge – two datasets with same or related observations, different variables Appending data in Stata  Example: append.do http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/append10.07.log

20 20 Combining Data- merge and joinby Demonstrate with two sample datasets:  Neighborhood and County samples One-to-one merge  onetoone.do http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/onetoone10.07.log One-to-many merge – use match merge  onetomany.do http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/onetomany10.07.log Many-to-many merge – use joinby  manytomany.do http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/manytomany10.07.log

21 21 Combining Data Variable _merge (generated by merge and joinby) Pitfalls  Merging unsorted data  Many-to-many using merge instead of joinby _mergeObservation in master dataObservation in “using” data 1YesNo 2 Yes 3

22 22 Do-files What is a do-file?  Stata commands can be executed interactively or via a do-file  A do-file is a text file containing commands that can be read by Stata  Running a do-file within Stata.do dofilename.do

23 23 Do-files Why use a do-file?  Documentation  Communication  Reproduce interactive session? Interactive vs. do-files Record EVERYTHING to recreate results in your do-file!

24 Do-files > Documentation Header *Josie Bruin (jbruin@ucla.edu) *HRS project */u/socio/jbruin/HRS/ *October 5, 2007 *Stata version 8 *Purpose: Create and merge two datasets in Stata, * then convert data to SAS *Input programs: *HRS/staprog/H2002.do, *HRS/staprog/x2002.do, *HRS/staprog/mergeFiles.do *Output: *HRS/stalog/H2002.log, *HRS/stalog/x2002.log, *HRS/stalog/mergeFiles.log *HRS/stadata/Hx2002.dta *HRS/sasdata/Hx2002.sas *Special instructions: Check log files for errors *check for duplicates upon new data release File header includes:  Name (email)  Project  Project location  Date  Software Version  Purpose of program  Inputs  Outputs  Special Instructions

25 25 Do-files > Comments Comments  Lines beginning with * will be ignored  Words between // and end of line will be ignored  Spanning commands over two lines: Words between /* and */ will be ignored, including end of line character Words between /// and beginning of next line will be ignored

26 26 Do-file > End of Line Character Commands requiring multiple lines  delimit ; This command tells Stata to read semi-colons as the end-of-line character instead of the carriage return  Comment out the carriage return with /* at the end of line and */ at the beginning of next  Comment out the carriage return with ///

27 27 Do-files > Examples webuse auto, clear *this is a comment #delimit ; summarize price mpg rep78 headroom trunk weight; #delimit cr summarize price mpg rep78 headroom trunk weight //this is a comment summarize price mpg rep78 /// headroom trunk weight summarize price mpg rep78 /* */ headroom trunk weight

28 28 Saving output Work in do-files and log your sessions! log using filename  replace or append log close Output choices:  *.log file - ASCII file (text)  *.smcl file - nicer format for viewing and printing in Stata

29 29 Saving Output, cont. Graphs are not saved in log files Export current graph:  graph export graph.ext  Ex: graph export graph.eps Supported formats: .ps,.eps,.wmf,.emf.pict

30 30 Example using local macro. local mypath "C:\Documents and Settings\MyStata". display `mypath' C:\Documents invalid name r(198);. display C:\Documents and Settings\MyStata C:\Documents invalid name r(198);. display "`mypath'" C:\Documents and Settings\MyStata

31 31 Example– foreach, return, display foreach var of varlist tenure-ln_wage { quietly summarize `var' local varmean = r(mean) display "Variable `var' has mean `varmean’ " } +---------------------------------------------------+ |tenure hours wks_work ln_wage | |---------------------------------------------------| 1. |.0833333 20 27 1.451214 | 2. |.1666667 15 27 2.09457 | 3. |.25 40 27 1.790204 | 4. |.0833333 44 10 1.02862 | 5. |.0833333 20 10.7409375 | +----------------------------------------------------+ http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/constructs10.07.log

32 32 Example using forvalues, display forvalues counter = 1/10 { display `counter' } forvalues counter = 0(2)10 { display `counter' }

33 33 Example: forvalues, generating random variables forvalues j = 1/3 { generate x`j' = uniform() generate y`j' = invnormal(uniform()) } foreach x of varlist x1-x3 y1-y3 { summarize `x' }

34 34 Example – if/else foreach var of varlist tenure-ln_wage { quietly summarize `var' local varmean = r(mean) if `varmean' > 10 { display "`var' has mean greater than 10" } else { display "`var' has mean less than 10" }

35 35 Special Topic: regular expressions webuse auto List all values of make starting with a capital and containing an additional capital: list make if regexm(make, "^[A-Z].+[A-Z].+") AND ending in a number list make if regexm(make, "^[A-Z].+[A-Z].+[0-9]$") +-------------------+ | make | |--------------------| | Merc. XR-7 | | Olds Delta 88 | +--------------------+

36 36 Special Topic: Exporting results using outreg User-written program called outreg From within Stata, type findit outreg Very simple!! Basically add one line of code after each regression to export results For an example of code, see http://www.ats.ucla.edu/stat/stata/faq/outreg.htm

37 37 Getting Help in Stata help command_name  abbreviated version of manual search  search keywords, local  search keywords, net  search keywords, all findit keywords  same as search keywords, all Search Stata Listserver and Stata FAQ

38 38 Stata Resources www.stata.com > Resources and Support  Search Stata Listserver  Search Stata (FAQ)  Stata Journal (SJ) articles for subscribers programs free  Stata Technical Bulletin (STB) replaced with the Stata Journal Articles available for purchase, programs free  Courses (for fee)

39 39 Updating Stata help update update all

40 CCPR’s Cluster and helping your research Software and Data  STATA, SAS, R, Compilers, text editors, etc  HRS, CPS (Unicon version), AddHealth, IFLS, etc Efficiency  Your PC is available for other work when you submit a job to the cluster  Faster processors  More RAM  Easy to share data, programs, etc. with colleagues via the cluster Obtain access by requesting an account  http://lexis.ccpr.ucla.edu/account/request/

41 Questions/Feedback Please email me if you need help in the future  cengel@ccpr.ucla.edu


Download ppt "1 CCPR Computing Services Introduction to Stata Courtney Engel October 26, 2007."

Similar presentations


Ads by Google