ECONOMETRICS ii – spring 2018

Slides:



Advertisements
Similar presentations
Housekeeping: Variable labels, value labels, calculations and recoding
Advertisements

Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Stata as a Data Entry Management Tool
CC SQL Utilities.
A complete citation, notecard, and outlining tool
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Getting Started With STATA How do I do this? It probably opened automatically, but you may have to save it to the desktop, and double-click it to open.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Stata 12 Merging Guide Nathan Favero Texas A&M University October 19, 2012.
IAGAP Access Database A Tutorial. Databases There are several databases available from the IAGAP Project. There are several databases available from the.
A Brief Introduction to Stata(1). 1. Getting Started.
Key Data Management Tasks in Stata
Chapter 17 Creating a Database.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Comparison of different output options from Stata
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
INTRODUCTION TO ACCESS 2010 Winter Basics of Access Data Management System Allows for multiple levels of data Relational Database User defined relations.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
Development Environment
IUIE Reporting Basics Workshop
AP CSP: Cleaning Data & Creating Summary Tables
Module 4: Building Reports
Understanding SPSS II Workshop Series July 19, 2017.
CSC201: Computer Programming
GO! with Microsoft Office 2016
Practical Office 2007 Chapter 10
Chapter 2 - Introduction to C Programming
GO! with Microsoft Access 2016
Lecture 25 More Synchronized Data and Producer/Consumer Relationship
Arrays and files BIS1523 – Lecture 15.
Introduction to WRDS data platform
Chapter 2 - Introduction to C Programming
Adding Assignments and Learning Units to Your TSS Course
How to Use Members Area of The Ninety-Nines Website
REDCap Data Migration from CSV file
While Loops BIS1523 – Lecture 12.
Preliminaries: -- vector, raster, shapefiles, feature classes.
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Rogers Sourcing Supplier Login Process For Returning Users
MODULE 7 Microsoft Access 2010
Topics Introduction to File Input and Output
Chapter 2 - Introduction to C Programming
Introduction to Stata Spring 2017.
How to Use poll everywhere
Introduction to SAS A SAS program is a list of SAS statements executed in order Every SAS statement ends with a semicolon! SAS statements can be in caps.
[insert Module title here]
Fundamentals of Data Structures
TRAINING OF FOCAL POINTS on the CountrySTAT SYSTEM based on FENIX
Microsoft Official Academic Course, Access 2016
Chapter 2 - Introduction to C Programming
Stata Basic Course Lab 4.
Lab 2 and Merging Data (with SQL)
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
CSCI N317 Computation for Scientific Applications Unit 1 – 1 MATLAB
Stata Basic Course Lab 2.
Have you signed up (or had) your meeting?
Chapter 2 - Introduction to C Programming
By A.Arul Xavier Department of mathematics
Topics Introduction to File Input and Output
Introduction to C Programming
Unit J: Creating a Database
Hardware is… Software is…
Presentation transcript:

ECONOMETRICS ii – spring 2018 Stata Basics ECONOMETRICS ii – spring 2018

https://www.stata.com/manuals13/gsw2.pdf

“Do” file Type your commands here (as opposed to in the “command” box) so that you can preserve the history Save “do” file so that you can access it later Common practice to write a “do” file so that you can run the entire file at once https://www.stata.com/manuals13/u16.pdf

“Do” file: Recommended Practices ALWAYS use a “do” file Generally, begin the “do” file by clearing memory using “clear” command Each “do” file should have a purpose; not everything needs to go in the same “do” file Add comments to yourself in the “do” file explaining what you are doing and why

Importing Your Data You can identify a Stata file by “.dta” extension It’s fairly easy to pull in Excel (.xlsx or .csv) files; pulling in other file types more complicated. In this case, recommend using StatTransfer You must tell Stata where the data is stored on your computer https://www.stata.com/manuals13/u21.pdf

Saving Your Data You save or export your data in a variety of formats Recommend saving edited data as a new file; don’t overwrite existing files Again, you must tell Stata where you want the data to be saved on your computer

Reviewing Your Data Use the “browse” (“bro”) command to open up the browser window

Reviewing Your Data You can add criteria to the browse command so that you are only reviewing certain observations

Creating new variables Use the “generate” command to create new variables **You must specify a value for this new variable The “egen” command more complicated version Can only create a new variable if that variable doesn’t already exist. Once a variable exists, you have to replace, not generate https://www.stata.com/manuals13/gsm11.pdf

Editing variables Suggest making one edit at a time. Focus on simple, logical statements References to learn more about editing variables https://www.stata.com/manuals13/drecode.pdf https://stats.idre.ucla.edu/stata/modules/creating-and-recoding-variables/

String variables String variables are non-numeric variables (also called “text,” “categorical,” or “character” variables) They show up in red when you look at them in the data editor The rules for editing string variables are different than editing numeric variables https://www.stata.com/manuals13/u23.pdf

String vs. numeric variables Convert from a string to numeric variable (or vice versa) Editing rules are different for numeric and string variables

Missing values Treated as large numbers Stata identifies missing values with “.” You need to do something with the missing values or you will get very weird results. You have some options, depending on the situation: (1) Drop the missing values (2) Top (or bottom) code the missing values** (usually best option) (3) Recode the missing values as a different value

Appending vs. merging data There are two basic ways to join datasets together (1) Appending (2) Merging They are not interchangeable These two commands will join your data together differently You need to choose the command that is appropriate for your situation Helpful resource that clearly explains difference between appending and merging https://www.princeton.edu/~otorres/Merge101.pdf https://www.stata.com/manuals13/dappend.pdf

Appending data Scenario: You have one dataset with data for males and one dataset with data for females. You want to create a dataset that includes both males and females

Appending data: Things to keep in mind Make sure your datasets have the same variable names before you append them!! It is worth spending time to make sure your variable names are exactly the same between both datasets. If two datasets have the same variable name, these columns will get stacked on top of each other when you append the two datasets together. Otherwise, you’ll end up with a dataset that makes no sense.

Merging data Scenario: You have one dataset with data on students’ test scores and one dataset with data on students’ family characteristics. You want to create one dataset with all of this information.

Merging data One-to-one merge (1:1) – link one unique row in data set A to one unique row in data set B (this is what we did in the previous example) One-to-many merge (1:m, or m:1) – link one unique row in data set A to multiple rows in data set B (or vice versa) This is often used when you are linking data sets together at different levels. For example, you want to link state data to county data Many-to-many merge (m:m) https://www.stata.com/manuals13/dmerge.pdf

Merging data

Merging data: Things to keep in mind The key is selecting the correct variable to merge on Usually this is a unit ID variable, a unique ID, a respondent name, etc. This unique variable needs to be an identifier in both datasets. Make sure you know which dataset is the “master” and which dataset is the “using” Review your dataset for duplicates before merging Always review the output of the merge to make sure it makes sense. Did all of the records merge successfully? If not, why? Is this intentional? Should the records that didn’t merge be dropped? Go through the records that didn’t merge to figure out why they are left over. It’s good practice to add comments about this in your “do” file.

Collapsing data The “collapse” command can be used to reduce your dataset to summary statistics Generally used in situations where you need to pull a few statistics that summarize the entire dataset (perhaps to answer a quick query) but then return to the full dataset Use the ‘preserve’ and ‘restore’ commands so that you can return to the full dataset. **Must run code one line at a time. Summarized data will go away when you return to full dataset. https://www.stata.com/manuals13/dcollapse.pdf

Syntax for different statistics https://www.stata.com/manuals13/rtabstat.pdf

Programming: Recommended Practices Add LOTS of comments in your “do” files Use * to add a comment to one line Can start and end a block of text with /* and */ Consistency in variable naming will save you time For example, if you generate 3 year variables call them “year_birth” “year_grad” “year_migrate” not “birthyear” “year_grad” and “yearofmigration” Focus on doing one thing in each step; group similar steps together in your “do” file Any other recommendations from personal experience?

Suggestions for future topics? 1/24 – Stata overview 1/31 – xi regression command vs. creating all FE variables 2/7 – 2/14 – 2/21 – 2/28 – 3/7 – export tables/regressions/charts into Excel 3/28 – 4/4 – 4/11 – 4/18 –

Reference Tools for Learning Stata The way to learn Stata is by trial and error Type “help” + command of interest into STATA Me! (mkdiliberti@gwu.edu) GW Library Event (https://library.gwu.edu/news- events/events/introduction-stata) Links at the bottom of these slides Online resources https://stats.idre.ucla.edu/stata/modules/ https://www.stata.com/support/faqs/data-management/true-and-false/ https://www.youtube.com/user/statacorp/videos https://www.stata.com/links/resources-for-learning-stata/ Thank you to Anne Kruse; I adapted this PowerPoint from one she created last year!