An Introduction into Stata I Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” Session 3, June 9, 2011.

An Introduction into Stata I Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” Session 3, June 9, 2011

Contents 1Introduction into the workplan 2Introduction into the dataset 3Introduction into STATA I Overview on working with STATA Menues and editors General editor Data editor Do File editor The Grammar of STATA commands loading data describing data graphs Working with Do-Files

1Workplan Forming four teams à 4-5 students Introduction and outline of research question Review of literature on labour market effects of migration (3-5 pages) Description of the dataset Data sources and caveats Descriptive statistics and graphs Presenting the empirical model Presenting and discussing the regression results Conclusions Presenting the papers in class

2The dataset: general information The IAB employment sample (IABS) 2% random sample of all employees obliged to pay social security contributions and recipients of unemployment benefits (e.g. SGB II and III) Precise information on wages and unemployment spells Information on education and work experience Period: 1974-2004 (meanwhile until 2008) Here we use 1980 – 2004 since information at beginning of sample period are less reliable Focus on Western Germany excl. (West-)Berlin due to unification

2The dataset: Caveats I Identification of foreigners by nationality We use nationality of first spell to control for nationalisations Problem to identify immigration of ethnic Germans (Spätaussiedler) We try to identify via programme participation No civil servants (“Beamte”) and self-employed Nothing what we can do. Wages are censored at legal pension threshold level (66,000 Euros) We impute wages above threshold level

2The dataset: Caveats II Missing education information (17%, about 35 per cent of foreigners) We impute education information We have only daily wages (not hourly wages) We exclude all part-time workers See Brücker/Jahn (2011), Data Section for Description and FDZ at IAB for description of data set

2The dataset: Organisation We distinguish 25 years (1980 – 2004) We distinguish 64 labour market spells by education (4), work experience (8) and nationality (2) 4 x 8 x 2 = 64 We use the following indexes: h = native (German) f = foreigner q = Education k = work experience t = time Note that we have also aggregates in the dataset (e.g. wt, wqt, wqkt and not only whqkt, wfqkt)

General overview of STATA The desktop of STATA is divided in four different parts: 1.Review shows executed commands 2.Resultsshows the results of your commands 3.Variablesthe current list of variables in the data set 4.commandhere the commands have to be typed in

Review window: Lists your previous commands

Result window: Shows outcome of your current command

Variable window: Shows variables of your dataset

Command window: Here you can type your commands

STATA has the following menues/editors you can work with: 1.The desktop menue You can run all commands here 2.The data editorHere you can edit the data you have loaded 3.The data browserHere you can browse the data you have loaded, but not edit 4.The do file editorThe do file is a file where you can edit and execute all types of commands. Very useful for replication and memorizing what you have done. We come back to this.

The Data Editor. You can change each cell by hand. The Data Browser looks similiar. But you can‘t edit the data.

The Do File Editor. You can type your commands and execute your commands there. (Words in stars are not treated as commands, e.g. * Note that … *).

The Grammar of STATA General Structure of STATA [prefix :] command [varlist] [if] [in] [weight] [, options]

General structure of STATA We will concentrate on: [prefix :] command [varlist] [if] [in] [weight] [, options]

General structure of STATA We will concentrate on: [prefix :] command [varlist] [if] [in] [weight] [, options] What you want to do?

[prefix :] command [varlist] [if] [in] [weight] [, options] First step how to load data: > use “Filename”, clear Practice: > use “C:\EigeneDateien\Stata\data1.dta”, clear other option to load data: -> File -> Open -> Choose your data

General structure of STATA There are two types of variables (data): numerical variables, e.g.: 0, 1, 501, 0.5, -12 etc. string variables, e.g.: no voc train, male, female etc. How to deal with the data types: Numerical variables: you can do all mathematical operations, e.g. var1 + var2, var1/var2, var1*var2 etc. String variables: You have to use quotation marks for identifcation, e.g. var1 = 1 if sex == “female”

The black variables are numerical variables. The red variables are string variable.

[prefix :] command [varlist] [if] [in] [weight] [, options] Since you have now loaded the data – How to get an overview of your data? > describe “describe” gives general information about the data, such as the number of observations, the amount of variables, the label and the name of the variables etc.

[prefix :] command [varlist] [if] [in] [weight] [, options] How to get an overview of your data? > list enlists the data of every single cell (e.g. persons, groups, classes) in the data set. Attention your data might be really large! “-more-” indicates that there are more information available, either put any key to continue or “q” in order to “quit”.

General structure of STATA We will concentrate on: [prefix :] command [varlist] [if] [in] [weight] [, options] What is concerned?

[prefix :] command [varlist] [if] [in] [weight] [, options] [varlist] stands for either a list of variables or only one variable which is concerned by the command. [varlist] is set into brackets since it’s an optional specification; in case there is no [varlist] specified, STATA will execute the command for all variables. Practice: In order to get information only about education and wages in the data set: > list ed whqkt

[prefix :] command [varlist] [if] [in] [weight] [, options] Further commands to describe the data set I.: > tabstat gives a table with the mean of the variable(s) > codebook indicates the codification of the variable with information on the datatype, range, units, unitvalues, missings, mean, standard deviation, percentiles In practice: tabstat whqkt wfqkt codebook tabstat whqkt

[prefix :] command [varlist] [if] [in] [weight] [, options] Further commands to describe the data set II.: > summarize gives the absolute frequencies, the mean, the standard deviation, the minimum and the maximum of a variable > tabulate indicates a table with the absolute and relative distributions of a certain variable In practice: > sum whqkt wfqkt > tab whqkt wfqkt

[prefix :] command [varlist] [if] [in] [weight] [, options] Practice: - how many observations - mean earnings or unemployment rate - standard deviation of earnings and unemployment rate - range of observations (minimum and maximum wage and unemployment rate) Note that the descriptive statistics provides already interesting information about the data, helps to control for outliers and measurement error and for the interpretation of regression results (most results refer to the sample mean)

General structure of STATA We will concentrate on: [prefix :] command [varlist] [if] [in] [weight] [, options] Under which condition

[prefix :] command [varlist] [if] [in] [weight] [, options] With [if] you can set a condition, or make restrictions. e.g. in order to get to know only the average income of migrants with the lowest education (no vocational training).  summarize wfqkt if ed == “no voc train”? “no voc train” is a string variable (therefore the quotation marks) and indicates that an individual has no vocational training.

[prefix :] command [varlist] [if] [in] [weight] [, options] How to create dummies? What is a dummy variable? A dummy variable has a value of 0 or 1. With STATA you are also able to make up new variables out of the data. In order to do so you need the command of “generate” and “replace” > gen ed1 = 0 > replace ed1 = 1 if education == “no voc train” Other example: > gen ex1 = 0 > replace ex1 = 1 if ex == 1

[prefix :] command [varlist] [if] [in] [weight] [, options] How to calculate and transform numerical variables > generate newvar = var1 – var2 STATA knows the mathematic calculations rules (+, -, /, logs, etc.) Practice: Create the log wage: > generate ln_whqkt = ln(whqkt)

[prefix :] command [varlist] [if] [in] [weight] [, options] How to modify variables/dummies? > replace var = (var1 – var2)/2 STATA knows the mathematic calculations rules (+, -, /, log, etc.) Practice: Replace the wage by the log wage only for low skilled > replace ln_whfqkt = ln(whqkt) if ed == “no voc train”

[prefix :] command [varlist] [if] [in] [weight] [, options] How to create graphics? > graph twoway line var1 year [if] [in] STATA produces twodimensional graphs with lines, bars, dots, scatter plots etc. with the “graph twoway” command, the type of the graph is assigned after that, e.g. “line” Practice: Graph the development of native and foreign wages for the years in our sample in a given education and experience group. > graph twoway line whqkt wfqkt year if ed == “no voc train” & ex == 1 > graph twoway scatter whqkt wfqkt if ed == “no voc train” & ex == 1

The do-file STATA also provides a do-file (= text-editor), into which the commands can be written. - the do-file can be opened by the command “ doedit ” or by pressing “STRG + 8” or by clicking at the do-file bar. How to execute commands in a do-file? - you write the command into the text-editor, then mark the text and press “STRG + d” - in case of no text is marked, the whole do-file will be executed. That can create troubles if you have in your list of commands a mistake. (That happens in most cases.)

The do-file Reasons to use a do-file: - your work is documented and reproducible! - you can include comments into your work by setting a “*” at the very beginning of the line (they automatically get a green color): e.g. > *load data > use “C:\User\...data1.dta”, clear > *get an overview > describe - you can save your do-file ->File ->Save - and you also can open do-files ->File ->Open - do-files have the extensions “.do”

This is an example of a Do-File. First I „set more off“ and load the data. Second I use a command for panel regressions. Third I generate some variables. The remarks in stars are explaing what I‘m doing.

Now I mark the lines where I have the commands I want to execute. Then I press the execute button.

Next Meeting: June 30, Room RZ 1.03!

An Introduction into Stata I Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” Session 3, June 9, 2011.

Similar presentations

Presentation on theme: "An Introduction into Stata I Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” Session 3, June 9, 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction into Stata I Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” Session 3, June 9, 2011.

Similar presentations

Presentation on theme: "An Introduction into Stata I Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” Session 3, June 9, 2011."— Presentation transcript:

Similar presentations

About project

Feedback