Download presentation

Presentation is loading. Please wait.

Published byMarc Conquest Modified over 2 years ago

1
Exercise session # 1 Random data generation Jan Matuska November, 2006 Labor Economics

2
Overview : Graphing Generating random variables Generating random dummy variables from sample Drawing from multivariate distributions Throwing seeds Loops and distribution of estimated coefficients

3
Histograms Histograms hist z2,den - histogram of variable z2 (density) hist z2,freq - histogram of variable z2 (frequency) dotplot z2 z3 - scatter plot graph of both variables kdensity z2 - produces kernel density estimates and graphs the result b) Sample cdf-s of variables: to generate variable cz3, the cdf values for z3 cumul z3, gen(cz3) graph the sample cdf: line cz3 z3, sort or:scatter cz3 z3, sort Graphing

4
500 draws from the uniform distribution on [0,1] set obs 500 gen x1 = uniform() 500 draws from the standard normal distribution, mean 0, variance 1 gen x2 = invnorm(uniform()) 500 draws from the distribution N(1,2) gen x3 = 1 + 4*invnorm(uniform()) Generating random variables 1

5
500 draws from the uniform distribution between 3 and 12 set obs 500 gen x4 = 3 + 9*uniform() compute 500 "z" values as 4-3*x4 + 8*x2 gen z = 4 - 3*x4 + 8*x2 Generating random variables 2

6
set obs 1000 create data for 1000 individuals gen smoke = uniform()>.7 assume that there is 70% chance that an individual smokes at time =1 smoke = 1 if the expression is true (uniform()>0.7) smoke = 0 if the expression is not true (uniform()<=0.7) Generating random dummy variables from sample

7
clear mat m=(12,20,0) - matrix of means of RHS vars: y2, y3, error mat c=(5,-.6, 0 \ -.6,119,0 \ 0,0,.1) -covariance matrix of RHS vars drawnorm y2 y3 e, n(1000) means(m) cov(c) - draws a sample of 1000 observations from a normal distribution with specified means and covariances Drawing from multivariate distributions

8
allows you to generate a particular sample anytime again clear set obs 50 set seed 2- seed number can be any positive integer STATA default is 123456789. gen z1 = invnorm(uniform()) set seed 2 gen z2 = invnorm(uniform()) set seed 4567803 gen z3 = invnorm(uniform()) dotplot z1 z2 z3 – we can see that z1 and z2 are identical and different from z3 Throwing seeds

9
Loop: while `i'<=500 {- i is the counter “commands” local i=`i'+1 } reg z x1 x2 – regress fits a model of dependent variable on other specified variables using linear regression The loop is used to acquire many estimated coefficients b1 which are different from the actual coefficient. The mean of all estimated coefficients should be the close approximation of the true coefficient we want to get Loops and distribution of estimated coefficients

10
Thank you for attention

Similar presentations

OK

Frequency and Histograms. Vocabulary: Frequency: The number of data values in an interval. Frequency Table: A table that groups a set of data values.

Frequency and Histograms. Vocabulary: Frequency: The number of data values in an interval. Frequency Table: A table that groups a set of data values.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google