Presentation is loading. Please wait.

Presentation is loading. Please wait.

Issues in Estimation Data Generating Process:

Similar presentations


Presentation on theme: "Issues in Estimation Data Generating Process:"— Presentation transcript:

1 Issues in Estimation Data Generating Process:
What behavior and what sampling process generated data that you have collected?

2 Estimation Are you gathering a random sample of all possible participants (e.g. telephone or mail survey of population)? Or, are you sampling on site?

3 1. Censored Samples If you sample a population of potential participants, you will find that some took trips to the site of interest and some (many?) took no trips. Plot of trip cost against number of trips for all observations in a hypothetical sample. Trip cost x x x x x x x Non-participants x x x x x x x x x x x x x x 0 trips Number of trips

4 Here’s a hypothetical data set and actual least squares regression lines
Least Squares line including zeros Least squares line excluding zeros Which, if either, is right? Answer: Neither

5 Censored Samples – empirical models to analyze them:
Tobit model – Assumes an underlying latent variable that could be negative Count models – Recognizes that trips are non-negative integers Sample selection models– Models the participation decision differently from the trips decision

6 Tobit Model Underlying model Latent variable: But,
zi = zi*, if zi* > 0 zi = 0, if zi*  0 (To cut down on notation, 0i stands for the intercept and all other covariates that might be in the model, so it varies over individuals.)

7 Estimation by Maximum Likelihood
Every observation makes contribution to the likelihood function. Contribution by non-trip takers: Pr(zi*  0) = where F is the cumulative distribution function for ; the x’s are the explanatory variables in the model, including cost of access.

8 Contribution by trip takers:
Note: this is the same expression as for ordinary least squares.

9 Tobit – maximize the following likelihood function
Likelihood function equals: where T is the set of trip takers and N is the set of non-trip takers

10 For our simple example:
Ordinary Least Squares estimates: 0 = 8.89 1 = -.28 2 = 2.4 Tobit estimates: 0 =13.81 1 = -.72 2 = 2.3 OLS Tobit

11 How do we get welfare measures in the Tobit?
The Tobit is usually estimated in linear form. The area behind a linear demand function is given by:

12 But how do you evaluate this expression?
Use as estimate for 1; But what do you use for zi? Do you use the individual’s actual number of trips? Or do you use the predicted number of trips using the model? estimated function . ci zi

13 If you want to use the predicted number of trips...
You must calculate the expected value of trips in the Tobit framework – which is a somewhat complicated expression. Fortunately, LIMDEP* will do this for you in a simple command. You should know that expected trips will always be positive in the Tobit. *LIMDEP is a software package by William Greene, Columbia University

14 The answers can be quite different…
but the choice is not obvious. In our simple example, the difference isn’t great. Using Actual z Using Predicted z Ave. trips Ave. consumer surplus $ $13.93 Total CS for sample $ $417.90 Difference in average consumer surplus is due to nonlinearity of consumer surplus in trips.

15 Reasons for using one rather than another…
Use the expected value of trips, if you think the dominant source of “error” is from measurement. Use the actual number of trips, if you think the dominant source of “error” is from specification. (Note: in the Tobit, the predicted number of trips is never zero.)

16 Getting an estimate for the population
If your sample is a random sample of the population: average CS * population

17 Count Models The Tobit assumes an underlying latent variable that can take on negative values. Count models explicitly account for the fact that the dependent variable, trips, can only be an integer and can only be non-negative.

18 Count Models.. …specify that the quantity demanded of trips is a non-negative random variable whose mean is a function of the exogenous regressors in the model.

19 The Poisson Distribution is a common choice
Where the mean is i and it is usually modeled as:

20 Intuition? The Poisson model implies that the number of trips a person decides to take is a random variable drawn from a distribution that only allows non-negative integers. The distribution can be centered around different non-negative numbers, however, depending on the exogenous variables the individual faces. E.g. A person with a relatively low access cost will face a distribution with a higher mean number of trips.

21 An individual’s contribution to the likelihood function in the Poisson is this very complicated looking expression: (Note: 0! is defined mathematically as =1) Fortunately, LIMDEP will estimate this for you without any hard work on your part.

22 Getting Welfare Measures in the Poisson
The expected number of trips for an individual is the mean of the Poisson distribution for that individual. The mean is i in the above expression and is a usually specified as a semi-log function of the explanatory variables:

23 We saw earlier that… the area under a semi-log demand function is given by: Because CS is linear in trips for a semi-log function, it does not matter whether you use actual or expected trips. The answer is the same.

24 Welfare measures in our simple hypothetical case
Using Actual z Using Predicted z Ave. trips Ave. consumer surplus $ $14.90 Total CS for sample $ $447.00 The Poisson has the property that the mean of expected trips = mean of actual trips. The formula for consumer surplus in a semi-log function is linear in trips. THEREFORE, it does not matter in this model whether you use expected or actual trips.

25 Another Popular Count Model
The negative binomial distribution is also used often. It is a more general distribution than the Poisson, in that it does not constrain the mean and the variance to be equal. See LIMDEP if you wish to estimate this model.

26 Participation vs Demand for Trips
In the above models, the same model affects how many trips a user takes and whether or not he is a user. Suppose different factors affected whether he used the site how many times he used the site, if he did use the site Two types of models (see LIMDEP): Combination of probit and truncated models (E.g. Cragg) Selection models (e.g. Heckman)

27 2. Truncated Samples Now suppose you have only collected data from people who actually visit the site. There will be no zeros in this dataset. Do you still need to make econometric adjustments?

28 The answer is “YES” Ordinary least squares assumes that every observation is drawn from a normal distribution with a given variance.

29 Let’s look at data again…
Remember the model is: OLS assumes that Trip cost Result of running OLS regression Distribution is truncated for obs near access x x x x Relationship you want x x x x x x x x x x x 0 trips Number of trips

30 OLS applied to truncated data
produces biased slope estimates if truncation is “relevant”. The bias will generate a larger negative estimate for the slope of the line in the graph, which is really a smaller negative estimate for 1. Since -1 is in the denominator of the consumer surplus formula, the result will be an over-estimate of consumer surplus.

31 Contribution to the Likelihood Function in the Truncated Model
Pr (trips=zi|trips>0) =

32 The difference between the OLS and Truncated estimated relationship for our simple hypothetical data
OLS Regression line Truncated regression

33 Oh no, another problem! The reason you have only non-zero observations for trips is probably because you sampled on site. On-site sampling is often the only practical way to get enough information on users of a site.

34 But this, too, causes problems!
If you randomly sample on-site, you are actually randomly sampling trips instead of trip-takers. This is not a random sample of users of the site. The problem is called “endogenous stratification”.

35 A simple example.. Suppose there are only two types of users:
25 users take 1 trip to site 75 users take 2 trips to site Total number of trips taken = 175. Average number of trips taken = 1.75. Now, suppose you randomly sample trips (not users). Prob. of encountering a 1-trip user = 25/175 = .14 (rather than .25) Prob. of encountering a 2-trip user = 75/175 = .86 (rather than .75)

36 Parameter estimates for our little sample:
A solution to endogenous stratification is to weight each observation by 1/trips. Parameter estimates for our little sample: * * *Note: for many problems the truncated model does not converge in estimation.

37 A Better and Easier Alternative
Poisson Count Model: Easy to estimate with truncation. Easy to estimate with truncation and endogenous stratification “It turns out that”….. You can solve both the truncation and the endogenous stratification problem by: estimating the regular Poisson with the value zi –1 substituted for zi in estimation

38 Poisson Endogenous Stratification Results and Welfare Estimates
*Note: Remember that this is basically a semi-log demand function so the parameters are not directly comparable to the parameters in the previous models.

39 Welfare Calculation Average WTP estimate for elimination of site
Note: must also be adjusted for endogenous stratification. Mean number of trips = N=number of individuals sampled zn = number of trips taken by individual n


Download ppt "Issues in Estimation Data Generating Process:"

Similar presentations


Ads by Google