Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 15-2 Censored regression

Similar presentations


Presentation on theme: "Lecture 15-2 Censored regression"— Presentation transcript:

1 Lecture 15-2 Censored regression
Research Method Lecture 15-2 Censored regression

2 Censored regression Sometimes, the dependent variable (y-variable) is right-censored (censored from above) or left-censored (censored from below). Example 1: In some household survey, when the wealth of a family exceeds $500,000, the data is recorded as $500,000 even if the actual wealth may be much higher than that amount. This is called the top coding. Top coding is done to protect the identity of the survey participants. In this case, wealth is right-censored.

3 Top-coding example Wealth $500,000 Educ of the head of the family
When wealth exceeds the threshold, say $500,000, the data record it as $500,000

4 Example 2: Duration data
Example 2: Duration data. Suppose that a survey is conducted to measure the duration of unemployed workers to find a job. If the survey is conducted for 12 months, some survey participants may not have found a job. For those workers, the researcher only knows that the duration is greater than 12 months. Thus, the duration is right-censored.

5 When a variable is censored from above, you only know that the variable is at least equal to the threshold value.

6 The censored regression model
Here, I will explain the censored regression model for the case where the dependent variable is right censored. Let yi be the actual value of the dependent variable for the ith person. However, when yi exceeds certain threshold, ci, the data is recorded as ci. In such a case, the observation is said to be right-censored.

7 Let wi be the observed value of the dependent variable, which may be censored. Then the model is written as: yi=β0+β1xi+ui u~N(0,σ2) wi=yi if yi<ci =ci if yi≥ci In the top-coding example, the threshold value is the same for all the people at $500,000. But, the threshold value can be different for different individual. Thus, we have i-subscript for the threshold values. This can be also written as wi=min(yi,ci)

8 It should be emphasized that, in the censored regression model, ci are known values. For example, in the top coding example, it is $500,000 for all the observations.

9 When the person is not right censored, we have wi=yi
When the person is not right censored, we have wi=yi. Thus, we have ui=wi-(β0+β1xi). Thus, the likelihood contribution is the height of the density function, which is given by:

10 If the person is right-censored, we only know that yi≥ci
If the person is right-censored, we only know that yi≥ci. Thus, the likelihood contribution of this person is given by:

11 To summarize:

12 Now, let Di be the dummy variable that takes the value 1 if the ith person is right-censored. Then the likelihood contribution for the ith person is conveniently written as:

13 Note, that computation-wise, Tobit model is the same as the censored regression model where the actual dependent variable is left-censored, and ci=0 for all observations.

14 The partial effect In censored regression model, our interest is to estimate the effect of x-variable on y, not w. Since β1 is the partial effect of x on y, you can use β1 as the partial effect. No difficult computation is necessary. You can interpret the coefficients as if it were OLS. This is very different from the Tobit regression model. In Tobit model, our interest is to estimate the effect of x on w, not y. Thus, we had a very complicated partial effect formula in the case of Tobit.

15 Exercise: Duration analysis of recidivism
Recidivism is an act of a person repeating an undesirable behavior. The data RECID.dta contains the duration (in month) until an inmate who are released from the prison is imprisoned again. 1445 released inmates were followed for a certain period of time.

16 Among them, 893 of them had not been arrested again
Among them, 893 of them had not been arrested again. Thus, the duration for those inmates are right-censored: We only know that the duration until they would come back to prison is at least as long as the recorded duration. Now, we want to estimate the determinants of the duration of prisoner recidivism. Although modern duration analysis is mostly conducted using “hazard function analysis” or “survival function analysis”, censored regression is also a valid model for a duration analysis.

17 Using RECID.dta, estimate the censored regression model of the duration of recidivism. Explanatory variables are workprog, priors, tserved, felon, alcohol drugs, black, married, educ age. Use the log of duration as the dependent variable.

18 Put the censoring indicator
Put the censoring indicator. This indicator must be 1 if right censored, -1 if left censored, and 0 if uncensored.


Download ppt "Lecture 15-2 Censored regression"

Similar presentations


Ads by Google