Presentation is loading. Please wait.

Presentation is loading. Please wait.

04/19/2006Econ 6161 Econ 616 – Spring 2006 Qualitative Response Regression Models Presented by Yan Hu.

Similar presentations


Presentation on theme: "04/19/2006Econ 6161 Econ 616 – Spring 2006 Qualitative Response Regression Models Presented by Yan Hu."— Presentation transcript:

1 04/19/2006Econ 6161 Econ 616 – Spring 2006 Qualitative Response Regression Models Presented by Yan Hu

2 04/19/2006Econ 6162 Outline  Qualitative Response Regression Model  Binary Response Regression Models 1.The Linear Probability Model (LPM) 2.The Logit Model 3.The Probit Model

3 04/19/2006Econ 6163 What is Qualitative Response Regression Model?  The dependent variable is qualitative (or dummy) in nature. --- The dependent variable is a binary, or dichotomous variable: Y=1 if the person is in the labor force and Y=0 if he or she is not. --- Trichotomous response variable. --- Poly-chotomous (or multiple- category) response variable.

4 04/19/2006Econ 6164 Binary Response Regression Models  E(Y) is related to the X’s through a link function g( E(Y) ) = X.  In binary regression, a link function specifies a relationship between E(Y) (the probability of Y=1, which is also the expected value of Y) and a linear composite score of X's.

5 04/19/2006Econ 6165 Three Binary Response Regression Models  The Linear Probability Model (LPM)  The Logit Model  The Probit Model

6 04/19/2006Econ 6166 What’s Linear Probability Model?  Y follows the Bernoulli probability distribution.  Link function: E(Y)=0(1-P)+1(P) =P  Expression for LPM: P= X YiYi Probabilit y 01-P 1P Total1

7 04/19/2006Econ 6167 Problems of LPM (1) 1. Non-normality of the disturbances:  U i follows the Bernoulli distribution :  Problem may not be so critical. If the objective is point estimation, the normality assumption of disturbance is not necessary and the OLS still remain unbiased. As the sample size increases indefinitely, the OLS estimators tend to be normally distributed uiui Probability Y i =1PiPi Y i =0(1-P i )

8 04/19/2006Econ 6168 Problems of LPM (2) 2. Heteroscedastic variances of the disturbances:  Var(u i )=P i (1-P i ), the variance is a function of the mean (P i ).  One way to solve the heteroscedasticity is to transform the model by dividing it by the weights. Then, estimate the transformed equation by OLS.

9 04/19/2006Econ 6169 Problems of LPM (3) 3. Nofulfillment of  Two ways of finding out whether the estimated lie between 0 and 1: 1.Estimate the LPM by the usual OLS method. If some are less than zero, is assumed to be zero for those cases; if they are greater than 1, they are assumed to be 1. 2.Devise an estimating technique that will guarantee that the estimated conditional probabilities will lie between 0 and 1, such as logit and probit models.

10 04/19/2006Econ 61610 Problems of LPM (4) 4. Questionable value of R 2 as a measure of goodness of fit.  For a given X, the Y values will be either 0 or 1. Therefore, all the Y values will either lie along the X- axis or along the line corresponding to 1. Therefore, generally no LPM is expected to fit such a scatter so well. As a result, the conventionally computed R 2 is likely to be much lower than 1 for such models.  Aldrich and Nelson contend that “use of the coefficient of determination as a summary statistic shoud be avoided in models with qualitative dependent variable.”

11 04/19/2006Econ 61611 What is the Logit Model?  The cumulative logistic distrubution: P = E(Y=1|X) = 1/(1+e -βX ) P X 1 0

12 04/19/2006Econ 61612 What is the Logit Model?  From the logistic distribution, 1-P = e -βX / (1+e -βX ) P/(1-P) = e βX, odds ratio log[p/(1-P)] = βX  Link function: g=log[ p/(1-p) ], where p is the probability of either Y=1 or Y=0, depending on the software.  Generally, log[ p/(1-p) ]=X.

13 04/19/2006Econ 61613 Two Types of Data  To estimate the value of logit log[ p/(1-p) ]=X, we have to distinguish two types of data: --- Data at the individual, or micro, level --- Grouped or replicated data

14 04/19/2006Econ 61614 Data at the Individual Level  X: family income, Y=1 if the family owns a house and 0 if it does not own a house. The following table gives data on individual families. FAMILYYX 108 2116 3118 4011 5012 6119 7120 8013 909

15 04/19/2006Econ 61615 Grouped or Replicated Data  The following table shows data on several families grouped according to income level and the number of families owning a house at each income level. Corresponding to each income level X i, there are N i families, n i among whom are home owners. IncomeNn 6408 85012 106018 138028 1510045 207036 256539 305033 354030 402520

16 04/19/2006Econ 61616 Steps in Estimating the Logit Regression (Grouped Data)  For each income level X, compute the probability of owning a house as P i ^=n i /N i.  For each X i, obtain the logit as L i ^=log[P i ^ /(1-P i ^)]  To resolve the problem of heteroscedasticity, W i =N i P i ^(1-P i ^) (W i ) 0.5 L i = β 1 (W i ) 0.5 + β 2 (W i ) 0.5 X i +(W i ) 0.5 u i or L i * = β 1 (W i ) 0.5 + β 2 X i *+v i  Estimate above function by OLS on the transformed data.  Establish confidence intervals and/or test hypotheses in the usual OLS framework.

17 04/19/2006Econ 61617 SAS Program Proc Import Out= Work.incomes Datafile= "c:\yan\econ616\DG-15.4.xls"; Run; data incomes1; set incomes; phat=n1/n; lhat=log(phat/(1-phat)); w=n*phat*(1-phat); wsquar=sqrt(w); lstar=round(lhat*wsquar, 0.0001); xstar=round(income*wsquar, 0.0001); run; proc reg data=incomes1; model lstar = wsquar xstar / NOINT; run;

18 04/19/2006Econ 61618 SAS Output The estimated slope coefficient suggests that for a unit ($1000) increase in weighted income, the weighted log of odds in favor of owning a house goes up by 0.08 units. VariableDFParamete r Estimator Standard Error t ValuePr > |t| wsquar1-1.593240.11150-14.29<.0001 xstar10.078670.0054514.44<.0001

19 04/19/2006Econ 61619 Odds Interpretation  The odds ratio:  For a unit increase in weighted income, the (weighted) odds in favor of owing a house increase by 1.082 (e 0.07867 ) or about 8.17%.

20 04/19/2006Econ 61620 An Example of Individual Data  In the following table, Y=1 if a student’s final grade in an intermediate microeconomics course was A and Y=0 if the final grade was B or C. GPA, TUCE, and Personalized System of Instruction (PSI) are grade predictors. OBSGPATUCEPSIGRADELETTER 12.662000C 22.892200B 33.282400B 42.921200B 542101A 62.861700B 72.761700B 82.872100B

21 04/19/2006Econ 61621 SAS Program Proc Import Out= Work.gpagrade Datafile= "c:\yan\econ616\DG-15.7.xls"; Run; proc print data=gpagrade; run; Proc Logistic data=gpagrade ; Model grade (event='1') = gpa tuce psi; run; /* or */ proc probit data=gpagrade; class grade; model grade = gpa tuce psi / d=logistic itprint; run;

22 04/19/2006Econ 61622 Output Standard Wald Parameter DF Estimate Error Chi-Square Pr> ChiSq Intercept 1 -13.0204 4.9310 6.9723 0.0083 GPA 1 2.8259 1.2629 5.0072 0.0252 TUCE 1 0.0951 0.1415 0.4518 0.5015 PSI 1 2.3785 1.0645 4.9925 0.0255 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 15.4042 3 0.0015 Score 13.3088 3 0.0040 Wald 8.3762 3 0.0388

23 04/19/2006Econ 61623 Interpretation  Each slope coefficient is a partial slope and measures the change in the estimated logit for a unit change in the value of the given regressor (holding other regressors constant).  Odds interpretation. For example, students who are exposed to the new method of teaching are more than 10.7887 (e 2.3785 ) times to get an A than students who are not exposed to it, other things remaining the same.

24 04/19/2006Econ 61624 What’s the Probit Model  Probit link: p= (h), where p is the cumulative distribution function of a standard normal variate.  P i =P(Y=1|X)=P(I i * ≤I i )=P(Z i ≤β 1 +β 2 X i )= (β 1 +β 2 X i ), where P(Y=1|X) means the probability that an event occurs given the values of the X, and where Z i ~N(0,σ 2 ).  β 1 +β 2 X i =  -1 (P i ), where  -1 is the inverse of the normal CDF.

25 04/19/2006Econ 61625 Use of Probit Model  Probit model is used when Y is considered as the “manifestation” of some unobservable Gaussian-distributed latent variable in the data.  For example, the decision of the family to own a house or not depends on an unobservable index I (latent variable), that is determined by one or more explanatory variables, say income X, in such a way that the larger the value of the index I, the greater the probability of a family owning a house.

26 04/19/2006Econ 61626 Probit Estimation with Grouped Data  Method 1: 1.Calculate P i ^ =N1/N. 2.Estimate I i =  -1 (P i ^ ), where  is the standard normal CDF. 3.Estimate β 1 and β 2 from I i, i.e., β 1 +β 2 X i = I i.  Method 2: Use SAS or R program directly.

27 04/19/2006Econ 61627 Program SAS: Proc Import Out= Work.incomes Datafile= "c:\yan\econ616\DG- 15.4.xls"; Run; proc genmod data=incomes; class ; model n1/n = income / dist = bin Link = probit lrci; run; R: incomes <- as.data.frame(matrix(scan(),ncol=3, byrow=T)) 6408 85012 106018 138028 1510045 207036 256539 305033 354030 402520 names(incomes) <- c(“income”,”N”, “N1”) N0 <- incomes$N- incomes$N1 glmA <- glm(cbind(N1, N0)~income, incomes, family=binomial(link=”probit”))

28 04/19/2006Econ 61628 Output Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.988138 0.122144 -8.090 5.97e-16 *** income 0.048587 0.005995 8.105 5.28e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 72.7581 on 9 degrees of freedom Residual deviance: 2.3456 on 8 degrees of freedom AIC: 49.002 Number of Fisher Scoring iterations: 3

29 04/19/2006Econ 61629 Interpretation  We want to find out the effect of a unit change in X (income) on the probability that Y=1, that is, a family purchases a house. 1.The rate of change of the probability with respect to income: 2.If X=6 (thousand dollars), the normal density function of f[-0.988138 + 0.048587(6)]=f(- 0.6966)=0.313. 3.0.313*0.048587=0.0152. Starting with an income level of $6000, if the income goes up by $1000, the probability of a family purchasing a house goes up by about 1.52%.

30 04/19/2006Econ 61630 Probit Model for Individual Data  SAS program: Proc Import Out= Work.gpagrade Datafile= "c:\yan\econ616\DG-15.7.xls"; Run; proc probit data=gpagrade; class grade; model grade = gpa tuce psi; run;

31 04/19/2006Econ 61631 Output Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 7.4523 2.5425 2.4692 12.4355 8.59 0.0034 GPA 1 -1.6258 0.6939 -2.9858 -0.2658 5.49 0.0191 TUCE 1 -0.0517 0.0839 -0.2162 0.1127 0.38 0.5375 PSI 1 -1.4263 0.5950 -2.5926 -0.2601 5.75 0.0165

32 04/19/2006Econ 61632 Marginal Effect of Change in Regressor  Holding the effect of all other variables constant. 1.LPM: slope coefficient measures directly the change in the probability of an event occurring as a result of a unit change in the value of a regressor. 2.Logit model: the slope coefficient of a variable gives the change in the log of the odds associated with a unit change in that variable. The rate of change in the probability of an event happening is given by β j P i (1-P i ). 3.Probit model: the rate of change in the probability is given by β j f(Xβ), where f is the density function of the standard normal variable.

33 04/19/2006Econ 61633 Logit or Probit?  In most applications, the models are quite similar, the main difference being that the logistic distribution has slightly fat tails.  There is no compelling reason to choose one over the other.  In practice, many researchers choose the logit model because of its comparative mathematical simplicity. 0 logit P 1probit

34 04/19/2006Econ 61634 Reading  Damodar N. Gujarati, Basic Econometrics, P580-615


Download ppt "04/19/2006Econ 6161 Econ 616 – Spring 2006 Qualitative Response Regression Models Presented by Yan Hu."

Similar presentations


Ads by Google