Chapter 4 – Nonlinear Models and Transformations of Variables.

Chapter 4 – Nonlinear Models and Transformations of Variables

© Christopher Dougherty 1999–2006 Linear in variables and parameters: LINEARITY AND NONLINEARITY To introduce the topic of fitting nonlinear regression models, we need a definition of linearity. The model shown above is linear in two senses. The right side is linear in variables because the variables are included exactly as defined, rather than as functions. It is also linear in parameters since a different parameter appears as a multiplicative factor in each term.

© Christopher Dougherty 1999–2006 Linear in variables and parameters: Linear in parameters, nonlinear in variables: The second model above is linear in parameters, but nonlinear in variables. LINEARITY AND NONLINEARITY

© Christopher Dougherty 1999–2006 Linear in variables and parameters: Linear in parameters, nonlinear in variables: Such models present no problem at all. Define new variables as shown. With these cosmetic transformations, we have made the model linear in both variables and parameters. LINEARITY AND NONLINEARITY

© Christopher Dougherty 1999–2006 Linear in variables and parameters: Linear in parameters, nonlinear in variables: Nonlinear in parameters: This model is nonlinear in parameters since the coefficient of X 4 is the product of the coefficients of X 2 and X 3. As we will see, some models which are nonlinear in parameters can be linearized by appropriate transformations, but this is not one of those. LINEARITY AND NONLINEARITY

© Christopher Dougherty 1999–2006 bananas income (lbs) ($10,000) household Y X 11.711 26.882 38.253 49.524 59.815 611.436 711.097 810.878 912.159 1010.9410 We will begin with an example of a model that can be linearized by a cosmetic transformation. Suppose that you have data on annual consumption of bananas and annual income for a sample of 10 households. LINEARITY AND NONLINEARITY

© Christopher Dougherty 1999–2006. reg Y X Source | SS df MS Number of obs = 10 ---------+------------------------------ F( 1, 8) = 17.44 Model | 58.8774834 1 58.8774834 Prob > F = 0.0031 Residual | 27.003764 8 3.3754705 R-squared = 0.6856 ---------+------------------------------ Adj R-squared = 0.6463 Total | 85.8812475 9 9.54236083 Root MSE = 1.8372 ------------------------------------------------------------------------------ Y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- X |.8447878.2022741 4.176 0.003.378343 1.311233 _cons | 4.618667 1.255078 3.680 0.006 1.724453 7.512881 ------------------------------------------------------------------------------ Here is the output from a linear regression. As you would expect from seeing the scatter diagram, the coefficient of X is highly significant, and the fit, as measured by R 2, is quite good. LINEARITY AND NONLINEARITY

© Christopher Dougherty 1999–2006 Here is the scatter diagram again with the regression line, the fitted values and residuals plotted. The residuals indicate that the model must be misspecified in some way. If the model is correctly specified, the residuals should be random. Here, we have one negative residual followed by 6 positive ones and 3 negative ones. X Y LINEARITY AND NONLINEARITY

© Christopher Dougherty 1999–2006 Revised model: Relating Y to 1/X would be more sensible. Y still increases with X if  2 < 0, but the rate of increase falls and there is an upper limit,  1. After all, you can eat only so many bananas. LINEARITY AND NONLINEARITY

© Christopher Dougherty 1999–2006 bananas income (lbs) ($10,000) household Y X Z 11.7111.00 26.8820.50 38.2530.33 49.5240.25 59.8150.20 611.4360.17 711.0970.14 810.8780.13 912.1590.11 1010.94100.10 16 The next step is to calculate the data for Z from the data for X. All serious regression applications allow you to create new variables from existing ones. LINEARITY AND NONLINEARITY

© Christopher Dougherty 1999–2006. g Z=1/X. reg Y Z Source | SS df MS Number of obs = 10 ---------+------------------------------ F( 1, 8) = 286.10 Model | 83.5451508 1 83.5451508 Prob > F = 0.0000 Residual | 2.33609666 8.292012083 R-squared = 0.9728 ---------+------------------------------ Adj R-squared = 0.9694 Total | 85.8812475 9 9.54236083 Root MSE =.54038 ------------------------------------------------------------------------------ Y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Z | -10.98865.6496573 -16.915 0.000 -12.48677 -9.490543 _cons | 12.48354.2557512 48.811 0.000 11.89378 13.07331 ------------------------------------------------------------------------------ Here is the regression output. The first command creates the new variable Z. ("g" is short for "generate".) LINEARITY AND NONLINEARITY

© Christopher Dougherty 1999–2006. g Z=1/X. reg Y Z Source | SS df MS Number of obs = 10 ---------+------------------------------ F( 1, 8) = 286.10 Model | 83.5451508 1 83.5451508 Prob > F = 0.0000 Residual | 2.33609666 8.292012083 R-squared = 0.9728 ---------+------------------------------ Adj R-squared = 0.9694 Total | 85.8812475 9 9.54236083 Root MSE =.54038 ------------------------------------------------------------------------------ Y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Z | -10.98865.6496573 -16.915 0.000 -12.48677 -9.490543 _cons | 12.48354.2557512 48.811 0.000 11.89378 13.07331 ------------------------------------------------------------------------------ The regression equation is as shown. LINEARITY AND NONLINEARITY

© Christopher Dougherty 1999–2006 Elasticity of Y with respect to X is the proportional change in Y per proportional change in X: ELASTICITIES AND LOGARITHMIC MODELS Y X A O This sequence defines elasticities and shows how one may fit nonlinear models with constant elasticities. First, the general definition of an elasticity.

© Christopher Dougherty 1999–2006 Elasticity of Y with respect to X is the proportional change in Y per proportional change in X: Y X A Re-arranging the expression for the elasticity, we can obtain a graphical interpretation. The elasticity at any point on the curve is the ratio of the slope of the tangent at that point to the slope of the line joining the point to the origin. O ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 Elasticity of Y with respect to X is the proportional change in Y per proportional change in X: In this case it is clear that the tangent at A is flatter than the line OA and so the elasticity must be less than 1. Y X A O ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 In this case the tangent at A is steeper than OA and the elasticity is greater than 1. A O Elasticity of Y with respect to X is the proportional change in Y per proportional change in X: Y X ELASTICITIES AND LOGARITHMIC MODELS

In general the elasticity will be different at different points on the function relating Y to X. In the example above, Y is a linear function of X. The tangent at any point is coincidental with the line itself, so in this case its slope is always  2. The elasticity depends on the slope of the line joining the point to the origin. xO A Y X ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 OB is flatter than OA, so the elasticity is greater at B than at A. (This ties in with the mathematical expression: (  1  / X) +  2 is smaller at B than at A, assuming that  1 is positive.) x A O B Y X ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 Note that the curvature can be quite gentle over wide ranges of X. This means that even if the true model is of the constant elasticity form, a linear model may be a good approximation over a limited range. Y X ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 It is easy to fit a constant elasticity function using a sample of observations. You can linearize the model by taking the logarithms of both sides. ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 You thus obtain a linear relationship between Y' and X', as defined. All serious regression applications allow you to generate logarithmic variables from existing ones. The coefficient of X' will be a direct estimate of the elasticity,  2. ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 The constant term will be an estimate of log  1. To obtain an estimate of  1, you calculate exp(b 1 '), where b 1 ' is the estimate of  1 '. (This assumes that you have used natural logarithms, that is, logarithms to base e, to transform the model.) ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 Here is a scatter diagram showing annual household expenditure on FDHO, food eaten at home, and EXP, total annual household expenditure, both measured in dollars, for 1995 for a sample of 869 households in the United States (Consumer Expenditure Survey data). FDHO EXP ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006. reg FDHO EXP Source | SS df MS Number of obs = 869 ---------+------------------------------ F( 1, 867) = 381.47 Model | 915843574 1 915843574 Prob > F = 0.0000 Residual | 2.0815e+09 867 2400831.16 R-squared = 0.3055 ---------+------------------------------ Adj R-squared = 0.3047 Total | 2.9974e+09 868 3453184.55 Root MSE = 1549.5 ------------------------------------------------------------------------------ FDHO | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- EXP |.0528427.0027055 19.531 0.000.0475325.0581529 _cons | 1916.143 96.54591 19.847 0.000 1726.652 2105.634 ------------------------------------------------------------------------------ Here is a regression of FDHO on EXP. It is usual to relate types of consumer expenditure to total expenditure, rather than income, when using household data. Household income data tend to be relatively erratic. The regression implies that, at the margin, 5 cents out of each dollar of expenditure is spent on food at home. Does this seem plausible? Probably, though possibly a little low. ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006. reg FDHO EXP Source | SS df MS Number of obs = 869 ---------+------------------------------ F( 1, 867) = 381.47 Model | 915843574 1 915843574 Prob > F = 0.0000 Residual | 2.0815e+09 867 2400831.16 R-squared = 0.3055 ---------+------------------------------ Adj R-squared = 0.3047 Total | 2.9974e+09 868 3453184.55 Root MSE = 1549.5 ------------------------------------------------------------------------------ FDHO | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- EXP |.0528427.0027055 19.531 0.000.0475325.0581529 _cons | 1916.143 96.54591 19.847 0.000 1726.652 2105.634 ------------------------------------------------------------------------------ It also suggests that $1,916 would be spent on food at home if total expenditure were zero. Obviously this is impossible. It might be possible to interpret it somehow as baseline expenditure, but we would need to take into account family size and composition. ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 We will now fit a constant elasticity function using the same data. The scatter diagram shows the logarithm of FDHO plotted against the logarithm of EXP. LGFDHO LGEXP ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006. g LGFDHO = ln(FDHO). g LGEXP = ln(EXP). reg LGFDHO LGEXP Source | SS df MS Number of obs = 868 ---------+------------------------------ F( 1, 866) = 396.06 Model | 84.4161692 1 84.4161692 Prob > F = 0.0000 Residual | 184.579612 866.213140429 R-squared = 0.3138 ---------+------------------------------ Adj R-squared = 0.3130 Total | 268.995781 867.310260416 Root MSE =.46167 ------------------------------------------------------------------------------ LGFDHO | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- LGEXP |.4800417.0241212 19.901 0.000.4326988.5273846 _cons | 3.166271.244297 12.961 0.000 2.686787 3.645754 ------------------------------------------------------------------------------ Here is the result of regressing LGFDHO on LGEXP. The first two commands generate the logarithmic variables. ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006. g LGFDHO = ln(FDHO). g LGEXP = ln(EXP). reg LGFDHO LGEXP Source | SS df MS Number of obs = 868 ---------+------------------------------ F( 1, 866) = 396.06 Model | 84.4161692 1 84.4161692 Prob > F = 0.0000 Residual | 184.579612 866.213140429 R-squared = 0.3138 ---------+------------------------------ Adj R-squared = 0.3130 Total | 268.995781 867.310260416 Root MSE =.46167 ------------------------------------------------------------------------------ LGFDHO | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- LGEXP |.4800417.0241212 19.901 0.000.4326988.5273846 _cons | 3.166271.244297 12.961 0.000 2.686787 3.645754 ------------------------------------------------------------------------------ The estimate of the elasticity is 0.48. Does this seem plausible? Yes, definitely. Food is a normal good, so its elasticity should be positive, but it is a basic necessity. Expenditure on it should grow less rapidly than expenditure generally, so its elasticity should be < 1. ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006. g LGFDHO = ln(FDHO). g LGEXP = ln(EXP). reg LGFDHO LGEXP Source | SS df MS Number of obs = 868 ---------+------------------------------ F( 1, 866) = 396.06 Model | 84.4161692 1 84.4161692 Prob > F = 0.0000 Residual | 184.579612 866.213140429 R-squared = 0.3138 ---------+------------------------------ Adj R-squared = 0.3130 Total | 268.995781 867.310260416 Root MSE =.46167 ------------------------------------------------------------------------------ LGFDHO | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- LGEXP |.4800417.0241212 19.901 0.000.4326988.5273846 _cons | 3.166271.244297 12.961 0.000 2.686787 3.645754 ------------------------------------------------------------------------------ The intercept has no substantive meaning. To obtain an estimate of  1, we calculate e 3.16, which is 23.8. ELASTICITIES AND LOGARITHMIC MODELS

Here, we compare the regression line from the logarithmic regression (plotted in the original scatter diagram), to the linear regression line. You can see that the logarithmic regression line gives a somewhat better fit, especially at low levels of expenditure. EXP FDHO ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 However, the difference in the fit is not dramatic. The main reason for preferring the constant elasticity model is that it makes more sense theoretically. It also has a technical advantage that we will come to later on (when discussing heteroscedasticity). EXP FDHO ELASTICITIES AND LOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 SEMILOGARITHMIC MODELS This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable is linear but the explanatory variables, multiplied by their coefficients, are exponents of e.

© Christopher Dougherty 1999–2006 Hence the proportional change in Y per unit change in X is equal to  2. It is therefore independent of the value of X. Strictly speaking, this interpretation is valid only for small changes in X. When changes are not small, the interpretation may be a little more complex. SEMILOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 In this example, we will relate earnings to years of schooling, measured as highest grade completed. This is a discrete variable, changing in whole units. We will consider the effect of increasing S by one year to S'. SEMILOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 The final line replaces e   with its mathematical definition. If  2 is small,  2 will be very small and can be neglected. In that case, the right side of the equation simplifies to EARNINGS (1 +  2 ) and the original marginal interpretation of  2 still applies. 2 2 SEMILOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 To fit a function of this type, you take logarithms of both sides. The right side of the equation becomes a linear function of X (note that the logarithm of e, to base e, is 1). Hence we can fit the model with a linear regression, regressing log Y on X. SEMILOGARITHMIC MODELS

© Christopher Dougherty 1999–2006. reg LGEARN S Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 140.05 Model | 38.5643833 1 38.5643833 Prob > F = 0.0000 Residual | 148.14326 538.275359219 R-squared = 0.2065 -------------+------------------------------ Adj R-squared = 0.2051 Total | 186.707643 539.34639637 Root MSE =.52475 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1096934.0092691 11.83 0.000.0914853.1279014 _cons | 1.292241.1287252 10.04 0.000 1.039376 1.545107 ------------------------------------------------------------------------------ Here is the regression output from a regression using Data Set 21. The estimate of  2 is 0.110. As an approximation, this implies that an extra year of schooling increases earnings by a proportion 0.110, that is, 11.0%. SEMILOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 If we take account of the fact that a year of schooling is not a marginal change, and work out the effect exactly, the increase is 11.6%. When  2 is less than 0.1, there is little to be gained by working out the effect exactly. SEMILOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 The intercept in the regression is an estimate of log  1. From it, we obtain an estimate of  1 equal to e 1.29, which is 3.64. Literally this implies that a person with no schooling would earn $3.64 per hour. Note: It’s dangerous to extrapolate so far from the range for which we have data.. reg LGEARN S Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 140.05 Model | 38.5643833 1 38.5643833 Prob > F = 0.0000 Residual | 148.14326 538.275359219 R-squared = 0.2065 -------------+------------------------------ Adj R-squared = 0.2051 Total | 186.707643 539.34639637 Root MSE =.52475 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1096934.0092691 11.83 0.000.0914853.1279014 _cons | 1.292241.1287252 10.04 0.000 1.039376 1.545107 ------------------------------------------------------------------------------ SEMILOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 Here is the semilogarithmic regression line plotted in a scatter diagram with the untransformed data, with the linear regression shown for comparison. SEMILOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 There is not much difference in the fit of the regression lines, but the semilogarithmic regression is more satisfactory in two respects. The linear specification predicts that earnings will increase by about $2 per hour with each additional year of schooling, which is implausible for high levels of education. The semi-logarithmic specification allows the increment to increase with level of education. Second, the linear specification predicts negative earnings for an individual with no schooling. The semilogarithmic specification predicts hourly earnings of $3.64, which at least is not obvious nonsense. SEMILOGARITHMIC MODELS

© Christopher Dougherty 1999–2006 THE DISTURBANCE TERM IN NONLINEAR MODELS Thus far, nothing has been said about the disturbance term in nonlinear regression models. For the regression results in a linearized model to have the desired properties, the disturbance term in the transformed model should be additive and it should satisfy the regression model conditions. To be able to perform the usual tests, it should be normally distributed in the transformed model.

© Christopher Dougherty 1999–2006 In the case of the first example of a nonlinear model, there was no problem. If the disturbance term had the required properties in the original model, it would have them in the regression model. It has not been affected by the transformation. THE DISTURBANCE TERM IN NONLINEAR MODELS

© Christopher Dougherty 1999–2006 We will denote this multiplicative term v. When u is equal to 0, not modifying the value of log Y, v is equal to 1, likewise not modifying the value of Y. Positive values of u correspond to values of v greater than 1, the random factor having a positive effect on Y and log Y. Likewise negative values of u correspond to values of v between 0 and 1, the random factor having a negative effect on Y and log Y. Besides satisfying the regression model conditions, we need u to be normally distributed if we are to perform t tests and F tests. THE DISTURBANCE TERM IN NONLINEAR MODELS

© Christopher Dougherty 1999–2006 v f(v)f(v) This will be the case if v has a lognormal distribution, shown above. The median of the distribution is located at v = 1, where u = 0. THE DISTURBANCE TERM IN NONLINEAR MODELS

© Christopher Dougherty 1999–2006 The same multiplicative disturbance term is needed in the semilog model. Note that, with this asymmetric distribution, one should expect a small proportion of observations to be subject to large positive random effects. v f(v)f(v) THE DISTURBANCE TERM IN NONLINEAR MODELS

© Christopher Dougherty 1999–2006 Here is the scatter diagram for earnings and schooling using Data Set 21. You can see that there are several outliers, with the four most extreme highlighted. THE DISTURBANCE TERM IN NONLINEAR MODELS

© Christopher Dougherty 1999–2006 Here is the scatter diagram for the semilogarithmic model, with its regression line. The same four observations remain outliers, but they do not appear to be so extreme. THE DISTURBANCE TERM IN NONLINEAR MODELS

© Christopher Dougherty 1999–2006 The histogram above compares the distributions of the residuals from the linear and semi-logarithmic regressions. The distributions have been standardized, that is, scaled so that they have standard deviation equal to 1, to make them comparable. 0–1–2–31 2 3 THE DISTURBANCE TERM IN NONLINEAR MODELS

© Christopher Dougherty 1999–2006 It can be shown that if the disturbance term in a regression model has a normal distribution, so will the residuals. It is obvious that the residuals from the semilogarithmic regression are approximately normal, but those from the linear regression are not. This is evidence that the semi-logarithmic model is the better specification. THE DISTURBANCE TERM IN NONLINEAR MODELS

© Christopher Dougherty 1999–2006 What would happen if the disturbance term in the logarithmic or semilogarithmic model were additive, rather than multiplicative? THE DISTURBANCE TERM IN NONLINEAR MODELS

© Christopher Dougherty 1999–2006 If this were the case, we would not be able to linearize the model by taking logarithms. There is no way of simplifying log(  1 X  + u). We should have to use some nonlinear regression technique. See textbook for example, Stata command for nonlinear regression: nl 2 THE DISTURBANCE TERM IN NONLINEAR MODELS

Chapter 4 – Nonlinear Models and Transformations of Variables.

Similar presentations

Presentation on theme: "Chapter 4 – Nonlinear Models and Transformations of Variables."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 4 – Nonlinear Models and Transformations of Variables.

Similar presentations

Presentation on theme: "Chapter 4 – Nonlinear Models and Transformations of Variables."— Presentation transcript:

Similar presentations

About project

Feedback