2 Multiple Regression Model Multiple regression enables us to determine the simultaneous effect of several independent variables on a dependent variable using the least squares principle.
3 Multiple Regression Objectives Multiple regression provides two important results:A linear equation that predicts the dependent variable, Y, as a function of “K” independent variables, xji, j = 1 , . . K.2. The marginal change in the dependent variable, Y, that is related to a change in the independent variables – measured by the partial coefficients, bj’s. In multiple regression these partial coefficients depend on what other variables are included in the model. The coefficients bj indicates the change in Y given a unit change in xj while controlling for the simultaneous effect of the other independent variables. (In some problems both results are equally important. However, usually one will predominate.
5 Multiple Regression Model POPULATION MULTIPLE REGRESSION MODELThe population multiple regression model defines the relationship between a dependent or endogenous variable, Y, and a set of independent or exogenous variables, xj, j=1, . . , K. The xji’s are assumed to be fixed numbers and Y is a random variable, defined for each observation, i, where i = 1, . . ., n and n is the number of observations. The model is defined asWhere the j’s are constant coefficients and the ’s are random variables with mean 0 and variance 2.
6 Standard Multiple Regression Assumptions The population multiple regression model isand we assume that n sets of observations are available. The following standard assumptions are made for the model.The x’s are fixed numbers, or they are realizations of random variables, Xji that are independent of the error terms, i’s. In the later case, inference is carried out conditionally on the observed values of the xji’s.The error terms are random variables with mean 0 and the same variance, 2. The later is called homoscedasticity or uniform variance.
7 Standard Multiple Regression Assumptions (continued) The random error terms, i , are not correlated with one another, so thatIt is not possible to find a set of numbers, c0, c1, , ck, such thatThis is the property of no linear relation for the Xj’s.
8 Least Squares Estimation and the Sample Multiple Regression We begin with a sample of n observations denoted as (x1i, x2i, . . ., xKi, yi i = 1, . . ,n) measured for a process whose population multiple regression model isThe least-squares procedure obtains estimates of the coefficients, 1, 2, . . .,K are the values b0 , b1, . . ., bK, for which the sum of the squared deviationsis a minimum.The resulting equationis the sample multiple regression of Y on X1, X2, . . ., XK.
9 Multiple Regression Analysis for Profit Margin Analysis (Using Example 11.1) The regression equation is:Y Profit Margin = X1 Revenue – X2 Office Spaceb0b1b2
10 Sum of Squares Decomposition and the Coefficient of Determination Given the multiple regression model fitted by least squaresWhere the bj’s are the least squares estimates of the coefficients of the population regression model and e’s are the residuals from the estimated regression model.The model variability can be partitioned into the componentsWhereTotal Sum of Squares
11 Sum of Squares Decomposition and the Coefficient of Determination (continued) Error Sum of Squares:Regression Sum of Squares:This decomposition can be interpreted as
12 Sum of Squares Decomposition and the Coefficient of Determination (continued) The coefficient of determination, R2, of the fitted regression is defined as the proportion of the total sample variability explained by the regression and isand it follows that
13 Estimation of Error Variance Given the population regression modelAnd the standard regression assumptions, let 2 denote the common variance of the error term i. Then an unbiased estimate of that variance isThe square root of the variance, Se is also called the standard error of the estimate.
14 Multiple Regression Analysis for Profit Margin Analysis (Using Example 11.1) The regression equation is:Y Profit Margin = X1 Revenue – X2 Office SpaceR2seSSRSSEb0b1b2
15 Adjusted Coefficient of Determination The adjusted coefficient of determination, R2, is defined asWe use this measure to correct for the fact that non-relevant independent variables will result in some small reduction in the error sum of squares. Thus the adjusted R2 provides a better comparison between multiple regression models with different numbers of independent variables.
16 Coefficient of Multiple Correlation The coefficient of multiple correlation, is the correlation between the predicted value and the observed value of the dependent variableand is equal to the square root of the multiple coefficient of determination. We use R as another measure of the strength of the linear relationship between the dependent variable and the independent variables. Thus it is comparable to the correlation between Y and X in simple regression.
17 Basis for Inference About the Population Regression Parameters Let the population regression model beLet b0, b1 , . . , bK be the least squares estimates of the population parameters and sb0, sb1, . . ., sbK be the estimated standard deviations of the least squares estimators. Then if the standard regression assumptions hold and if the error terms i are normally distributed, the random variables corresponding toare distributed as Student’s t with (n – K – 1) degrees of freedom.
18 Confidence Intervals for Partial Regression Coefficients If the regression errors i , are normally distributed and the standard regression assumptions hold, the 100(1 - )% confidence intervals for the partial regression coefficients j, are given byWhere t(n – K - 1, /2) is the number for whichAnd the random variable t(n – K - 1) follows a Student’s t distribution with (n – K - 1) degrees of freedom.
19 Multiple Regression Analysis for Profit Margin Analysis (Using Example 11.1) The regression equation is:Y Profit Margin = X1 Revenue – X2 Office Spaceb1b2tb2tb1
20 Tests of Hypotheses for the Partial Regression Coefficients If the regression errors i are normally distributed and the standard least squares assumptions hold, the following tests have significance level :To test either null hypothesisagainst the alternativethe decision rule is
21 Tests of Hypotheses for the Partial Regression Coefficients (continued) 2. To test either null hypothesisagainst the alternativethe decision rule is
22 Tests of Hypotheses for the Partial Regression Coefficients (continued) 3. To test the null hypothesisAgainst the two-sided alternativethe decision rule is
23 Test on All the Parameters of a Regression Model Consider the multiple regression modelTo test the null hypothesisagainst the alternative hypothesisAt a significance level we can use the decision ruleWhere F K,n – K –1, is the critical value of F from Table 7 in the appendix for whichThe computed F K,n – K –1 follows an F distribution with numerator degrees of freedom k and denominator degrees of freedom (n – K – 1)
24 Test on a Subset of the Regression Parameters Consider the multiple regression modelTo test the null hypothesisThat a subset of regression parameters are simultaneously equal to 0 against the alternative hypothesis
25 Test on a Subset of the Regression Parameters (continued) We compare the error sum of squares for the complete model with the error sum of squares for the restricted model. First run a regression for the complete model that includes all the independent variables and obtain SSE. Next run a restricted regression that excludes the Z variables whose coefficients are the ’s - - the number of variables excluded is r. From this regression obtain the restricted error sum of squares SSE (r). The compute the F statistic and apply the decision rule for a significance level
26 Predictions from the Multiple Regression Models Given that the population regression modelholds and that the standard regression assumptions are valid. Let b0, b1, , bK be the least squares estimates of the model coefficients, j, j = 1, 2, ,K, based on the x1i, x2i, , xKi, yi(i = 1, 2, n) data points. Then given a new observation of a data point, x1,n+1, x 2,n+1, , x K,n+1 the best linear unbiased forecast of Y n+1 isIt is very risky to obtain forecasts that are based on X values outside the range of the data used to estimate the model coefficients, because we do not have data evidence to support the linear model at those points.
27 Quadratic Model Transformations The quadratic functionCan be transformed into a linear multiple regression model by defining new variables:And then specifying the model asWhich is linear in the transformed variables. Transformed quadratic variables can be combined with other variables in a multiple regression model. Thus we could fit a multiple quadratic regression using transformed variables.
28 Exponential Model Transformations Coefficients for exponential models of the formCan be estimated by first taking the logarithm of both sides to obtain an equation that is linear in the logarithms of the variables:Using this form we can regress the logarithm of Y on the logarithm of the two X variables and obtain estimates for the coefficients 1, 2 directly from the regression analysis. Note that this estimation procedure requires that the random errors are multiplicative in the original exponential model. Thus the error term, , is expressed as a percentage increase or decrease instead of the addition or subtraction of a random error as we have seen for linear regression models.
29 Dummy Variable Regression Analysis The relationship between Y and X1can shift in response to a changed condition. The shift effect can be estimated by using a dummy variable which has values of 0 (condition not present) and 1 (condition present). All of the observations from one set of data have dummy variable X2 = 1, and the observations for the other set of data have X2 = 0. In these cases the relationship between Y and X1 is specified by the regression model
30 Dummy Variable Regression Analysis (continued) The functions for each set of points areandIn the first function the constant is b0, while in the second the constant is b0 + b2. Dummy variables are also called indicator variables.
31 Dummy Variable Regression for Differences in Slope To determine if there are significant differences in the slope between two discrete conditions we need to expand our regression model to a more complex formNow we see that the slope coefficient of x1 contains two components, b1, and b3x2. When x2 equals 0, the slope estimate is the usual b1. However, when x2 equals 1, the slope is equal to the algebraic sun of b1 + b3. To estimate the model we actually need to multiply the variables to create a new set of transformed variables that are linear. Therefore the model actually used for the estimation is
32 Dummy Variable Regression for Differences in Slope (continued) The resulting regression model is now linear with three variables. The new variable x1x2 is often called an interaction variable. Note that when the dummy variable x2 = 0 this variable has a value of 0, but when x2 = 1 this variable has the value of x1. The coefficient b3 is an estimate of the difference in the coefficient of x1 when x2 = 1 compared to when x2 = 0. Thus the t statistic for b3 can be used to test the hypothesisIf we reject the null hypothesis we conclude that there is a difference in the slope coefficient for the two subgroups. In many cases we will be interested in both the difference in the constant and difference in the slope and will test both of the hypotheses presented in this section.
33 Key Words Adjusted Coefficient of Determination Basis for Inference About the Population Regression ParametersCoefficient of Multiple DeterminationConfidence Intervals for Partial Regression CoefficientsDummy Variable Regression AnalysisDummy Variable Regression for Differences in SlopeEstimation of Error VarianceLeast Squares Estimation and the Sample Multiple RegressionPrediction from Multiple Regression ModelsQuadratic Model Transformations
34 Key Words (continued) Regression Objectives Standard Error of the EstimateStandard Multiple Regression AssumptionsSum of Squares Decomposition and the Coefficient of DeterminationTest on a Subset of the Regression ParametersTest on All the Parameters of a Regression ModelTests of Hypotheses for the Partial Regression CoefficientsThe Population Multiple Regression Model