3 Regression Line: Line of Best Fit Regression Line: Minimizes the sum of the squared vertical deviations (et) of each point from the regression line.Ordinary Least Squares (OLS) Method
4 REGRESSION ANALYSISGiven the following demand function:Y = A + B1 X + B2 P + B3 I + B4 Pr;X = selling expenses (advertising) Pr = price of substitutesWhat we want are estimates of the values of A, B1, B2, B3, & B4.Regression analysis describes the way in which one variable is related to another. It derives an equation that can be used to estimate the unknown value of one variable on the basis of the known value of the other variable(s).
5 the simple regression model takes the following form: Yi = A + B Xi + ei;Regression analysis assumes that the mean value of Y, given the value of X, is a linear function of X. In other words, the mean value of the dependent variable is assumed to be a linear function of the independent variable.Yi is the ith observed value of the dependent variable and Xi is the ith observed value of the independent variable. Essentially ei is an error term, that is, a random amount that is added to A+BXi (or subtracted if ei is negative).
6 Because of the presence of the error term, the observed values of Yi fall around the population regression line (A+BXi), not on itRegression analysis assumes that the values of ei are independent and that their mean value equals zero.
7 Sample Regression Line (based on a sample) . sample regression line (estimated regression line) describes the average relationship between the dependent variable and independent variable. The general expression of the sample regression line is:i.e, the value of the dependent variable predicted by the regression line,a & b = estimators of A and B.a = the intercept of the regression lineb = the slope of the line, measure the change in the predicted value of Y associated with a one unit increase in X.
8 Method of Least Squares. Used to determine the values of a and b. Since the deviation of the ith observed value of Y from the regression line equals , the sum of the squared deviations equals:Where n is the sample size. Using minimization technique we can find the values of a and b that minimize this expression, by differentiating these expression with respect to a and b and by setting these partial derivatives equal to zero.
9 (1)(2)solving equations (1) and (2) simultaneously, and letting equal the mean value of X in the sample and equal the mean of Y, we find that;and
10 given the following data for company X given the following data for company X. Given the following results of the table below= X;if Y = the observed value of sales= the computed (estimated) value of sales based on the regression line.from the table Y = 4 when X = 1.But using the regression line:= (1) = (Note there is a difference between the observed sales (4) and the estimated sales (4.037).
11 if X = 0 then Y = (0) = (the intercept: the value of Y that intersects the vertical axis)Interpretation: if the firm’s selling expenses = 0, sales would be million of units, and estimated sales go up million units when selling expenses increase by 1m.
12 Ordinary Least Squares Estimation Using Excel The model:Objective: Determine the slope and intercept that minimize the sum of the squared errors.
15 Standard Error of the Slope Estimate Tests of SignificanceStandard Error of the Slope EstimateA measure of the amount of scatter of individual observations about the regression line.It is useful in constructing prediction intervals - that is, intervals within which there is a specified probability that the dependent variable will lie.
16 if probability is set at 0 if probability is set at 0.95, a very approximate prediction interval is:2se; since se =if the predicted value of Y is 11, there is a probability that the firm’s sales will be between:10.26 (11 – (2 × 0.37))and11.74 (11 + (2 × 0.37))Example Calculation
17 The t-statistic (significance of individual variables). Managers need to know whether a particular independent variable influences the dependent variable. The least square estimates of B’s by chance may be positive even if their true values are zero. e.g., B1 = 1.76 i.e., selling expenses have an effect on sales (t=0.0001).To test whether the true value of B1 is zero we must look at the t-statistic of B1. The t-statistic has a distribution called t-distribution.All things equal, the bigger the value of t-statistic (in absolute terms), the smaller the probability that the true value of the regression coefficient in question is zero.In our case, there is only 1 in that chance alone would have resulted in a large t-statistic.
18 Calculation of the t Statistic Degrees of Freedom = (n-k) = (10 - 2) = 8Critical Value at 5% level =2.306 ( is significant)
19 Decomposition of Sum of Squares Total Variation = Explained Variation + Unexplained VariationCoefficient of DeterminationCoefficient of Correlation
21 Multiple Regression Analysis Model:Adjusted Coefficient of DeterminationAnalysis of Variance and F Statistic
22 Problems in Regression Analysis Multicollinearity.A situation in which two or more independent variables are very highly correlated.Under perfect linear correlation it is impossible to estimate the regression coefficients.e.g., (perfect linear correlation)Y = A + B1 X1i + B2 X2i; where X1i = 3X2i-1or X1i = 6 + X2i or X1i = 2 + 4X2i;imperfect linear correlation.Y = A + B1 X1i + B2 X2i; where; X1 = price, X2 = nominal income (p.Q)
23 If two independent variables move together in a rigid fashion, there is no way to tell how much effect each has separately, all what we can observe is the effect of both combined.Consequences of mutlicollinearity- High R2 with no significant t-scores- High simple correlation coefficients (cross correlation matrix)How to deal with multicollinearity- Drop one or more of the multicollinear variables- Transform the multicollinear variables (e.g. first difference)- Increase sample size
24 Serial Correlation (or Autocorrelation) Error terms are not independent, if this year’s error term is positive, next year is always positive ( positive serial correlation ), and if this year’s error term is negative, next year’s is always negative.This is a violation of the assumptions underlying regression analysis. [ should be E(reiej) = 0, if not, the simple correlation between two observations of the error term is not equal to zero]Consequences of Serial Correlation- Increases of the variances of the distributions- Leads to underestimate the standard errors of the coefficients.
25 Detecting Serial correlation. - Durbin Watson TestCompare the computed DW with the DW tables to show whether d is so high or so low, that the hypothesis that there is no serial correlation should be rejected.if d < dL reject the hypothesis of no serial correlation.if d > du accept the hypothesis of no serial correlationif dL d du, the test is inconclusive.e.g.; if the hypothesis is that there is a negative serial correlation, we should- reject the hypothesis of no serial correlation if d<4-dL- accept the hypothesis of no serial correlation if d<4-du- if 4-du d 4-dL the test is inconclusive.
26 How to deal with serial correlation - take the difference of the variables- use generalized least squares
27 Steps in Demand Estimation 1. Model Specification: Identify Variables- Identify the independent variables (in reality an empirical issue)2. Specify Functional Form- Specify the mathematical form of the equation relating the mean value of the dependent variable to those of the independent variables.e.g., Y = f(X,P). This can take the following forms:Y = A + B1 Xi + B2 Pi + ei; B1>0, B2<0or:log Y = log A + B1 log Xi + B2 log Xi + log ei;
28 3. Collect your data. Data can be: - time series- cross section- cross section/time series (panel)4. Estimate The Function5. Test the Results