Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3 Today: Statistical Review cont’d: Unbiasedness and efficiency Sample equivalents of variance, covariance and correlation Probability limits and.

Similar presentations


Presentation on theme: "Lecture 3 Today: Statistical Review cont’d: Unbiasedness and efficiency Sample equivalents of variance, covariance and correlation Probability limits and."— Presentation transcript:

1 Lecture 3 Today: Statistical Review cont’d: Unbiasedness and efficiency Sample equivalents of variance, covariance and correlation Probability limits and consistency (quick) The Simple Regression Model

2 © Christopher Dougherty 1999–2006 We will next demonstrate that the variance of the distribution of X is smaller than that of X, as depicted in the diagram. probability density function of X XX X XX X probability density function of X SAMPLING AND ESTIMATORS

3 © Christopher Dougherty 1999–2006 We start by replacing X by its definition and then using variance rule 2 to take 1/n out of the expression as a common factor. SAMPLING AND ESTIMATORS

4 © Christopher Dougherty 1999–2006 Next we use variance rule 1 to replace the variance of a sum with a sum of variances. In principle there are many covariance terms as well, but they are zero if we assume that the sample values are generated independently. SAMPLING AND ESTIMATORS

5 © Christopher Dougherty 1999–2006 Now we come to the bit that requires thought. Start with X 1. When we are still at the planning stage, we do not know what the value of X 1 will be. SAMPLING AND ESTIMATORS

6 © Christopher Dougherty 1999–2006 All we know is that it will be generated randomly from the distribution of X. The variance of X 1, as a beforehand concept, will therefore be  X. The same is true for all the other sample components, thinking about them beforehand. Hence we write this line. 2 SAMPLING AND ESTIMATORS

7 © Christopher Dougherty 1999–2006 Thus we have demonstrated that the variance of the sample mean is equal to the variance of X divided by n, a result with which you will be familiar from your statistics course. SAMPLING AND ESTIMATORS

8 © Christopher Dougherty 1999–2006 UNBIASEDNESS AND EFFICIENCY However, the sample mean is not the only unbiased estimator of the population mean. We will demonstrate this supposing that we have a sample of two observations (to keep it simple). Thus Z is an unbiased estimator of  X if the sum of the weights is equal to one. An infinite number of combinations of 1 and 2 satisfy this condition, not just the sample mean (here, i =1/n ). Unbiasedness of X: Generalized estimator Z = 1 X X 2

9 © Christopher Dougherty 1999–2006 probability density function XX estimator B Generalized estimator Z = 1 X X 2 is an unbiased estimator of  X if the sum of the weights is equal to one. An infinite number of combinations of i s satisfy this condition, not just the sample mean. How do we choose among them? The answer is to use the most efficient estimator, the one with the smallest population variance, because it will tend to be the most accurate. estimator A UNBIASEDNESS AND EFFICIENCY

10 © Christopher Dougherty 1999–2006 probability density function estimator B In the diagram, A and B are both unbiased estimators but B is superior because it is more efficient. estimator A XX UNBIASEDNESS AND EFFICIENCY

11 © Christopher Dougherty 1999–2006 We will analyze the variance of the generalized estimator and find out what condition the weights must satisfy in order to minimize it. Generalized estimator Z = 1 X X 2 UNBIASEDNESS AND EFFICIENCY

12 © Christopher Dougherty 1999–2006 The first variance rule is used to decompose the variance. Generalized estimator Z = 1 X X 2 UNBIASEDNESS AND EFFICIENCY

13 © Christopher Dougherty 1999–2006 Note that we are assuming that X 1 and X 2 are independent observations and so their covariance is zero. The second variance rule is used to bring 1 and 2 out of the variance expressions. Generalized estimator Z = 1 X X 2 UNBIASEDNESS AND EFFICIENCY

14 © Christopher Dougherty 1999–2006 The variance of X 1, at the planning stage, is  X 2. The same goes for the variance of X 2. At this step, you can use the following result, “If = 1, then, >= ½.” to show that the sample mean is more efficient because it has a lower variance. Generalized estimator Z = 1 X X 2 UNBIASEDNESS AND EFFICIENCY

15 © Christopher Dougherty 1999–2006 Or, you can use calculus as follows: We take account of the condition for unbiasedness and re-write the variance of Z, substituting for 2. Generalized estimator Z = 1 X X 2 UNBIASEDNESS AND EFFICIENCY

16 © Christopher Dougherty 1999–2006 The quadratic is expanded. To minimize the variance of Z, we must choose 1 so as to minimize the final expression. Generalized estimator Z = 1 X X 2 UNBIASEDNESS AND EFFICIENCY

17 © Christopher Dougherty 1999–2006 Generalized estimator Z = 1 X X 2 We differentiate with respect to 1 to obtain the first-order condition. UNBIASEDNESS AND EFFICIENCY

18 © Christopher Dougherty 1999–2006 The expression is minimized for 1 = 0.5. It follows that 2 = 0.5 as well. So we have demonstrated that the sample mean is the most efficient unbiased estimator, at least in this example. (Note that the second differential is positive, confirming that we have a minimum.) Generalized estimator Z = 1 X X 2 UNBIASEDNESS AND EFFICIENCY

19 © Christopher Dougherty 1999–2006 CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE Suppose that you have alternative estimators of a population characteristic , one unbiased, the other biased but with a smaller variance. How do you choose between them? probability density function  estimator B estimator A

20 © Christopher Dougherty 1999–2006 A widely-used loss function is the mean square error of the estimator, defined as the expected value of the square of the deviation of the estimator about the true value of the population characteristic. probability density function  estimator B CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

21 © Christopher Dougherty 1999–2006 The mean square error involves a trade-off between the variance of the estimator and its bias. Suppose you have a biased estimator like estimator B above, with expected value  Z. probability density function ZZ bias estimator B CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

22 © Christopher Dougherty 1999–2006 The mean square error can be shown to be equal to the sum of the variance of the estimator and the square of the bias. probability density function ZZ bias estimator B CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

23 © Christopher Dougherty 1999–2006 To demonstrate this, we start by subtracting and adding  Z. CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

24 © Christopher Dougherty 1999–2006 We expand the quadratic using the rule (a + b) 2 = a 2 + b 2 + 2ab, where a = Z –  Z and b =  Z – . CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

25 © Christopher Dougherty 1999–2006 We use the first expected value rule to break up the expectation into its three components. CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

26 © Christopher Dougherty 1999–2006 The first term in the expression is by definition the variance of Z. CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

27 © Christopher Dougherty 1999–2006 (  Z –  ) is a constant, so the second term is a constant. CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

28 © Christopher Dougherty 1999–2006 In the third term, (  Z –  ) may be brought out of the expectation, again because it is a constant, using the second expected value rule. CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

29 © Christopher Dougherty 1999–2006 Now E(Z) is  Z, and E(–  Z ) is –  Z. CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

30 © Christopher Dougherty 1999–2006 Hence the third term is zero and the mean square error of Z is shown be the sum of the variance of Z and the bias squared. CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

31 © Christopher Dougherty 1999–2006 In the case of the estimators shown, estimator B is probably a little better than estimator A according to the MSE criterion. probability density function  estimator B estimator A CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

32 © Christopher Dougherty 1999–2006 ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION Given a sample of n observations, the usual estimator of the variance is the sum of the squared deviations around the sample mean divided by n – 1, typically denoted s 2 X. Since the variance is the expected value of the squared deviation of X about its mean, it makes intuitive sense to use the average of the sample squared deviations as an estimator. But why divide by n – 1 rather than by n? The reason is that the sample mean is by definition in the middle of the sample, while the unknown population mean is not, except by coincidence. As a consequence, the sum of the squared deviations from the sample mean tends to be slightly smaller than the sum of the squared deviations from the population mean. Hence a simple average of the squared sample deviations is a downwards biased estimator of the variance. However, the bias can be shown to be a factor of (n – 1)/n. Thus one can allow for the bias by dividing the sum of the squared deviations by n – 1 instead of n. The proof is in the appendix of the review chapter. Variance Estimator

33 © Christopher Dougherty 1999–2006 Variance Estimator Covariance Estimator A similar adjustment has to be made when estimating a covariance. For two random variables X and Y an unbiased estimator of the covariance  XY is given by the sum of the products of the deviations around the sample means divided by n – 1. ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

34 © Christopher Dougherty 1999–2006 The population correlation coefficient  XY for two variables X and Y is defined to be their covariance divided by the square root of the product of their variances. The sample correlation coefficient, r XY, is obtained from this by replacing the covariance and variances by their estimators. Correlation Estimator ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

35 © Christopher Dougherty 1999–2006 Correlation Estimator The 1/(n – 1) terms in the numerator and the denominator cancel and one is left with a straightforward expression. ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

36 Probability Limits and Consistency

37 © Christopher Dougherty 1999–2006 n 150 If n is equal to 1, the sample consists of a single observation. X is the same as X and its standard deviation is n = probability density function of X ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY X

38 © Christopher Dougherty 1999–2006 n We will see how the shape of the distribution changes as the sample size is increased n = probability density function of X ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY X

39 © Christopher Dougherty 1999–2006 n The distribution becomes more concentrated about the population mean n = probability density function of X ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY X

40 © Christopher Dougherty 1999–2006 n To see what happens for n greater than 100, we will have to change the vertical scale n = probability density function of X ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY X

41 © Christopher Dougherty 1999–2006 n We have increased the vertical scale by a factor of n = probability density function of X ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY X

42 © Christopher Dougherty 1999–2006 n The distribution continues to contract about the population mean n = probability density function of X ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY X

43 © Christopher Dougherty 1999–2006 n In the limit, the variance of the distribution tends to zero. The distribution collapses to a spike at the true value. The plim of the sample mean is therefore the population mean n = probability density function of X ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY X

44 © Christopher Dougherty 1999–2006 Consistency An estimator of a population characteristic is said to be consistent if it satisfies two conditions: (1)It possesses a probability limit, and so its distribution collapses to a spike as the sample size becomes large, and (2)The spike is located at the true value of the population characteristic. Hence we can say plim X =  X. ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY

45 © Christopher Dougherty 1999–2006 The sample mean in our example satisfies both conditions and so it is a consistent estimator of  X. Most standard estimators in simple applications satisfy the first condition because their variances tend to zero as the sample size becomes large n = probability density function of X ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY X

46 © Christopher Dougherty 1999–2006 The only issue then is whether the distribution collapses to a spike at the true value of the population characteristic. A sufficient condition for consistency is that the estimator should be unbiased and that its variance should tend to zero as n becomes large. It is easy to see why this is a sufficient condition. If the estimator is unbiased for a finite sample, it must stay unbiased as the sample size becomes large. Meanwhile, if the variance of its distribution is decreasing, its distribution must collapse to a spike. Since the estimator remains unbiased, this spike must be located at the true value. The sample mean is an example of an estimator that satisfies this sufficient condition. ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY

47 © Christopher Dougherty 1999–2006 Consistency Why are we interested in consistency, when in practice we have finite samples? As a first approximation, the answer is that if we can show that an estimator is consistent, then we may be optimistic about its finite sample properties, whereas is the estimator is inconsistent, we know that for finite samples it will definitely be biased. ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY

48 © Christopher Dougherty 1999–2006 Consistency Why are we interested in consistency, when in practice we have finite samples? As a first approximation, the answer is that if we can show that an estimator is consistent, then we may be optimistic about its finite sample properties, whereas is the estimator is inconsistent, we know that for finite samples it will definitely be biased. ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY

49 © Christopher Dougherty 1999–2006 Consistency However, there are reasons for being cautious about preferring consistent estimators to inconsistent ones. First, a consistent estimator may be biased for finite samples. Second, we are usually also interested in variances. If a consistent estimator has a larger variance than an inconsistent one, the latter might be preferable if judged by the mean square error or similar criterion that allows a trade-off between bias and variance. How can you resolve these issues? Mathematically they are intractable, otherwise we would not have resorted to large sample analysis in the first place. ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY

50 © Christopher Dougherty 1999–2006 Consistency However, there are reasons for being cautious about preferring consistent estimators to inconsistent ones. First, a consistent estimator may be biased for finite samples. Second, we are usually also interested in variances. If a consistent estimator has a larger variance than an inconsistent one, the latter might be preferable if judged by the mean square error or similar criterion that allows a trade-off between bias and variance. How can you resolve these issues? Mathematically they are intractable, otherwise we would not have resorted to large sample analysis in the first place. ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY

51 The Simple Regression Model

52 © Christopher Dougherty 1999–2006 Y SIMPLE REGRESSION MODEL Suppose that a variable Y is a linear function of another variable X, with unknown parameters  1 and  2 that we wish to estimate. Suppose that we have a sample of 4 observations with X values as shown. 11 X X1X1 X2X2 X3X3 X4X4

53 © Christopher Dougherty 1999–2006 If the relationship were an exact one, the observations would lie on a straight line and we would have no trouble obtaining accurate estimates of  1 and  2. Q1Q1 Q2Q2 Q3Q3 Q4Q4 11 Y X X1X1 X2X2 X3X3 X4X4

54 © Christopher Dougherty 1999–2006 P4P4 In practice, most economic relationships are not exact and the actual values of Y are different from those corresponding to the straight line. P3P3 P2P2 P1P1 Q1Q1 Q2Q2 Q3Q3 Q4Q4 11 Y X X1X1 X2X2 X3X3 X4X4 SIMPLE REGRESSION MODEL

55 © Christopher Dougherty 1999–2006 P4P4 To allow for such divergences, we will write the model as Y =  1 +  2 X + u, where u is a disturbance term. P3P3 P2P2 P1P1 Q1Q1 Q2Q2 Q3Q3 Q4Q4 11 Y X X1X1 X2X2 X3X3 X4X4 SIMPLE REGRESSION MODEL

56 © Christopher Dougherty 1999–2006 P4P4 Each value of Y thus has a non-random component,  1 +  2 X, and a random component, u. The first observation has been decomposed into these two components. P3P3 P2P2 P1P1 Q1Q1 Q2Q2 Q3Q3 Q4Q4 u1u1 11 Y X X1X1 X2X2 X3X3 X4X4 SIMPLE REGRESSION MODEL

57 © Christopher Dougherty 1999–2006 P4P4 In practice we can see only the P points. P3P3 P2P2 P1P1 Y X X1X1 X2X2 X3X3 X4X4 SIMPLE REGRESSION MODEL

58 © Christopher Dougherty 1999–2006 P4P4 Obviously, we can use the P points to draw a line which is an approximation to the line Y =  1 +  2 X. If we write this line Y = b 1 + b 2 X, b 1 is an estimate of  1 and b 2 is an estimate of  2. P3P3 P2P2 P1P1 ^ b1b1 Y X X1X1 X2X2 X3X3 X4X4 SIMPLE REGRESSION MODEL

59 © Christopher Dougherty 1999–2006 P4P4 The line is called the fitted model and the values of Y predicted by it are called the fitted values of Y. They are given by the heights of the R points. P3P3 P2P2 P1P1 R1R1 R2R2 R3R3 R4R4 b1b1 (fitted value) Y (actual value) Y X X1X1 X2X2 X3X3 X4X4 SIMPLE REGRESSION MODEL

60 © Christopher Dougherty 1999–2006 P4P4 X X1X1 X2X2 X3X3 X4X4 The discrepancies between the actual and fitted values of Y are known as the residuals. P3P3 P2P2 P1P1 R1R1 R2R2 R3R3 R4R4 (residual) e1e1 e2e2 e3e3 e4e4 b1b1 (fitted value) Y (actual value) Y SIMPLE REGRESSION MODEL

61 © Christopher Dougherty 1999–2006 P4P4 Note that the values of the residuals are not the same as the values of the disturbance term. The diagram now shows the true unknown relationship as well as the fitted line. P3P3 P2P2 P1P1 R1R1 R2R2 R3R3 R4R4 b1b1 11 (fitted value) Y (actual value) Y X X1X1 X2X2 X3X3 X4X4 SIMPLE REGRESSION MODEL

62 © Christopher Dougherty 1999–2006 P4P4 The disturbance term in each observation is responsible for the divergence between the non-random component of the true relationship and the actual observation. P3P3 P2P2 P1P1 Q2Q2 Q1Q1 Q3Q3 Q4Q4 11 b1b1 (fitted value) Y (actual value) Y X X1X1 X2X2 X3X3 X4X4 SIMPLE REGRESSION MODEL

63 © Christopher Dougherty 1999–2006 P4P4 The residuals are the discrepancies between the actual and the fitted values. If the fit is a good one, the residuals and the values of the disturbance term will be similar, but they must be kept apart conceptually. P3P3 P2P2 P1P1 R1R1 R2R2 R3R3 R4R4 11 b1b1 (fitted value) Y (actual value) Y X X1X1 X2X2 X3X3 X4X4 SIMPLE REGRESSION MODEL

64 © Christopher Dougherty 1999–2006 P4P4 Both of these lines will be used in our analysis. Each permits a decomposition of the value of Y. The decompositions will be illustrated with the fourth observation. Q4Q4 u 4 11 b1b1 (fitted value) Y (actual value) Y X X1X1 X2X2 X3X3 X4X4 e4e4 R4R4 SIMPLE REGRESSION MODEL

65 © Christopher Dougherty 1999–2006 Using the theoretical relationship, Y can be decomposed into its non-stochastic component  1 +  2 X and its random component u. Y =  1 +  2 X + u This is a theoretical decomposition because we do not know the values of  1 or  2, or the values of the disturbance term. We shall use it in our analysis of the properties of the regression coefficients. The other decomposition is with reference to the fitted line. In each observation, the actual value of Y is equal to the fitted value plus the residual. This is an operational decomposition which we will use for practical purposes. Y = b 1 + b 2 X + e = + e SIMPLE REGRESSION MODEL

66 © Christopher Dougherty 1999–2006 Least squares criterion: Minimize RSS (residual sum of squares), where To begin with, we will draw the fitted line so as to minimize the sum of the squares of the residuals, RSS. This is described as the least squares criterion. SIMPLE REGRESSION MODEL

67 © Christopher Dougherty 1999–2006 Why the squares of the residuals? Why not just minimize the sum of the residuals? Least squares criterion: Why not minimize Minimize RSS (residual sum of squares), where SIMPLE REGRESSION MODEL

68 © Christopher Dougherty 1999–2006 P4P4 The answer is that you would get an apparently perfect fit by drawing a horizontal line through the mean value of Y. The sum of the residuals would be zero. You must prevent negative residuals from cancelling positive ones, and one way to do this is to use the squares of the residuals. Of course there are other ways of dealing with the problem. The least squares criterion has the attraction that the estimators derived with it have desirable properties, provided that certain conditions are satisfied. P3P3 P2P2 P1P1 Y X X1X1 X2X2 X3X3 X4X4 Y SIMPLE REGRESSION MODEL

69 © Christopher Dougherty 1999–2006 DERIVING LINEAR REGRESSION COEFFICIENTS Y X Next, we’ll see how the regression coefficients for a simple regression model are derived, using the least squares criterion (OLS, for ordinary least squares). We will start with a numerical example with just three observations: (1,3), (2,5), and (3,6)

70 © Christopher Dougherty 1999–2006 Y b2b2 b1b1 X Writing the fitted regression as Y = b 1 + b 2 X, we will determine the values of b 1 and b 2 that minimize RSS, the sum of the squares of the residuals. ^ DERIVING LINEAR REGRESSION COEFFICIENTS

71 © Christopher Dougherty 1999–2006 Given our choice of b 1 and b 2, the residuals are as shown. Y b2b2 b1b1 X DERIVING LINEAR REGRESSION COEFFICIENTS

72 © Christopher Dougherty 1999–2006 The sum of the squares of the residuals is thus as shown above. DERIVING LINEAR REGRESSION COEFFICIENTS

73 © Christopher Dougherty 1999–2006 The quadratics have been expanded. DERIVING LINEAR REGRESSION COEFFICIENTS

74 © Christopher Dougherty 1999–2006 Like terms have been added together. DERIVING LINEAR REGRESSION COEFFICIENTS

75 © Christopher Dougherty 1999–2006 For a minimum, the partial derivatives of RSS with respect to b 1 and b 2 should be zero. (We should also check a second-order condition.) DERIVING LINEAR REGRESSION COEFFICIENTS

76 © Christopher Dougherty 1999–2006 The first-order conditions give us two equations in two unknowns. DERIVING LINEAR REGRESSION COEFFICIENTS

77 © Christopher Dougherty 1999–2006 Solving them, we find that RSS is minimized when b 1 and b 2 are equal to 1.67 and 1.50, respectively. DERIVING LINEAR REGRESSION COEFFICIENTS

78 © Christopher Dougherty 1999–2006 Y b2b2 b1b1 X Here is the scatter diagram again. DERIVING LINEAR REGRESSION COEFFICIENTS

79 © Christopher Dougherty 1999–2006 Y X The fitted line and the fitted values of Y are as shown DERIVING LINEAR REGRESSION COEFFICIENTS

80 © Christopher Dougherty 1999–2006 XXnXn X1X1 Y Now we will do the same thing for the general case with n observations. DERIVING LINEAR REGRESSION COEFFICIENTS

81 © Christopher Dougherty 1999–2006 XXnXn X1X1 Y b1b1 b2b2 Given our choice of b 1 and b 2, we will obtain a fitted line as shown. DERIVING LINEAR REGRESSION COEFFICIENTS

82 © Christopher Dougherty 1999–2006 XXnXn X1X1 Y b1b1 b2b2 The residual for the first observation is defined. DERIVING LINEAR REGRESSION COEFFICIENTS

83 © Christopher Dougherty 1999–2006 Similarly we define the residuals for the remaining observations. That for the last one is marked. XXnXn X1X1 Y b1b1 b2b2 DERIVING LINEAR REGRESSION COEFFICIENTS

84 © Christopher Dougherty 1999–2006 RSS, the sum of the squares of the residuals, is defined for the general case. The data for the numerical example are shown for comparison. DERIVING LINEAR REGRESSION COEFFICIENTS

85 © Christopher Dougherty 1999–2006 The quadratics are expanded. DERIVING LINEAR REGRESSION COEFFICIENTS

86 © Christopher Dougherty 1999–2006 Like terms are added together. DERIVING LINEAR REGRESSION COEFFICIENTS

87 © Christopher Dougherty 1999–2006 Note that in this equation the observations on X and Y are just data that determine the coefficients in the expression for RSS. The choice variables in the expression are b 1 and b 2. This may seem a bit strange because in elementary calculus courses b 1 and b 2 are usually constants and X and Y are variables. However, if you have any doubts, compare what we are doing in the general case with what we did in the numerical example. DERIVING LINEAR REGRESSION COEFFICIENTS

88 © Christopher Dougherty 1999–2006 The first derivative with respect to b 1. DERIVING LINEAR REGRESSION COEFFICIENTS

89 © Christopher Dougherty 1999–2006 With some simple manipulation we obtain a tidy expression for b 1. DERIVING LINEAR REGRESSION COEFFICIENTS

90 © Christopher Dougherty 1999–2006 The first derivative with respect to b 2. DERIVING LINEAR REGRESSION COEFFICIENTS

91 © Christopher Dougherty 1999–2006 Divide through by 2. DERIVING LINEAR REGRESSION COEFFICIENTS

92 © Christopher Dougherty 1999–2006 We now substitute for b 1 using the expression obtained for it and we thus obtain an equation that contains b 2 only. DERIVING LINEAR REGRESSION COEFFICIENTS

93 © Christopher Dougherty 1999–2006 The definition of the sample mean has been used. DERIVING LINEAR REGRESSION COEFFICIENTS

94 © Christopher Dougherty 1999–2006 The last two terms have been disentangled. DERIVING LINEAR REGRESSION COEFFICIENTS

95 © Christopher Dougherty 1999–2006 Terms not involving b 2 have been transferred to the right side. DERIVING LINEAR REGRESSION COEFFICIENTS

96 © Christopher Dougherty 1999–2006 Hence we obtain an expression for b 2. DERIVING LINEAR REGRESSION COEFFICIENTS

97 © Christopher Dougherty 1999–2006 In practice, we shall use an alternative expression. We will demonstrate that it is equivalent. DERIVING LINEAR REGRESSION COEFFICIENTS

98 © Christopher Dougherty 1999–2006 Expanding the numerator, we obtain the terms shown. DERIVING LINEAR REGRESSION COEFFICIENTS

99 © Christopher Dougherty 1999–2006 In the second term the mean value of Y is a common factor. In the third, the mean value of X is a common factor. The last term is the same for all i. DERIVING LINEAR REGRESSION COEFFICIENTS

100 © Christopher Dougherty 1999–2006 We use the definitions of the sample means to simplify the expression. DERIVING LINEAR REGRESSION COEFFICIENTS

101 © Christopher Dougherty 1999–2006 Hence we have shown that the numerators of the two expressions are the same. DERIVING LINEAR REGRESSION COEFFICIENTS

102 © Christopher Dougherty 1999–2006 The denominator is mathematically a special case of the numerator, replacing Y by X. Hence the expressions are equivalent. DERIVING LINEAR REGRESSION COEFFICIENTS

103 © Christopher Dougherty 1999–2006 XXnXn X1X1 Y b1b1 b2b2 The scatter diagram is shown again. We will summarize what we have done. We hypothesized that the true model is as shown, we obtained some data, and we fitted a line. DERIVING LINEAR REGRESSION COEFFICIENTS

104 © Christopher Dougherty 1999–2006 XXnXn X1X1 Y b1b1 b2b2 We chose the parameters of the fitted line so as to minimize the sum of the squares of the residuals. As a result, we derived the expressions for b 1 and b 2. DERIVING LINEAR REGRESSION COEFFICIENTS


Download ppt "Lecture 3 Today: Statistical Review cont’d: Unbiasedness and efficiency Sample equivalents of variance, covariance and correlation Probability limits and."

Similar presentations


Ads by Google