Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1
17.4 Error Variable: Required Conditions The error is a critical part of the regression model. Four requirements involving the distribution of must be satisfied. –The probability distribution of is normal. –The mean of is zero: E( ) = 0. –The standard deviation of is for all values of x. –The set of errors associated with different values of y are all independent. 2
Observational and Experimental Data Observational Y,X : random variables Ex: Y = return, X = inflation Models: Regression Correlation (Bivariate normal) Experimental Y: random variable, X : controlled Ex:Y = blood pressure X = medicine dose Models: Regression 3
17.5 Assessing the Model For our assumed model, the least squares method will produces a regression line whether or not there are linear relationship between x and y. 4
Consequently, it is important to assess how well the linear model fits the data as we have assumed linear model. Several methods are used to assess the model. All are based on the sum of squares for errors, SSE. 5
Sum of Squares for Errors –This is the sum of squared differences between the observed points and the points on the regression line. –It can serve as a measure of how well the line fits the data. SSE is defined by 6
–The standard deviation of the error variables shows the dispersion around the true line for a certain x. –If is big we have big dispersion around the true line. If is small the observations tend to be close to the line. Then, the model fits the data well. –Therefore, we can, use as a measure of the suitability of using a linear model. Standard Error of Estimate, s 7
The is not known, therefore use an estimate of it An estimate of is given by s the standard error of estimate: 8
The Example with the food company: 9
Model: X ~ N( , 2 ) Hypothesis: H 0 : = 50 H 1 : ≠ 50 Test statistic: if H 0 is true Level of significance: α Earlier: Inference about μ, (when 2 unknown) 10
Rejection region: Reject H 0 if t obs >t crit or t obs <-t crit Observation: t obs Conclusion: t obs >t crit or t obs <-t crit reject H 0 t obs -t crit don’t reject H 0 Interpretation: We have empirical support for the hypothesis 11
12 Testing the Slope We test if there is a slope. We can write it formally as follows Test statistic Under H 0, where Confidence interval:
To measure the strength of the linear relationship we use the coefficient of determination. It is a measure of how much the variation in Y is explained by the variation in X. (How many % of the variation in Y can be explained by the model) Coefficient of determination 13
14 i x i y i Total Mean Food Company example: Call X=ADVER and Y=SALES
15 Standard error of the estimate Standard deviation of
Coefficient of determina tion Overall variability in y The regression model Remains, in part, unexplained The error Explained in part by Variation in the dependent variable Y = variation explained by the independent variable + unexplained variation SST=SSR+SSE The greater the explained variable, better the model Co-efficient of determination is a measure of explanatory power of the model 16
17
18
19
20
21
Cause - effect Note that conclusions about cause and effect, X Y is based on knowledge of the subject. Experimental studies can decide it. It is hard to say with observational studies Eg: Is smoking lungcancer true?. The regression model only shows the linear relationship. We will make the same inferences even if we switch the variables! 22
23
Do the analyze in the right order: First the theory of the subject Then the statistical model! 24