 Chapter 10 Simple Regression.

Presentation on theme: "Chapter 10 Simple Regression."— Presentation transcript:

Chapter 10 Simple Regression

Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables.

Correlation Analysis The correlation coefficient is a quantitative measure of the strength of the linear relationship between two variables.

Correlation Analysis The sample correlation coefficient: where:

follows a Student’s t Distribution with (n-2) degrees of freedom
Correlation Analysis The null hypothesis of no linear association: where the random variable: follows a Student’s t Distribution with (n-2) degrees of freedom

Tests for Zero Population Correlation
Let r be the sample correlation coefficient, calculated from a random sample of n pairs of observation from a joint normal distribution. The following tests of the null hypothesis have a significance value : 1. To test H0 against the alternative the decision rule is

Tests for Zero Population Correlation (continued)
2. To test H0 against the alternative the decision rule is

Tests for Zero Population Correlation (continued)
3. To test H0 against the two-sided alternative the decision rule is Here, t n-2, is the number for which Where the random variable tn-2 follows a Student’s t distribution with (n – 2) degrees of freedom.

Linear Regression Model (Example 10.2)

Linear Regression Model (Figure 10.1)

Linear Regression Model
LINEAR REGRESSION POPULATION EQUATION MODEL Where 0 and 1 are the population model coefficients and  is a random error term.

Linear Regression Outcomes
Linear regression provides two important results: Predicted values of the dependent or endogenous variable as a function of an independent or exogenous variable. Estimated marginal change in the endogenous variable that results from a one unit change in the independent or exogenous variable.

Least Squares Procedure
The Least-squares procedure obtains estimates of the linear equation coefficients b0 and b1, in the model by minimizing the sum of the squared residuals ei This results in a procedure stated as Choose b0 and b1 so that the quantity is minimized. We use differential calculus to obtain the coefficient estimators that minimize SSE..

Least-Squares Derived Coefficient Estimators
The slope coefficient estimator is And the constant or intercept indicator is We also note that the regression line always goes through the mean X, Y.

Standard Assumptions for the Linear Regression Model
The following assumptions are used to make inferences about the population linear model by using the estimated coefficients: The x’s are fixed numbers, or they are realizations of random variable, X that are independent of the error terms, i’s. In the latter case, inference is carried out conditionally on the observed values of the x’s. The error terms are random variables with mean 0 and the same variance, 2. The later is called homoscedasticity or uniform variance. The random error terms, I, are not correlated with one another, so that

Regression Analysis for Retail Sales Analysis (Figure 10.5)
The regression equation is Y Retail Sales = X Income b0 b1

Analysis of Variance The total variability in a regression analysis, SST, can be partitioned into a component explained by the regression, SSR, and a component due to unexplained error, SSE With the components defined as, Total sum of squares Error sum of squares Regression sum of squares

Regression Analysis for Retail Sales Analysis (Figure 10.7)
The regression equation is Y Retail Sales = X Income

Coefficient of Determination, R2
The Coefficient of Determination for a regression equation is defined as This quantity varies from 0 to 1 and higher values indicate a better regression. Caution should be used in making general interpretations of R2 because a high value can result from either a small SSE or a large SST or both.

Correlation and R2 The multiple coefficient of determination, R2, for a simple regression is equal to the simple correlation squared:

Estimation of Model Error Variance
The quantity SSE is a measure of the total squared deviation about the estimated regression line, and ei is the residual. An estimator for the variance of the population model error is Division by n – 2 instead of n – 1 results because the simple regression model uses two estimated parameters, b0 and b1, instead of one.

Sampling Distribution of the Least Squares Coefficient Estimator
If the standard least squares assumptions hold, then b1 is an unbiased estimator of 1 and has a population variance and an unbiased sample variance estimator

Basis for Inference About the Population Regression Slope
Let 1 be a population regression slope and b1 its least squares estimate based on n pairs of sample observations. Then, if the standard regression assumptions hold and it can also be assumed that the errors i are normally distributed, the random variable is distributed as Student’s t with (n – 2) degrees of freedom. In addition the central limit theorem enables us to conclude that this result is approximately valid for a wide range of non-normal distributions and large sample sizes, n.

Excel Output for Retail Sales Model (Figure 10.9)
The regression equation is Y Retail Sales = X Income se SSR SSE SST MSR MSE b0 b1 sb1 tb1

Tests of the Population Regression Slope
If the regression errors i are normally distributed and the standard least squares assumptions hold (or if the distribution of b1 is approximately normal), the following tests have significance value : To test either null hypothesis against the alternative the decision rule is

Tests of the Population Regression Slope (continued)
2. To test either null hypothesis against the alternative the decision rule is

Tests of the Population Regression Slope (continued)
3. To test the null hypothesis Against the two-sided alternative the decision rule is

Confidence Intervals for the Population Regression Slope 1
If the regression errors i , are normally distributed and the standard regression assumptions hold, a 100(1 - )% confidence interval for the population regression slope 1 is given by Where t(n – 2, /2) is the number for which And the random variable t(n – 2) follows a Student’s t distribution with (n – 2) degrees of freedom.

F test for Simple Regression Coefficient
We can test the hypothesis against the alternative By using the F statistic The decision rule is We can also show that the F statistic is For any simple regression analysis.

Key Words Analysis of Variance
Assumptions for the Least Squares Coefficient Estimators Basis for Inference About the Population Regression Slope Coefficient of Determination, R2 Confidence Intervals for Predictions Confidence Intervals for the Population Regression Slope b1 Correlation and R2 Estimation of Model Error Variance F test for Simple Regression Coefficient Least-Squares Procedure Linear Regression Outcomes

Key Words (continued) Linear Regression Population Equation Model
Population Model Sampling Distribution of the Least Squares Coefficient Estimator Tests for Zero Population Correlation Tests of the Population Regression Slope