Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simple Linear Regression and Correlation

Similar presentations


Presentation on theme: "Simple Linear Regression and Correlation"— Presentation transcript:

1 Simple Linear Regression and Correlation

2 Introduction Regression refers to the statistical technique of modeling the relationship between variables. In simple linear regression, we model the relationship between two variables. One of the variables, denoted by Y, is called the dependent variable and the other, denoted by X, is called the independent variable. The model we will use to depict the relationship between X and Y will be a straight-line relationship. A graphical sketch of the pairs (X, Y) is called a scatter plot.

3 Using Statistics This scatterplot locates pairs of observations of advertising expenditures on the x-axis and sales on the y-axis. We notice that: Larger (smaller) values of sales tend to be associated with larger (smaller) values of advertising. S c a t e r p l o f A d v i s n g E x u ( X ) Y 5 4 3 2 1 8 6 The scatter of points tends to be distributed around a positively sloped straight line. The pairs of values of advertising expenditures and sales are not located exactly on a straight line. The scatter plot reveals a more or less strong tendency rather than a precise linear relationship. The line represents the nature of the relationship on average.

4 Examples of Other Scatterplots
Y X Y X Y X X Y X Y X Y

5 Simple Linear Regression Model
The equation that describes how y is related to x and an error term is called the regression model. The simple linear regression model is: y = a+ bx +e where: a and b are called parameters of the model, a is the intercept and b is the slope. e is a random variable called the error term.

6 Assumptions of the Simple Linear Regression Model
The relationship between X and Y is a straight-line relationship. The errors i are normally distributed with mean 0 and variance 2. The errors are uncorrelated (not related) in successive observations. That is: ~ N(0,2) X Y E[Y]=0 + 1 X Assumptions of the Simple Linear Regression Model Identical normal distributions of errors, all centered on the regression line.

7 Errors in Regression Y . { X Xi

8 Estimating Using the Regression Line
SIMPLE REGRESSION AND CORRELATION Estimating Using the Regression Line First, lets look at the equation of a straight line is: Independent variable Dependent variable Slope of the line Y-intercept

9 The Method of Least Squares
SIMPLE REGRESSION AND CORRELATION The Method of Least Squares To estimate the straight line we have to use the least squares method. This method minimizes the sum of squares of error between the estimated points on the line and the actual observed points.

10 Slope of the best-fitting Regression Line
SIMPLE REGRESSION AND CORRELATION The estimating line Slope of the best-fitting Regression Line Y-intercept of the Best-fitting Regression Line

11 SIMPLE REGRESSION - EXAMPLE
Suppose an appliance store conducts a five-month experiment to determine the effect of advertising on sales revenue. The results are shown below. (File PPT_Regr_example.sav) Month Advertising Exp.($100s) Sales Rev.($1000S)

12 SIMPLE REGRESSION - EXAMPLE
X Y X XY

13 SIMPLE REGRESSION - EXAMPLE
b = 0.7

14 Standard Error of Estimate
The standard error of estimate is used to measure the reliability of the estimating equation. It measures the variability or scatter of the observed values around the regression line.

15 Standard Error of Estimate
Short-cut

16 Standard Error of Estimate
Y2 1 4 16

17 Correlation Analysis Correlation analysis is used to describe
the degree to which one variable is linearly related to another. There are two measures for describing correlation: The Coefficient of Correlation The Coefficient of Determination

18 Correlation The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by, can take on any value from -1 to 1.    indicates a perfect negative linear relationship -1 <  < 0 indicates a negative linear relationship    indicates no linear relationship 0 <  < 1 indicates a positive linear relationship    indicates a perfect positive linear relationship The absolute value of  indicates the strength or exactness of the relationship.

19 Illustrations of Correlation
Y X  = 1 Y X  = -1 Y X  = 0 Y X  = -.8 Y X  = 0 Y X  = .8

20 The coefficient of correlation:
Sample Coefficient of Determination Alternate Formula

21 Interpretation: Sample Coefficient of Determination
Percentage of total variation explained by the regression. We can conclude that % of the variation in the sales revenues is explain by the variation in advertising expenditure.

22 The Coefficient of Correlation or
Karl Pearson’s Coefficient of Correlation The coefficient of correlation is the square root of the coefficient of determination. The sign of r indicates the direction of the relationship between the two variables X and Y. The sign of r will be the same as the sign of the coefficient “b” in the regression equation Y = a + b X

23 If the slope of the estimating line is positive
SIMPLE REGRESSION AND CORRELATION If the slope of the estimating line is positive line is negative :- r is the positive square root :- r is the negative The relationship between the two variables is direct

24 Hypothesis Tests for the Correlation Coefficient
H0:  = 0 (No linear relationship) H1:   0 (Some linear relationship) Test Statistic:

25 Analysis-of-Variance Table and an F Test of the Regression Model
H0 : The regression model is not significant H1 : The regression model is significant

26 Testing for the existence of linear relationship
We pose the question: Is the independent variable linearly related to the dependent variable? To answer the question we test the hypothesis H0: b = 0 H1: b is not equal to zero. If b is not equal to zero, the model has some validity. Test statistic, with n-2 degrees of freedom:

27 Advertising expenses ($00)
Correlations Advertising expenses ($00) Sales revenue ($000) Pearson Correlation 1 .904* Sig. (2-tailed) .035 N 5 *. Correlation is significant at the 0.05 level (2-tailed).

28 Std. Error of the Estimate
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .904a .817 .756 .606 a. Predictors: (Constant), Advertising expenses ($00) ANOVAb Model Sum of Squares df Mean Square F Sig. 1 Regression 4.900 13.364 .035a Residual 1.100 3 .367 Total 6.000 4 a. Predictors: (Constant), Advertising expenses ($00) b. Dependent Variable: Sales revenue ($000) Alternately, R2 = 1-[SS(Residual) / SS(Total)] = (1.1/6.0)=0.817 When adjusted for degrees of freedom, Adjusted R2 = 1-[SSResidual/(n-k-1)] / [SS(Total)/(n-1)] = 1-[1.1//3]/[6/4] = 0.756

29 Unstandardized Coefficients Standardized Coefficients
Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -.100 .635 -.157 .885 Advertising expenses ($00) .700 .191 .904 3.656 .035 a. Dependent Variable: Sales revenue ($000)

30 The p-value is 0.035 Test Statistic Value of the test statistic:
Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. b is not equal to zero. Thus, the independent variable is linearly related to y. This linear regression model is valid

31 Test statistic, with n-2 degrees of freedom:
Rejection Region Value of the test statistic: Conclusion: The calculated test statistic is 3.66 which is outside the acceptance region. Alternately, the actual significance is Therefore we will reject the null hypothesis. The advertising expenses is a significant explanatory variable.


Download ppt "Simple Linear Regression and Correlation"

Similar presentations


Ads by Google