Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.

Chapter 13 Linear Regression and Correlation

Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent variable.  Calculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate.  Conduct a test of hypothesis to determine whether the coefficient of correlation in the population is zero.  Calculate the least squares regression line.  Construct and interpret confidence and prediction intervals for the dependent variable.

Correlation Analysis  A group of techniques to measure the association between two variables.  The study of the relationship between variables.  Ex. Is there a relationship between the number of sales calls made in a month and the number of items sold that month?  Usually, a first step in analyzing the correlation of two variables is plotting the scatter diagram.  We also need to define our variables: dependent, and independent.

Dependent & Independent Variables  Dependent variable:  The variable that is being predicted or estimated.  Independent variable:  A variable that provides the basis for estimation. It is the predictor variable.

Example  The sales manager of Copiers Sales wants to determine whether there is a relationship between the number of sales calls made in a month and the number of copiers sold that month. She selected a random sample of 10 representatives and determined the no. of calls, and no. of copiers sold last month by each. What information can you make about the relationship between no. of calls and no. sold?

Example (cont’d)  Our dependent variable is the number of copiers sold.  Our independent variable is the number of sales calls made.  there seems to be some relationship between the two variables. However, the relationship is not exact or “perfect”.  Scatter diagram There appears to be a positive relationship between the 2 variables.

Coefficient of Correlation  A measure of the strength of the linear relationship between two variables.  Designated as r, often referred to as Pearson’s r, or as the Pearson product-moment correlation coefficient.  it can assume any value from -1.00 to +1.00 inclusive.  A correlation of -1.00 or +1.00 indicates a “perfect” correlation.

Correlation Coefficient  Chart 13-2 (pg. 432)

Correlation Coefficient  If there is absolutely no correlation between 2 sets of variables.  A coefficient close to zero (r=-.08) shows a weak linear relationship between variables.  A coefficient close to 1 (r=.91) indicates a strong linear relationship between variables.  If the correlation is weak, there is considerable scatter about a line drawn through the center of data.  If the correlation is strong, there is little scatter about the line drawn through the center of data.

How to Calculate r?

 By examining the previous chart, we notice that:  When the no. of calls is above the mean, the no. sold is also above the mean (data in upper-right quadrant).  When the no. of calls is below the mean, the no. sold is also below the mean (data in lower-left quadrant).  We can conclude that variables are positively related.  In general if:  Data values are in upper-right quadrant and lower-left quadrant, the variables are positively related.  Data values are in upper-left quadrant and lower-right quadrant, the variables are negatively related.  Data values are spread in all 4 quadrants, the variables are not related. How to Calculate r?

 Next, we calculate the total deviations of values from their respective means and their product.  In both the upper-right and lower-left quadrants that product is positive (both factors have same sign). In other 2 quadrants, product is negative, and variables are negatively related.  If values of product is zero, variables are not related. How to Calculate r?

 Pearson developed the following formula for r:  In our example, SD of Y = 14.337; SD of X = 9.189:

How to Interpret r?  r= 0.759:  It is positive, there is a direct relationship between no. of calls and no. of copiers sold.  It is fairly close to 1, so the association is strong.  An increase in calls will likely lead to more sales.

Coefficient of Determination  The proportion of the total variation in the dependent variable Y that is explained, or accounted for, by the variation in the independent variable X.  It is computed by squaring the coefficient of correlation r 2.  In our example: r 2 = (0.759) 2 = 0.576  We can say that 57.6 percent of the variation in the number of copiers sold is explained by the variation in the number of sales calls.  Correlation does not mean causation.

Testing Significance of r  r = 0.759; sample size of 10.  What if the value of r was due to chance? Could it be that r for population is zero? Did the computed r come from a population of paired observations with zero correlation?  We use hypothesis testing procedures to answer these questions.  the degrees of freedom is n-2, with n-2 degrees of freedom

 H 0 : ρ = 0  H 1 : ρ ≠ 0 ρ (rho) is correlation in population.  Test is 2-tailed.  Use.05 significance level.  Use t test statistic.  Since in our example, n=10, so df=n-2=8. Then, critical t-value is 2.306 (from Table in Appendix).  Decision rule: If computed t falls between ± 2.306, we do not reject H 0. Testing Significance of r

 The formula for computing t is: With n-2 degrees of freedom. Testing Significance of r Computed t falls in rejection area. So H0 is rejected at the.05 sig. level. There is a correlation between the 2 variables.

Testing Significance of r

 Used when we wish to develop an equation to describe the linear relationship between 2 variables.  Also, when we want to be able to estimate the value of the DV Y based on a selected value of the IV X.  The technique used to develop the equation and provide the estimates is called regression analysis. Regression Analysis

Regression Equation  An equation that expresses the linear relationship between 2 variables.  We need to draw a line (in our scatter diagram) that best fits the data. How?  we need to use a statistical tool (the least square principle).

 Determining a regression equation by minimizing the sum of squares of the vertical distances between the actual values and the predicted values of Y. Least Square Line – best fitting line

Lines Drawn with Straight Edge (by hand) have a higher deviation with the data points than the regression line  Chart 13-9 (pg. 442)

Simple Linear Regression Equation: The Prediction Line  Estimate of Y:

 The slope of the linear regression line is computed using the formula: where r is the correlation coefficient; Linear Regression Equation

 The Y-intercept can be computed using the formula: Linear Regression Equation

Example  The sales manager of Copiers Sales of America would like to offer specific information about the relationship between the no. of calls and no. of copiers sold. Use the least square method to determine a linear equation between the 2 variables. What is the expected no. of copiers sold by a representative who made 20 calls?

Example (cont’d) Using previously calculated values, we compute the following: Therefore, So the expected no. of copiers sold by a representative who made 20 calls is: 18.9476 + (1.1842)(20) = 42.6316 copiers.

Drawing the Regression Line

 Features of regression line:  There is no other line through the data for which the sum of squared deviations is smaller.  The line will pass through the points represented by the mean of X values and mean of Y values. Drawing the Regression Line

Standard Error of Estimate  If all data points fall on regression line:  Y would be predicted with 100% accuracy.  There would be no error in predicting Y based on X.  Perfect predictions are not realistic in business.  We need a measure that describes how precise the prediction of Y is based on X, or, inversely, how inaccurate the estimate might be.  Definition:  A measure of the scatter, or dispersion, of the observed values around the line of regression.  It is computed using the following formula:

Standard Error of Estimate  The standard error of the estimate is based on the squared deviations of each Y and its predicted value Y prime.  If S sub Y dot X is small, this means that the data are relatively close to the regression line and the regression equation can be used to predict Y with little error.  If it is large, this means that the data are widely scattered around the regression line and the regression equation will not provide a relatively precise estimate of Y.

Coefficient of Determination  We will use an example to further examine the concept of the coefficient of determination.  Example: suppose there is interest in the relationship between years on the job, X, and weekly production, Y. Sample data reveal the following:

SST=SSR+SSE Total variation : Regression : Error variation : MEASURES OF VARIATION IN REGRESSION

variation = sum of squares (SS)  Here, Total Variation can be split into:  1. Variation explained by the regression (i.e. by the one independent variable so df for this category is 1)  2. Error or Unexplained variation.

ANOVA Table to find Coefficient of Determination  The coefficient of determination can be obtained directly from :  It represents the percentage of variation in the dependent variable Y that is explained by the independent variable X.  Effect of SSE on the Coeff. of Determination: Note that as SSE decreases, r 2 will increase; and as SSE increases, r 2 will decrease.

Unexplained Variation (Error)  Plot the data into a scatter diagram.  Draw the least square line.  The equation is =2+0.4X.  Note that there are some errors between predicted values and actual values.  To measure overall error in our prediction, every deviation from the line is squared and the squared summed.  Logically this is unexplained variation.

Unexplained Variation (Error)  Chart 13-15 (pg. 455)

Unexplained Variation (Error)  Table 13-6 (pg. 455)

Total Variation  Now suppose only the Y values (i.e. the observed values of weekly production) are known and we want to predict production for every employee.  To make these predictions, we would assign the mean weekly production (30/5=6) to each employee.  The calculations are presented in the following table. The value of 20 in table is referred to as total variation.

Total Variation  Table 13-7 (pg. 456)

Total Variation  What we did to arrive to total variation in Y is shown in the following chart:  Chart 13-16 (pg. 456)

Explained Variation  Explained Variation = Total Variation – Unexplained Variation. r 2 = EV / TV =(Total - Unexplained) / Total

Example (cont’d)  In our example:  Notice that 16 represents explained variation.  Also.80 is a proportion. We say that 80% of the variation in weekly production Y is explained by its linear relationship with years on the job, X.

Therefore, in summary:  Coefficient of correlation  Coefficient of Determination square of coefficient of correlation and measure the % of variation in Y explained by variation in X  Standard Error of Estimate

Confidence Interval & Prediction Interval  Confidence Interval: Reports an interval estimate for the mean value of all Y for a given X.  Prediction Interval: Reports the range of values of a particular value of Y for a given X.

 We compute the confidence interval for the mean of Y, given X using the following formula: Where is predicted value for any selected value of X; X is any selected value of X; X bar is the means of the Xs; n is the number of observations; s sub y.x is the standard error of the estimate; t is the value of t from Appendix B.2 with n-2 df (degrees of freedom). Confidence Interval

Prediction Interval  We compute the prediction interval for particular value of Y for a given X using the following formula:

Example: Question: Determine a 95% confidence interval for all sales representatives who made 25 calls (i.e. confidence interval) and for Sheila Baker, a west coast sales representative who made 25 calls (i.e. prediction interval). Answer: Steps: Find corresponding to 25. Find t-value (bec n=10, df is 8). Then use table to get t=2.306 Find the mean X-bar. Ready to use the formulas for the confidence interval and prediction interval. We obtain: Confidence interval =48.55+-7.63=[40.9, 56.2] (slide 42) So if a sales repres. is expected to sell 48.55 copiers, it is likely that those sales will range from 40.9 to 56.2 Prediction interval =48.55+-24.07=[24.47, 72.62] (slide 43) So Sheila Baker will sell between 24.47 and 72.62 copiers. Which interval is larger?

Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.

Similar presentations

Presentation on theme: "Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.

Similar presentations

Presentation on theme: "Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent."— Presentation transcript:

Similar presentations

About project

Feedback