Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation and Linear Regression

Similar presentations


Presentation on theme: "Correlation and Linear Regression"— Presentation transcript:

1 Correlation and Linear Regression
Chapter 13

2 Learning Objectives LO13-1 Explain the purpose of correlation analysis. LO13-2 Calculate a correlation coefficient to test and interpret the relationship between two variables. LO13-3 Apply regression analysis to estimate the linear LO13-4 Evaluate the significance of the slope of the regression equation. LO13-5 Evaluate a regression equation’s ability to predict using the standard estimate of the error and the coefficient of determination. LO13-6 Calculate and interpret confidence and prediction intervals. LO13-7 Use a log function to transform a nonlinear relationship.

3 LO13-1 Explain the purpose of correlation analysis.
Correlation Analysis – Measuring the Relationship Between Two Variables Analyzing relationships between two quantitative variables. The basic hypothesis of correlation analysis: Does the data indicate that there is a relationship between two quantitative variables? For the Applewood Auto sales data, the data is displayed in a scatter graph. Are profit per vehicle and age correlated?

4 Correlation Analysis – Measuring
LO13-1 Correlation Analysis – Measuring the Relationship Between Two Variables The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables. The sample correlation coefficient is identified by the lowercase letter r. It shows the direction and strength of the linear relationship between two interval- or ratio-scale variables. It ranges from -1 up to and including +1. A value near 0 indicates there is little linear relationship between the variables. A value near +1 indicates a direct or positive linear relationship between the variables. A value near -1 indicates an inverse or negative linear relationship between the variables.

5 LO13-1 Correlation Analysis – Measuring the Relationship Between Two Variables .

6 Computing the Correlation Coefficient:
LO13-2 Calculate a correlation coefficient to test and interpret the relationship between two variables. Correlation Analysis – Measuring the Relationship Between Two Variables Computing the Correlation Coefficient:

7 Correlation Analysis – Example
LO13-2 Correlation Analysis – Example The sales manager of Copier Sales of America has a large sales force throughout the United States and Canada and wants to determine whether there is a relationship between the number of sales calls made in a month and the number of copiers sold that month. The manager selects a random sample of 15 representatives and determines the number of sales calls each representative made last month and the number of copiers sold. Determine if the number of sales calls and copiers sold are correlated.

8 Correlation Analysis – Example
LO13-2 Correlation Analysis – Example Step 1: State the null and alternate hypotheses. H0:  = 0 (the correlation in the population is 0) H1:  ≠ 0 (the correlation in the population is not 0) Step 2: Select a level of significance. We select a .05 level of significance. Step 3: Identify the test statistic. To test a hypothesis about a correlation we use the t-statistic. For this analysis, there will be n-2 degrees of freedom.

9 Correlation Analysis – Example
LO13-2 Correlation Analysis – Example Step 4: Formulate a decision rule. Reject H0 if: t > t/2,n-2 or t < -t/2,n-2 t > t0.025,13 or t < -t0.025,13 t > or t <

10 Correlation Coefficient – Example
LO13-2 Correlation Coefficient – Example Step 5: Take a sample, calculate the statistics, arrive at a decision. . Numerator

11 Correlation Coefficient – Example
LO13-2 Correlation Coefficient – Example Step 5 (continued): Take a sample, calculate the statistics, arrive at a decision. The t-test statistic, 6.216, is greater than Therefore, reject the null hypothesis that the correlation coefficient is zero. Step 6: Interpret the result. The data indicate that there is a significant correlation between the number of sales calls and copiers sold. We can also observe that the correlation coefficient is .865, which indicates a strong, positive relationship. In other words, more sales calls are strongly related to more copier sales. Please note that this statistical analysis does not provide any evidence of a causal relationship. Another type of study is needed to test that hypothesis.

12 LO13-3 Apply regression analysis to estimate the linear relationship between two variables.
Correlation Analysis tests for the strength and direction of the relationship between two quantitative variables. Regression Analysis evaluates and “measures” the relationship between two quantitative variables with a linear equation. This equation has the same elements as any equation of a line, that is, a slope and an intercept. The relationship between X and Y is defined by the values of the intercept, a, and the slope, b. In regression analysis, we use data (observed values of X and Y) to estimate the values of a and b. Y = a + b X

13 Regression Analysis LO13-3 EXAMPLES
Assuming a linear relationship between the size of a home, measured in square feet, and the cost to heat the home in January, how does the cost vary relative to the size of the home? In a study of automobile fuel efficiency, assuming a linear relationship between miles per gallon and the weight of a car, how does the fuel efficiency vary relative to the weight of a car?

14 Regression Analysis: Variables
LO13-3 Regression Analysis: Variables Y = a + b X Y is the Dependent Variable. It is the variable being predicted or estimated. X is the Independent Variable. For a regression equation, it is the variable used to estimate the dependent variable, Y. X is the predictor variable. Examples of dependent and independent variables: How does the size of a home, measured in number of square feet, relate to the cost to heat the home in January? We would use the home size as, X, the independent variable to predict the heating cost, and Y as the dependent variable. Regression equation: Heating cost = a + b (home size) How does the weight of a car relate to the car’s fuel efficiency? We would use car weight as, X, the independent variable to predict the car’s fuel efficiency, and Y as the dependent variable. Regression equation: Miles per gallon = a + b (car weight)

15 Regression Analysis – Example
LO13-3 Regression Analysis – Example Regression analysis estimates a and b by fitting a line to the observed data. Each line (Y = a + bX) is defined by values of a and b. A way to find the line of “best fit” to the data is the: LEAST SQUARES PRINCIPLE Determining a regression equation by minimizing the sum of the squares of the vertical distances between the actual Y values and the predicted values of Y.

16 Regression Analysis – Example
LO13-3 Regression Analysis – Example Recall the example involving Copier Sales of America. The sales manager gathered information on the number of sales calls made and the number of copiers sold for a random sample of 15 sales representatives. Use the least squares method to determine a linear equation to express the relationship between the two variables. In this example, the number of sales calls is the independent variable, X, and the number of copiers sold is the dependent variable, Y. What is the expected number of copiers sold by a representative who made 20 calls? Number of Copiers Sold = a + b ( Number of Sales Calls)

17 Regression Analysis – Example
LO13-3 Regression Analysis – Example Descriptive statistics: Correlation coefficient:

18 Regression Analysis - Example
LO13-3 Regression Analysis - Example Step 1: Find the slope (b) of the line. Step 2: Find the y-intercept (a). Step 3: Create the regression equation. Number of Copiers Sold = ( Number of Sales Calls) Step 4: What is the predicted number of sales if someone makes 20 sales calls? Number of Copiers Sold = = (20)

19 Regression Analysis ANOVA (Excel) – Example
LO13-3 Regression Analysis ANOVA (Excel) – Example a b Number of Copiers Sold = ( Number of Sales Calls) * Note that the Excel differences in the values of a and b are due to rounding.

20 Regression Analysis: Testing the Significance of the Slope – Example
LO13-4 Evaluate the significance of the slope of the regression equation. Regression Analysis: Testing the Significance of the Slope – Example Step 1: State the null and alternate hypotheses. H0: β = 0 (the slope of the regression equation is 0) H1: β ≠ 0 (the slope of the regression equation is not 0) Step 2: Select a level of significance. We select a .05 level of significance. Step 3: Identify the test statistic. To test a hypothesis about the slope of a regression equation, we use the t-statistic. For this analysis, there will be n-2 degrees of freedom.

21 Regression Analysis: Testing the Significance of the Slope – Example
Step 4: Formulate a decision rule. Reject H0 if: t > t/2,n-2 or t < -t/2,n-2 t > t0.025,13 or t < -t0.025,13 t > or t <

22 Regression Analysis: Testing the Significance of the Slope – Example
Step 5: Take a sample, calculate the ANOVA (Excel), arrive at a decision. . Decision: Reject the null hypothesis that the slope of the regression equation is equal to zero.

23 Regression Analysis: Testing the Significance of the Slope – Example
Step 6: Interpret the result. For the regression equation that predicts the number of copier sales based on the number of sales calls, the data indicate that the slope, (0.2606), is not equal to zero. Therefore, the slope can be interpreted and used to relate the dependent variable (number of copier sales) to the independent variable (number of sales calls). In fact, the value of the slope indicates that for an increase of 1 sales call, the number of copiers sold will increase If a salesperson increases their number of sales calls by 10, the value of the slope indicates that the number of copiers sold is predicted to increase by As in correlation analysis, please note that this statistical analysis does not provide any evidence of a causal relationship. Another type of study is needed to test that hypothesis.

24 Regression Analysis: The Standard Error of Estimate
LO13-5 Evaluate a regression equation’s ability to predict using the standard estimate of the error and the coefficient of determination. Regression Analysis: The Standard Error of Estimate The standard error of estimate measures the scatter, or dispersion, of the observed values around the line of regression for a given value of X. The standard error of estimate is important in the calculation of confidence and prediction intervals. Formula used to compute the standard error: .

25 Regression Analysis ANOVA: The Standard Error of Estimate – Example
LO13-5 Regression Analysis ANOVA: The Standard Error of Estimate – Example Recall the example involving Copier Sales of America. The sales manager determined the least squares regression. Determine the standard error of estimate as a measure of how well the values fit the regression line.

26 Regression Analysis: Coefficient of Determination
LO13-5 Regression Analysis: Coefficient of Determination The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). It is the square of the coefficient of correlation. It ranges from 0 to 1. It does not provide any information on the direction of the relationship between the variables.

27 Regression Analysis: Coefficient of Determination – Example
LO13-5 Regression Analysis: Coefficient of Determination – Example The coefficient of determination, r2, is It can be computed as the correlation coefficient, squared: (0.865)2. The coefficient of determination is expressed as a proportion or percent; we say that 74.8 percent of the variation in the number of copiers sold is explained, or accounted for, by the variation in the number of sales calls.

28 Regression Analysis ANOVA: Coefficient of Determination – Example
LO13-5 Regression Analysis ANOVA: Coefficient of Determination – Example The Coefficient of Determination can also be computed based on its definition. We can divide the Regression Sum of Squares (the variation in the dependent variable explained by the regression equation) divided by the Total Sum of Squares (the total variation in the dependent variable).

29 Regression Analysis: Computing Interval Estimates for Y
LO13-6 Calculate and interpret confidence and prediction intervals. Regression Analysis: Computing Interval Estimates for Y A regression equation is used to predict or estimate the population value of the dependent variable, Y, for a given X. In general, estimates of population parameters are subject to sampling error. Recall that confidence intervals account for sampling error by providing an interval estimate of a population parameter. In regression analysis, interval estimates are also used to provide a complete picture of the point estimate of Y for a given X by computing an interval estimate that accounts for sampling error. In regression analysis, there are two types of intervals: A confidence interval reports the interval estimate for the mean value of Y for a given X. A prediction interval reports the interval estimate for an individual value of Y for a particular value of X.

30 Assumptions underlying linear regression:
LO13-6 LO13-6 Regression Analysis: Computing Interval Estimates for Y Assumptions underlying linear regression: For each value of X, the Y values are normally distributed. The means of these normal distributions of Y values all lie on the regression line. The standard deviations of these normal distributions are equal. The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values.

31 Regression Analysis: Computing Interval Estimates for Y – Example
LO13-6 Regression Analysis: Computing Interval Estimates for Y – Example We return to the Copier Sales of America illustration. Determine a 95 percent confidence interval for all sales representatives, that is, the population mean number of copiers sold, who make 50 sales calls. *Note the values of “a” and “b” differ from the EXCEL values due to rounding. Thus, the 95% confidence interval for all sales representatives who make 50 calls is from up to To interpret, let’s round the values. For all sales representative who make 50 calls, the predicted mean number of copiers sold is 33. The mean sales will range from 27 to 39 copiers.

32 Regression Analysis: Computing Interval Estimates for Y – Example
LO13-6 Regression Analysis: Computing Interval Estimates for Y – Example Comments on calculation: The t-statistic is based on a two-tailed test with n – 2 = 15 – 2 = 13 degrees of freedom. The only new value is . Note that the width of the interval or the margin of error when predicting the dependent variable is related to the standard error of the estimate. *Note the values of “a” and “b” differ from the EXCEL values due to rounding.

33 Regression Analysis: Computing Interval Estimates for Y – Example
LO13-6 Regression Analysis: Computing Interval Estimates for Y – Example We return to the Copier Sales of America illustration. Determine a 95 percent prediction interval for individual sales representatives, such as Sheila Baker, who makes 50 sales calls. *Note the values of “a” and “b” differ from the EXCEL values due to rounding. Thus, the prediction interval of copiers sold by an individual sales person, such as Sheila Baker, who makes 50 sales calls is from up to copiers. Rounding these results, the predicted number of copiers sold will be between 17 and 49. This interval is quite large. It is much larger than the confidence interval for all sales representatives who made 50 calls. It is logical, however, that there should be more variation in the sales estimate for an individual than for the mean of a group.

34 Regression Analysis: Computing Interval Estimates for Y – Example
LO13-6 Regression Analysis: Computing Interval Estimates for Y – Example Comments on calculation: The t-statistic is based on a two-tailed test with n – 2 = 15 – 2 = 13 degrees of freedom. The only new value is Note that the width of the interval or the margin of error when predicting the dependent variable is related to the standard error of the estimate. Also, note that the prediction interval is wider because 1 is added to the sum under the square root sign. *Note the values of “a” and “b” differ from the EXCEL values due to rounding.

35 LO13-6 Regression Analysis: Computing Interval Estimates for Y – Minitab Illustration ConfidenceIntervals Prediction Intervals

36 Regression Analysis: Transforming Non-linear Relationships
LO13-7 Use a log function to transform a nonlinear relationship. Regression Analysis: Transforming Non-linear Relationships One of the assumptions of regression analysis is that the relationship between the dependent and independent variables is LINEAR. Sometimes, two variables have a NON-LINEAR relationship. When this occurs, the data can be transformed to create a linear relationship. The regression analysis is applied on the transformed variables.

37 Regression Analysis: Transforming Non-Linear Relationships
LO13-7 Regression Analysis: Transforming Non-Linear Relationships In this case, the dependent variable, sales, is transformed to the log(sales). The graph shows that the relationship between log(sales) and price is linear. Now regression analysis can be used to create the regression equation between log(sales) and price.


Download ppt "Correlation and Linear Regression"

Similar presentations


Ads by Google