 ### Similar presentations

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-2 Chapter 13: Statistics 13.1 Visual Displays of Data 13.2 Measures of Central Tendency 13.3 Measures of Dispersion 13.4 Measures of Position 13.5 The Normal Distribution 13.6 Regression and Correlation

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-5 Regression One important branch of inferential statistics, called regression analysis, is used to compare quantities or variables, to discover relationships that exist between them, and to formulate those relationships in useful ways.

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-6 Regression Suppose that we wish to get an idea of how the number of hours preparing for a final exam relates to the score on the exam. Data is collected and shown below. Hours12345678910 Score5062 74708678909694

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-7 Linear Regression The first step in analyzing these data is to graph the results as shown in the scatter diagram on the next slide.

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-9 Linear Regression Once a scatter diagram has been produce, we can draw a curve that best fits the pattern exhibited by the sample points. The best-fitting curve for the sample points is called an estimated regression curve. If the points in the scatter diagram seem to lie approximately along a straight line, the relationship is assumed to be linear, and the line that best fits the data points is called the estimated linear regression.

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-11 Linear Regression If we let x denote hours studying and y denote exam score in the data of the previous slide and assume that the best-fitting curve is a line, then the equation of that line will take the form y = ax + b, where a is the slope of the line and b is the y- coordinate of the y-intercept. To identify the estimated regression line, we must find the values of the “regression coefficients” a and b.

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-12 Linear Regression For each x-value in the data set, the corresponding y-value usually differs from the value it would have if the data point were exactly on the line. These differences are shown in the figure by vertical line segments. The most common procedure is to choose the line where the sum of the squares of all these differences is minimized. This is called the method of least squares, and the resulting line is called the least squares line.

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-13 Regression Coefficient Formulas The least squares line y’ = ax + b that provides the best fit to the data points (x 1, y 1 ), (x 2, y 2 ),… (x n, y n ) has

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-14 Example: Computing a Least Squares Line Find the equation of the least squares line for the hours and exam score data. Hours12345678910 Score5062 74708678909694

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-16 Example: Predicting from a Least Squares Line Use the result from the previous example to predict the exam score for a student that studied 6.5 hours. Solution Use the equation and replace x with 6.5. Based on the given data, the student should make about an 81%.

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-17 Correlation One common measure of the strength of the linear relationship in the sample is called the sample correlation coefficient, denoted r. It is calculated from the sample data according to the formula on the next slide.

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-18 Sample Correlation Coefficient Formula In linear regression, the strength of the linear relationship is measured by the correlation coefficient r is always between –1 and 1, or perhaps equal to –1 or 1.

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-19 Correlation Coefficient Values of exactly 1 or –1 indicate that the least squares line goes exactly through all the data points. If r is close to 1 or –1, but not exactly equal, then the line comes “close,” and the linear correlation between x and y is “strong.” If r is equal, or nearly equal, to 0, there is no linear correlation or the correlation is weak. If r is neither close to 0 nor close to 1 or –1, we might describe the linear correlation as “moderate.”

© 2008 Pearson Addison-Wesley. All rights reserved 13-6-20 Correlation Coefficient A positive value of r indicates that the linear relationship between x and y is direct; as x increases, y also increases. A negative value of r indicates that there is an inverse relationship between x and y; as x increases, y decreases.