Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.

Similar presentations


Presentation on theme: "1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School."— Presentation transcript:

1 1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School

2 2 Let us pause for a few moments… What are we working on in this chapter?

3 3 Problem Statement If we have a scatter plot that seems “linear”, can we find an equation that generates similar data? How accurate will it be?

4 4 Regression One important branch of inferential statistics, called regression analysis, is used to compare quantities or variables, to discover relationships that exist between them, and to formulate those relationships in useful ways.

5 5 Linear Regression Once a scatter diagram has been produced, we can draw a curve that best fits the pattern exhibited by the sample points. The best-fitting curve for the sample points is called an estimated regression curve. If the points in the scatter diagram seem to lie approximately along a straight line, the relationship is assumed to be linear, and the line that best fits the data points is called the estimated linear regression.

6 6 Linear Regression Linear regression is the process of determining the linear relationship between two variables. If we assume that the best-fitting curve is a line, then the equation of that line will take the form y = ax + b, where a is the slope of the line and b is the y- coordinate of the y -intercept. To identify the estimated regression line, we must find the values of the “regression coefficients” a and b.

7 7 Regression, 1 st approach

8 8 2 nd Approach: Med-Med Line

9 99 How do we evaluate accuracy? Root Mean Square Error (RMS) Sum of Squares of Residuals (SS res )

10 10 3 rd Approach: Least Squares For each x-value in the data set, the corresponding y-value usually differs from the value it would have if the data point were exactly on the line. These differences are shown in the figure by vertical line segments. The most common procedure is to choose the line where the sum of the squares of all these differences is minimized. This is called the method of least squares, and the resulting line is called the least squares line.

11 11 Linear Regression Linear regression is the process of determining the linear relationship between two variables. The line of best fit ( regression line or the least squares line ) is the line such that the sum of the squares of the vertical distances from the line to the data points (on a scatter diagram) is a minimum.

12 12 Linear Regression Formulas The least squares line (regression line) that provides the best fit to the data points ( x 1, y 1 ), ( x 2, y 2 ),… ( x n, y n ) has the equation

13 13 Med-Med vs. Least Squares The Median-Median Line is sometimes called the resistant line because it is not very influenced by one or two “bad” data points. The L east Squares Line uses every point in its calculation, so it is affected by outliers.

14 14 Example 1: Regression Suppose that we wish to get an idea of how the number of hours preparing for a final exam relates to the score on the exam. Data is collected and shown below. Hours12345678910 Score5062 74708678909694

15 15 Linear Regression The first step in analyzing these data is to graph the results as shown in the scatter diagram on the next slide.

16 16 Scatter Diagram

17 17 Linear Regression If we let x denote hours studying and y denote exam score in the data of the previous slide and assume that the best-fitting curve is a line, then the equation of that line will take the form y = mx + b, where m is the slope of the line and b is the y- coordinate of the y-intercept. To identify the estimated regression line, we must find the values of the “regression coefficients” m and b.

18 18 Solution The equation is Example 1: Computing a Least Squares Line

19 19 Estimated Regression Line

20 20 Example: Med-Med vs. Best Fit Using Dobbie, Find the estimated regression line using both methods Hours12345678910 Score5062 74708678909694

21 21 Example 2: Predicting from a Regression Line Use the result from the previous example to predict the exam score for a student that studied 6.5 hours. II) Best Fit: Use the equation and replace x with 6.5. Based on the given data, the student should make about an 81%. I) Med-Med: Use the equation and replace x with 6.5. Based on the given data, the student should make about an 82%.

22 Copyright © 2005 Pearson Education, Inc. 22 13.8 Linear Correlation and Regression

23 Slide 13-23 Copyright © 2005 Pearson Education, Inc. Linear Correlation Linear correlation is used to determine whether there is a relationship between two quantities and, if so, how strong the relationship is.  The linear correlation coefficient, r, is a unitless measure that describes the strength of the linear relationship between two variables. If the value is positive, as one variable increases, the other increases. If the value is negative, as one variable increases, the other decreases. The variable, r, will always be a value between –1 and 1 inclusive.

24 Slide 13-24 Copyright © 2005 Pearson Education, Inc. Scatter Diagrams A visual aid used with correlation is the scatter diagram, a plot of points (bivariate data).  The independent variable, x, generally is a quantity that can be controlled.  The dependant variable, y, is the other variable. The value of r is a measure of how far a set of points varies from a straight line.  The greater the spread, the weaker the correlation and the closer the r value is to 0.

25 Slide 13-25 Copyright © 2005 Pearson Education, Inc. Correlation

26 Slide 13-26 Copyright © 2005 Pearson Education, Inc. Correlation

27 Slide 13-27 Copyright © 2005 Pearson Education, Inc. Linear Correlation Coefficient The formula to calculate the correlation coefficient (r) is as follows:

28 Slide 13-28 Copyright © 2005 Pearson Education, Inc. There are five applicants applying for a job as a medical transcriptionist. The following shows the results of the applicants when asked to type a chart. Determine the correlation coefficient between the words per minute typed and the number of mistakes. Example: Words Per Minute versus Mistakes 934Nancy 1041Kendra 1253Phillip 1167George 824Ellen MistakesWords per MinuteApplicant

29 Slide 13-29 Copyright © 2005 Pearson Education, Inc. We will call the words typed per minute, x, and the mistakes, y. List the values of x and y and calculate the necessary sums. Solution 306811156934 xy = 2,281y 2 = 510 x 2 =10,711y = 50x = 219 10 12 11 8 y Mistakes xyy2y2 x2x2 x 41 53 67 24 WPM 4101001681 6361442809 7371214489 19264576

30 Slide 13-30 Copyright © 2005 Pearson Education, Inc. Solution continued The n in the formula represents the number of pieces of data. Here n = 5.

31 Slide 13-31 Copyright © 2005 Pearson Education, Inc. Solution continued Since 0.86 is fairly close to 1, there is a fairly strong positive correlation. This result implies that the more words typed per minute, the more mistakes made.

32 Slide 13-32 Copyright © 2005 Pearson Education, Inc. Linear Regression Linear regression is the process of determining the linear relationship between two variables. The line of best fit (line of regression or the least square line) is the line such that the sum of the vertical distances from the line to the data points is a minimum.

33 Slide 13-33 Copyright © 2005 Pearson Education, Inc. The Line of Best Fit Equation:

34 Slide 13-34 Copyright © 2005 Pearson Education, Inc. Example Use the data in the previous example to find the equation of the line that relates the number of words per minute and the number of mistakes made while typing a chart. Graph the equation of the line of best fit on a scatter diagram that illustrates the set of bivariate points.

35 Slide 13-35 Copyright © 2005 Pearson Education, Inc. Solution From the previous results, we know that Now we find the y-intercept, b. Therefore the line of best fit is y = 0.081x + 6.452

36 Slide 13-36 Copyright © 2005 Pearson Education, Inc. Solution continued To graph y = 0.081x + 6.452, plot at least two points and draw the graph. 8.88230 8.07220 7.26210 yx

37 Slide 13-37 Copyright © 2005 Pearson Education, Inc. Solution continued


Download ppt "1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School."

Similar presentations


Ads by Google