### Similar presentations

3 What You Should Learn Construct scatter plots and interpret correlation Use scatter plots and a graphing utility to find linear models for data NOTE: You really need to read all of these slides so that you have an idea of what we are doing next class. Only record vocabulary in your notes.

4 Scatter Plots and Correlation

5 Many real-life situations involve finding relationships between two variables, such as the year and the number of employees in the cellular telecommunications industry. In a typical situation, data are collected and written as a set of ordered pairs. The graph of such a set is called a scatter plot.

6 Example 1 – Constructing a Scatter Plot The data in the table show the numbers E (in thousands) of employees in the cellular telecommunications industry in the United States from 2002 through 2007. Construct a scatter plot of the data.

7 Example 1 – Solution Begin by representing the data with a set of ordered pairs. Let t represent the year, with t = 2 corresponding to 2002. (2, 192), (3, 206), (4, 226), (5, 233), (6, 254), (7, 267) Then plot each point in a coordinate plane, as shown in Figure 1.66. Figure 1.66

8 Scatter Plots and Correlation From figure 1.66 we can say that, a mathematical equation that approximates the relationship between t and E is a mathematical model. When developing a mathematical model to describe a set of data, you strive for two (often conflicting) goals-accuracy and simplicity. For the data above, a linear model of the form E = at + b (where a and b are constants) appears to be best. It is simple and relatively accurate.

9 Scatter Plots and Correlation Consider a collection of ordered pairs of the form (x, y). If y tends to increase as x increases, then the collection is said to have a positive correlation. If y tends to decrease as x increases, then the collection is said to have a negative correlation.

10 Scatter Plots and Correlation Figure 1.67 shows three examples: one with a positive correlation, one with a negative correlation, and one with no (discernible) correlation. Positive correlationNegative correlationNo correlation Figure 1.67

11 Example 2 – Interpreting Correlation On a Friday, 22 students in a class were asked to record the numbers of hours they spent studying for a test on Monday and the numbers of hours they spent watching television. The results are shown below. (The first coordinate is the number of hours and the second coordinate is the score obtained on the test.) Study Hours: (0, 40), (1, 41), (2, 51), (3, 58), (3, 49), (4, 48), (4, 64), (5, 55), (5, 69), (5, 58), (5, 75), (6, 68), (6, 63), (6, 93), (7, 84), (7, 67), (8, 90), (8, 76), (9, 95), (9, 72), (9, 85), (10, 98)

12 Example 2 – Interpreting Correlation TV Hours: (0, 98), (1, 85), (2, 72), (2, 90), (3, 67), (3, 93), (3, 95), (4, 68), (4, 84), (5, 76), (7, 75), (7, 58), (9, 63), (9, 69), (11, 55), (12, 58), (14, 64), (16, 48), (17, 51), (18, 41), (19, 49), (20, 40) a. Construct a scatter plot for each set of data. b. Determine whether the points are positively correlated, are negatively correlated, or have no discernible correlation. What can you conclude? cont’d

13 Example 2(a) – Solution Scatter plots for the two sets of data are shown in Figure 1.68. Figure 1.68

14 Example 2(b) – Solution The scatter plot relating study hours and test scores has a positive correlation. This means that the more a student studied, the higher his or her score tended to be. The scatter plot relating television hours and test scores has a negative correlation. This means that the more time a student spent watching television, the lower his or her score tended to be. cont’d

15 Fitting a Line to Data

16 Fitting a Line to Data Finding a linear model to represent the relationship described by a scatter plot is called fitting a line to data. This is often called the “line of best fit”.

17 Example 3 – Fitting a Line to Data Find a linear model that relates the year to the number of employees in the cellular telecommunications industry in the United States.

18 Example 3 – Solution Let t represent the year, with t = 2 corresponding to 2002. After plotting the data in the table, draw the line that you think best represents the data, as shown in Figure 1.69. Two points that lie on this line are (3, 206) and (6, 254). Cellular Telecommunications Industry Figure 1.69

19 Example 3 – Solution Using the point-slope form, you can find the equation of the line to be E = 16(t – 3) + 206 = 16t + 158. Linear model cont’d

20 Fitting a Line to Data Once you have found a model, you can measure how well the model fits the data by comparing the actual values with the values given by the model, as shown in the following table. The sum of the squares of the differences between the actual values and the model values is called the sum of the squared differences.

21 Fitting a Line to Data The model that has the least sum is called the least squares regression line for the data. For the model in Example 3, the sum of the squared differences is 54. The least squares regression line for the data is E = 15.0t + 162. Its sum of squared differences is 37. Another way to find a linear model to represent the relationship described by a scatter plot is to enter the data points into a graphing utility and use the linear regression feature. This method is demonstrated in Example 4. Best-fitting linear model

22 The data in the table show the estimated numbers v (in thousands) of alternative-fueled vehicles in use in the United States from 2001 through 2007. (Source: Energy Information Administration) Example 4 – A Mathematical Model

23 Example 4 – A Mathematical Model a. Use the regression feature of a graphing utility to find a linear model for the data. Let t represent the year, with t = 1 corresponding to 2001. b. How closely does the model represent the data? cont’d

24 Example 4(a) – Solution Using the linear regression feature of a graphing utility, you can find that a linear model for the data is V = 42.8t + 388.

25 Example 4(b) – Solution You can see how well the model fits the data by comparing the actual values of V with the values of V given by the model, which are labeled V* in the table below. From the table, you can see that the model appears to be a good fit for the actual data. cont’d

26 Fitting a Line to Data When you use the regression feature of a graphing calculator or computer program to find a linear model for data, you will notice that the program may also output an “r -value.” For instance, the r -value from Example 4 was r  0.994. This r -value is the correlation coefficient of the data and gives a measure of how well the model fits the data. The correlation coefficient r varies between –1 and 1. Basically, the closer | r | is to 1, the better the points can be described by a line.

27 Fitting a Line to Data Three examples are shown in Figure 1.72. r = 0.972 r = – 0.856 r = 0.190 Figure 1.72

29 Classifying Scatter Plots In real life, many relationships between two variables are parabolic. A scatter plot can be used to give you an idea of which type of model will best fit a set of data.

30 Example 1 – Classifying Scatter Plots Decide whether each set of data could be better modeled by a linear model, y = ax + b a quadratic model, y = ax 2 + bx + c or neither.

31 Example 1 – Classifying Scatter Plots a.(0.9, 1.7), (1.2, 2.0), (1.3, 1.9), (1.4, 2.1), (1.6, 2.5), (1.8, 2.8), (2.1, 3.0), (2.5, 3.4), (2.9, 3.7), (3.2, 3.9), (3.3, 4.1), (3.6, 4.4), (4.0, 4.7), (4.2, 4.8), (4.3, 5.0) b.(0.9, 3.2), (1.2, 4.0), (1.3, 4.1), (1.4, 4.4), (1.6, 5.1), (1.8, 6.0), (2.1, 7.6), (2.5, 9.8), (2.9, 2.4),(3.2,14.3), (3.3, 15.2), (3.6, 18.1), (4.0, 22.7), (4.2, 24.9), (4.3, 27.2) c.(0.9, 1.2), (1.2, 6.5), (1.3, 9.3), (1.4, 11.6), (1.6, 15.2), (1.8, 16.9), (2.1, 14.7), (2.5, 8.1), (2.9, 3.7), (3.2, 5.8), (3.3, 7.1), (3.6, 11.5), (4.0, 20.2), (4.2, 23.7), (4.3, 26.9) cont’d

32 Example 1(a) – Solution Begin by entering the data into a graphing utility. Then display the scatter plot, as shown in Figure 2.56. From the scatter plot, it appears the data follow a linear pattern. So, the data can be better modeled by a linear function. Figure 2.56

33 Example 1(c) – Solution Enter the data into a graphing utility and then display the scatter plot (see Figure 2.58). From the scatter plot, it appears the data do not follow either a linear or a parabolic pattern. So, the data cannot be modeled by either a linear function or a quadratic function. cont’d Figure 2.58

34 Fitting a Quadratic Model to Data

35 Example 2 – Fitting a Quadratic Model to Data A study was done to compare the speed x (in miles per hour) with the mileage y (in miles per gallon) of an automobile. The results are shown in the table. a. Use a graphing utility to create a scatter plot of the data. b. Use the regression feature of the graphing utility to find model that best fits the data. c. Approximate the speed at which the mileage is the greatest.

36 Begin by entering the data into a graphing utility and displaying the scatter plot, as shown in Figure 2.59. From the scatter plot, you can see that the data appear to follow a parabolic pattern. Example 2(a) – Solution Figure 2.59

37 Figure 2.60 Example 2(b) – Solution Using the regression feature of the graphing utility, you can find the quadratic model, as shown in Figure 2.60. So, the quadratic equation that best fits the data is given by y = –0.0082x 2 + 0.75x + 13.5. cont’d Quadratic model

38 Example 2(c) – Solution cont’d Graph the data and the model in the same viewing as shown in Figure 2.61. Use the maximum feature or the zoom and trace features of the graphing utility to approximate the speed at which the mileage is greatest. You should obtain a maximum of approximately (46, 31) as shown in Figure 2.61. So, the speed at which the mileage is greatest is about 46 miles per hour. Figure 2.61

40 Classifying Scatter Plots A scatter plot can be used to give you an idea of which type of model will best fit a set of data.

41 Example 1 – Classifying Scatter Plots Decide whether each set of data could best be modeled by a linear model, y = ax + b, an exponential model, y = ab x,or a logarithmic model, y = a + b ln x. a. (2, 1), (2.5, 1.2), (3, 1.3), (3.5, 1.5), (4, 1.8), (4.5, 2), (5, 2.4), (5.5, 2.5), (6, 3.1), (6.5, 3.8), (7, 4.5), (7.5, 5), (8, 6.5), (8.5, 7.8), (9, 9), (9.5, 10) b. (2, 2), (2.5, 3.1), (3, 3.8), (3.5, 4.3), (4, 4.6), (4.5, 5.3), (5, 5.6), (5.5, 5.9), (6, 6.2), (6.5, 6.4), (7, 6.9), (7.5, 7.2), (8, 7.6), (8.5, 7.9), (9, 8), (9.5, 8.2)

42 Example 1 – Classifying Scatter Plots c. (2, 1.9), (2.5, 2.5), (3, 3.2), (3.5, 3.6), (4, 4.3), (4.5, 4.7), (5, 5.2), (5.5, 5.7), (6, 6.4), (6.5, 6.8), (7, 7.2), (7.5, 7.9), (8, 8.6), (8.5, 8.9), (9, 9.5), (9.5, 9.9) Solution: a. From Figure 3.41, it appears that the data can best be modeled by an exponential function. Figure 3.41

43 Example 1 – Solution b. From Figure 3.42, it appears that the data can best be modeled by a logarithmic function. c. From Figure 3.43, it appears that the data can best be modeled by a linear function. cont’d Figure 3.42 Figure 3.43

44 Fitting Nonlinear Models to Data

45 Example 2 – Fitting a Model to a Data Fit the following data from Example 1(a) to an exponential model and a power model. Identify the coefficient of determination and determine which model fits the data better. (2, 1), (2.5, 1.2), (3, 1.3), (3.5, 1.5), (4, 1.8), (4.5, 2), (5, 2.4), (5.5, 2.5), (6, 3.1), (6.5, 3.8), (7, 4.5), (7.5, 5), (8, 6.5), (8.5, 7.8), (9, 9), (9.5, 10)

46 Example 2 – Solution Begin by entering the data into a graphing utility. Then use the regression feature of the graphing utility to find exponential and power models for the data, as shown in Figure 3.44. Figure 3.44 Power Model Exponential Model

47 Example 2 – Solution So, an exponential model for the data is y = 0.507(1.368) x, and a power model for the data is y = 0.249x 1.518. Plot the data and each model in the same viewing window, as shown in Figure 3.45. cont’d Figure 3.45 Power Model Exponential Model

48 Example 2 – Solution To determine which model fits the data better, compare the coefficients of determination for each model. The model whose r 2 -value is closest to 1 is the model that better fits the data. In this case, the better-fitting model is the exponential model. cont’d

49 Modeling with Exponential and Logistic Functions

50 Example 4 – Fitting an Exponential Model to Data The table below shows the amounts of revenue R (in billions of dollars) collected by the Internal Revenue Service (IRS) for selected years from 1963 through 2008. Use a graphing utility to find a model for the data. Then use the model to estimate the revenue collected in 2013. (Source: IRS Data Book)

51 Example 4 – Solution Let x represent the year, with x = 3 corresponding to 1963. Begin by entering the data into a graphing utility and displaying the scatter plot, as shown in Figure 3.48. Figure 3.48

52 Example 4 – Solution From the scatter plot, it appears that an exponential model is a good fit. Use the regression feature of the graphing utility to find the exponential model, as shown in Figure 3.49. Change the model to a natural exponential model, as follows. R = 96.56(1.076) x = 96.56 (1.076)x  96.56 0.073x cont’d Write original model. b = e ln b Figure 3.49 Simplify.

53 Example 4 – Solution Graph the data and the natural exponential model R = 96.56e 0.073x in the same viewing window, as shown in Figure 3.50. From the model, you can see that the revenue collected by the IRS from 1963 through 2008 had an average annual increase of about 7%. From this model, you can estimate the 2013 revenue to be R = 96.56e 0.073x cont’d Figure 3.50 Write natural exponential model.

54 Example 4 – Solution = 96.56e 0.073(53)  \$4624.7 billion which is more than twice the amount collected in 2003. You can also use the value feature of the graphing utility to approximate the revenue in 2013 to be \$4624.7 billion, as shown in Figure 3.50. cont’d Substitute 53 for x. Use a calculator.