1 Functions and Applications

Slides:



Advertisements
Similar presentations
AP Statistics Section 3.2 C Coefficient of Determination
Advertisements

1 Functions and Applications
Statistics Measures of Regression and Prediction Intervals.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Simple Linear Regression and Correlation (Part II) By Asst. Prof. Dr. Min Aung.
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Chapter 10 Regression. Defining Regression Simple linear regression features one independent variable and one dependent variable, as in correlation the.
2.2 Correlation Correlation measures the direction and strength of the linear relationship between two quantitative variables.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Business Statistics - QBM117 Least squares regression.
CHAPTER 3 Describing Relationships
Introduction to Linear Regression.  You have seen how to find the equation of a line that connects two points.
Descriptive Methods in Regression and Correlation
Linear Regression.
Copyright © Cengage Learning. All rights reserved. 1 Functions and Their Graphs.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Copyright © Cengage Learning. All rights reserved.
Slide Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation.
Copyright © Cengage Learning. All rights reserved. 3 Exponential and Logarithmic Functions.
Introduction to MATLAB for Engineers, Third Edition Chapter 6 Model Building and Regression PowerPoint to accompany Copyright © The McGraw-Hill Companies,
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
1.6 Linear Regression & the Correlation Coefficient.
Linear Regression Least Squares Method: the Meaning of r 2.
Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary.
Functions of Several Variables Copyright © Cengage Learning. All rights reserved.
Linear Regression Least Squares Method: an introduction.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
11/26/2015 V. J. Motto 1 Chapter 1: Linear Models V. J. Motto M110 Modeling with Elementary Functions 1.5 Best-Fit Lines and Residuals.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.3.
Creating a Residual Plot and Investigating the Correlation Coefficient.
Section 2.6 – Draw Scatter Plots and Best Fitting Lines A scatterplot is a graph of a set of data pairs (x, y). If y tends to increase as x increases,
Copyright © Cengage Learning. All rights reserved. 1 STRAIGHT LINES AND LINEAR FUNCTIONS.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Business Statistics for Managerial Decision Making
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
CHAPTER 3 Describing Relationships
Slide Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
PreCalculus 1-7 Linear Models. Our goal is to create a scatter plot to look for a mathematical correlation to this data.
Chapter Correlation and Regression 1 of 84 9 © 2012 Pearson Education, Inc. All rights reserved.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
Copyright © Cengage Learning. All rights reserved. 8 4 Correlation and Regression.
Multiple Regression.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
The simple linear regression model and parameter estimation
Department of Mathematics
1 Functions and Applications
CHAPTER 3 Describing Relationships
The Lease Squares Line Finite 1.3.
Lecture Slides Elementary Statistics Thirteenth Edition
Multiple Regression.
Least Squares Method: the Meaning of r2
Functions and Their Graphs
Correlation and Regression
9/27/ A Least-Squares Regression.
Presentation transcript:

1 Functions and Applications Copyright © Cengage Learning. All rights reserved.

Copyright © Cengage Learning. All rights reserved. Linear Regression 1.4 Copyright © Cengage Learning. All rights reserved.

Linear Regression To find a linear model given two data points: We find the equation of the line that passes through them. However, we often have more than two data points, and they will rarely all lie on a single straight line, but may often come close to doing so. The problem is to find the line coming closest to passing through all of the points.

Linear Regression Suppose, for example, that we are conducting research for a company interested in expanding into Mexico. Of interest to us would be current and projected growth in that country’s economy. The following table shows past and projected per capita gross domestic product (GDP) of Mexico for 2000–2014.

Linear Regression A plot of these data suggests a roughly linear growth of the GDP (Figure 27(a)). These points suggest a roughly linear relationship between t and y, although they clearly do not all lie on a single straight line. Figure 27(a)

Linear Regression Figure 27(b) shows the points together with several lines, some fitting better than others. Can we precisely measure which lines fit better than others? For instance, which of the two lines labeled as “good” fits in Figure 27(b) models the data more accurately? Figure 27(b)

Linear Regression We begin by considering, for each value of t, the difference between the actual GDP (the observed value) and the GDP predicted by a linear equation (the predicted value). The difference between the predicted value and the observed value is called the residual. Residual = Observed Value – Predicted Value

Linear Regression On the graph, the residuals measure the vertical distances between the (observed) data points and the line (Figure 28) and they tell us how far the linear model is from predicting the actual GDP. Figure 28

Linear Regression The more accurate our model, the smaller the residuals should be. We can combine all the residuals into a single measure of accuracy by adding their squares. (We square the residuals in part to make them all positive.) The sum of the squares of the residuals is called the sum-of-squares error, SSE. Smaller values of SSE indicate more accurate models.

Linear Regression Observed and Predicted Values Suppose we are given a collection of data points (x1, y1), …, (xn, yn). The n quantities y1, y2, …, yn are called the observed y-values. If we model these data with a linear equation ŷ = mx + b, then the y-values we get by substituting the given x-values into the equation are called the predicted y-values: ŷ1 = mx1 + b ŷ2 = mx2 + b … ŷn = mxn + b. ŷ stands for “estimated y” or “predicted y.” Substitute x1 for x. Substitute x2 for x. Substitute xn for x.

Linear Regression Quick Example Consider the three data points (0, 2), (2, 5), and (3, 6). The observed y-values are y1 = 2, y2 = 5, and y3 = 6. If we model these data with the equation ŷ = x + 2.5, then the predicted values are: ŷ1 = x1 + 2.5 = 0 + 2.5 = 2.5 ŷ2 = x2 + 2.5 = 2 + 2.5 = 4.5 ŷ3 = x3 + 2.5 = 3 + 2.5 = 5.5.

Linear Regression Residuals and Sum-of-Squares Error (SSE) If we model a collection of data (x1, y1), …, (xn, yn) with a linear equation ŷ = mx + b, then the residuals are the n quantities (Observed Value – Predicted Value): (y1 – ŷ1), (y2 – ŷ2), …, (yn – ŷn). The sum-of-squares error (SSE) is the sum of the squares of the residuals: SSE = (y1 – ŷ1)2 + (y2 – ŷ2)2 + … +(yn – ŷn)2.

Linear Regression Quick Example For the data and linear approximation given above, the residuals are: y1 – ŷ1 = 2 – 2.5 = –0.5 y2 – ŷ2 = 5 – 4.5 = 0.5 y3 – ŷ3 = 6 – 5.5 = 0.5 and so SSE = (–0.5)2 + (0.5)2 + (0.5)2 = 0.75.

Example 1 – Computing SSE Using the data above on the GDP in Mexico, compute SSE for the linear models y = 0.5t + 8 and y = 0.25t + 9. Which model is the better fit? Solution: We begin by creating a table showing the values of t, the observed (given) values of y, and the values predicted by the first model.

Example 1 – Solution cont’d We now add two new columns for the residuals and their squares. SSE, the sum of the squares of the residuals, is then the sum of the entries in the last column, SSE = 8.

Example 1 – Solution cont’d Repeating the process using the second model, 0.25t + 9, yields the following table: This time, SSE = 2 and so the second model is a better fit.

Example 1 – Solution cont’d Figure 29 shows the data points and the two linear models in question. Figure 29

Linear Regression Among all possible lines, there ought to be one with the least possible value of SSE—that is, the greatest possible accuracy as a model. The line (and there is only one such line) that minimizes the sum of the squares of the residuals is called the regression line, the least-squares line, or the best-fit line. To find the regression line, we need a way to find values of m and b that give the smallest possible value of SSE.

Linear Regression Regression Line The regression line (least squares line, best-fit line) associated with the points (x1, y1), (x2, y2), …, (xn, yn) is the line that gives the minimum (SSE).

Linear Regression The regression line is y = mx + b, where m and b are computed as follows: n = number of data points. The quantities m and b are called the regression coefficients.

Linear Regression Here, “” means “the sum of.” Thus, for example, x = Sum of the x-values = x1 + x2 + …+xn xy = Sum of products = x1y1 + x2y2 + …+ xnyn x2 = Sum of the squares of the x-values = x12 + x22 + …+ xn2. On the other hand, (x)2 = Square of x = Square of the sum of the x-values.

Coefficient of Correlation

Coefficient of Correlation If all the data points do not lie on one straight line, we would like to be able to measure how closely they can be approximated by a straight line. We know that SSE measures the sum of the squares of the deviations from the regression line; therefore it constitutes a measurement of what is called “goodness of fit.” (For instance, if SSE = 0, then all the points lie on a straight line.) However, SSE depends on the units we use to measure y, and also on the number of data points (the more data points we use, the larger SSE tends to be).

Coefficient of Correlation Thus, while we can (and do) use SSE to compare the goodness of fit of two lines to the same data, we cannot use it to compare the goodness of fit of one line to one set of data with that of another to a different set of data. To remove this dependency, statisticians have found a related quantity that can be used to compare the goodness of fit of lines to different sets of data. This quantity, called the coefficient of correlation or correlation coefficient, and usually denoted r, is between –1 and 1. The closer r is to –1 or 1, the better the fit.

Coefficient of Correlation For an exact fit, we would have r = –1 (for a line with negative slope) or r = 1 (for a line with positive slope). For a bad fit, we would have r close to 0. Figure 31 shows several collections of data points with least squares lines and the corresponding values of r. Figure 31

Coefficient of Correlation Correlation Coefficient The coefficient of correlation of the n data points (x1, y1), (x2, y2), …, (xn, yn) is It measures how closely the data points (x1, y1), (x2, y2), …, (xn, yn) fit the regression line. (The value r2 is sometimes called the coefficient of determination.)

Coefficient of Correlation Interpretation If r is positive, the regression line has positive slope; if r is negative, the regression line has negative slope. If r = 1 or –1, then all the data points lie exactly on the regression line; if it is close to ±1, then all the data points are close to the regression line. On the other hand, if r is not close to ±1, then the data points are not close to the regression line, so the fit is not a good one. As a general rule of thumb, a value of | r | less than around 0.8 indicates a poor fit of the data to the regression line.

Example 3 – Computing the Coefficient of Correlation Use the following table that shows past and projected per capita gross domestic product (GDP) of Mexico for 2000–2014 and find the correlation coefficient for the same. Is the regression line a good fit?

Example 3 – Solution The formula for r requires x, x2, xy, y, and y2. Let’s organize our work in the form of a table, where the original data are entered in the first two columns and the bottom row contains the column sums.

Example 3 – Solution Substituting these values into the formula we get cont’d Substituting these values into the formula we get As r is close to 1, the fit is a fairly good one; that is, the original points lie nearly along a straight line.