Modeling a Linear Relationship Lecture 47 Secs. 13.1 – 13.3.1 Tue, Apr 25, 2006.

Slides:



Advertisements
Similar presentations
AP Statistics Section 3.2 B Residuals
Advertisements

Residuals.
Least Squares Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Chapter 8 Linear regression
Chapter 8 Linear regression
Lesson Diagnostics on the Least- Squares Regression Line.
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Linear Regression Analysis
Correlation & Regression
Lecture 3: Bivariate Data & Linear Regression 1.Introduction 2.Bivariate Data 3.Linear Analysis of Data a)Freehand Linear Fit b)Least Squares Fit c)Interpolation/Extrapolation.
Linear Regression.
5-7 Scatter Plots. _______________ plots are graphs that relate two different sets of data by displaying them as ordered pairs. Usually scatter plots.
VCE Further Maths Least Square Regression using the calculator.
Sections 9-1 and 9-2 Overview Correlation. PAIRED DATA Is there a relationship? If so, what is the equation? Use that equation for prediction. In this.
Biostatistics Unit 9 – Regression and Correlation.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lesson Least-Squares Regression. Knowledge Objectives Explain what is meant by a regression line. Explain what is meant by extrapolation. Explain.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Regression Regression relationship = trend + scatter
Bivariate Data and Scatter Plots Bivariate Data: The values of two different variables that are obtained from the same population element. While the variables.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Modeling a Linear Relationship Lecture 44 Secs – Tue, Apr 24, 2007.
Scatter Plots, Correlation and Linear Regression.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 3: Describing Relationships Section 3.2 Least-Squares Regression.
9.2 Linear Regression Key Concepts: –Residuals –Least Squares Criterion –Regression Line –Using a Regression Equation to Make Predictions.
AP Statistics HW: p. 165 #42, 44, 45 Obj: to understand the meaning of r 2 and to use residual plots Do Now: On your calculator select: 2 ND ; 0; DIAGNOSTIC.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
^ y = a + bx Stats Chapter 5 - Least Squares Regression
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
Section 1.6 Fitting Linear Functions to Data. Consider the set of points {(3,1), (4,3), (6,6), (8,12)} Plot these points on a graph –This is called a.
Unit 4 Lesson 3 (5.3) Summarizing Bivariate Data 5.3: LSRL.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Calculating the Least Squares Regression Line Lecture 40 Secs Wed, Dec 6, 2006.
Entry Task Write the equation in slope intercept form and find the x and y intercepts. 1. 4x + 6y = 12.
Response Variable: measures the outcome of a study (aka Dependent Variable) Explanatory Variable: helps explain or influences the change in the response.
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Section 1.3 Scatter Plots and Correlation.  Graph a scatter plot and identify the data correlation.  Use a graphing calculator to find the correlation.
Page 1 Introduction to Correlation and Regression ECONOMICS OF ICMAP, ICAP, MA-ECONOMICS, B.COM. FINANCIAL ACCOUNTING OF ICMAP STAGE 1,3,4 ICAP MODULE.
Part II Exploring Relationships Between Variables.
Two-Variable Data Analysis
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
Chapter 4.2 Notes LSRL.
CHAPTER 3 Describing Relationships
CHAPTER 10 Correlation and Regression (Objectives)
Regression and Residual Plots
Unit 4 Vocabulary.
Calculating the Least Squares Regression Line
Least Squares Regression
Modeling a Linear Relationship
Calculating the Least Squares Regression Line
Calculating the Least Squares Regression Line
3.2 – Least Squares Regression
Lesson 2.2 Linear Regression.
Linear Models We will determine and use linear models, and use correlation coefficients.
9/27/ A Least-Squares Regression.
Modeling a Linear Relationship
Calculating the Least Squares Regression Line
Calculating the Least Squares Regression Line
Presentation transcript:

Modeling a Linear Relationship Lecture 47 Secs – Tue, Apr 25, 2006

Bivariate Data Data is called bivariate if each observations consists of a pair of values (x, y). Data is called bivariate if each observations consists of a pair of values (x, y). x is the explanatory variable. x is the explanatory variable. y is the response variable. y is the response variable. x is also called the independent variable. x is also called the independent variable. y is also called the dependent variable. y is also called the dependent variable.

Scatterplots Scatterplot – A display in which each observation (x, y) is plotted as a point in the xy plane. Scatterplot – A display in which each observation (x, y) is plotted as a point in the xy plane.

Example Draw a scatterplot of the percent on-time arrivals vs. percent on-time departures for the 22 airports listed in Exercise 4.29, p. 252, and also in Exercise 13.5, p 822. Draw a scatterplot of the percent on-time arrivals vs. percent on-time departures for the 22 airports listed in Exercise 4.29, p. 252, and also in Exercise 13.5, p 822. OnTimeArrivals.xls. OnTimeArrivals.xls. OnTimeArrivals.xls Does there appear to be a relationship? Does there appear to be a relationship? How can we tell? How can we tell? How would we describe that relationship? How would we describe that relationship?

Linear Association Draw (or imagine) an oval around the data set. Draw (or imagine) an oval around the data set. If the oval is tilted, then there is some linear association. If the oval is tilted, then there is some linear association. If the oval is tilted upwards from left to right, then there is positive association. If the oval is tilted upwards from left to right, then there is positive association. If the oval is tilted downwards from left to right, then there is negative association. If the oval is tilted downwards from left to right, then there is negative association. If the oval is not tilted at all, then there is no association. If the oval is not tilted at all, then there is no association.

Positive Linear Association x y

x y

Negative Linear Association x y

x y

No Linear Association x y

x y

Strong vs. Weak Association The association is strong if the oval is narrow. The association is strong if the oval is narrow. The association is weak if the oval is wide. The association is weak if the oval is wide.

Strong Positive Linear Association x y

x y

Weak Positive Linear Association x y

x y

TI-83 - Scatterplots To set up a scatterplot, To set up a scatterplot, Enter the x values in L 1. Enter the x values in L 1. Enter the y values in L 2. Enter the y values in L 2. Press 2 nd STAT PLOT. Press 2 nd STAT PLOT. Select Plot1 and press ENTER. Select Plot1 and press ENTER.

TI-83 - Scatterplots The Stat Plot display appears. The Stat Plot display appears. Select On and press ENTER. Select On and press ENTER. Under Type, select the first icon (a small image of a scatterplot) and press ENTER. Under Type, select the first icon (a small image of a scatterplot) and press ENTER. For XList, enter L 1. For XList, enter L 1. For YList, enter L 2. For YList, enter L 2. For Mark, select the one you want and press ENTER. For Mark, select the one you want and press ENTER.

TI-83 - Scatterplots To draw the scatterplot, To draw the scatterplot, Press ZOOM. The Zoom menu appears. Press ZOOM. The Zoom menu appears. Select ZoomStat (#9) and press ENTER. The scatterplot appears. Select ZoomStat (#9) and press ENTER. The scatterplot appears. Press TRACE and use the arrow keys to inspect the individual points. Press TRACE and use the arrow keys to inspect the individual points.

Example Use the TI-83 to draw a scatterplot of the following data. Use the TI-83 to draw a scatterplot of the following data. xy

Simple Linear Regression To quantify the linear relationship between x and y, we wish to find the equation of the line that “best” fits the data. To quantify the linear relationship between x and y, we wish to find the equation of the line that “best” fits the data. Typically, there will be many lines that all look pretty good. Typically, there will be many lines that all look pretty good. How do we measure how well a line fits the data? How do we measure how well a line fits the data?

Measuring the Goodness of Fit Start with the scatterplot. Start with the scatterplot. x y

Measuring the Goodness of Fit Draw any line through the scatterplot. Draw any line through the scatterplot. x y

Measuring the Goodness of Fit Measure the vertical distances from every point to the line Measure the vertical distances from every point to the line x y

Measuring the Goodness of Fit Each of these represents a deviation, called a residual e, from the line. Each of these represents a deviation, called a residual e, from the line. x y e

Residuals The i th residual – The difference between the observed value of y i and the predicted value of y i. The i th residual – The difference between the observed value of y i and the predicted value of y i. Use y i ^ for the predicted y i. Use y i ^ for the predicted y i. The formula for the i th residual is The formula for the i th residual is Notice that the residual is positive if the data point is above the line and it is negative if the data point is below the line. Notice that the residual is positive if the data point is above the line and it is negative if the data point is below the line.

Measuring the Goodness of Fit Each of these represents a deviation, called a residual e, from the line. Each of these represents a deviation, called a residual e, from the line. x y e xixi yi^yi^ yiyi

Measuring the Goodness of Fit Find the sum of the squared residuals. Find the sum of the squared residuals. x y e xixi yi^yi^ yiyi

Measuring the Goodness of Fit The smaller the sum of squared residuals, the better the fit. The smaller the sum of squared residuals, the better the fit. x y e xixi yi^yi^ yiyi

Example Consider the data points Consider the data points xy

Example

Least Squares Line Let’s see how good the fit is for the line Let’s see how good the fit is for the line y ^ = x, where y ^ represents the predicted value of y, not the observed value.

Sum of Squared Residuals Begin with the data set. Begin with the data set. xy

Sum of Squared Residuals Compute the predicted y, using y ^ = x. Compute the predicted y, using y ^ = x. xyy^y^

Sum of Squared Residuals Compute the residuals, y – y ^. Compute the residuals, y – y ^. xyy^y^ y – y ^

Sum of Squared Residuals Compute the squared residuals. Compute the squared residuals. xyy^y^ y – y ^ (y – y ^ )

Sum of Squared Residuals Compute the sum of the squared residuals. Compute the sum of the squared residuals. xyy^y^ y – y ^ (y – y ^ )  (y – y ^ ) 2 = 2.00

Sum of Squared Residuals Now let’s see how good the fit is for the line Now let’s see how good the fit is for the line y ^ = x.

Sum of Squared Residuals Begin with the data set. Begin with the data set. xy

Sum of Squared Residuals Compute the predicted y, using y ^ = x. Compute the predicted y, using y ^ = x. xyy^y^

Sum of Squared Residuals Compute the residuals, y – y ^. Compute the residuals, y – y ^. xyy^y^ y – y ^

Sum of Squared Residuals Compute the squared residuals. Compute the squared residuals. xyy^y^ y – y ^ (y – y ^ )

Sum of Squared Residuals Compute the sum of the squared residuals. Compute the sum of the squared residuals. xyy^y^ y – y ^ (y – y ^ )  (y – y ^ ) 2 = 1.70

Sum of Squared Residuals We conclude that y ^ = x is a better fit than y ^ = x. We conclude that y ^ = x is a better fit than y ^ = x.

Sum of Squared Residuals y ^ = x

Sum of Squared Residuals y ^ = x

Least Squares Line Least squares line – The line for which the sum of the squares of the distances is as small as possible. Least squares line – The line for which the sum of the squares of the distances is as small as possible. The least squares line is also called the line of best fit or the regression line. The least squares line is also called the line of best fit or the regression line.

Example For all the lines that one could draw through this data set, For all the lines that one could draw through this data set, it turns out that 1.70 is the smallest possible value for the sum of the squares of the residuals. xy

Example Therefore, Therefore, y ^ = x is the regression line for this data set.

Regression Line We will write regression line as We will write regression line as a is the y-intercept. a is the y-intercept. b is the slope. b is the slope. This is the usual slope-intercept form y = mx + b with the two terms rearranged and relabeled. This is the usual slope-intercept form y = mx + b with the two terms rearranged and relabeled.

TI-83 – Computing Residuals It is not hard to compute the residuals and the sum of their squares on the TI-83. It is not hard to compute the residuals and the sum of their squares on the TI-83. (Later, we will see a faster method.) (Later, we will see a faster method.) Enter the x-values in list L 1 and the y-values in list L 2. Enter the x-values in list L 1 and the y-values in list L 2. Compute a + b*L 1 and store in list L 3 (y ^ values). Compute a + b*L 1 and store in list L 3 (y ^ values). Compute (L 2 – L 3 ) 2. This is a list of the squared residuals. Compute (L 2 – L 3 ) 2. This is a list of the squared residuals. Compute sum(Ans). This is the sum of the squared residuals. Compute sum(Ans). This is the sum of the squared residuals.

TI-83 – Computing Residuals Enter the data set Enter the data set and use the equation y ^ = x to compute the sum of squared residuals. xy

Prediction Use the regression line to predict y when Use the regression line to predict y when x = 4 x = 4 x = 7 x = 7 x = 20 x = 20 Interpolation – Using an x value within the observed extremes of x values to predict y. Interpolation – Using an x value within the observed extremes of x values to predict y. Extrapolation – Using an x value beyond the observed extremes of x values to predict y. Extrapolation – Using an x value beyond the observed extremes of x values to predict y.

Interpolation vs. Extrapolation Interpolated values are more reliable then extrapolated values. Interpolated values are more reliable then extrapolated values. The farther out the values are extrapolated, the less reliable they are. The farther out the values are extrapolated, the less reliable they are.