Slide 2 - 42 Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation.

Slides:



Advertisements
Similar presentations
1 Functions and Applications
Advertisements

Review ? ? ? I am examining differences in the mean between groups
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
Chapter 4 The Relation between Two Variables
The Simple Regression Model
SIMPLE LINEAR REGRESSION
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Math 227 Elementary Statistics Math 227 Elementary Statistics Sullivan, 4 th ed.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
SIMPLE LINEAR REGRESSION
CHAPTER 3 Describing Relationships
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Descriptive Methods in Regression and Correlation
Linear Regression.
SIMPLE LINEAR REGRESSION
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Linear Regression and Correlation
Correlation and Linear Regression
Simple Linear Regression
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
Chapter 4 Correlation and Regression Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Anthony Greene1 Regression Using Correlation To Make Predictions.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.3.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.2.
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Business Statistics for Managerial Decision Making
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
CHAPTER 3 Describing Relationships
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Copyright © Cengage Learning. All rights reserved.
1 Functions and Applications
CHAPTER 3 Describing Relationships
SIMPLE LINEAR REGRESSION MODEL
Elementary Statistics
Lecture Slides Elementary Statistics Thirteenth Edition
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Slide Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation

Slide Copyright © 2008 Pearson Education, Inc. Definition 4.1

Slide Copyright © 2008 Pearson Education, Inc. Key Fact 4.1 Figure 4.6

Slide Copyright © 2008 Pearson Education, Inc. Table 4.2 Table 4.2 displays data on age and price for a sample of cars of a particular make and model. We refer to the car as the Orion, but the data, obtained from the Asian Import edition of the Auto Trader magazine, is for a real car. Ages are in years; prices are in hundreds of dollars, rounded to the nearest hundred dollars.

Slide Copyright © 2008 Pearson Education, Inc. Plotting the data in a scatterplot helps us visualize any apparent relationship between age and price. Generally speaking, a scatterplot (or scatter diagram) is a graph of data from two quantitative variables of a population. To construct a scatterplot, we use a horizontal axis for the observations of one variable and a vertical axis for the observations of the other. Each pair of observations is then plotted as a point. Figure 4.7 shows a scatterplot for the age-price data in Table 4.2. Note that we use a horizontal axis for ages and a vertical axis for prices. Each age-price observation is plotted as a point. For instance, the second car in Table 4.2 is 4 years old and has a price of 103 ($10,300). We plot this age-price observation as the point (4, 103), shown in magenta in Fig. 4.7.

Slide Copyright © 2008 Pearson Education, Inc. Figure 4.7

Slide Copyright © 2008 Pearson Education, Inc. Although the age-price data points do not fall exactly on a line, they appear to cluster about a line. We want to fit a line to the data points and use that line to predict the price of an Orion based on its age. Because we could draw many different lines through the cluster of data points, we need a method to choose the “best” line.The method, called the least-squares criterion, is based on an analysis of the errors made in using a line to fit the data points. To introduce the least-squares criterion, we use a very simple data set in Example 4.3. We return to the Orion data shortly.

Slide Copyright © 2008 Pearson Education, Inc. Example 4.3 Consider the problem of fitting a line to the four data points in Table 4.3. Many (in fact, infinitely many) lines can “fit” those four data points. Two possibilities are shown in Figs. 4.9(a) and 4.9(b). Table 4.3

Slide Copyright © 2008 Pearson Education, Inc. Figure 4.9 Example 4.3

Slide Copyright © 2008 Pearson Education, Inc. Example 4.3 To avoid confusion, we use to denote the y-value predicted by a line for a value of x. For instance, the y- value predicted by Line A for x = 2 is and the y-value predicted by Line B for x = 2 is To measure quantitatively how well a line fits the data, we first consider the errors, e, made in using the line to predict the y-values of the data points. For instance, as we have just demonstrated, Line A predicts a y-value of = 3 when x = 2. The actual y-value for x = 2 is y = 2 (see Table 4.3). So, the error made in using Line A to predict the y- value of the data point (2, 2) is e = y − = 2 − 3 =−1, as seen in Fig. 4.9(a).

Slide Copyright © 2008 Pearson Education, Inc. Table 4.4 Example 4.3 In general, an error, e, is the signed vertical distance from the line to a data point. The fourth column of Table 4.4(a) shows the errors made by Line A for all four data points; the fourth column of Table 4.4(b) shows the same for Line B.

Slide Copyright © 2008 Pearson Education, Inc. Key Fact 4.2 Definition 4.2

Slide Copyright © 2008 Pearson Education, Inc. Definition 4.3

Slide Copyright © 2008 Pearson Education, Inc. Formula 4.1

Slide Copyright © 2008 Pearson Education, Inc. Table 4.5 Example 4.4 In the first two columns of Table 4.5, we repeat our data on age and price for a sample of 11 Orions.

Slide Copyright © 2008 Pearson Education, Inc. Example 4.4 a.Determine the regression equation for the data. b.Graph the regression equation and the data points. c.Describe the apparent relationship between age and price of Orions. d.Interpret the slope of the regression line in terms of prices for Orions. e.Use the regression equation to predict the price of a 3-year-old Orion and a 4-year-old Orion.

Slide Copyright © 2008 Pearson Education, Inc. Solution Example 4.4 a.We first need to compute b 1 and b 0 by using Formula 4.1. We did so by constructing a table of values for x (age), y (price), xy, x 2, and their sums in Table 4.5. The slope of the regression line therefore is

Slide Copyright © 2008 Pearson Education, Inc. Solution Example 4.4 a.The y-intercept is So the regression equation is

Slide Copyright © 2008 Pearson Education, Inc. Solution Example 4.4 b.To graph the regression equation, we need to substitute two different x-values in the regression equation to obtain two distinct points. Let’s use the x-values 2 and 8. The corresponding y-values are and Therefore, the regression line goes through the two points (2, ) and (8, 33.39). In Fig. 4.10, we plotted these two points with hollow dots. Drawing a line through the two hollow dots yields the regression line, the graph of the regression equation. Figure 4.10 also shows the data points from the first two columns of Table 4.5.

Slide Copyright © 2008 Pearson Education, Inc. Figure 4.10

Slide Copyright © 2008 Pearson Education, Inc. Solution Example 4.4 c.Because the slope of the regression line is negative, price tends to decrease as age increases, which is no particular surprise. d.Because x represents age, in years, and y represents price, in hundreds of dollars, the slope of −20.26 indicates that Orions depreciate an estimated $2026 per year, at least in the 2- to 7- year-old range. e.For a 3-year-old Orion, x = 3, and the regression equation yields the predicted price of y = Similarly, the predicted price for a 4-year-old Orion is y = Interpretation The estimated price of a 3-year-old Orion is $13,469, and the estimated price of a 4- year-old Orion is $11,443.

Slide Copyright © 2008 Pearson Education, Inc. Extrapolation Suppose that a scatterplot indicates a linear relationship between two variables. Then, within the range of the observed values of the predictor variable, we can reasonably use the regression equation to make predictions for the response variable. However, to do so outside that range, which is called extrapolation, may not be reasonable because the linear relationship between the predictor and response variables may not hold there. Grossly incorrect predictions can result from extrapolation. The Orion example is a case in point. Its observed ages (values of the predictor variable) range from 2 to 7 years old. But suppose that we extrapolate to predict the price of an 11-year-old Orion. Using the regression equation, the predicted price is

Slide Copyright © 2008 Pearson Education, Inc. Extrapolation or −$2739. Clearly, this result is ridiculous: no one is going to pay us $2739 to take away their 11-year-old Orion. Consequently, although the relationship between age and price of Orions appears to be linear in the range from 2 to 7 years old, it is definitely not so in the range from 2 to 11 years old. Figure 4.11 summarizes the discussion on extrapolation as it applies to age and price of Orions.

Slide Copyright © 2008 Pearson Education, Inc. Figure 4.11

Slide Copyright © 2008 Pearson Education, Inc. Outliers and Influential Observations An outlier is an observation that lies outside the overall pattern of the data. In the context of regression, an outlier is a data point that lies far from the regression line, relative to the other data points. An outlier can sometimes have a significant effect on a regression analysis. We must also watch for influential observations. In regression analysis, an influential observation is a data point whose removal causes the regression equation (and line) to change considerably. A data point separated in the x-direction from the other data points is often an influential observation because the regression line is “pulled” toward such a data point without counteraction by other data points.

Slide Copyright © 2008 Pearson Education, Inc. Outliers and Influential Observations For the Orion data, the data point (2, 169) might be an influential observation because the age of 2 years appears separated from the other observed ages. Removing that data point and recalculating the regression equation yields = – 14.24x. Figure 4.12 reveals that this equation differs markedly from the regression equation based on the full data set. The data point (2, 169) is indeed an influential observation. The influential observation (2, 169) is not a recording error; it is a legitimate data point. Nonetheless, we may need either to remove it – thus limiting the analysis to Orions between 4 and 7 years old – or to obtain additional data on 2- and 3- year-old Orions so that the regression analysis is not so dependent on one data point.

Slide Copyright © 2008 Pearson Education, Inc. Figure 4.12

Slide Copyright © 2008 Pearson Education, Inc. Key Fact 4.3

Slide Copyright © 2008 Pearson Education, Inc. Definition 4.4

Slide Copyright © 2008 Pearson Education, Inc. Definition 4.5

Slide Copyright © 2008 Pearson Education, Inc. Example 4.7 The scatterplot and regression line for the age and price data of 11 Orions are repeated in Fig on the next slide. The scatterplot reveals that the prices of the 11 Orions vary widely, ranging from a low of 48 ($4800) to a high of 169 ($16,900). But Fig also shows that much of the price variation is “explained” by the regression (or age); that is, the regression line, with age as the predictor variable, predicts a sizeable portion of the type of variation found in the prices. Make this qualitative statement precise by finding and interpreting the coefficient of determination for the Orion data.

Slide Copyright © 2008 Pearson Education, Inc. Figure 4.15

Slide Copyright © 2008 Pearson Education, Inc. Table 4.6 Solution Example 4.7 To compute the total sum of squares, SST, we must first find the mean of the observed prices. Referring to the second column of Table 4.6, we get

Slide Copyright © 2008 Pearson Education, Inc. Table 4.6 Solution Example 4.7 After constructing the third column of Table 4.6, we calculate the entries for the fourth column and then find the total sum of squares: which is the total variation in the observed prices.

Slide Copyright © 2008 Pearson Education, Inc. Table 4.7 Solution Example 4.7 To compute the regression sum of squares, SSR, we need the predicted prices and the mean of the observed prices. Each predicted price is obtained by substituting the age of the Orion in question for x in the regression equation. The third column of Table 4.7 shows the predicted prices for all 11 Orions.

Slide Copyright © 2008 Pearson Education, Inc. Solution Example 4.7 Recalling that = 88.64, we construct the fourth column of Table 4.7. We then calculate the entries for the fifth column and obtain the regression sum of squares: which is the variation in the observed prices explained by the regression. From SST and SSR, we compute the coefficient of determination, the percentage of variation in the observed prices explained by the regression (i.e., by the linear relationship between age and price for the sampled Orions): Interpretation Evidently, age is quite useful for predicting price because 85.3% of the variation in the observed prices is explained by the regression of price on age.

Slide Copyright © 2008 Pearson Education, Inc. Table 4.8 Error of Sum of Squares To compute SSE, we need the observed prices and the predicted prices. Both quantities are displayed in Table 4.7 and are repeated in the second and third columns of Table 4.8. From the final column, we get the error sum of squares:

Slide Copyright © 2008 Pearson Education, Inc. Error of Sum of Squares Continued which is the variation in the observed prices not explained by the regression. Because the regression line is the line that best fits the data according to the least squares criterion, SSE is also the smallest possible sum of squared errors among all lines.

Slide Copyright © 2008 Pearson Education, Inc. Definition 4.6

Slide Copyright © 2008 Pearson Education, Inc. Understanding the Linear Correlation Coefficient We now discuss some other important properties of the linear correlation coefficient, r. Keep in mind that r measures the strength of the linear relationship between two variables and that the following properties of r are meaningful only when the data points are scattered about a line. r reflects the slope of the scatterplot. The magnitude of r indicates the strength of the linear relationship. The sign of r suggests the type of linear relationship. The sign of r and the sign of the slope of the regression line are identical.

Slide Copyright © 2008 Pearson Education, Inc. Figure 4.17 Understanding the Linear Correlation Coefficient To graphically portray the meaning of the linear correlation coefficient, we present various degrees of linear correlation in Fig