Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

AP Statistics Section 3.2 C Coefficient of Determination
Chapter 8 Linear Regression
Linear Regression.  The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:  The model won’t be perfect, regardless.
Correlation and Linear Regression
Linear Regression (C7-9 BVD). * Explanatory variable goes on x-axis * Response variable goes on y-axis * Don’t forget labels and scale * Statplot 1 st.
Chapter 8 Linear regression
Chapter 8 Linear regression
Linear Regression Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression.
Correlation and Linear Regression
Statistics Measures of Regression and Prediction Intervals.
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Statistics Residuals and Regression. Warm-up In northern cities roads are salted to keep ice from freezing on the roadways between 0 and -9.5 ° C. Suppose.
CHAPTER 8: LINEAR REGRESSION
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 7 Linear Regression.
LSRL Least Squares Regression Line
AP Statistics Chapter 8: Linear Regression
CHAPTER 3 Describing Relationships
Chapter 8: Linear Regression
Linear Regression.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Chapter 6 & 7 Linear Regression & Correlation
AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression *The Linear Model *Residuals *Best Fit Line *Correlation and the Line *Predicated Values *Regression.
Chapter 8 Linear Regression. Slide 8- 2 Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Chapter 8 Linear Regression
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Chapter 8 Linear Regression. Objectives & Learning Goals Understand Linear Regression (linear modeling): Create and interpret a linear regression model.
CHAPTER 8 Linear Regression. Residuals Slide  The model won’t be perfect, regardless of the line we draw.  Some points will be above the line.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
CHAPTER 3 Describing Relationships
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 7, Slide 1 Chapter 7 Linear Regression.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Honors Statistics Chapter 8 Linear Regression. Objectives: Linear model Predicted value Residuals Least squares Regression to the mean Regression line.
Part II Exploring Relationships Between Variables.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
LSRL Least Squares Regression Line
Regression and Residual Plots
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Least-Squares Regression
CHAPTER 3 Describing Relationships
Least-Squares Regression
CHAPTER 3 Describing Relationships
Chapter 8 Part 1 Linear Regression
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
CHAPTER 3 Describing Relationships
Presentation transcript:

Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects the price? (d) Take a guess as to how much (on average) each additional 1000 square foot would increase the price? (e) Would you guess that distance explains about 50% of the variability in airfares, about 65% of this variability, or about 85% of this variability?

What is the line that best “fits” or models this data? In other words, what constitutes a “good” line through a scatterplot?

We can model the relationship with a line and give its equation. No line can go through all the points, but we it can still be a useful “model.” The best line might not even go through any of the points. We want to find the line that comes closer to all the points than any other line.

Residual Definition: the difference between the observed value and its associated predicted value The residual tells us how far off the model’s prediction is at that point. We always subtract the predicted value from the observed one.

A negative residual means: the model made an overestimate A positive residual means: the model made an underestimate.

When we draw a line through the scatterplot, some residuals are positive and some are negative. If we add up all the residuals, what happens? We faced the same issue when we calculated a standard deviation to measure spread. How do we deal with it? We square the residuals! Since squaring them will make them all positive, we can now sum them. Squaring also emphasizes the larger residuals. When we add up all the squared residuals, that sum indicates how well the line we drew fits the data. Do we want a small or large sum?

LINE OF BEST FIT Is the line for which the sum of squared residuals is the smallest. Our line has the property that the variation of data from the model is the smallest it can be for any straight line model for the data. We say that this line “minimizes the sum of squared residuals” – the best fit line becomes the “least squares” line.

Correlation & the Line What we know about correlation can lead us to the equation of the linear model… Let’s look at scatterplots of standardized variables again: What line would you choose to model the relationship of standardized variables?

The line must go through So in z-scores, the line must go through the point: The equation of a line that passes through the origin can be written as y = mx We need to again change it z-scores m = slope so we can say moving over one unit in the z- scores corresponds to moving up m units in the predicted z-scores of y. There are many different slopes that pass through the origin. Which one fits our data best? In other words, which slope minimizes the sum of the squared residuals? It turns out that the best choice for m is the correlation coefficient itself, r! So now, we can write:

What does it tell us? In moving one standard deviation away from the mean in x, we can expect to move about r standard deviations away from the mean in y. In general, moving any number of standard deviations in x moves r times that number of standard deviations in y.

A scatterplot of house prices (in thousands of dollars) vs. house size (in thousands of square feet) shows a relationship that is straight, with only moderate scatter, and no outliers. The correlation is If a house is one SD above the mean in size (making it about 2170 sq ft), how many SD above the mean would you predict its sale price to be? What would you predict about the sale price of a house that’s 2 SDs below average in size?

The regression line in real units If we want to find real values, we don’t always want to convert to z-scores, find the correlation, use the formula for looking at standard deviation changes, and then convert back to the original units…why not write an equation for the line for our data: In Algebra, you learned that an equation for a line was:

y= mx + b Statisticians use slightly different notation:

y – price (in thousands of dollars) x – house size (in thousands of sq. feet) What does the slope mean? What are the units? How much can the homeowner expect the value of his house to increase if he builds an additional 2000 sq feet? How much would you expect to pay for a house of 3000 sq ft?

Calculating a Regression Equation step-by-step Estimate the costs per person associated with traffic delays Annual Cost per person Mean = SD = Peak Period freeway speed Mean = mph SD = mph r = Find the equation of the regression line and write a sentence interpreting your equation.

Summary of Residuals A common theme in statistical modeling is to think of each data point as being composed of 2 parts – the part that is explained by the model (often called the fit) and the “leftover” part, (often called the residual). In the context of least squares regression, the fitted value for an observation is simply the y value that the regression line would predict for the x-value of that observation. The residual is the difference between the actual y value and the fitted y hat. Residual = actual – fitted.

Data = Model + Residual Or Residual = Data – Model In symbols: We can do a “residual plot” in the hopes of finding “nothing.”

The residual plot shown offers a good example of what a problem-free plot should look like. There are no odd fan or curved trends in the plot, the average of the residuals is zero, and the points are equally represented about the x-axis. This residual represents the difference between the observed response variable Y and the value predicted by the regression line.

Accounting for Variation The variation in residuals is the key to assessing how well the model fits. All regression models fall between the two extremes of zero correlation and perfect correlation. Can we gauge where are model falls? Compare a regression model with correlation 0.5 and –0.5 in terms of strength of linearity.

Since they only have different directions, we can square the correlation coefficient to get r 2. R square d gives us the fraction of the data’s variation accounted for by the model, and 1-r squared is the fraction of the original variation left in the residuals. What does r 2 = 0 mean? What does r 2 = 69% mean?

price hat = size Back to our regression of house price The value is reported as 71.4%/ What does this R 2 value mean about the relationship of price and size? Is the correlation positive or negative? How do you know? If we measured the size in thousands of square meters instead of thousands of square feet, would the r 2 value change? What about the slope?