Least Squares Regression Fitting a Line to Bivariate Data.

Slides:



Advertisements
Similar presentations
STATISTICS Linear Statistical Models
Advertisements

Simple Linear Regression 1. review of least squares procedure 2
AP Statistics Chapter 3 Review.
AP Statistics Section 3.2 C Coefficient of Determination
Chapter 10 Correlation and Regression
Simple Linear Regression Analysis
Correlation and Linear Regression
Multiple Regression and Model Building
CHAPTER 3 Describing Relationships
Least Squares Regression Fitting a Line to Bivariate Data.
Least Squares Regression Fitting a Line to Bivariate Data.
Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel.
Chapter 6 (cont.) Regression Estimation. Simple Linear Regression: review of least squares procedure 2.
Chapters 8, 9, 10 Least Squares Regression Line Fitting a Line to Bivariate Data.
Linear Regression.
Relationship of two variables
1.6 Linear Regression & the Correlation Coefficient.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Chapter 10: Determining How Costs Behave 1 Horngren 13e.
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Part II Exploring Relationships Between Variables.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
Chapter 4 Correlation.
Regression and Residual Plots
Regression Analysis Week 4.
CHAPTER 3 Describing Relationships
Least-Squares Regression
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
Least-Squares Regression
Chapter 3: Describing Relationships
M248: Analyzing data Block D UNIT D2 Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
Least-Squares Regression
Chapter 3: Describing Relationships
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Least Squares Regression Fitting a Line to Bivariate Data

Linear Relationships Avg. occupants per car n 1980: 6/car n 1990: 3/car n 2000: 1.5/car n By the year 2010 every fourth car will have nobody in it! Food for Thought n Kind of mathematical relationship between year and avg. no. of occupants per car? n Why might relation- ship break down by 2010?

Basic Terminology n Scatterplots, correlation: interested in association between 2 variables (assign x and y arbitrarily) n Least squares regression: does one quantitative variable explain or cause changes in another variable?

Basic Terminology (cont.) n Explanatory variable: explains or causes changes in the other variable; the x variable. (independent variable) n Response variable: the y -variable; it responds to changes in the x - variable. (dependent variable)

Examples n Fertilizer (x ) corn yield (y ) n Advertising $ (x ) store income (y ) n Drug dose (x ) blood pressure (y ) n Daily temperature (x ) natural gas demand (y ) n change in min wage(x) unemployment rate (y)

Simplest Relationship n Simplest equation that describes the dependence of variable y on variable x y = b 0 + b 1 x n linear equation n graph is line with slope b 1 and y- intercept b 0

Graph y x0 b0b0 y=b 0 +b 1 x run rise Slope b=rise/run

Notation n (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) n draw the line y= b 0 + b 1 x through the scatterplot, the point on the line corresponding to x i is

Observed y, Predicted y predicted y when x=2.7 yhat = a + bx = a + b*

Scatterplot: Fuel Consumption vs Car Weight “Best” line?

Scatterplot with least squares prediction line

How do we draw the line? Residuals

Residuals: graphically

Criterion for choosing what line to draw: method of least squares n The method of least squares chooses the line that makes the sum of squares of the residuals as small as possible n This line has slope b 1 and intercept b 0 that minimizes

Least Squares Line y = b 0 + b 1 x: Slope b 1 and Intercept b 0

Example: Income vs Consumption Expenditure

Questions n Construct scatterplot; determine if linear model is appropriate. If so … n … find the least squares prediction line n Estimate consumption expenditure in a household with an income of (i) $6,000 (ii) $25,000. Comfortable with estimates? n Compute the residuals

Scatterplot

Solution

Calculations

least squares prediction line

Least Squares Prediction Line

Consumption Expenditure Prediction When x=$6,

Consumption Expenditure Prediction When x=$25,

The least squares line always goes through the point with coordinates (x, y) ( x, y ) = ( 9, 8 )

C. Compute the Residuals

Residuals

Income Residual Plot

 residuals,  residuals) 2 n Note that *  residuals = 0  residuals) 2 = 3.6 *From formula in box on p. 7: SSE=  y i 2 – b 0 *  y i – b 1 *  x i y i 330 – 6.2*40 -.2*392 = 330 – 248 – 78.4 = 3.6 Any other line drawn through the scatterplot will have  residuals) 2 > 3.6

Car Weight, Fuel Consumption Example, cont. (x i, y i ): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)

Wt (x) Fuel (y) col. sum

Calculations

Scatterplot with least squares prediction line

The Least Squares Line Always goes Through ( x, y ) (x, y ) = (2.9, 4.39)

Using the least squares line for prediction. Fuel consumption of 3,000 lb car? (x=3)

Be Careful! Fuel consumption of 500 lb car? (x =.5) x =.5 is outside the range of the x-data that we used to determine the least squares line

Avoid GIGO! Evaluating the least squares line 1. Create scatterplot. Approximately linear? 2. Calculate r 2, the square of the correlation coefficient 3. Examine residual plot

r 2 : The Variation Accounted For n The square of the correlation coefficient r gives important information about the usefulness of the least squares line

r 2 : important information for evaluating the usefulness of the least squares line The square of the correlation coefficient, r 2, is the fraction of the variation in y that is explained by the least squares regression of y on x. -1 ≤ r ≤ 1 implies 0 ≤ r 2 ≤ 1 The square of the correlation coefficient, r 2, is the fraction of the variation in y that is explained by the variation in x.

Example: car weight, fuel consumption n x=car weight, y=fuel consumption r 2 = (.9766) 2 .95 About 95% of the variation in fuel consumption (y) is explained by the linear relationship between car weight (x) and fuel consumption (y). n What else affects fuel consumption? –Driver, size of engine, tires, road, etc.

Example: SAT scores

SAT scores: calculations

SAT scores: result r 2 = (-.868) 2 =.7534 If 57% of NC seniors take the SAT, the predicted mean score is

Avoid GIGO! Evaluating the least squares line 1. Create scatterplot. Approximately linear? 2. Calculate r 2, the square of the correlation coefficient 3. Examine residual plot

Residuals n residual=observed y - predicted y = y - y n Properties of residuals 1.The residuals always sum to 0 (therefore the mean of the residuals is 0) 2.The least squares line always goes through the point (x, y)

Graphically residual = y - y y y i y i e i =y i - y i X x i

Residual Plot n Residuals help us determine if fitting a least squares line to the data makes sense n When a least squares line is appropriate, it should model the underlying relationship; nothing interesting should be left behind n We make a scatterplot of the residuals in the hope of finding… NOTHING!

Car Wt/ Fuel Consump: Residuals n CAR WT. FUEL CONSUMP. Pred FUEL CONSUMP. Residuals n n n n n n n n n n

Example: Car wt/fuel consump. residual plot page 13

SAT Residuals

Linear Relationship?

Garbage In Garbage Out

Residual Plot – Clue to GIGO