The Coefficient of Determination: r 2 Section 3.3.2.

Slides:



Advertisements
Similar presentations
Residuals.
Advertisements

Lesson 10: Linear Regression and Correlation
Least Squares Regression
AP Statistics.  Least Squares regression is a way of finding a line that summarizes the relationship between two variables.
Statistics Measures of Regression and Prediction Intervals.
Chapter 3 Bivariate Data
AP Statistics Chapter 3 Practice Problems
2nd Day: Bear Example Length (in) Weight (lb)
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
C HAPTER 3: E XAMINING R ELATIONSHIPS. S ECTION 3.3: L EAST -S QUARES R EGRESSION Correlation measures the strength and direction of the linear relationship.
Warm-up with 3.3 Notes on Correlation
Lesson Least-Squares Regression. Knowledge Objectives Explain what is meant by a regression line. Explain what is meant by extrapolation. Explain.
Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict.
AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.
1.6 Linear Regression & the Correlation Coefficient.
Linear Regression Least Squares Method: the Meaning of r 2.
Warm-up with 3.3 Notes on Correlation Universities use SAT scores in the admissions process because they believe these scores provide some insight into.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
3.3 Least-Squares Regression.  Calculate the least squares regression line  Predict data using your LSRL  Determine and interpret the coefficient of.
Inference for Regression Section Starter The Goodwill second-hand stores did a survey of their customers in Walnut Creek and Oakland. Among.
Chapter 5 Residuals, Residual Plots, Coefficient of determination, & Influential points.
Section 3.2C. The regression line can be found using the calculator Put the data in L1 and L2. Press Stat – Calc - #8 (or 4) - enter To get the correlation.
Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats.
WARM-UP Do the work on the slip of paper (handout)
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Residuals and Residual Plots Section Starter A study showed that the correlation between GPA and hours of study per week was r =.6 –Which.
SWBAT: Calculate and interpret the residual plot for a line of regression Do Now: Do heavier cars really use more gasoline? In the following data set,
Linear Regression Day 1 – (pg )
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Warm-up O Turn in HW – Ch 8 Worksheet O Complete the warm-up that you picked up by the door. (you have 10 minutes)
LEAST-SQUARES REGRESSION 3.2 Role of s and r 2 in Regression.
LSRLs: Interpreting r vs. r 2 r – “the correlation coefficient” tells you the strength and direction between two variables (x and y, for example, height.
Least Squares Regression Lines Text: Chapter 3.3 Unit 4: Notes page 58.
Unit 4 Lesson 3 (5.3) Summarizing Bivariate Data 5.3: LSRL.
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Chapter 3: Describing Relationships
AP STATISTICS LESSON 3 – 3 (DAY 2) The role of r 2 in regression.
Unit 3 – Association: Contingency, Correlation, and Regression Lesson 3-3 Linear Regression, Residuals, and Variation.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Chapter 5 LSRL. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable Use x to predict.
Bring project data to enter into Fathom
Unit 4 LSRL.
LSRL.
Least Squares Regression Line.
Sections Review.
LEAST – SQUARES REGRESSION
Linear Regression Special Topics.
Chapter 5 LSRL.
Chapter 3.2 LSRL.
Regression and Residual Plots
Least-Squares Regression
No notecard for this quiz!!
Least Squares Regression Line LSRL Chapter 7-continued
AP STATISTICS LESSON 3 – 3 (DAY 2)
Least-Squares Regression
Least Squares Method: the Meaning of r2
Calculating the Least Squares Regression Line
Chapter 5 LSRL.
Chapter 5 LSRL.
Chapter 5 LSRL.
Least-Squares Regression
Least-Squares Regression
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Calculating the Least Squares Regression Line
MATH 2311 Section 5.3.
Presentation transcript:

The Coefficient of Determination: r 2 Section 3.3.2

Starter Write a description of what it means to say that there is a negative association between two variables. (Don’t tell me about the graph!) If there is a strong negative association: –What value would you expect r to take on? –What would you expect the graph to look like?

Objective Use the calculator LinReg command to find the equation of a LSRL Interpret the meaning of r 2 in the context of the data

To find r and the LSRL on the calculator: Enter the explanatory variable data (FEMUR) into L 1 Enter the response variable data (HUMER) into L 2 Tap Zoom:9 to see the scatterplot as usual In the STAT:CALC menu, choose 8:LinReg(a+bx) –Follow the command with L 1,L 2,Y 1 Find Y 1 under VARS:Y-VARS:Function… Tap ENTER to run the command –The screen will show a, b, r and r 2 –If you don’t see r and r 2, enter DIAGNOSTIC:ON Tap GRAPH to see the LSRL on the scatterplot To predict y when x=47 –Trace to x=47 on the Y 1 graph or enter Y 1 (47)

The Sanchez Data Again Put the Sanchez lists (GASDA and GASFT) into L 1 and L 2 Run LinReg again with these data –Sketch the scatterplot and regression line –Write the value of r and r 2 –Write the LSRL equation in context Predict the amount of gas used in a month with 19 degree-days

Answers You should find a = and b =.189 –So: gas used = x coldness –Note that the equation is in the context of the problem You should find r =.995 –This is a very strong positive linear association You should also find r 2 =.991 –Save this for later use You should find that 19 degree-days predicts 468 cu ft of gas (4.68 hundreds)

Variability in Linear Associations Consider the response variable “humerus length” in the archaeopteryx data –Were all specimens the same length? –They are not, so why? There are two possibilities: Larger or smaller animals still should have the same proportions, so the association described by the LSRL leads to larger or smaller y values OR… Random variation – in other words, chance! So which is it? Actually it’s both –So how can we quantify the two causes?

Quantifying Random Variation All the y values vary about the y-mean –Some are greater, some are less than  –The sum of all these deviations is zero –But the sum of the squares is not zero See the example on page 146 So do they also vary about the LSRL, or do they lie exactly on the line? –If all the points lie on the line, then the y-variability must all have come from the linear association. –If points randomly miss the line, then there are non-zero deviations, and the sum of their squares is not zero. See the example on page 147 So the ratio of these two area sums can be a measure of random variation.

Finding the Ratio of Areas The first area (squares of deviations about  ) is called SSM: Sum of Squares about Mean The second area (squares of deviations about y-hat) is called SSE: Sum of Squares for Error Then the ratio SSE / SSM tells us how much of the variability is due to random chance So 1 – SSE/SSM tells us how much of the y-variation is due to the association –The author expresses this as (SSM-SSE)/SSM on page 147 –Note that if all points are on the LSRL, then SSE = 0 so 100% of the y- variability is due to the linear association Here’s the punch line: –First find r, the correlation constant, by linear regression –Then it turns out that r 2 is equal to the fraction (SSM-SSE)/SSM It is called the coefficient of determination The author chooses to skip the proof; so do I! –Note that when you run LinReg to find r, you also get r 2 at the same time. So r 2 expresses the proportion of y-variation that is due to the linear association and 1 – r 2 is the proportion that is due to random chance.

The Sanchez Data We previously pasted the Sanchez lists (GASDA & GASFT) into L 1 & L 2 and ran LinReg to find r and r 2 Write a sentence that answers this question: What proportion of the variability in the gas usage data is attributable to the linear association with coldness of weather (as measured in degree-days)?

Since r 2 =.991, we conclude that about 99% of the variability in gas usage can be accounted for by the least squares regression line equation and about 1% is due to random chance.

Objective Use the calculator LinReg command to find the equation of a LSRL Interpret the meaning of r 2 in the context of the data

Homework Read pages 144 – 150 Do problems 36, 37, 38