Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.

Slides:



Advertisements
Similar presentations
AP Statistics Section 3.2 C Coefficient of Determination
Advertisements

Chapter 6: Exploring Data: Relationships Lesson Plan
Scatterplots and Correlation
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Looking at data: relationships Scatterplots IPS chapter 2.1 © 2006 W. H. Freeman and Company.
CHAPTER 3 Describing Relationships
Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel.
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
Regression, Residuals, and Coefficient of Determination Section 3.2.
Objectives (BPS chapter 5)
Linear Regression.
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
Chapter 3: Examining relationships between Data
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Relationships Regression BPS chapter 5 © 2006 W.H. Freeman and Company.
Relationships Regression BPS chapter 5 © 2006 W.H. Freeman and Company.
Looking at data: relationships Scatterplots IPS chapter 2.1 © 2006 W. H. Freeman and Company.
Lesson Least-Squares Regression. Knowledge Objectives Explain what is meant by a regression line. Explain what is meant by extrapolation. Explain.
IPS Chapter 2 DAL-AC FALL 2015  2.1: Scatterplots  2.2: Correlation  2.3: Least-Squares Regression  2.4: Cautions About Correlation and Regression.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
3.2 Least Squares Regression Line. Regression Line Describes how a response variable changes as an explanatory variable changes Formula sheet: Calculator.
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Chapter 7 Scatterplots, Association, and Correlation.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,
Lecture 5 Chapter 4. Relationships: Regression Student version.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Correlation/Regression - part 2 Consider Example 2.12 in section 2.3. Look at the scatterplot… Example 2.13 shows that the prediction line is given by.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
Relationships Scatterplots and Correlation.  Explanatory and response variables  Displaying relationships: scatterplots  Interpreting scatterplots.
The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
Lecture 4 Chapter 3. Bivariate Associations. Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2.
Scatter plots Adapted from 350/
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Lecture 7 Simple Linear Regression. Least squares regression. Review of the basics: Sections The regression line Making predictions Coefficient.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Lecture 9 Sections 3.3 Objectives:
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Examining Relationships Least-Squares Regression & Cautions about Correlation and Regression PSBE Chapters 2.3 and 2.4 © 2011 W. H. Freeman and Company.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Objectives (IPS Chapter 2.3)
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
9/27/ A Least-Squares Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Presentation transcript:

Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to have a numerical description of how both variables vary together. For instance, is one variable increasing faster than the other one? And we would like to make predictions based on that numerical description. But which line best describes our data?

Explanatory (independent) variable: number of beers Response (dependent) variable: blood alcohol content x y Recall: explanatory and response variables A response variable measures or records an outcome of a study - the endpoint of interest in the study. An explanatory variable explains changes in the response variable. Typically, the explanatory or independent variable is plotted on the x axis, and the response or dependent variable is plotted on the y axis.

Some plots don’t have clear explanatory and response variables. Do calories explain sodium amounts? Does percent return on Treasury bills explain percent return on common stocks?

The regression line  A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes - the description is made through the slope and intercept of the line.  We often use a regression line to predict the value of y for a given value of x; the line is also called the prediction line, or the least-squares line, or the best fit line.  In regression, the distinction between explanatory and response variables is important - if you interchange the roles of x and y, the regression line is different. y-hat is the predicted value of y b 1 is the slope b 0 is the y-in tercept

Distances between the points and line are squared so all are positive values. This is done so that distances can be properly added… The regression line The least-squares regression line is the unique line such that the sum of the squares of the vertical (y) distances between the data points and the line is as small as possible ("least squares")

First we calculate the slope of the line, b 1 ; from statistics we already know: r is the correlation. s y is the standard deviation of the response variable y. s x is the the standard deviation of the explanatory variable x. Once we know b 1, the slope, we can calculate b 0, the y-intercept: Where and are the sample means of the x and y variables How to: Typically, we use the TI-83 or JMP (Analyze -> Fit Y by X -> under the red triangle, Fit Line) to compute the b's, rarely using the above formulas…

JMP output the regression line intercept estimate is b 0 slope estimate b 1 is the coefficient (Estimate) of the NEA term

The equation completely describes the regression line. To plot the regression line you only need to plug two x values into the equation, get the two y's, and draw the line that goes through those points. Hint: The regression line always passes through the mean of x and the mean of y. Usually we let JMP draw the regression line as here… BAC= * No. Beers

Unlike in Correlation, the distinction between explanatory and response variables is crucial in Regression. If you exchange y for x in calculating the regression line, you will get a different line. Regression examines the distance of all points from the line in the y direction only. Hubble telescope data about galaxies moving away from earth: These two lines are the two regression lines calculated either correctly (x = distance, y = velocity, solid line) or incorrectly (x = velocity, y = distance, dotted line).

Making predictions The equation of the least-squares regression allows you to predict y for any x within the range studied - be careful about extrapolation! Confirm the regression line's equation below using JMP Nobody in the study drank 6.5 beers, but by finding the value of from the regression line for x = 6.5 we would expect a blood alcohol content of mg/ml.

There is a positive linear relationship between the number of powerboats registered and the number of manatee deaths. (in 1000s) The least squares regression line has the equation: Roughly 21 manatees. Thus if we were to limit the number of powerboat registrations to 500,000, what could we expect for the number of manatee deaths?  ˆ y  0.125x  41.4 ˆ y  0.125x  41.4

Extrapolation is the use of a regression line for predictions outside the range of x values used to obtain the line. This can be a very stupid thing to do, as seen here. Height in Inches !!! Extrapolation

Sometimes the y-intercept is not realistically possible. Here we have negative blood alcohol content, which makes no sense… y-intercept shows negative blood alcohol But the negative value is correct for the equation of the regression line. There is a lot of scatter in the data, and the slope and intercept of the line are both estimates. We'll see later the y- intercept is probably not "significantly different from zero" The y intercept

Coefficient of determination, r 2 r 2 represents the percentage of the variation in y (vertical scatter from the regression line) that can be explained by the regression of y on x. r 2 is the ratio of the variance of the predicted values of y (y-hats - as if no scatter around the line) to the variance of the actual values of y. r 2, the coefficient of determination, is the square of the correlation coefficient. r = 0.87 r 2 = 0.76

r = -1 r 2 = 1 Changes in x explain 100% of the variation in y. Y can be entirely predicted for any given value of x. r = 0 r 2 = 0 y is entirely independent of what value x takes on: r 2 = 0 Here the regression of y on x explains only 76% of the variation in y. The rest of the variation in y (the vertical scatter, shown as red arrows) must be explained by something other than the line. r = 0.87 r 2 = 0.76

r=0.994, r-square=0.988r=0.921, r-square=0.848 Here are two plots of height (response) against age (explanatory) of some children. Notice how r 2 relates to the variation in heights...

r = 0.86, but this is misleading. The elephant is an influential point. Most mammals are very small in comparison. Without this point, r = 0.50 only. Body weight and brain weight in 96 mammal species Now we plot the log of brain weight against the log of body weight. The pattern is linear, with r = The vertical scatter is homogenous → good for predictions of brain weight from body weight (in the log scale).

 Homework:  You should have read through section 2.3, carefully reviewing what you know about lines from your algebra course…Go over Examples in some detail so you understand them!  Do #2.66, , 2.78, 2.82, 2.83 Use JMP to make the scatterplots, describe the association you see between the two variables, and then let JMP calculate the intercept and slope and r-square for the regression line. Explain what the slope and intercept mean in the context of the specific problem. Also, practice doing the above computations with the TI-83 - I'll help you get started with the calculator in class…