The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

AP Statistics Section 3.2 C Coefficient of Determination
2.2 Correlation Correlation measures the direction and strength of the linear relationship between two quantitative variables.
CHAPTER 3 Describing Relationships
Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel.
Regression, Residuals, and Coefficient of Determination Section 3.2.
C HAPTER 3: E XAMINING R ELATIONSHIPS. S ECTION 3.3: L EAST -S QUARES R EGRESSION Correlation measures the strength and direction of the linear relationship.
Linear Regression Analysis
Objectives (BPS chapter 5)
Relationship of two variables
Chapter 3: Examining relationships between Data
Ch 3 – Examining Relationships YMS – 3.1
Relationships Regression BPS chapter 5 © 2006 W.H. Freeman and Company.
Relationships Regression BPS chapter 5 © 2006 W.H. Freeman and Company.
AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
Chapter 4 – Correlation and Regression before: examined relationship among 1 variable (test grades, metabolism, trip time to work, etc.) now: will examine.
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,
Lecture 5 Chapter 4. Relationships: Regression Student version.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Correlation/Regression - part 2 Consider Example 2.12 in section 2.3. Look at the scatterplot… Example 2.13 shows that the prediction line is given by.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 3: Describing Relationships Section 3.2 Least-Squares Regression.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
AP STATISTICS LESSON 3 – 3 (DAY 2) The role of r 2 in regression.
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Lecture 7 Simple Linear Regression. Least squares regression. Review of the basics: Sections The regression line Making predictions Coefficient.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Lecture 9 Sections 3.3 Objectives:
4. Relationships: Regression
4. Relationships: Regression
CHAPTER 3 Describing Relationships
Statistics 101 Chapter 3 Section 3.
Linear Regression Special Topics.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
AP Stats: 3.3 Least-Squares Regression Line
Least-Squares Regression
CHAPTER 3 Describing Relationships
Least-Squares Regression
CHAPTER 3 Describing Relationships
Objectives (IPS Chapter 2.3)
Least-Squares Regression
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Correlation/Regression - part 2
Correlation We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here.
CHAPTER 3 Describing Relationships
Presentation transcript:

The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to have a numerical description ( model ) of how both variables vary together. For instance, is one variable increasing faster than the other one? And we would like to make predictions based on that numerical description. The relationship above looks linear... But which line best describes our data?

The regression line The least-squares regression line is the unique line such that the sum of the squares of the vertical distances of the data points to the line is the smallest possible. ˆ y  0.125x  41.4

And these equations are available in R through the function lm(y~x) ("lm" means "linear model"). Try lm on the manatee data… (manatee.csv)

The equation completely describes the regression line. To plot the regression line you only need to choose two x values, put them into the prediction equation, calculate y, and draw the line that goes through those two points... or let R do it for you with the abline function (abline(lm(y~x))) Hint: The regression line always passes through the mean of x and y. The points you use for drawing the regression line are computed from the equation..125* = * = 46.1 So plot the points (450,14.85) & (700,46.1)  ˆ y  0.125x  41.4 X X

The distinction between explanatory and response variables is crucial in regression. If you exchange y for x in calculating the regression line, you will get a different line. Regression examines the distance of all points from the line in the y direction only. Hubble telescope data about galaxies moving away from earth: These two lines are the two regression lines calculated either correctly (x = distance, y = velocity, solid line) or incorrectly (x = velocity, y = distance, dotted line).

There is a positive linear relationship between the number of powerboats registered and the number of manatee deaths. (in 1000’s) The least squares regression line has the equation: Roughly 21 manatees - do this with R using the predict function (see help(predict)) Thus if we were to limit the number of powerboat registrations to 500,000, what could we expect for the number of manatee deaths?  ˆ y  0.125x  41.4  ˆ y  0.125x  41.4

The least-squares regression line of y on x is the line that minimizes the sum of the squares of the vertical distances of the data points to the line. The equation of the l-s line is usually represented as = b 0 + b 1 x where = the predicted value of y b 0 = the intercept (predicted value of y when x=0) b 1 = the slope of the prediction line The correlation coefficient, r, is related to the l-s regression line as follows: the square of r (r 2 ) is equal to the fraction of the variation in the values of the response variable y that is explained by the least squares regression of y on x. (See next slide)

r=0.994, r-square=0.988r=0.921, r-square=0.848 Here are two plots of height (response) against age (explanatory) of some children. Notice how r 2 relates to the variation in heights...

Homework: –Read pages 8-10 in the Reading & Problems 2.1 on Linear Regression –note the R functions used here: model1=lm(y~x) plot(x,y) ; abline(model1) plot(model1) coef(model1) ; resid(model1) ; fitted(model1) plot(fitted(model1),resid(model1)) –Read at least one of the online sources for simple linear regression ( I like the second one…)

Homework(cont.) –FPG (mg/ml) - fasting plasma glucose (measured at home) HbA (% - measured in doctor's office). Can you predict FPG by HbA? Plot, compute the correlation coefficient, compute and plot the regression line and get a residual plot. Are there any unusual cases? Influential Points? Outliers?