Describing the Relation Between Two Variables

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 4 The Relation between Two Variables
Chapter Describing the Relation between Two Variables © 2010 Pearson Prentice Hall. All rights reserved 3 4.
Chapter 3 Bivariate Data
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Chapter 4 Describing the Relation Between Two Variables
Relationships Between Quantitative Variables
Chapter 10 Relationships between variables
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Linear Regression and Correlation Analysis
Topic 2 Bivariate Data. Data for a single variable is univariate data Many or most real world models have more than one variable … multivariate data In.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Linear Regression and Correlation
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Math 227 Elementary Statistics Math 227 Elementary Statistics Sullivan, 4 th ed.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Least Squares Regression
Scatter Diagrams and Correlation
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Linear Regression Analysis
Descriptive Methods in Regression and Correlation
Linear Regression.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Introduction to Linear Regression and Correlation Analysis
Relationship of two variables
STAT 211 – 019 Dan Piett West Virginia University Lecture 2.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
Chapter 4 Correlation and Regression Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Chapter 6 & 7 Linear Regression & Correlation
M22- Regression & Correlation 1  Department of ISM, University of Alabama, Lesson Objectives  Know what the equation of a straight line is,
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 1 of 30 Chapter 4 Section 1 Scatter Diagrams and Correlation.
1 Chapter 7 Scatterplots, Association, and Correlation.
4.1 Scatter Diagrams and Correlation. 2 Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Chapter 10 Correlation and Regression
Chapter 4 Describing the Relation Between Two Variables 4.1 Scatter Diagrams; Correlation.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Bivariate Data and Scatter Plots Bivariate Data: The values of two different variables that are obtained from the same population element. While the variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Scatter Diagrams and Correlation Variables ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall.
Chapter 4 Summary Scatter diagrams of data pairs (x, y) are useful in helping us determine visually if there is any relation between x and y values and,
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Lesson Scatter Diagrams and Correlation. Objectives Draw and interpret scatter diagrams Understand the properties of the linear correlation coefficient.
The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,
Copyright © 2010 Pearson Education, Inc. Chapter 7 Scatterplots, Association, and Correlation.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Least Squares Regression.   If we have two variables X and Y, we often would like to model the relation as a line  Draw a line through the scatter.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Linear Regression Day 1 – (pg )
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Discovering Mathematics Week 9 – Unit 6 Graphs MU123 Dr. Hassan Sharafuddin.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Response Variable: measures the outcome of a study (aka Dependent Variable) Explanatory Variable: helps explain or influences the change in the response.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Chapter 3: Describing Relationships
SIMPLE LINEAR REGRESSION MODEL
CHAPTER 10 Correlation and Regression (Objectives)
Lecture Slides Elementary Statistics Thirteenth Edition
Lecture Notes The Relation between Two Variables Q Q
Presentation transcript:

Describing the Relation Between Two Variables Chapter 4 Describing the Relation Between Two Variables

Overview Data for a single variable is univariate data Many or most real world models have more than one variable … multivariate data In this chapter we will study the relations between two variables … bivariate data

Chapter 4 Chapter 4 – Describing the Relation Between Two Variables Only section 1 and 2 Scatter Diagrams and Correlation Least-Squares Regression

Scatter Diagrams and Correlation Chapter 4 Section 1 Scatter Diagrams and Correlation

Chapter 4 – Section 1 In many studies, we measure more than one variable for each individual Some examples are Rainfall amounts and plant growth Exercise and cholesterol levels for a group of people Height and weight for a group of people In these cases, we are interested in whether the two variables have some kind of a relationship In many studies, we measure more than one variable for each individual In many studies, we measure more than one variable for each individual Some examples are Rainfall amounts and plant growth Exercise and cholesterol levels for a group of people Height and weight for a group of people

Chapter 4 – Section 1 When we have two variables, they could be related in one of several different ways They could be unrelated One variable (the explanatory or predictor variable) could be used to explain the other (the response or dependent variable) One variable could be thought of as causing the other variable to change In this chapter, we examine the second case … explanatory and response variables When we have two variables, they could be related in one of several different ways They could be unrelated One variable (the explanatory or predictor variable) could be used to explain the other (the response or dependent variable) One variable could be thought of as causing the other variable to change When we have two variables, they could be related in one of several different ways They could be unrelated When we have two variables, they could be related in one of several different ways When we have two variables, they could be related in one of several different ways They could be unrelated One variable (the explanatory or predictor variable) could be used to explain the other (the response or dependent variable)

Chapter 4 – Section 1 Sometimes it is not clear which variable is the explanatory variable and which is the response variable Sometimes the two variables are related without either one being an explanatory variable Sometimes the two variables are both affected by a third variable, a lurking variable, that had not been included in the study

Chapter 4 – Section 1 An example of a lurking variable A researcher studies a group of elementary school children Y = the student’s height X = the student’s shoe size It is not reasonable to claim that shoe size causes height to change The lurking variable of age affects both of these two variables An example of a lurking variable A researcher studies a group of elementary school children Y = the student’s height X = the student’s shoe size An example of a lurking variable A researcher studies a group of elementary school children Y = the student’s height X = the student’s shoe size It is not reasonable to claim that shoe size causes height to change

Chapter 4 – Section 1 Some other examples Rainfall amounts and plant growth Explanatory variable – rainfall Response variable – plant growth Possible lurking variable – amount of sunlight Exercise and cholesterol levels Explanatory variable – amount of exercise Response variable – cholesterol level Possible lurking variable – diet Some other examples Rainfall amounts and plant growth Explanatory variable – rainfall Response variable – plant growth Possible lurking variable – amount of sunlight

Chapter 4 – Section 1 The most useful graph to show the relationship between two quantitative variables is the scatter diagram Each individual is represented by a point in the diagram The explanatory (X) variable is plotted on the horizontal scale The response (Y) variable is plotted on the vertical scale

Chapter 4 – Section 1 An example of a scatter diagram Note the truncated vertical scale!

Chapter 4 – Section 1 There are several different types of relations between two variables A relationship is linear when, plotted on a scatter diagram, the points follow the general pattern of a line A relationship is nonlinear when, plotted on a scatter diagram, the points follow a general pattern, but it is not a line A relationship has no correlation when, plotted on a scatter diagram, the points do not show any pattern There are several different types of relations between two variables A relationship is linear when, plotted on a scatter diagram, the points follow the general pattern of a line A relationship is nonlinear when, plotted on a scatter diagram, the points follow a general pattern, but it is not a line There are several different types of relations between two variables A relationship is linear when, plotted on a scatter diagram, the points follow the general pattern of a line There are several different types of relations between two variables

Chapter 4 – Section 1 Linear relations have points that cluster around a line Linear relations can be either positive (the points slants upwards to the right) or negative (the points slant downwards to the right)

Chapter 4 – Section 1 For positive (linear) associations Examples Above average values of one variable are associated with above average values of the other (above/above, the points trend right and upwards) Below average values of one variable are associated with below average values of the other (below/below, the points trend left and downwards) Examples “Age” and “Height” for children “Temperature” and “Sales of ice cream” For positive (linear) associations Above average values of one variable are associated with above average values of the other (above/above, the points trend right and upwards) Below average values of one variable are associated with below average values of the other (below/below, the points trend left and downwards)

Chapter 4 – Section 1 For negative (linear) associations Examples Above average values of one variable are associated with below average values of the other (above/below, the points trend right and downwards) Below average values of one variable are associated with above average values of the other (below/above, the points trend left and upwards) Examples “Age” and “Time required to run 50 meters” for children “Temperature” and “Sales of hot chocolate” For negative (linear) associations Above average values of one variable are associated with below average values of the other (above/below, the points trend right and downwards) Below average values of one variable are associated with above average values of the other (below/above, the points trend left and upwards)

Chapter 4 – Section 1 Nonlinear relations have points that have a trend, but not around a line The trend has some bend in it

Chapter 4 – Section 1 When two variables are not related There is no linear trend There is no nonlinear trend Changes in values for one variable do not seem to have any relation with changes in the other

Chapter 4 – Section 1 Nonlinear relations and no relations are very different Nonlinear relations are definitely patterns … just not patterns that look like lines No relations are when no patterns appear at all

Chapter 4 – Section 1 Examples of nonlinear relations “Age” and “Height” for people (including both children and adults) “Temperature” and “Comfort level” for people Examples of no relations “Temperature” and “Closing price of the Dow Jones Industrials Index” (probably) “Age” and “Last digit of telephone number” for adults Examples of nonlinear relations “Age” and “Height” for people (including both children and adults) “Temperature” and “Comfort level” for people

Chapter 4 – Section 1 The linear correlation coefficient is a measure of the strength of linear relation between two quantitative variables The sample correlation coefficient “r” is This should be computed with software (and not by hand) whenever possible

Chapter 4 – Section 1 Some properties of the linear correlation coefficient r is a unitless measure (so that r would be the same for a data set whether x and y are measured in feet, inches, meters, or fathoms) r is always between –1 and +1 Positive values of r correspond to positive relations Negative values of r correspond to negative relations Some properties of the linear correlation coefficient r is a unitless measure (so that r would be the same for a data set whether x and y are measured in feet, inches, meters, or fathoms) r is always between –1 and +1 Positive values of r correspond to positive relations Some properties of the linear correlation coefficient r is a unitless measure (so that r would be the same for a data set whether x and y are measured in feet, inches, meters, or fathoms) r is always between –1 and +1 Some properties of the linear correlation coefficient r is a unitless measure (so that r would be the same for a data set whether x and y are measured in feet, inches, meters, or fathoms)

Chapter 4 – Section 1 Some more properties of the linear correlation coefficient The closer r is to +1, the stronger the positive relation … when r = +1, there is a perfect positive relation The closer r is to –1, the stronger the negative relation … when r = –1, there is a perfect negative relation The closer r is to 0, the less of a linear relation (either positive or negative) Some more properties of the linear correlation coefficient The closer r is to +1, the stronger the positive relation … when r = +1, there is a perfect positive relation Some more properties of the linear correlation coefficient The closer r is to +1, the stronger the positive relation … when r = +1, there is a perfect positive relation The closer r is to –1, the stronger the negative relation … when r = –1, there is a perfect negative relation

Chapter 4 – Section 1 Examples of positive correlation In general, if the correlation is visible to the eye, then it is likely to be strong Strong Positive r = .8 Moderate Positive r = .5 Very Weak r = .1

Chapter 4 – Section 1 Examples of negative correlation In general, if the correlation is visible to the eye, then it is likely to be strong Strong Negative r = –.8 Moderate Negative r = –.5 Very Weak r = –.1

Chapter 4 – Section 1 Nonlinear correlation and no correlation Both sets of variables have r = 0.1, but the difference is that the nonlinear relation shows a clear pattern Nonlinear Relation No Relation

Chapter 4 – Section 1 Correlation is not causation! Just because two variables are correlated does not mean that one causes the other to change There is a strong correlation between shoe sizes and vocabulary sizes for grade school children Clearly larger shoe sizes do not cause larger vocabularies Clearly larger vocabularies do not cause larger shoe sizes Often lurking variables result in confounding Correlation is not causation! Just because two variables are correlated does not mean that one causes the other to change There is a strong correlation between shoe sizes and vocabulary sizes for grade school children Clearly larger shoe sizes do not cause larger vocabularies Clearly larger vocabularies do not cause larger shoe sizes Correlation is not causation! Just because two variables are correlated does not mean that one causes the other to change Correlation is not causation!

Summary: Chapter 4 – Section 1 Correlation between two variables can be described with both visual (graphic) and numeric methods Visual methods Scatter diagrams Numeric methods Linear correlation coefficient

Least-Squares Regression Chapter 4 Section 2 Least-Squares Regression

Chapter 4 – Section 2 If we have two variables X and Y, we often would like to model the relation as a line Draw a line through the scatter diagram We want to find the line that “best” describes the linear relationship … the regression line

Chapter 4 – Section 2 We want to use a linear model Linear models can be written in several different (equivalent) ways y = m x + b y – y1 = m (x – x1) y = b1 x + b0 Because the slope and the intercept are important to analyze, we will use We want to use a linear model We want to use a linear model Linear models can be written in several different (equivalent) ways y = m x + b y – y1 = m (x – x1) y = b1 x + b0

Residual = Observed – Predicted Chapter 4 – Section 2 The difference between the observed value and the predicted value is called an error or residual The formula for the residual is always Residual = Observed – Predicted

Chapter 4 – Section 2 For example, say that we want to predict a value of y for a specific value of x Assume that we are using y = 10 x + 25 as our model To predict the value of y when x = 3, the model gives us y = 10  3 + 25 = 55, or a predicted value of 55 Assume the actual value of y for x = 3 is equal to 50 The actual value is 50, the predicted value is 55, so the residual (or error) is 50 – 55 = –5 For example, say that we want to predict a value of y for a specific value of x Assume that we are using y = 10 x + 25 as our model To predict the value of y when x = 3, the model gives us y = 10  3 + 25 = 55, or a predicted value of 55 Assume the actual value of y for x = 3 is equal to 50 For example, say that we want to predict a value of y for a specific value of x Assume that we are using y = 10 x + 25 as our model For example, say that we want to predict a value of y for a specific value of x For example, say that we want to predict a value of y for a specific value of x Assume that we are using y = 10 x + 25 as our model To predict the value of y when x = 3, the model gives us y = 10  3 + 25 = 55, or a predicted value of 55

Chapter 4 – Section 2 What the residual is on the scatter diagram The model line The observed value y The predicted value y The x value of interest

Chapter 4 – Section 2 We want to minimize the residuals, but we need to define what this means We use the method of least-squares We consider a possible linear mode We calculate the residual for each point We add up the squares of the residuals The line that has the smallest is called the least-squares regression line We want to minimize the residuals, but we need to define what this means We want to minimize the residuals, but we need to define what this means We use the method of least-squares We consider a possible linear mode We calculate the residual for each point We add up the squares of the residuals

Chapter 4 – Section 2 The equation for the least-squares regression line is given by y = b1x + b0 b1 is the slope of the least-squares regression line b0 is the y-intercept of the least-squares regression line

Chapter 4 – Section 2 Finding the values of b1 and b0, by hand, is a very tedious process You should use software for this Finding the coefficients b1 and b0 is only the first step of a regression analysis We need to interpret the slope b1 We need to interpret the y-intercept b0

Chapter 4 – Section 2 Interpreting the slope b1 The slope is sometimes referred to as The slope is also sometimes referred to as The slope relates changes in y to changes in x

Chapter 4 – Section 2 For example, if b1 = 4 For example, if b1 = –7 If x increases by 1, then y will increase by 4 If x decreases by 1, then y will decrease by 4 A positive linear relationship For example, if b1 = –7 If x increases by 1, then y will decrease by 7 If x decreases by 1, then y will increase by 7 A negative linear relationship For example, if b1 = 4 If x increases by 1, then y will increase by 4 If x decreases by 1, then y will decrease by 4 A positive linear relationship

Chapter 4 – Section 2 For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) The model used is y = 300 x + 12,000 A slope of 300 means that the model predicts that, on the average, the population increases by 300 per year For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) The model used is y = 300 x + 12,000

Chapter 4 – Section 2 Interpreting the y-intercept b0 Sometimes b0 has an interpretation, and sometimes not If 0 is a reasonable value for x, then b0 can be interpreted as the value of y when x is 0 If 0 is not a reasonable value for x, then b0 does not have an interpretation In general, we should not use the model for values of x that are much larger or much smaller than the observed values Interpreting the y-intercept b0 Sometimes b0 has an interpretation, and sometimes not If 0 is a reasonable value for x, then b0 can be interpreted as the value of y when x is 0 If 0 is not a reasonable value for x, then b0 does not have an interpretation

Chapter 4 – Section 2 For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) The model used is y = 300 x + 12,000 An intercept of 12,000 means that the model predicts that the town had a population of 12,000 in the year 1900 (i.e. when x = 0) For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) The model used is y = 300 x + 12,000

Chapter 4 – Section 2 After finding the slope b1 and the intercept b0, it is very useful to compute the residuals, particularly Again, this is a tedious computation All the least-squares regression software would compute this quantity

Summary: Chapter 4 – Section 2 We can find the least-squares regression line that is the “best” linear model for a set of data The slope can be interpreted as the change in y for every change of 1 in x The intercept can be interpreted as the value of y when x is 0, as long as a value of 0 for x is reasonable