# Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.

## Presentation on theme: "Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We."— Presentation transcript:

Chapter 5 Regression

Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict the average response for all subjects with a given value of the explanatory variable. Linear Regression

Regression Line A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x Chapter 52

Example – Staying Thin STATE: Some people don’t gain weight even when they overeat. Maybe differences in “nonexercise activity” account for this. Researchers deliberately fed 16 healthy young adults for eight weeks and measured fat gain (kg) and changes in energy use (cal) from nonexercise activities. Chapter 53

Example – Staying Thin Question: Do people with large increases in nonexercise activity (NEA) gain less weight? Chapter 54

Example – Staying Thin PLAN: Make a scatterplot of the data and examine the pattern. If it’s linear, use correlation to measure the strength and direction. Draw a regression line on the plot to predict fat gain from change in NEA. Chapter 55

6

Example – Staying Thin SOLVE: The scatterplot shows a moderately strong negative correlation: r = -0.78. The line on the plot is the regression line that predicts weight gain from changes in NEA Chapter 57

8 Least Squares u Used to determine the “best” line u We want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict) u Least Squares: use the line that minimizes the sum of the squares of the vertical distances of the data points from the line

Straight Lines A straight line relating response variable y to explanatory variable x is given by: b is the slope, the amount y changes when x increases by one unit a is the intercept, the value of y when x = 0 Chapter 59

10 Least Squares Regression Line u Regression equation: y = a + bx ^ –x is the value of the explanatory variable –“y-hat” is the average value of the response variable (predicted response for a value of x) –note that a and b are just the intercept and slope of a straight line –note that r and b are not the same thing, but their signs will agree

Example – Using Regression Line Any line describing the weight gain-NEA data has the form: The line in the plot has specific values for a and b to produce this regression line: Chapter 511

Chapter 512

Example – Using Regression Line u The slope b = -0.00344 says that for each added calorie of NEA, fat gain goes down by 0.00344 kg. This is the rate of change of fat gain with changes in NEA. u The intercept a = 3.505 says that our estimate of the amount of weight gained for someone whose NEA doesn’t change at all (NEA change = 0) is 3.505 kg. Chapter 513

Regression Line Calculation Chapter 514 ^ u Regression equation: y = a + bx where s x and s y are the standard deviations of the two variables, and r is their correlation

Chapter 515 Regression Calculation Case Study Per Capita Gross Domestic Product and Average Life Expectancy for Countries in Western Europe

Chapter 516 CountryPer Capita GDP (x)Life Expectancy (y) Austria21.477.48 Belgium23.277.53 Finland20.077.32 France22.778.63 Germany20.877.17 Ireland18.676.39 Italy21.578.51 Netherlands22.078.15 Switzerland23.878.99 United Kingdom21.277.37 Regression Calculation Case Study

Chapter 517 Linear regression equation:

Chapter 518 Coefficient of Determination (R 2 ) u Measures usefulness of regression prediction u R 2 (or r 2, the square of the correlation): measures what fraction of the variation in the values of the response variable (y) is explained by the regression line v r=1: R 2 =1:regression line explains all (100%) of the variation in y v r=.7: R 2 =.49:regression line explains almost half (50%) of the variation in y

Example – Using R 2 For the NEA data (predicting weight gain with changes in NEA): That is, about 61% of the variation in weight gained is accounted for by the linear relationship with changes in NEA. The other 39% or so is not explained by this relationship. Chapter 519

Chapter 520 Residuals u A residual is the difference between an observed value of the response variable and the value predicted by the regression line: residual = y  y ^

Chapter 521 Residuals u A residual plot is a scatterplot of the regression residuals against the explanatory variable – used to assess the fit of a regression line – look for a “random” scatter around zero

Example - Empathy u “Empathy” is understanding how others feel u To investigate empathy, researchers measured brain activity in women who watched their partner receive a harmless but painful electric shock and compared that to their score on an ‘empathy test’ u Question: do women who score high on empathy also respond more to watching their partner being zapped? Chapter 522

Example 5.5 - Empathy u The next slide displays a scatterplot of the data collected by the researchers u There’s a positive association – as we suspected u The overall pattern is moderately linear with u The line on the plot is the least squares line of brain activity vs. empathy score Chapter 523

Chapter 524

Example 5.5 - Empathy u The least squares regression line for brain activity vs. empathy score is given by: u For subject 1 whose values are empathy score = 38 and brain activity value = -0.120, we predict: Chapter 525

Example 5.5 - Empathy u So the residual for subject 1 the residual is: u The residual is negative because the regression line is above the observed value Chapter 526

Chapter 527

Example 5.5 - Empathy u We can plot the residuals themselves against the explanatory variable to create a residual plot u The residual plot helps to show how well the regression line describes the data u The horizontal red “residual = 0” line corresponds to the regression line in the scatterplot Chapter 528

Chapter 529

Chapter 530 Outliers and Influential Points u An outlier is an observation that lies far away from the other observations – outliers in the y direction have large residuals – outliers in the x direction are often influential for the least-squares regression line, meaning that the removal of such points would markedly change the equation of the line

Example – Influential Observation u Subject 16 in the empathy data seems to be an outlier: – Removing subject 16 from the calculation of r reduces r from 0.515 to 0.331 – yes, it is influential in this sense – Calculating r = 0.515 isn’t useful because it depends so much on subject 16 u Question: does subject 16 also have a big influence on the least squares regression line? Chapter 531

Chapter 532

Example – Influential Observation u As you can see in plot on the previous slide, removing the point for subject 16 from the calculation of the least squares regression line has almost no effect – so it is not influential where it is …. u However, if we move it around, it does become influential Chapter 533

Chapter 534

Example – Influential Observation u When we move the point for subject 16 straight down, it pulls the regression line with it  when an outlier does not lie close to the regression line determined by the rest of the points in the dataset, it is influential  in the regression setting, not all outliers are influential Chapter 535

Chapter 536 Cautions about Correlation and Regression u only describe linear relationships u are both affected by outliers u always plot the data before interpreting u beware of extrapolation – predicting outside of the range of x u beware of lurking variables – have important effect on the relationship among the variables in a study, but are not included in the study u association does not imply causation

Chapter 537 Evidence of Causation u A properly conducted experiment establishes the connection u Other considerations: – The association is strong – The association is consistent v The connection happens in repeated trials v The connection happens under varying conditions – Higher doses are associated with stronger responses – Alleged cause precedes the effect in time – Alleged cause is plausible (reasonable explanation)

Download ppt "Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We."

Similar presentations