Linear Regression
Simple Linear Regression Using one variable to … 1) explain the variability of another variable 2) predict the value of another variable Both accomplished with the line that best fits a scatterplot. Linear RegressionSlide #2
Linear RegressionSlide #3 Recall -- Definitions Response (dependent) variable –variability is being explained or values are predicted –y-axis Explanatory (independent, predictor) variable –used to explain variability or make predictions –x-axis
Review -- Line Characteristics 1.What is the most common equation of a line? 2.What does the slope tell us? 3.What does the intercept tell us? Linear RegressionSlide #4
Linear RegressionSlide #5 Finding the Best-Fit Line Candidate Lines X Y We need an objective criterion
Linear RegressionSlide #6 Finding the Best-Fit Line Definition -- Predicted Y ( ) The y-coordinate of the point on the line that corresponds to the observed x value X Plug value of x into line equation to get
Linear RegressionSlide #7 Finding the Best-Fit Line Definition -- Residual X Y Residual = Observed Y - Predicted Y
Linear RegressionSlide #8 Finding the Best-Fit Line minimize sum of residuals? X Y
Linear RegressionSlide #9 RSS = sum of squared residuals the line out of all possible lines that minimizes the RSS Should the RSS be computed for all lines? Finding the Best-Fit Line minimize sum of squared residuals?
Linear RegressionSlide #10 So …. It is important to understand –where the equation of the line comes from –how to interpret the line It is not important to compute the best-fit line “by hand”
Linear RegressionSlide #11 Example -- Rabbit Metabolic Rate Katzner et al. (1997; J. Wildl. Man. 78: ) examined the metabolic rate of pygmy rabbits (Brachylagus idahoensis) in the laboratory. In particular, they wanted to determine if the variability in resting metabolic rate (ml O 2 g -1 h -1 ) at 20 o C could be adequately explained by body mass (g). What is the response variable? –Resting metabolic rate What is the explanatory variable? –Body mass 1 2
Linear RegressionSlide #12 Example -- Rabbit Metabolic Rate Y = X R-Sq = 55.4 % Mass Metabolic Rate In terms of the variables of the problem, what is the equation of the best-fit line? MetRate = Mass 3
Linear RegressionSlide #13 Example -- Rabbit Metabolic Rate Y = X R-Sq = 55.4 % Mass Metabolic Rate In terms of the variables of the problem, interpret the value of the slope? For each additional gram of mass, the metabolic rate decreases ml O 2 g -1 h -1 on average 4
Linear RegressionSlide #14 Example -- Rabbit Metabolic Rate Y = X R-Sq = 55.4 % Mass Metabolic Rate In terms of the variables of the problem, interpret the value of the y-intercept? Rabbits with no mass have a metabolic rate of 1.41 ml O 2 g -1 h -1 on average 5
Linear RegressionSlide #15 Example -- Rabbit Metabolic Rate Y = X R-Sq = 55.4 % Mass Metabolic Rate What is the predicted metabolic rate for a mass of 450 g? 6 (450,0.85) What is the predicted metabolic rate for a mass of 600 g? 7 What is the residual for a mass of 425 g and a metabolic rate of 0.82 ml O 2 g -1 h -1 ? 8 (425,0.82) (425,0.88)
Linear RegressionSlide #16 One More Regression Statistic r 2 = coefficient of determination = proportion of the total variability in the response variable explained away by knowing the value of the explanatory variable
Linear RegressionSlide #17 Visualizing r 2 Height Weight Total Variability in Y Variability Explained r 2 = Variability Explained Total Variability in y = Vrbility Remain
Linear RegressionSlide #18 Characteristics of r 2 What range of values can r 2 be? Which relationship is stronger -- r 2 = 0.5 or 0.9? Which relationship gives “better” predictions -- r 2 = 0.5 or 0.9? 0 < r 2 < 1
Linear RegressionSlide #19 Example -- Rabbit Metabolic Rate Y = X R-Sq = 55.4 % Mass Metabolic Rate What proportion of the variability in metabolic rate is explained by knowing mass? r 2 = What is the correlation between metabolic rate and mass? r = =
Simple Linear Regression in R Examine handout – lm() – rSquared() – fitPlot() – predict() Linear RegressionSlide #20
Linear RegressionSlide #21 Regression is the Most Used and Most Abused Statistical Technique Assumptions: –A line adequately models the data –Homoscedasticity – same scatter of points along entire line –Residuals at any given value of the explanatory variable are normally distributed –Residuals at any given value of the explanatory variable are independent Intro Advanced
Linear RegressionSlide #22 A Line Models the Data
Linear RegressionSlide #23 Homoscedasticity
Linear RegressionSlide #24 r 2 doesn’t depend on x because of homoscedasticity Total Variability in Y Vrbility Remain Variability Explained Height Weight
Linear RegressionSlide #25 Other Problems Outliers –a problem because the model does not fit that point –may or may not remove Influential Points –a point that would markedly change the line if it were removed –typically an outlier in the x direction