Download presentation
Presentation is loading. Please wait.
1
Regression Analysis Week 4
2
Business Scenarios Does an Increasing Crime Rate Decrease House Prices? How does the size of a home impact its price? What GPA at graduation from Temple would you predict for a student with 1200 SAT score? How does advertising impact sales of a product? Does online advertising have a higher impact on sales than TV advertising?
3
Objective Define relation between two variables Visual
Strength of relation Prediction
4
Scatter Plots and Correlation
Visual A scatter plot (or scatter diagram) is used to show the relationship between two variables
5
Scatter Plot Examples Linear relationships Curvilinear relationships y
6
Scatter Plot Examples (continued) No relationship y x y x
7
Inclass Exercise Draw a scatter plot in Excel with the following data
What relation can you infer? Tree Height Trunk Diameter y x 35 8 49 9 27 7 33 6 60 13 21 45 11 51 12
8
Measure the strength Correlation analysis is used to measure strength of the association (linear relationship) between two variables Only concerned with strength of the relationship No causal effect is implied correlation coefficient is between 0 and 1 Mathematically
9
Regression Analysis Regression Dependent variable
Independent variable (x) Dependent variable Regression is the attempt to explain the variation in a dependent variable using the variation in independent variables. If the independent variable(s) sufficiently explain the variation in the dependent variable, the model can be used for prediction.
10
Linear Regression The regression model: Random Error term, or residual
Population Slope Coefficient Population y intercept Independent Variable Dependent Variable Linear component Random Error component
11
Linear Regression y x εi Slope = β1 Random Error for this x value xi
(continued) y Observed Value of y for xi εi Slope = β1 Predicted Value of y for xi Random Error for this x value Intercept = β0 xi x
12
Linear Regression Predicted value: Slope Coefficient
Independent Variable intercept Predicted Value As the sample size becomes large, sample estimates approximate the population estimates more closely.
13
Least Squares Criterion
b0 and b1 are obtained by finding the values of β0 and β1 that minimize the sum of the squared residuals
14
The Least Squares Equation
The formulas for b1 and b0 are: Algebraic equivalent: and
15
The Total Sum of Squares (SST) is equal to SSR + SSE.
Regression Formulas The Total Sum of Squares (SST) is equal to SSR + SSE. Mathematically, SSR = ∑ ( y – y ) (measure of explained variation) SSE = ∑ ( y – y ) (measure of unexplained variation) SST = SSR + SSE = ∑ ( y – y ) (measure of total variation in y) ^ 2 ^ 2 The proportion of total variation (SST) that is explained by the regression (SSR) is known as the Coefficient of Determination, and is often referred to as R2 .
16
Interpretation of the Slope and the Intercept
b0 is the estimated average value of y when the value of x is zero b1 is the estimated change in the average value of y as a result of a one-unit change in x
17
Sample Data for House Price Model
House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255
18
Regression Using Excel
Tools / Data Analysis / Regression
19
Residual Analysis Purposes Graphical Analysis of Residuals
Examine for linearity assumption Examine for constant variance for all levels of x Evaluate normal distribution assumption Graphical Analysis of Residuals Can plot residuals vs. x Can create histogram of residuals to check for normality
20
Residual Analysis for Linearity
x x x x residuals residuals Not Linear Linear
21
Residual Analysis for Constant Variance
x x x x residuals residuals Constant variance Non-constant variance
22
Data Transformation Why transform data?
Linear least squares regression assumes that the relationship between two variables is linear. We can “straighten” a nonlinear relationship by transforming one or both of the variables Often transformations will ‘fix’ problem distributions so that we can use least-squares regression
23
Interpreting coefficients after log transformation
IV DV Interpretation of IV coefficient, b1 x y 1 unit change in x leads to b1 units change in y Ln(x) 1 % change in x leads to b1 units change in y Ln(y) 1 unit change in x leads to b1 % change in y 1% change in x leads to b1 % change in y - also known as elasticity demand elasticity, cross price elasticity, income elasticity, advertising elasticity
24
Multiple Regression A regression model specifies a relation between a dependent variable Y and certain independent variables X1, …,XK. Here “independence” is not in the sense of random variables; rather, it means that the value of Y depends on - or is determined by - the Xi variables.) A linear model sets Y = b1 + b1X1 + … + bkXK + e, where e is the error term. To use such a model, we need to have data on values of Y corresponding to values of the Xi's. selling prices for various house features, past growth values for various economic conditions beer sales corresponding to various marketing strategies
25
Multicollinearity Multicollinearity arises when two variables that measure the same thing or similar things (e.g., weight and BMI) are both included in a multiple regression model; they will, in effect, cancel each other out and generally destroy your model. Model building and diagnostics are tricky business!
26
Overfitting In multivariate modeling, you can get highly significant but meaningless results if you put too many predictors in the model. The model is fit perfectly to the quirks of your particular sample, but has no predictive ability in a new sample. Input selection is a key step
27
Nonlinear Regression Nonlinear functions can also be fit as regressions. Common choices include Power, Logarithmic, Exponential, and Logistic, but any continuous function can be used.
28
Inclass exercise You’re a Vet epidemiologist for the county cooperative. You gather the following data: Food (lb.) Milk yield (lb.) What is the relationship between cows’ food intake and milk yield? © T/Maker Co. 62
29
Scattergram Milk Yield vs. Food intake*
M. Yield (lb.) Food intake (lb.) 65
30
Parameter Estimation Solution Table*
EPI 809/Spring 2008 66
31
Parameter Estimation Solution*
67
32
Coefficient Interpretation Solution*
1. Slope (b1) Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Food intake (X) 2. Y-Intercept (b0) Average Milk yield (Y) Is Expected to Be 0.8 lb. When Food intake (X) Is 0
33
Questions
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.