# Learning Objectives Describe the linear regression model

## Presentation on theme: "Learning Objectives Describe the linear regression model"— Presentation transcript:

Learning Objectives Describe the linear regression model
State the regression modeling steps Explain least squares Compute regression coefficients Describe residual analysis Predict the response variable Understand correlational analysis As a result of this class, you will be able to... 2

Probabilistic Models Hypothesize 2 components
Deterministic Random error Example: Sales volume is 10 times advertising spending plus random error Y = 10X + e Random error may be due to factors other than advertising 6

Types of Probabilistic Models
7

Regression Models Answer ‘What is the relationship between the variables?’ Equation used 1 numerical dependent (response) variable What is to be predicted 1 or more numerical or categorical independent (explanatory) variables Used mainly for prediction 8

Regression Modeling Steps
Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters Evaluate model Use model for prediction 9

Problem Definition Most critical step What are the model objectives?
Don’t want right answer to wrong question What are the model objectives? Who will use the model? What will be the benefits? Are resources available (data etc.)? How will the results be implemented? 12

Specifying the Model Define variables
Conceptual (e.g., advertising, price) Empirical (e.g., list price, regular price) Measurement (e.g., \$, units) Hypothesize nature of relationship Expected effects (i.e., coefficients’ signs) Functional form (linear or non-linear) Interactions 15

Model Specification Is Based on Theory
Economic & business theory Mathematical theory Previous research ‘Common sense’ 16

Types of Regression Models
This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 25

Linear Equations High School Teacher © T/Maker Co. 28

Linear Regression Model
Relationship between variables is a linear function Population Y-Intercept Population Slope Random Error Dependent (Response) Variable Independent (Explanatory) Variable 29

Sample Linear Regression Model
ei = Random error Unsampled observation Observed value 36

Scatter Diagram Plot of all (Xi, Yi) pairs
Suggests how well model will fit 39

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? Alone Group Class 42

Least Squares ‘Best fit’ means difference between actual Y values & predicted Y values are a minimum But positive differences off-set negative LS minimizes the sum of the squared differences (or errors) 51

Least Squares Graphically
52

Coefficient Equations
Sample regression equation # (Xi, Yi) pairs Sample slope Average Xi’s, then square Sample Y-intercept 53

Computation Table 54

Interpretation of Coefficients
Slope (b1) Estimated Y changes by b1 for each 1 unit increase in X Example: If b1 = 2, then Sales (Y) is expected to increase by 2 for each 1 unit increase in Advertising (X) Y-Intercept (b0) Average value of Y when X = 0 Example: If b0 = 4, then average Sales (Y) is expected to be 4 when Advertising (X) is 0 55

Parameter Estimation Example
You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad \$ Sales (Units) What is the relationship between sales & advertising? 56

57

Parameter Estimation Solution Table
58

Coefficient Interpretation Solution
Slope (b1) Sales Volume (Y) is expected to increase by .7 units for each \$1 increase in Advertising (X) Y-Intercept (b0) Average value of Sales Volume (Y) is units when Advertising (X) is 0 Difficult to explain to Marketing Manager Expect some sales without advertising 60

Parameter Estimation Excel Output
bP b0 b1 61

Evaluating the Model How well does the model describe the relationship between the variables? Closeness of ‘best fit’ Closer the points to the line the better Assumptions met Significance of parameter estimates 71

Evaluating Model Steps
Examine variation measures Do residual analysis Test coefficients for significance 72

Random Error Variation
Variation of actual Y from predicted Y Measured by standard error of estimate Sample standard deviation of e Denoted SYX Affects several factors Parameter significance Prediction accuracy 75

Standard Error of Estimate
The mean error is 0. 76

Measures of Variation in Regression
Total sum of squares (SST) Measures variation of observed Yi around the mean`Y Explained variation (SSR) Variation due to relationship between X & Y Unexplained variation (SSE) Variation due to other factors 77

Variation Measures Yi Unexplained sum of squares (Yi - Yi)2 ^
Total sum of squares (Yi -`Y)2 Explained sum of squares (Yi -`Y)2 ^ 78

Coefficient of Determination
Proportion of variation ‘explained’ by relationship between X & Y 0 £ r2 £ 1 79

r 2 Examples r2 = 1 r2 = 1 r2 = .8 r2 = 0 80

Proportion of variation ‘explained’ by relationship between X & Y Reflects Sample size Number of independent variables 81

Coef. of Determination Excel Output
r2 adjusted for number of explanatory variables & sample size SYX 86

Residual Analysis Graphical analysis of residuals Purposes
Plot residuals vs. Xi values Residuals are also called errors Difference between actual Yi & predicted Yi Purposes Examine functional form (linear vs. non-linear model) Evaluate violations of assumptions 89

Linear Regression Assumptions
Normality Y values are normally distributed for each X Probability distribution of error is normal Homoscedasticity (constant variance) Independence of errors Linearity 90

Residual Plot for Functional Form
Add X2 Term Correct Specification 92

Residual Plot for Homoscedasticity
Heteroscedasticity Correct Specification Fan-shaped. Standardized residuals used typically. 93

Residual Plot for Independence
Not Independent Correct Specification Plots reflect sequence data were collected. 94

Residual Analysis Excel Output
The plot is standardized (student) residuals for each observation. For observation 5, the standardized residual is large. You can save the residuals & do descriptive analysis on them, including a normal probability plot. There are not enough observations here to make further analysis meaningful. 95

Residual Plot Excel Output

Test of Slope Coefficient
Tests if there is a linear relationship between X & Y Involves population slope b1 Hypotheses H0: b1 = 0 (No linear relationship) H1: b1 ¹ 0 (Linear relationship) Theoretical basis is sampling distribution of slopes 101

Test of Slope Parameter Solution
H0: b1 = 0 H1: b1 ¹ 0 a = .05 df = = 3 Critical Value(s): Test Statistic: Decision: Conclusion: Reject at a = .05 There is evidence of a relationship 109

Test Statistic Solution
110

Test of Slope Parameter Excel Output
‘Standard Error’ is the estimated standard deviation of the sampling distribution, sbP. bP Sb t = bP /Sb P P P-Value 111

Prediction With Regression Models
Types of predictions Point estimates Interval estimates What is predicted Population mean response (mYX) for given X Point on population regression line Individual response (Yi) for given X 114

What Is Predicted 115

Factors Affecting Interval Width
Level of confidence (1 - a) Width increases as confidence increases Data dispersion (SYX) Width increases as variation increases Sample size Width decreases as sample size increases Distance of Xgiven from mean`X Width increases as distance increases 117

Regression Cautions Violated assumptions Relevancy of historical data
Level of significance Extrapolation Cause & effect Relevancy of Historical Data Even if interpolating, conditions may have changed. Level of Significance r2 may be high, but at what level? Extrapolation Prediction Outside the Range of X Values Used to Develop Equation Interpolation Prediction Within the Range of X Values Used to Develop Equation Based on smallest & largest X Values Cause & Effect The # of teachers is highly correlated with liquor consumption due to population size! 126

Extrapolation Extrapolation
Prediction Outside the Range of X Values Used to Develop Equation Interpolation Prediction Within the Range of X Values Used to Develop Equation Based on smallest & largest X Values 127

Cause & Effect Liquor Consumption # Teachers
The # of teachers is highly correlated with liquor consumption due to population size! # Teachers 128

Types of Probabilistic Models
130

Correlation Models Answer ‘How strong is the linear relationship between 2 variables?’ Coefficient of correlation used Population correlation coefficient denoted r (rho) Values range from -1 to +1 Measures degree of association Used mainly for understanding 131

Sample Coefficient of Correlation
Pearson Product-Moment Coefficient of Correlation: 132

Correlation & Regression Line
141

Test of Correlation Coefficient
Shows if there is a linear relationship between 2 numerical variables Same conclusion as testing population slope b1 Hypotheses H0: r = 0 (No correlation) H1: r ¹ 0 (Correlation) 142

Conclusion Described the linear regression model
Stated the regression modeling steps Explained least squares Computed regression coefficients Described residual analysis Predicted the response variable As a result of this class, you will be able to... 143

Learning Objectives Explain the linear multiple regression model
Interpret linear multiple regression computer output Explain multicollinearity As a result of this class, you will be able to... 2

Multiple Regression Models
10

Linear Multiple Regression Model
Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Dependent (response) variable Independent (explanatory) variables 11

Population Multiple Regression Model
Bivariate model 12

Sample Multiple Regression Model
Bivariate model 13

Regression Modeling Steps
Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters Evaluate model Use model for prediction 14

Linear Multiple Regression Model
Parameter Estimation Linear Multiple Regression Model 15

Multiple Linear Regression Equations
Too complicated by hand! Ouch! 16

Interpretation of Estimated Coefficients
Slope (bP) Estimated Y changes by bP for each 1 unit increase in XP holding all other variables constant Example: If b1 = 2, then Sales (Y) is expected to increase by 2 for each 1 unit increase in Advertising (X1) given the Number of Sales Rep’s (X2) Y-Intercept (b0) Average value of Y when XP = 0 17

Parameter Estimation Example
You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00). You’ve collected the following data: Resp Size Circ Is this model specified correctly? What other variables could be used (color, photo’s etc.)? 18

Parameter Estimation Excel Output
bP b0 b2 b1 19

Interpretation of Coefficients Solution
Slope (b1) # Responses to Ad is expected to increase by (20.49) for each 1 sq. in. increase in Ad Size holding Circulation constant Slope (b2) # Responses to Ad is expected to increase by (28.05) for each 1 unit (1,000) increase in Circulation holding Ad Size constant Y-intercept is difficult to interpret. How can you have any responses with no circulation? 20

Evaluating the Model 21

Regression Modeling Steps
Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters Evaluate model Use model for prediction F 22

Evaluating Multiple Regression Model Steps
Examine variation measures Do residual analysis Test parameter significance Overall model Portions of model Individual coefficients Test for multicollinearity 23

Coef. of Determination Excel Output
r2Y.12 r2adj means 95.61% of variation in Y is due to Ad Size & Circulation SYX 29

Coefficient of Partial Determination
Proportion of variation in Y ‘explained’ by variable XP holding all others constant Must estimate separate models Denoted r2Y1.2 in two X variables case Coefficient of partial determination of X1 with Y holding X2 constant Useful in selecting X variables 30

r 2Y1.2 Excel Output ANOVA df SS Regression 2 9.2497 Residual 3 0.2503
Total 5 9.5000 32

Testing Parameters 33

Evaluating Multiple Regression Model Steps
Expanded! Examine variation measures Do residual analysis Test parameter significance Overall model Portions of model Individual coefficients Test for multicollinearity F New! New! New! 34

Testing Overall Significance
Shows if there is a linear relationship between all X variables together & Y Uses F test statistic Hypotheses H0: b1 = b2 = ... = bP = 0 No linear relationship H1: At least one coefficient is not 0 At least one X variable affects Y Less chance of error than separate t-tests on each coefficient. Doing a series of t-tests leads to a higher overall Type I error than a. 35

Overall Significance Excel Output
n - P -1 MSR / MSE n - 1 P-value 36

Testing Model Portions
Examines the contribution of a set of X variables to the relationship with Y Null hypothesis: Variables in set do not improve significantly the model when all other variables are included Must estimate separate models Used in selecting X variables 37

Testing Model Portions Test Statistic
Test H0: b1 = 0 in a 2 variable model From ANOVA section of regression for From ANOVA section of regression for 38

Multicollinearity 39

Evaluating Multiple Regression Model Steps
Expanded! Examine variation measures Do residual analysis Test parameter significance Overall model Portions of model Individual coefficients Test for multicollinearity New! New! New! F 40

Multicollinearity High correlation between X variables
Coefficients measure combined effect Leads to unstable coefficients depending on X variables in model Always exists; matter of degree Example: Using both Sales & Profit as explanatory variables in same model 41

Detecting Multicollinearity
Examine correlation matrix Correlations between pairs of X variables are more than with Y variable Examine variance inflation factor (VIF) If VIFj > 5, multicollinearity exists Few remedies Obtain new sample data Eliminate one correlated X variable 42

Correlation Matrix Excel Output
rY1 rY2 r12 43

VIF Excel Output Regress X1 on X2 44

This Class... Please take a moment to answer the following questions in writing: What was the most important thing you learned in class today? What do you still have questions about? How can today’s class be improved? As a result of this class, you will be able to... 144 10