Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regression Analysis and Multiple Regression

Similar presentations


Presentation on theme: "Regression Analysis and Multiple Regression"— Presentation transcript:

1 Regression Analysis and Multiple Regression
Session 7

2 Simple Linear Regression Model
Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares Error Variance and the Standard Errors of Regression Estimators Correlation Hypothesis Tests about the Regression Relationship How Good is the Regression? Analysis of Variance Table and an F Test of the Regression Model Residual Analysis and Checking for Model Inadequacies Use of the Regression Model for Prediction Using the Computer Summary and Review of Terms

3 7-1 Using Statistics This scatterplot locates pairs of observations of advertising expenditures on the x-axis and sales on the y-axis. We notice that: Larger (smaller) values of sales tend to be associated with larger (smaller) values of advertising. S c a t e r p l o f A d v i s n g E x u ( X ) Y 5 4 3 2 1 8 6 The scatter of points tends to be distributed around a positively sloped straight line. The pairs of values of advertising expenditures and sales are not located exactly on a straight line. The scatter plot reveals a more or less strong tendency rather than a precise linear relationship. The line represents the nature of the relationship on average.

4 Examples of Other Scatterplots
Y

5 Model Building Data Statistical model Systematic component + Random
The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship. A statistical model separates the systematic component of a relationship from the random component. Data Statistical model Systematic component + Random errors In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE). In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line.

6 7-2 The Simple Linear Regression Model
The population simple linear regression model: Y= 0 + 1 X  Nonrandom or Random Systematic Component Component where Y is the dependent variable, the variable we wish to explain or predict; X is the independent variable, also called the predictor variable; and  is the error term, the only random component in the model, and thus, the only source of randomness in Y. 0 is the intercept of the systematic component of the regression relationship. 1 is the slope of the systematic component. The conditional mean of Y:

7 Picturing the Simple Linear Regression Model
X Y E[Y]=0 + 1 X Xi } 1 = Slope 1 0 = Intercept Yi { Error: i Regression Plot The simple linear regression model posits an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Yi]=0 + 1 Xi Actual observed values of Y differ from the expected value by an unexplained or random error: Yi = E[Yi] + i = 0 + 1 Xi + i

8 Assumptions of the Simple Linear Regression Model
X Y E[Y]=0 + 1 X Assumptions of the Simple Linear Regression Model Identical normal distributions of errors, all centered on the regression line. The relationship between X and Y is a straight-line relationship. The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term i. The errors i are normally distributed with mean 0 and variance 2. The errors are uncorrelated (not related) in successive observations. That is: ~ N(0,2)

9 7-3 Estimation: The Method of Least Squares
Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line. The estimated regression equation: Y=b0 + b1X + e where b0 estimates the intercept of the population regression line, 0 ; b1 estimates the slope of the population regression line, 1; and e stands for the observed errors - the residuals from fitting the estimated regression line b0 + b1X to a set of n points.

10 Fitting a Regression Line
Y Y Data Three errors from the least squares regression line X X Y e Errors from the least squares regression line are minimized Three errors from a fitted line X X

11 Errors in Regression Y . { X

12 Least Squares Regression
b0 SSE b1 Least squares b0 Least squares b1

13 Sums of Squares, Cross Products, and Least Squares Estimators

14 Example 7-1 Miles Dollars Miles 2 Miles*Dollars

15 Example 7-1: Using the Computer
MTB > Regress 'Dollars' 1 'Miles'; SUBC> Constant. Regression Analysis The regression equation is Dollars = Miles Predictor Coef Stdev t-ratio p Constant Miles s = R-sq = 96.5% R-sq(adj) = 96.4% Analysis of Variance SOURCE DF SS MS F p Regression Error Total 5 4 3 2 1 M i l e s D o a r 8 7 6 R - S q u d = . 9 Y + X g n f C h t

16 Example 7-1: Using Computer-Excel
The results on the right side are the output created by selecting REGRESSION option from the DATA ANALYSIS toolkit.

17 Example 7-1: Using Computer-Excel
Residual Analysis. The plot shows the absence of a relationship between the residuals and the X-values (miles). Residuals vs. Miles -800 -600 -400 -200 200 400 600 1000 2000 3000 4000 5000 6000 Miles Residuals

18 Total Variance and Error Variance
Y X What you see when looking at the total variation of Y. What you see when looking along the regression line at the error variance of Y.

19 7-4 Error Variance and the Standard Errors of Regression Estimators
X Y Square and sum all regression errors to find SSE.

20 Standard Errors of Estimates in Regression

21 Confidence Intervals for the Regression Parameters
Length = 1 Height = Slope Least-squares point estimate: b1= Upper 95% bound on slope: Lower 95% bound: (not a possible value of the regression slope at 95%)

22 7-5 Correlation The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by, can take on any value from -1 to 1.  indicates a perfect negative linear relationship -1<  <0 indicates a negative linear relationship  indicates no linear relationship 0<  <1 indicates a positive linear relationship  indicates a perfect positive linear relationship The absolute value of  indicates the strength or exactness of the relationship.

23 Illustrations of Correlation
Y X =-1 Y X =0 Y X =1 Y X =-.8 Y X =0 Y X =.8

24 Covariance and Correlation
Example 10 - 1: = r SS XY X Y 4 84 29 9824 ( )( ) . *Note: If  < 0, b1 < 0 If  = 0, b1 = 0 If  > 0, b1 >0

25 Example 7-2: Using Computer-Excel

26 Example 7-2: Regression Plot
8 9 10 11 12 2 3 4 5 6 7 United States International Y = X R-Sq = Regression Plot

27 Hypothesis Tests for the Correlation Coefficient
H0: =0 (No linear relationship) H1: 0 (Some linear relationship) Test Statistic:

28 Hypothesis Tests about the Regression Relationship
X Constant Y Unsystematic Variation Nonlinear Relationship A hypothes is test fo r the exis tence of a linear re lationship between X and Y: H 1 Test stati stic for t he existen ce of a li near relat ionship be tween X an d Y: ( - ) where is the le ast squares es timate of the regres sion slope and ) is the s tandard er ror of . When the null hypot hesis is t rue, the stati stic has a distribu tion with degrees o f freedom. : b 2 = t n s

29 Hypothesis Tests for the Regression Slope

30 7-7 How Good is the Regression?
The coefficient of determination, r2, is a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data. . { Y X } Total Deviation Explained Deviation Unexplained Deviation Percentage of total variation explained by the regression.

31 The Coefficient of Determination
Y Y Y X X X SST SST SST S E r2=0 SSE r2=0.50 SSE SSR r2=0.90 SSR 5 4 3 2 1 7 6 M i l e s D o a r

32 7-8 Analysis of Variance and an F Test of the Regression Model

33 7-9 Residual Analysis and Checking for Model Inadequacies
Residuals Homoscedasticity: Residuals appear completely random. No indication of model inadequacy. Curved pattern in residuals resulting from underlying nonlinear relationship. Residuals exhibit a linear trend with time. Time Heteroscedasticity: Variance of residuals changes when x changes.

34 7-10 Use of the Regression Model for Prediction
Point Prediction A single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation. Prediction Interval For a value of Y given a value of X Variation in regression line estimate. Variation of points around regression line. For an average value of Y given a value of X

35 Errors in Predicting E[Y|X]
Regression line Upper limit on slope Lower limit on slope 1) Uncertainty about the slope of the regression line Upper limit on intercept Lower limit on intercept 2) Uncertainty about the intercept of the regression line

36 Prediction Interval for E[Y|X]
The prediction band for E[Y|X] is narrowest at the mean value of X. The prediction band widens as the distance from the mean of X increases. Predictions become very unreliable when we extrapolate beyond the range of the sample itself. Y Prediction band for E[Y|X] Regression line Y X X Prediction Interval for E[Y|X]

37 Additional Error in Predicting Individual Value of Y
3) Variation around the regression line. X Y Regression line Prediction Interval for E[Y|X] Regression line Prediction band for E[Y|X] Prediction band for Y

38 Prediction Interval for a Value of Y

39 Prediction Interval for the Average Value of Y

40 Using the Computer MTB > regress 'Dollars' 1 'Miles' tres in C3 fits in C4; SUBC> predict 4000; SUBC> residuals in C5. Regression Analysis The regression equation is Dollars = Miles Predictor Coef Stdev t-ratio p Constant Miles s = R-sq = 96.5% R-sq(adj) = 96.4% Analysis of Variance SOURCE DF SS MS F p Regression Error Total Fit Stdev.Fit % C.I % P.I. ( , ) ( , )

41 Plotting on the Computer (1)
5 4 3 2 1 - M i l e s R d 7 6 F t MTB > PLOT 'Resids' * 'Fits' MTB > PLOT 'Resids' *'Miles'

42 Plotting on the Computer (2)
MTB > HISTOGRAM 'StRes' 2 1 - 8 7 6 5 4 3 S t R e s F r q u n c y M i l D o a MTB > PLOT 'Dollars' * 'Miles'

43 11 Multiple Regression (1) Using Statistics.
The k-Variable Multiple Regression Model. The F Test of a Multiple Regression Model. How Good is the Regression. Tests of the Significance of Individual Regression Parameters. Testing the Validity of the Regression Model. Using the Multiple Regression Model for Prediction.

44 11 Multiple Regression (1) Qualitative Independent Variables.
Polynomial Regression. Nonlinear Models and Transformations. Multicollinearity. Residual Autocorrelation and the Durbin-Watson Test. Partial F Tests and Variable Selection Methods. Using the Computer. The Matrix Approach to Multiple Regression Analysis. Summary and Review of Terms.

45 7-11 Using Statistics x y x2 x1 Lines Planes
Slope: 1 Intercept: 0 Any two points (A and B), or an intercept and slope (0 and 1), define a line on a two-dimensional surface. B A x y x2 x1 C Any three points (A, B, and C), or an intercept and coefficients of x1 and x2 (0 , 1, and 2), define a plane in a three-dimensional surface. Lines Planes

46 7-12 The k-Variable Multiple Regression Model
The population regression model of a dependent variable, Y, on a set of k independent variables, X1, X2,. . . , Xk is given by: Y= 0 + 1X1 + 2X kXk + where 0 is the Y-intercept of the regression surface and each i , i = 1,2,...,k is the slope of the regression surface - sometimes called the response surface - with respect to Xi. x2 x1 y 2 1 0 Model assumptions: 1. ~N(0,2), independent of other errors. 2. The variables Xi are uncorrelated with the error term.

47 Simple and Multiple Least-Squares Regression
In a simple regression model, the least-squares estimators minimize the sum of squared errors from the estimated regression line. In a multiple regression model, the least-squares estimators minimize the sum of squared errors from the estimated regression plane. X Y x2 x1 y

48 The Estimated Regression Relationship
where is the predicted value of Y, the value lying on the estimated regression surface. The terms b0,...,k are the least-squares estimates of the population regression parameters i. The actual, observed value of Y is the predicted value plus an error: y=b0+ b1 x1+ b2 x bk xk+e

49 Least-Squares Estimation: The 2-Variable Normal Equations
Minimizing the sum of squared errors with respect to the estimated coefficients b0, b1, and b2 yields the following normal equations:

50 Example 7-3 Normal Equations: 743 = 10b0+123b1+65b2
Y X1 X2 X1X2 X12 X22 X1Y X2Y Normal Equations: 743 = 10b0+123b1+65b2 9382 = 123b0+1615b1+869b2 5040 = 65b0+869b1+509b2 b0 = b1 = b2 =

51 Example 7-3: Using the Computer
Excel Output

52 Decomposition of the Total Deviation in a Multiple Regression Model
x2 x1 y Total Deviation = Regression Deviation + Error Deviation SST = SSR SSE

53 7-13 The F Test of a Multiple Regression Model
A statistical test for the existence of a linear relationship between Y and any or all of the independent variables X1, x2, ..., Xk: H0: 1 = 2 = ...= k=0 H1: Not all the i (i=1,2,...,k) are 0

54 Using the Computer: Analysis of Variance Table (Example 7-3)
SOURCE DF SS MS F p Regression Error Total F D i s t r b u o n w h 2 a d 7 e g f m F0.01=9.55 =0.01 Test statistic 86.34 f(F) The test statistic, F = 86.34, is greater than the critical point of F(2, 7) for any common level of significance (p-value 0), so the null hypothesis is rejected, and we might conclude that the dependent variable is related to one or more of the independent variables.

55 7-14 How Good is the Regression
x2 x1 y

56 Decomposition of the Sum of Squares and the Adjusted Coefficient of Determination
SST SSR SSE Example 11-1: s = R-sq = 96.1% R-sq(adj) = 95.0%

57 Measures of Performance in Multiple Regression and the ANOVA Table

58 7-15 Tests of the Significance of Individual Regression Parameters
Hypothesis tests about individual regression slope parameters: (1) H0: b1=0 H1: b10 (2) H0: b2=0 H1: b20 . (k) H0: bk=0 H1: bk0

59 Regression Results for Individual Parameters

60 Example 7-3: Using the Computer
MTB > regress 'Y' on 2 predictors 'X1' 'X2' Regression Analysis The regression equation is Y = X X2 Predictor Coef Stdev t-ratio p Constant X X s = R-sq = 96.1% R-sq(adj) = 95.0% Analysis of Variance SOURCE DF SS MS F p Regression Error Total SOURCE DF SEQ SS X X

61 Using the Computer: Example 7-4
MTB > READ ‘a:\data\c11_t6.dat’ C1-C5 MTB > NAME c1 'EXPORTS' c2 'M1' c3 'LEND' c4 'PRICE' C5 'EXCHANGE' MTB > REGRESS 'EXPORTS' on 4 predictors 'M1' 'LEND' 'PRICE' 'EXCHANGE' Regression Analysis The regression equation is EXPORTS = M LEND PRICE EXCHANGE Predictor Coef Stdev t-ratio p Constant M LEND PRICE EXCHANGE s = R-sq = 82.5% R-sq(adj) = 81.4% Analysis of Variance SOURCE DF SS MS F p Regression Error Total

62 Example 7-5: Three Predictors
MTB > REGRESS 'EXPORTS' on 3 predictors 'LEND' 'PRICE' 'EXCHANGE' Regression Analysis The regression equation is EXPORTS = LEND PRICE EXCHANGE Predictor Coef Stdev t-ratio p Constant LEND PRICE EXCHANGE s = R-sq = 73.1% R-sq(adj) = 71.8% Analysis of Variance SOURCE DF SS MS F p Regression Error Total

63 Example 7-5: Two Predictors
MTB > REGRESS 'EXPORTS' on 2 predictors 'M1' 'PRICE' Regression Analysis The regression equation is EXPORTS = M PRICE Predictor Coef Stdev t-ratio p Constant M PRICE s = R-sq = 82.5% R-sq(adj) = 81.9% Analysis of Variance SOURCE DF SS MS F p Regression Error Total

64 7-16 Investigating the Validity of the Regression Model: Residual Plots
5 4 3 2 - P R I C E S D U A L 9 8 7 M Residuals Plotted Against M1 (Apparently Random) Residuals Plotted Against Price (Apparent Heteroscedasticity)

65 Investigating the Validity of the Regression: Residual Plots (2)
Residuals Plotted Against Time (Apparently Random) Residuals Plotted Against Fitted Values (Apparent Heteroscedasticity) 7 6 5 4 3 2 1 - T I M E R S D U A L Y-HAT

66 Histogram of Standardized Residuals: Example 7-6
MTB > Histogram 'SRES1'. Histogram of SRES1 N = 67 Midpoint Count * * *** * ***** ************* ******************* ************ ****** *** ** *

67 Investigating the Validity of the Regression: Outliers and Influential Observations
. * Outlier y x Regression line without outlier Regression line with outlier Outliers Point with a large value of xi * Regression line when all data are included No relationship in this cluster Influential Observations

68 Outliers and Influential Observations: Example 7-6
Unusual Observations Obs M1 EXPORTS Fit Stdev.Fit Residual St.Resid X X R R R R R denotes an obs. with a large st. resid. X denotes an obs. whose X value gives it large influence.

69 7-17 Using the Multiple Regression Model for Prediction
Sales Advertising Promotions 8.00 18.00 3 12 63.42 89.76 Estimated Regression Plane for Example 11-1

70 Prediction in Multiple Regression
MTB > regress 'EXPORTS' 2 'M1' 'PRICE'; SUBC> predict 6 160; SUBC> predict 5 150; SUBC> predict Fit Stdev.Fit % C.I % P.I. ( , ) ( , ) ( , ) ( , ) ( , ) ( , )

71 7-18 Qualitative (or Categorical) Independent Variables (in Regression)
MOVIE EARN COST PROM BOOK MTB > regress 'EARN’ 'COST' 'PROM’ 'BOOK' Regression Analysis The regression equation is EARN = COST PROM BOOK Predictor Coef Stdev t-ratio p Constant COST PROM BOOK s = R-sq = 96.7% R-sq(adj) = 96.0% Analysis of Variance SOURCE DF SS MS F p Regression Error Total

72 Picturing Qualitative Variables in Regression
x2 x1 y b3 X1 Y Line for X2=1 Line for X2=0 b0 b0+b2 A regression with one quantitative variable (X1) and one qualitative variable (X2): A multiple regression with two quantitative variables (X1 and X2) and one qualitative variable (X3):

73 Picturing Qualitative Variables in Regression: Three Categories and Two Dummy Variables
X1 Y Line for X = 0 and X3 = 1 A regression with one quantitative variable (X1) and two qualitative variables (X2 and X2): b0+b2 b0+b3 Line for X2 = 1 and X3 = 0 Line for X2 = 0 and X3 = 0 A qualitative variable with r levels or categories is represented with (r-1) 0/1 (dummy) variables. Category X2 X3 Adventure Drama Romance

74 Using Qualitative Variables in Regression: Example 7-6
Salary = Education Experience Gender (SE) (32.6) (45.1) (78.5) (212.4) (t) (262.2) (21.0) (16.0) (-15.3) On average, female salaries are $3256 below male salaries

75 Interactions between Quantitative and Qualitative Variables: Shifting Slopes
X1 Y Line for X2=0 b0+b2 b0 Line for X2=1 Slope = b1 Slope = b1+b3 A regression with interaction between a quantitative variable (X1) and a qualitative variable (X2 ):

76 7-19 Polynomial Regression
One-variable polynomial regression model: Y=0+1 X + 2X2 + 3X mXm + where m is the degree of the polynomial - the highest power of X appearing in the equation. The degree of the polynomial is the order of the model. X1 Y

77 Polynomial Regression: Example 7-7
MTB > regress sales' 'advert’ 'advsqr' Regression Analysis The regression equation is SALES = ADVERT ADVSQR Predictor Coef Stdev t-ratio p Constant ADVERT ADVSQR s = R-sq = 95.9% R-sq(adj) = 95.4% Analysis of Variance SOURCE DF SS MS F p Regression Error Total 1 5 2 A D V E R T S L

78 Polynomial Regression: Other Variables and Cross-Product Terms
Variable Estimate Standard Error T-statistic X X X X X1X

79 7-20 Nonlinear Models and Transformations: Multiplicative Model
MTB > loge c1 c3 MTB > loge c2 c4 MTB > name c3 'LOGSALE' c4 'LOGADV' MTB > regress 'logsale' 'logadv' Regression Analysis The regression equation is LOGSALE = LOGADV Predictor Coef Stdev t-ratio p Constant LOGADV s = R-sq = 94.7% R-sq(adj) = 94.4% Analysis of Variance SOURCE DF SS MS F p Regression Error Total

80 Transformations: Exponential Model
MTB > regress 'sales' 1 'logadv' Regression Analysis The regression equation is SALES = LOGADV Predictor Coef Stdev t-ratio p Constant LOGADV s = R-sq = 97.8% R-sq(adj) = 97.6% Analysis of Variance SOURCE DF SS MS F p Regression Error Total

81 Plots of Transformed Variables
1 5 3 2 A D V E R T S L i m p l e g r s o n f a d v t . O G ( ) - q u = 8 9 Y 6 7 + X 4 Y-HAT I P :

82 Variance Stabilizing Transformations
Square root transformation: Useful when the variance of the regression errors is approximately proportional to the conditional mean of Y. Logarithmic transformation: Useful when the variance of regression errors is approximately proportional to the square of the conditional mean of Y. Reciprocal transformation: Useful when the variance of the regression errors is approximately proportional to the fourth power of the conditional mean of Y.

83 Regression with Dependent Indicator Variables
y x 1 Logistic Function The logistic function: Transformation to linearize the logistic function:

84 7.21 Multicollinearity x2 x1 x2 x1
Orthogonal X variables provide information from independent sources. No multicollinearity. Perfectly collinear X variables provide identical information content. No regression. Some degree of collinearity. Problems with regression depend on the degree of collinearity. x2 x1 A high degree of negative collinearity also causes problems with regression.

85 Effects of Multicollinearity
Variances of regression coefficients are inflated. Magnitudes of regression coefficients may be different from what are expected. Signs of regression coefficients may not be as expected. Adding or removing variables produces large changes in coefficients. Removing a data point may cause large changes in coefficient estimates or signs. In some cases, the F ratio may be significant while the t ratios are not.

86 Detecting the Existence of Multicollinearity: Correlation Matrix of Independent Variables and Variance Inflation Factors MTB > CORRELATION 'm1' 'lend’ 'price’ 'exchange' Correlations (Pearson) M LEND PRICE LEND PRICE EXCHANGE MTB > regress 'exports' on 4 predictors 'm1’ 'lend’ 'price’ 'exchange'; SUBC> vif. Regression Analysis The regression equation is EXPORTS = M LEND PRICE EXCHANGE Predictor Coef Stdev t-ratio p VIF Constant M LEND PRICE EXCHANGE s = R-sq = 82.5% R-sq(adj) = 81.4%

87 Variance Inflation Factor
Relationship between VIF and Rh2 1 . 5 Rh2 VIF

88 Solutions to the Multicollinearity Problem
Drop a collinear variable from the regression. Change in sampling plan to include elements outside the multicollinearity range. Transformations of variables. Ridge regression.

89 7-22 Residual Autocorrelation and the Durbin-Watson Test
An autocorrelation is a correlation of the values of a variable with values of the same variable lagged one or more periods back. Consequences of autocorrelation include inaccurate estimates of variances and inaccurate predictions. Lagged Residuals i i i i-2 i-3 i-4 * * * * * * * * * * The Durbin-Watson test (first-order autocorrelation): H0: 1 = 0 H1:  0 The Durbin-Watson test statistic:

90 Critical Points of the Durbin-Watson Statistic: =0
Critical Points of the Durbin-Watson Statistic: =0.05, n= Sample Size, k = Number of Independent Variables k = 1 k = 2 k = k = 4 k = 5 n dL dU dL dU dL dU dL dU dL dU

91 Using the Durbin-Watson Statistic
MTB > regress 'EXPORTS' 4 'M1' 'LEND' 'PRICE' 'EXCHANGE'; SUBC> dw. Durbin-Watson statistic = 2.58 Positive Autocorrelation Test is Inconclusive No Autocorrelation Test is Inconclusive Negative Autocorrelation dL dU 4-dU 4-dL 4 For n = 67, k = 4: dU dU2.27 dL dL2.53 < 2.58 H0 is rejected, and we conclude there is negative first-order autocorrelation.

92 7-23 Partial F Tests and Variable Selection Methods
Full model: Y = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 +  Reduced model: Y = 0 + 1 X1 + 2 X2 +  Partial F test: H0: 3 = 4 = 0 H1: 3 and 4 not both 0 Partial F statistic: where SSER is the sum of squared errors of the reduced model, SSEF is the sum of squared errors of the full model; MSEF is the mean square error of the full model [MSEF = SSEF/(n-(k+1))]; r is the number of variables dropped from the full model.

93 Variable Selection Methods
All possible regressions Run regressions with all possible combinations of independent variables and select best model. Stepwise procedures Forward selection Add one variable at a time to the model, on the basis of its F statistic. Backward elimination Remove one variable at a time, on the basis of its F statistic. Stepwise regression Adds variables to the model and subtracts variables from the model, on the basis of the F statistic.

94 Stepwise Regression Compute F statistic for each variable not in the model Enter most significant (smallest p-value) variable into model Calculate partial F for all variables in the model Is there a variable with p-value > Pout? Remove variable Stop Yes No Is there at least one variable with p-value > Pin?

95 Stepwise Regression: Using the Computer
MTB > STEPWISE 'EXPORTS' PREDICTORS 'M1’ 'LEND' 'PRICE’ 'EXCHANGE' Stepwise Regression F-to-Enter: F-to-Remove: Response is EXPORTS on 4 predictors, with N = 67 Step Constant M T-Ratio PRICE T-Ratio S R-Sq

96 Using the Computer: MINITAB
MTB > REGRESS 'EXPORTS’ 'M1’ 'LEND’ 'PRICE' 'EXCHANGE'; SUBC> vif; SUBC> dw. Regression Analysis The regression equation is EXPORTS = M LEND PRICE EXCHANGE Predictor Coef Stdev t-ratio p VIF Constant M LEND PRICE EXCHANGE s = R-sq = 82.5% R-sq(adj) = 81.4% Analysis of Variance SOURCE DF SS MS F p Regression Error Total Durbin-Watson statistic = 2.58

97 Using the Computer: SAS
data exports; infile 'c:\aczel\data\c11_t6.dat'; input exports m1 lend price exchange; proc reg data = exports; model exports=m1 lend price exchange/dw vif; run; Model: MODEL1 Dependent Variable: EXPORTS Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model Error C Total Root MSE R-square Dep Mean Adj R-sq C.V

98 Using the Computer: SAS (continued)
Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP M LEND PRICE EXCHANGE Variance Variable DF Inflation INTERCEP M LEND PRICE EXCHANGE Durbin-Watson D (For Number of Obs.) 1st Order Autocorrelation

99 The Matrix Approach to Regression Analysis (1)

100 The Matrix Approach to Regression Analysis (2)


Download ppt "Regression Analysis and Multiple Regression"

Similar presentations


Ads by Google