Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regression II Dr. Rahim Mahmoudvand Department of Statistics,

Similar presentations


Presentation on theme: "Regression II Dr. Rahim Mahmoudvand Department of Statistics,"— Presentation transcript:

1 Regression II Dr. Rahim Mahmoudvand Department of Statistics,
Bu-Ali Sina University Fall 2014 Regression II; Bu-Ali Sina University Regression II

2 Model Adequacy Checking
Chapter 4 Model Adequacy Checking Fall 2014 Regression II; Bu-Ali Sina University Regression II

3 Ch4: Partial Regression Plot: Definition and Usage
Is a curvature effect for the regressor needed in the model? A partial regression plot is a variation of the plot of residual versus the residual. This plot evaluate whether we have specified the relationship between the response and the regressor variables correctly. This plot study the marginal relationship of a regressor given the other variables that are in the model. The partial residual plot is called the added-variable plot or the adjusted variable plot, too. Fall 2014 Regression II; Bu-Ali Sina University Regression II

4 Ch4: Partial Regression Plot: The way of working with example
In this plot, the response variable y and the regressor xj are both regressed against the other regressors in the model and the residuals obtained for each regression. The plot of these residuals against each other provides information about the nature of the marginal relationship for regressor xj under consideration. Example: Y is regressed on x2 x1 is regressed on x2 Fall 2014 Regression II; Bu-Ali Sina University

5 Ch4: Partial Regression Plot: Interpretation of plot
Regressor x1 enters the model linearly, Line through from the origin and slop of the line is equal to Higher order term in x1 such as x12 is required. Transformation such as replacing x1 with 1/ x1 is required there is no additional useful information in x1 for predicting y. Fall 2014 Regression II; Bu-Ali Sina University Regression II

6 Ch4: Partial Regression Plot: Relationship among Residuals
Consider model Denoting We have: xj is regressed on X(j) Y is regressed on X(j) Fall 2014 Regression II; Bu-Ali Sina University

7 Ch4: Partial Regression Plot: shortcoming
These plots may not give information about the proper form of the relationship if several variables already in the model are incorrectly specified. Partial regression plots will not, in general, detect interaction effects among the regressors. The presence of strong multicollinearity can cause partial regression plots to give incorrect information about the relationship between the response and the regressor variables. Fall 2014 Regression II; Bu-Ali Sina University

8 Ch4: Partial Residual Plots: Definition and usage
Partial residual plot is closely related to the partial regression plot A partial residual plot is a variation of the plot of residuals versus the predictor. It is designed to show the relationship between the response variable and the regressors Fall 2014 Regression II; Bu-Ali Sina University

9 Ch4: Partial residual Plot: computation of partial residuals
Consider model Denoting We have: Partial residual is defined and calculated by: Fall 2014 Regression II; Bu-Ali Sina University

10 Ch4: Partial Residual Plot: Interpretation of plot
Regressor xj enters the model linearly, Line through from the origin and slop of the line is equal to Higher order term in xij such as xij2 is required. Transformation such as replacing xij with 1/ xij is required there is no additional useful information in xij for predicting y. Fall 2014 Regression II; Bu-Ali Sina University Regression II

11 Ch4: Other Plots: Regressor versus regressor
Scatterplot of regressor xi against regressor xj : Is useful in studying the relationship between regressor variables, Is useful in detecting multicollinearity There is one unusual observation with respect to xj There is one unusual observation with respect to both sides Fall 2014 Regression II; Bu-Ali Sina University

12 Ch4: Other Plots: Response versus regressor
Scatterplot of response y against regressor xj : Is useful in distinguishing the type of points Influential point, Outlier in x space, Prediction variance for this point is large, Residual variance for this point is small. Influential point, Outlier in y direction leverage point, Outlier in both sides, Prediction variance for this point is large, Residual variance for this point is small. Fall 2014 Regression II; Bu-Ali Sina University

13 Ch4: PRESS Statistic: computation and usage
PRESS is generally regarded as a measure of how well a regression model will perform in predicting new data. A model with a small value of PRESS is desired. PRESS residuals are: and accordingly PRESS statistic is defined as follows: R2 for prediction based on PRESS statistic: Fall 2014 Regression II; Bu-Ali Sina University

14 Ch4: PRESS Statistic: Interpretation with an example
Fall 2014 Regression II; Bu-Ali Sina University

15 Ch4: PRESS Statistic: Interpretation with an example
y is regressed on x1 y is regressed on x1 , x2 So, model including both x1 and x2 is better than model with only x1 is included. Fall 2014 Regression II; Bu-Ali Sina University

16 Ch4: Detection and Treatment of Outliers: Tools and methods
Recall that, an outlier is an extreme observation; one that is considerably different from the majority of the data. Detection tools: Residuals Scaled residuals Doing statistical test Outliers can be categorized to: Bad values, occurring as a result of unusual but explainable events; such as faulty measurement or analysis, incorrect recording of data and failure of a measurement instrument; Normally observed values, such as leverage and influential observations. Treatment Remove bad values Follow-up analysis of outliers; this may help us to improve process or results in new knowledge concerning factors whose effect on the response was previously unknown. The effect of outliers may be checked easily by dropping these points and refitting the regression equation. Fall 2014 Regression II; Bu-Ali Sina University

17 Ch4: Detection and Treatment of Outliers: Example 1 (Rocket data)
Quantity Obs 5 & 6 IN Obs 5 & 6 out intercept Slope -37.15 -37.69 R2 0.9018 0.9578 MSRes Fall 2014 Regression II; Bu-Ali Sina University

18 Ch4: Detection and Treatment of Outliers: Example 2
Country Cigarette Deaths Australia 480 180 Canada 500 150 Denmark 380 170 Finland 1100 350 UK 460 Iceland 230 60 Netherlands 490 240 Norway 250 90 Sweden 300 110 Switzerland 510 USA 1300 200 Regression with all the data: The regression equation is y = x R-Sq = 54.4% Regression without the USA y = x R-Sq = 88.9% Fall 2014 Regression II; Bu-Ali Sina University

19 Ch4: Lack of fit of the regression model: What is meant?
All models are wrong; some models are useful (George Box) In the simple linear regression model if we have n distinct data points we can always fit a polynomial of order up to n-1. In the process what we claim to be random error is actually a systematic departure as the result of not fitting enough terms. Perfect linear fitting is always possible when we have two distinct points. Perfect linear fitting is not possible in general when we have three (and more) distinct points. Fall 2014 Regression II; Bu-Ali Sina University

20 Ch4: Lack of fit of the regression model: A formal test
This test assumes that normality, independence and constant variance requirements are met and Only first order or straight line character of the relationship is in doubt. To do this test, we have to replicate observations on response y for at least one level of x. These new data can provide a model-independent estimate of 2. Straight line fit is not satisfactory Fall 2014 Regression II; Bu-Ali Sina University

21 Ch4: Lack of fit of the regression model: A formal test
Let yij= jth observation on the response at xi ; j=1,…, ni ; i=1,….,m. So, we have n=ni observations and we can write: Considering a linear regression: Fall 2014 Regression II; Bu-Ali Sina University

22 Ch4: Lack of fit of the regression model: A formal test
We have: Accordingly, we get: Fall 2014 Regression II; Bu-Ali Sina University

23 Ch4: Lack of fit of the regression model: A formal test
Accordingly: If the fitted value are close to the corresponding average response then there is a strong indication that the regression function is linear. Note that: If the assumption of constant variance is satisfied, SSPE is a model independent measure of pure error. Degree of freedom for SSPE is Note also that: where Si2 is the variance of response at level xi. Fall 2014 Regression II; Bu-Ali Sina University

24 Ch4: Lack of fit of the regression model: A formal test
It is well known that E(Si2)=2 and so we get: But for the SSLOF we have: Fall 2014 Regression II; Bu-Ali Sina University

25 Ch4: Lack of fit of the regression model: A formal test
An unbiased estimation of variance can be obtained by: Moreover, we have: So, the ratio can be considered as a statistics for testing the linearity assumption in the linear regression model. It can be seen that F0 follows a Fm-2,n-m and therefor Regression function is not linear if F0> Fm-2,n-m,1- Fall 2014 Regression II; Bu-Ali Sina University

26 Ch4: Lack of fit of the regression model: limitation and solutions
limitations Ideally, we find that the F ratio for lack of fit is not significant, and the hypothesis of significance of regression is rejected. Unfortunately, this does not guarantee that the model will be satisfactory as a prediction equation. The model may have been fitted to error only. Solutions Regression model is to be useful as a predictor when F ratio is at least four or five times of critical value from F table, Comparing the range of the fitted value s to their average standard error. In order to do this we can use the following measure for average standard error Where is a model independent estimate of the error variance. Fall 2014 Regression II; Bu-Ali Sina University

27 Ch4: Lack of fit of the regression model: multiple version
repeat observations do not often occur in multiple regression According to a solution we are searching for points in x space that are near neighbors, that is, sets of observations that have been taken with nearly identical levels of x1 , x2 , …, xk. As a measure of the distance between any two points, for example will use the weighted sum of squared distance (WSSD): Pairs of point that have small Dii’ are near neighbors The residuals at two points with a small value of Dii’ can be used to obtain an estimate of pure error. Fall 2014 Regression II; Bu-Ali Sina University

28 Ch4: Lack of fit of the regression model: multiple version
There is a relationship between the range of a sample from a normal population and the population standard deviation. For samples of size 2, this relationship is (Exercise) An algorithm for sample size greater than 2 is: First arrange the data points xi1, xi2, …, xik in order of increasing Compute the values of Dii’2 for all n – 1 pairs of points with adjacent values of Repeat this calculation for the pairs of points separated by one, two, and three intermediate values This will produce 4n-10 values of Dii’2 Arrange above values in ascending order. Let Eu for u=1,…, 4n-10 denote the range of residuals at these points and calculate an estimate of standard deviation of pure error by: E1, E2 ,…, Em, are residuals associated with the m smallest values of Dii’2 Fall 2014 Regression II; Bu-Ali Sina University

29 Methods to Correct Model Inadequacy
Chapter 5 Methods to Correct Model Inadequacy Fall 2014 Regression II; Bu-Ali Sina University Regression II

30 Ch 5: Transformation and Weighting
Main assumption in model Y=X+ are: E()=0 , Var()= 2  , N(0, 2 ) , Form of X , that has used in the model, is correct. We use residuals analysis to detect violation from these basic assumptions. In this chapter, we focus on methods and procedures for building regression models when some of the above assumptions are violated. Fall 2014 Regression II; Bu-Ali Sina University Regression II

31 Ch5: Transformation and Weighting: Problems?
Error variance is not constant Relationship between y and regressors is not linear. Solutions Transformation: Using transformed data and stabilize variance. Weighting: Using weighted least square. Parameter estimators are unbiased Parameter estimators are not BLUE Fall 2014 Regression II; Bu-Ali Sina University Regression II

32 Ch5: Transformation: Stabilizing variance
Let Var(Y)=c2.[E(Y)]h . In this case we have: Example 1: Poisson data, Var(Yi)=E(Yi) Example 2: Inverse-Gaussian data, Var(Yi)=E3(Yi) Fall 2014 Regression II; Bu-Ali Sina University Regression II

33 Ch5: Transformation: Stabilizing variance
Relationship of 2 to E(Y) Transformation 2  constant Y’=Y (no transformation) 2  E(Y) Y’=Y1/2 (square root; Poisson data) 2  E(Y)[1-E(Y)] Y’=Arcsine(Y1/2 ) (Binomial data) 2  [E(Y)]2 Y’=log(Y) (Gamma distribution) 2  [E(Y)]3 Y’=Y-1/2 (Inverse-Gaussian) 2  [E(Y)]4 Y’=Y-1 Fall 2014 Regression II; Bu-Ali Sina University Regression II

34 Ch5: Transformation for stabilizing variance: limitations
Note that the predicted values are in the transformed scale, so: Applying the inverse transformation directly to the predicted values gives an estimate of the median of the distribution of the response instead of the mean. Confidence or prediction intervals may be directly converted from one metric to another. However, there is no assurance that the resulting intervals in the original units are the shortest possible intervals Fall 2014 Regression II; Bu-Ali Sina University Regression II

35 Ch5: Transformation: Linearizing model
The linearity assumption is the usual starting point in regression analysis. Occasionally we find that this assumption is inappropriate. Nonlinearity may be detected via the lack-of-fit test, from scatter diagrams, the matrix of scatterplots, residual plots such as the partial regression plot, Prior experience or theoretical considerations . In some cases a nonlinear function can be linearized by using a suitable transformation. Such nonlinear models are called intrinsically or transformably linear. Fall 2014 Regression II; Bu-Ali Sina University Regression II

36 Ch5: Transformation: Linearizing model
Example 1: This function is intrinsically linear since it can be transformed to a straight line by a logarithmic transformation Example 2: This function can be linearized by using the reciprocal transformation x’=1/x: Fall 2014 Regression II; Bu-Ali Sina University Regression II

37 Ch5: Transformation: Linearizing model
Linearizable function Transformation Y=0exp(1x) Y’=log (Y) Y=0 x1 Y’=log(Y) and x’=log(x) Y=0+1log(x) x’=log(x) Y=0/(0x-1) X’=1/x and Y’=1/Y When transformations such as those described above are employed, the least –squares estimator has least-squares properties with respect to the transformed data, not the original data. Fall 2014 Regression II; Bu-Ali Sina University Regression II

38 Ch5: Transformation: Analytical method
Transformation on Y Power transformation Box-Cox method Where is the geometric mean of the observations. Then fit model Y() =X+ The maximum-likelihood estimate of λ corresponds to the value of λ for which the residual sum of squares from the fitted model SSRes(λ) is a minimum Fall 2014 Regression II; Bu-Ali Sina University Regression II

39 Ch5: Transformation: Analytical method
Obtaining suitable value for λ is easy by plotting λ versus SSRes(λ) for some possible values of λ (usually between (-3 , +3)) SSRes(λ) λ Fall 2014 Regression II; Bu-Ali Sina University Regression II

40 Ch5: Transformation: Analytical method with regressors
Suppose that the relationship between y and one or more of the regressor variables is nonlinear but that the usual assumptions of normally and independently distributed responses with constant variance are at least approximately satisfied. Assume E(Y)=0+1Z where Assuming 0, we expand about 0= in a Taylor series and ignore terms of higher than first order: E(Y)=0+1x0 +(- 0) 1x0.log(x)=0+1x1+2.x2 Where, 0= 0 , 1=1 , 2=(- 0) 1 and x1=x0 , x2=x0.log(x). Fall 2014 Regression II; Bu-Ali Sina University Regression II

41 Ch5: Transformation: Analytical method with regressors
Now use the following algorithm (By Box-Tidwell, 1962): Fit model E(Y)=0+1x and find least square estimates of 0 and 1 Fit model E(Y)=0+1x+2x.log(x) and find least square estimates of 0 , 1 and 2. Applying equality provide an updated Set x=xi and repeat steps 1-3 again. apply the above algorithm until a small difference among i and i-1. (index i is for the repeat in the algorithm and 0=1) Fall 2014 Regression II; Bu-Ali Sina University Regression II

42 Ch5: Transformation: Analytical method with regressors
Example: ŷ= x ŷ = x x log(x) 1=-0.462/ =-0.92 x’=x-0.92 ŷ = x’ ŷ = x x’ log(x’) 2=0.5994/ =-1.01 Fall 2014 Regression II; Bu-Ali Sina University Regression II

43 Ch5: Transformation: Analytical method with regressors
Example: Model 1 (blue and solid line in the graph) ŷ= x R2=0.87 Model2 (Red and dotted line in the graph) ŷ = /x R2=0.980 Fall 2014 Regression II; Bu-Ali Sina University Regression II

44 Ch5: Generalized least square: Covariance matrix is nonsingular
Consider the model Y=X+ with the following assumptions: E()=0 , Var()= 2 V , where V is a nonsingular square matrix. We will approach this problem by transforming the model to a new set of observations that satisfy the standard least-squares assumptions. Then we will use ordinary least squares on the transformed data. Fall 2014 Regression II; Bu-Ali Sina University Regression II

45 Ch5: Generalized least square: Covariance matrix is nonsingular
Since V is nonsingular and positive definite, we can write V=K’K=KK where K is a nonsingular symmetry square matrix. Define the new variables: Z=K-1Y ; B=K-1X ; g=K-1 Multiplying both sides of original regression model by K-1 gives: Z=B+g This new transformed model has following properties: E(g)=K-1E()=0 ; Var(g)=E{[(g-E(g)]’[(g-E(g)]}=E(g’g)=K-1E(’) K-1 = K-1 2 V K-1= 2 K-1 KK K-1= 2  Fall 2014 Regression II; Bu-Ali Sina University Regression II

46 Ch5: Generalized least square: Covariance matrix is nonsingular
So, in this transformed model, the error terms g has zero mean and constant variance and uncorrelated. In this model: This estimator is called Generalized least square estimator of . We have easily: Fall 2014 Regression II; Bu-Ali Sina University Regression II

47 Ch5: Generalized least square: Covariance matrix is diagonal
When the errors ε are uncorrelated but have unequal variances so that the covariance matrix of ε is: the estimation procedure is usually called weighted least squares. Let W=V-1. Then we have: Which is called the weighted least-squares estimator. Note that observations with large variances will have smaller weights than observations with small variances Fall 2014 Regression II; Bu-Ali Sina University Regression II

48 Ch5: Generalized least square: Covariance matrix is diagonal
For the case of simple linear regression, the weighted least-squares function is Getting derivative with respect to 0 and 1 the resulting least-squares normal equations would become: Exercise: Show the solutions of the above system is coincide with general formula, stated in the previous page. Fall 2014 Regression II; Bu-Ali Sina University Regression II

49 Diagnostics for Leverage and Influence
Chapter 6 Diagnostics for Leverage and Influence Fall 2014 Regression II; Bu-Ali Sina University Regression II

50 Ch6: Diagnostics for Leverage and influence
In this chapter, we present several diagnostics for leverage and influence. This point does not affect the estimates of the regression coefficients It has a dramatic effect on the model summary statistics such as R2 and the standard error of the regression coefficients It has a noticeable impact on the model coefficients. Fall 2014 Regression II; Bu-Ali Sina University Regression II

51 Ch6: Diagnostics for Leverage and influence: importance
A regression coefficient may have a sign that does not make engineering or scientific sense, A regressor known to be important may be statistically insignificant, A model that fits the data well and that is logical from an application– environment perspective may produce poor predictions. These situations may be the result of one or perhaps a few influential observations. Finding these observations then can shed considerable light on the problems with the model. Fall 2014 Regression II; Bu-Ali Sina University Regression II

52 Ch6: Diagnostics for Leverage and influence: Leverage
The only measure is hat matrix. The hat matrix diagonal is a standardized measure of the distance of the ith observation from the center (or centroid) of the x space. Thus, large hat diagonals reveal observations that are leverage points because they are remote in x space from the rest of the sample. Two problem with this rule: 2(K+1)>n, in this case the cut off does not apply, Leverage points are potentially influential. If hii>2¯h =2(K+1)/n then i th observation is leverage Fall 2014 Regression II; Bu-Ali Sina University Regression II

53 Ch6: Diagnostics for Leverage and influence: Measures of influence
Formula Rules Cook’s D Di is not an F statistics but practically it can be compared with F,K+1,n-K-1, We consider Points with Di>1 to be influence. DFBETAS If |DFBETTASj,i|>2/n then ith observation warrants examination DFFITS If |DFFITSi|>2(K+1)/n then ith observation warrants attention Fall 2014 Regression II; Bu-Ali Sina University Regression II

54 Ch6: Diagnostics for Leverage and influence: Cook’s D
There are several equivalent formulas (Exercise) This term is big if case i is unusual in y-direction This term is big if case i is unusual in x-direction It is the squared Euclidean Distance that the vector of fitted values moves when the ith observation is deleted Fall 2014 Regression II; Bu-Ali Sina University Regression II

55 Ch6: Diagnostics for Leverage and influence: DFBETAS
There is an interesting computational formula for DFBETAS. Let R=(X’X)-1X’ and r’j =[rj,1, rj,2,…,rj,n] denotes the jth row of R. Then, we can write (Exercise) Is a measure of the impact of the ith observation on ˆj This term is big if case i is unusual in both sides Fall 2014 Regression II; Bu-Ali Sina University Regression II

56 Ch6: Diagnostics for Leverage and influence: DFFITS
DFFITSi is the number of standard deviations that the fitted value i changes if observation i is removed. Computationally we may find (Exersice) Note that, However, if hii≈0, the effect of R-student will be moderated. Similarly a near-zero R-student combined with a high leverage point could produce a small value of DFFITS. Is the leverage of the ith observation. This term is big if case i is an outlier Fall 2014 Regression II; Bu-Ali Sina University Regression II

57 Regression II; Bu-Ali Sina University
Ch6: Diagnostics for Leverage and influence: A measure of model performance The diagnostics Di , DFBETASj,i , and DFFITSi provide insight about the effect of observations on the estimated coefficients j and fitted values i. They do not provide any information about overall precision of estimation. Since it is fairly common practice to use the determinant of the covariance matrix as a convenient scalar measure of precision, called the generalized variance, we could define the generalized variance of as Fall 2014 Regression II; Bu-Ali Sina University Regression II

58 Regression II; Bu-Ali Sina University
Ch6: Diagnostics for Leverage and influence: A measure of model performance To express the role of the ith observation on the precision of estimation, we could define Clearly if COVRATIOi > 1, the ith observation improves the precision of estimation, while if COVRATIOi < 1, inclusion of the ith point degrades precision. Computationally (Exercise): Cutoff value for COVRATIO is not easy, but researchers suggest that if COVRATIOi > 1 + 3(K+1)/n or if COVRATIOi < 1 – 3(K+1)/n, then the ith point should be considered influential. Fall 2014 Regression II; Bu-Ali Sina University Regression II

59 Polynomial Regression Models
Chapter 7 Polynomial Regression Models Fall 2014 Regression II; Bu-Ali Sina University Regression II

60 Ch 7: Polynomial regression models
Is a subclass of multiple regression, Example 1: the second-order polynomial in one variable Y=0+1x+ 2x2+ Example 2: the second-order polynomial in two variables Y=0+1x+ 2x2+ 11x12 +22x22 +12x1x2 + Polynomials are widely used in situations where the response is curvilinear, Complex nonlinear relationships can be adequately modeled by polynomials over reasonably small ranges of the x’s. This chapter will survey several problems and issues associated with fitting polynomials. Fall 2014 Regression II; Bu-Ali Sina University Regression II

61 Ch 7: Polynomial regression models: in one variables
In general, the kth-order polynomial model in one variable is Y=0+1x+ 2x2+ … +kxk+ If we set xj = xj, j = 1, 2,.. ., k, then the above model becomes a multiple linear regression model in the k regressors x1 , x2 ,. .. xk. Thus, a polynomial model of order k may be fitted using the techniques studied previously. Set E(Y|X=x)=g(x) be an unknown function. Using Taylor series expansion: So, the polynomial models are also useful as approximating functions to unknown and possibly very complex nonlinear relationships. Fall 2014 Regression II; Bu-Ali Sina University Regression II

62 Ch 7: Polynomial regression models: in one variables
Example (Second-order model or Quadratic model): Y=0+1x+ 2x2+ We often call β1 the linear effect parameter and β2 the quadratic effect parameter. The parameter β0 is the mean of y when x = 0 if the range of the data includes x = 0. Otherwise β0 has no physical interpretation. Numerical example 10 ╾ 9 ╾ 8 ╾ 7 ╾ E(Y) ╾ 5 ╾ 4 ╾ 3 ╾ 2 ╾ 1 ╾ 0 ╾ 5-2x-0.25x2 | | | | | | | | | | | x Fall 2014 Regression II; Bu-Ali Sina University Regression II

63 Regression II; Bu-Ali Sina University
Ch 7: Polynomial regression models: Important consideration in fitting these models Order of the model: Keep the order of the model as low as possible Model building strategy: Use forward selection or backward elimination Extrapolation: extrapolation with polynomial models can be extremely hazardous Example: If we extrapolate beyond the range of the original data, the predicted response turns downward. This may be at odds with the true behavior of the system. But, In general, a polynomial models may turn in unanticipated and inappropriate directions, both in interpolation and in extrapolation. 10 ╾ 9 ╾ 8 ╾ 7 ╾ E(Y) ╾ 5 ╾ 4 ╾ 3 ╾ 2 ╾ 1 ╾ 0 ╾ 5+2x-0.25x2 | | | | | | | | | | | x Region of original data Extrapolation Fall 2014 Regression II; Bu-Ali Sina University Regression II

64 Regression II; Bu-Ali Sina University
Ch 7: Polynomial regression models: Important consideration in fitting these models Ill –Conditioning I: This means that the matrix inversion calculations will be inaccurate, and considerable error may be introduced into the parameter estimates. Nonessential ill-conditioning caused by the arbitrary choice of origin can be removed by first centering the regressor variables Ill –Conditioning II : If the values of x are limited to a narrow range, there can be significant ill-conditioning or multicollinearity in the columns of the X matrix. For example, if x varies between 1 and 2, x2 varies between 1 and 4, which could create strong multicollinearity between x and x2. Fall 2014 Regression II; Bu-Ali Sina University Regression II

65 Regression II; Bu-Ali Sina University
Ch 7: Polynomial regression models: Important consideration in fitting these models Example: Hardwood Concentration in Pulp and Tensile Strength of Kraft Paper Fitting: Testing: Diagnostic: Residual analysis Fall 2014 Regression II; Bu-Ali Sina University Regression II

66 Ch 7: Polynomial regression models: in two or more variables
In general, these models are straightforward extension of the model with one variable . An example of a second-order in two variable is: Y=0+1x1 + 2x2+ 11x12 +22x22 +12x1x2 + Where, 1, 2 are linear effect parameters, 11, 22 are quadratic effect parameters and 12 is an interaction effect parameter. This example, has received considerable attention, both from researchers and from practitioners. The regression function of this example is called response surface. Response surface methodology (RSM) is widely applied in industry for modeling the output response(s) of a process in terms of the important controllable variables and then finding the operating conditions that optimize the response. Fall 2014 Regression II; Bu-Ali Sina University Regression II

67 Ch 7: Polynomial regression models: in two or more variables
Example: Observation Run order Temperature (T) Concentration (C ) conversion 1 4 200 15 43 2 12 250 78 3 11 25 69 5 73 6 189.65 20 48 7 260.35 76 225 12.93 65 8 27.07 74 9 10 79 83 81 x1 x2 y -1 43 1 78 69 73 -1.414 48 1.414 76 65 74 79 83 81 Fall 2014 Regression II; Bu-Ali Sina University Regression II

68 Ch 7: Polynomial regression models: in two or more variables
Example: Central composite design is widely used for fitting RSM. 2 ╾ 1 ╾ 0 ╾ -1 ╾ -2 30 ╾ 25 ╾ 20 ╾ 15 ╾ 10 x2 Runs at: Comers of square (x1,x2)=(-1,-1),(-1,1), (1,-1),(1,1) Center of square (x1,x2)=(0,0),(0,0), (0,0),(0,0) Axial of square (x1,x2)=(0,-1.414),(0,1.414), (-1.414,0),(1.414,0) Concentration | | | | Temperature, | | | | x1, Fall 2014 Regression II; Bu-Ali Sina University Regression II

69 Ch 7: Polynomial regression models: in two or more variables
We fit the second order model: Y=0+1x1 + 2x2+ 11x12 +22x22 +12x1x2 + To do this, we have: Fall 2014 Regression II; Bu-Ali Sina University Regression II

70 Ch 7: Polynomial regression models: in two or more variables
So, the fitted model by coded variable is: the second order model: ŷ= x x x x x1x2 And in terms of the original data, the model is: We use coded data for computation of sum of squares: Source of variation SS D.F MS F P-value Regression 5 346.72 58.87 <0.0001 Residual 35.34 6 5.89 Total 11 Fall 2014 Regression II; Bu-Ali Sina University Regression II

71 Ch 7: Polynomial regression models: in two or more variables
So, if we fit only linear model by coded variable, we have: Source of variation SS D.F MS F P-value Regression 914.41 2 457.21 4.82 0.0377 Residual 854.51 9 94.95 Total 11 Fall 2014 Regression II; Bu-Ali Sina University Regression II

72 Ch 7: Polynomial regression models: in two or more variables
As, the last four rows of the matrix X in page 64 are the same, we can divide the SSRes ibto two components and do a lack of fit test. We have: Source of variation SS D.F MS F P-value Regression (SSR) SSR(1, 2|0) SSR(11, 22, 12 |1, 2, 0)=SSR-SSR(1, 2|0) (914.4) (819.2) 5 (2) (3) 346.72 (457.2) (273.1) 58.87 <0.0001 Residual Lack of fit Pure error 35.34 (8.5) (26.8) 6 5.89 (2.83) (8.92) 0.3176 0.8120 Total 11 Fall 2014 Regression II; Bu-Ali Sina University Regression II

73 Ch 7: Polynomial regression models: in two or more variables
As the quadratic model is significant for the data, we can do tests on the individual variables to drop out unimportant terms, if there is any. We use the following statistics Where Cjj are diagonal entities of the matrix (XX’)-1: Variable Estimated coefficient Standard error t P-value Intercept 79.75 1.21 65.72 x1 9.83 0.86 11.45 0.0001 x2 4.22 4.913 0.0027 x12 -8.88 0.96 -9.25 x22 -5.13 -5.341 0.0018 x1x2 -7.75 -6.386 0.0007 Fall 2014 Regression II; Bu-Ali Sina University Regression II

74 Ch 7: Polynomial regression models: in two or more variables
Generally we prefer to fit the full quadratic model whenever possible, unless there are large differences between the full and the reduced model in terms of PRESS and adjusted R2 ŷ= x x x x x1x2 Using equation we have: Note that, all 8 runs 1 to 8 have the same hii as these points are Equidistant form the center of the design. In addition, all last four runs have hii=0.25. x1 x2 y ŷ ei hii ti e[i] -1 43 43.96 -0.96 0.625 -0.67 -2.55 1 78 79.11 -1.11 -0.74 -2.95 69 67.89 1.11 0.75 2.96 73 72.04 0.96 0.65 2.56 -1.414 48 48.11 -0.11 -0.07 -0.29 1.414 76 75.90 0.10 0.07 0.28 65 63.54 1.46 0.98 3.89 74 75.46 -1.46 0.99 -3.90 79.75 -3.75 0.250 -1.78 -5.00 79 -0.75 -0.36 -1.00 83 3.25 1.55 4.33 81 1.25 0.59 1.67 R2= , R2Adj= , R2Predicted=0.94 Fall 2014 Regression II; Bu-Ali Sina University Regression II

75 Ch 7: Polynomial regression models: in two or more variables
Normality is hold, because: Variance is stable, because: independence is hold, because: Fall 2014 Regression II; Bu-Ali Sina University Regression II

76 Ch 7: Polynomial regression models: Orthogonal polynomial
Consider the kth-order polynomial model in one variable as Y=0+1x+ 2x2+ … +kxk+ Generally the columns of the X matrix will not be orthogonal. One approach to deal with this problem is orthogonal polynomial. In this approach we fit the following model: Y=0+ 1 P1(x)+  2P1(x)+ … +  kPk(x)+ Where Pj(x) is jth-order orthogonal polynomial defined such as: Fall 2014 Regression II; Bu-Ali Sina University Regression II

77 Ch 7: Polynomial regression models: Orthogonal polynomial
With this model we have Pj(x) can be determined by gram-Schmidt process. In the cases where the level of x are equally spaced we have: Fall 2014 Regression II; Bu-Ali Sina University Regression II

78 Ch 7: Polynomial regression models: Orthogonal polynomial
Gram-Schmidt process: Consider an arbitrary set S={U1,…,Uk} and denote by 〈Ui , Uj〉 the inner product of Ui and Uj. Then the set S’={V1,…,Vk} are orthogonal when computed as bellow: Normalizing Fall 2014 Regression II; Bu-Ali Sina University Regression II

79 Ch 7: Polynomial regression models: Orthogonal polynomial
In polynomial regression with one variable assume that Ui =xi-1 . Now, applying Gram-Schmidt process, we have: If the levels of x are equally spaced we have: So, in this case we have: Exercise : Give a proof for other Pj(x) in page 72 by similar method. Note: every arbitrary constant can be substituted by j in Pj(x). Normalizing Fall 2014 Regression II; Bu-Ali Sina University Regression II

80 Ch 7: Polynomial regression models: Orthogonal polynomial
Example: x y 50 335 75 326 100 316 125 313 150 311 175 314 200 318 225 328 250 337 275 345 i P0(x)=1 1 -9 335 2 -7 326 3 -5 316 4 -3 313 5 -1 311 6 314 7 318 8 328 9 337 10 345 xi-xi-1 =25 for all i, so the levels of X are equally spaced and we have Fall 2014 Regression II; Bu-Ali Sina University Regression II

81 Ch 7: Polynomial regression models: in two or more variables
Then, the fitted model is: Source of variation SS D.F MS F P-value Regression (SSR) Linear Quadratic 181.89 2 (1) 606.72 (181.89) ( ) 159.24 47.74 270.75 <0.0001 <0.0002 Residual 26.67 7 3.81 Total 1240.1 9 Fall 2014 Regression II; Bu-Ali Sina University Regression II

82 Regression II; Bu-Ali Sina University
Chapter 8 Indicator Variables Fall 2014 Regression II; Bu-Ali Sina University Regression II

83 Ch 8: Indicator Variables
The variables employed in regression analysis, are often quantitative variables: Example: temperature, distance, income These variables have well defined scale of measurement In some situation it is necessary to use qualitative or categorical variables, as predictor variables Example: sex, operators, employment status, In general, these variables have no natural scale of measurement, Question: How we can account for the effect that these variables may have on the response? This is done through the use of indicator variable. Sometimes, indicator variables are called dummy variables . Fall 2014 Regression II; Bu-Ali Sina University Regression II

84 Ch 8: Indicator Variables: Example 1
Y=life of cutting tool: X1 = lathe speed per minute; X2 = Type of cutting tool; is qualitative and has two levels (e.g. tool types A and B) Let Assuming that a first-order model is appropriate, we have: Y=β0+ β1x1+ β2x2+ϵ 0 if the observation is from tool types A X2= 1 if the observation is from tool types B A ← tool type →B Y=β0+ β1x1+ β2(0)+ϵ=β0+ β1x1+ϵ Y=β0+ β1x1+ β2(1)+ϵ=(β0+β2)+ β1x1+ϵ Fall 2014 Regression II; Bu-Ali Sina University Regression II

85 Ch 8: Indicator Variables
50 β0+β2 β0 Regression lines are parallel; 2 is a measure of the difference in mean tool life resulting from changing from tool type A to tool type B; Variance of the error is assumed to be the same for both tool types A and B. Tool life, y (hours) β2 E(Y|x2=1)=β0+ β2 + β1x1 , tool type B β1 E(Y|x2=0)=β0+ β1x1 , tool type A β1 Lathe speed, x1 (RPM) Fall 2014 Regression II; Bu-Ali Sina University Regression II

86 Ch 8: Indicator Variables: Example 2
Consider again the example 1 , but here assume that X2 = Type of cutting tool; is qualitative and has three levels (e.g. tool types A, B and C) Define two indicator variables: Assuming that a first-order model is appropriate, we have: Y=β0+ β1x1+ β2x2+ β3x3+ ϵ= (0,0) if the observation is from tool types A (x2,x3)= (1,0) if the observation is from tool types B (0,1) if the observation is from tool types C β0+ β1x1+ β2(0)+ β3(0)+ ϵ=β0+ β1x1+ϵ tool type=A β0+ β1x1+ β2(1)+ β3(0)+ ϵ=(β0+β2)+ β1x1+ϵ tool type=B β0+ β1x1+ β2(0)+ β3(1)+ ϵ=(β0+β3)+ β1x1+ϵ tool type=C In general, a qualitative variable with l levels is represented by l -1 indicator variables, each taking on the values 0 and 1. Fall 2014 Regression II; Bu-Ali Sina University Regression II

87 Ch 8: Indicator Variables: Numerical example
yi xi1 xi2 18.73 610 A 14.52 950 17.43 720 14.54 840 13.44 980 24.39 530 13.34 680 22.71 540 12.68 890 19.32 730 30.16 670 B 27.09 770 25.40 880 26.05 1000 33.49 760 35.62 590 26.07 910 36.78 650 34.95 810 43.67 500 We fit model Y=β0+ β1x1+ β2x2+ϵ Fall 2014 Regression II; Bu-Ali Sina University Regression II

88 Ch 8: Indicator Variables: Numerical example
Then, the fitted model is: Source of variation SS D.F MS F P-value Regression (SSR) 2 709.02 79.75 <0.0001 Residual 157.06 17 9.24 Total 19 Variable Estimated coefficient Standard error t P-value Intercept 36.99 x1 -0.03 0.005 -5.89 < x2 15 1.360 11.04 Fall 2014 Regression II; Bu-Ali Sina University Regression II

89 Ch 8: Indicator Variables: Comparing regression models
Consider the case of simple linear regression where the n observations can be formed into M groups, with the m th group having nm observations. The most general model consists of M separate equations such as: Y=0m+1mx+ , m=1,2,…,M It is often of interest to compare this general model to a more restrictive one Indicator variables are helpful in this regard. Using indicator variables we can write: Y=(01+11x)D1 +(02+12x)D2 +…+(0M+1Mx)DM + Where Di is 1 when group i is selected. We call this model as full model (FM). In this model we have 2M parameters and so degree of freedom for SSRes(FM) is n-2M. Exercise: Let SSRes(FMm) denotes sum of square of residual in model Y=0m+1mx+. Show that SSRes(FM)= SSRes(FM1)+ SSRes(FM2)+…+ SSRes(FMM) We consider three cases: Parallel lines: 11=12 =…= 1M Concurrent lines: 01=02 =…= 0M Coincide lines: 11=12 =…= 1M and 01=02 =…= 0M Fall 2014 Regression II; Bu-Ali Sina University Regression II

90 Ch8: Indicator Variables: parallel lines
In the parallel lines all M slopes are identical but the intercepts may differ. So, here we want to test: H0:11=12 =…= 1M =1 Recall that this procedure involves fitting a full model (FM) and a reduced model(RM) restricted to the null hypothesis and computing the F statistics Under H0 the full model will be reduced to the following model Y=01+1x +2D2 +…+MDM + In this model dfRM=n-(M+1) Therefore, using the above F statistics we can test hypothesis H0. Analysis of covariance Fall 2014 Regression II; Bu-Ali Sina University Regression II

91 Ch 8: Indicator Variables: Concurrent and coincide lines
In the concurrent lines all M intercepts are identical but the slopes may differ: H0:01=02 =…= 0M =0 Under H0 the full model will be reduced to the following model Y=0+1x +2 xD2 +…+M xDM + In this model dfRM=n-(M+1) In this way, similar to parallel lines, we can test hypothesis H0 using the above F statistics. In the coincide lines we want to test: H0:01=02 =…= 0M =0 and 11=12 =…= 1M =1 Under H0 the full model will be reduced to the simple model Y=0+1x + In this model dfRM=n-2 In this way, similar to parallel lines, we can test hypothesis H0 using the above F statistics. Fall 2014 Regression II; Bu-Ali Sina University Regression II

92 Ch 8: Indicator Variables: Regression approach to analysis variance
Consider a one way model: yij=+i+ij =i+ij , i=1,…,k; j=1,2,…,n In the fixed effect case: H0:1= 2 =…= k =0 H1: 10 at least for one i Source of variation SS Df MS F Treatment K-1 ssT/(k-1) Error K(n-1) SSRes /k(n-1) Total Kn-1 Fall 2014 Regression II; Bu-Ali Sina University Regression II

93 Ch 8: Indicator Variables: Regression approach to analysis variance
Equivalent regression model for the one way model: yij=i+ij , i=1,…,k; j=1,2,…,n is: Yij=0+1x1j + 2x2j+ …+k-1xk-1,j +ij where Relationship between two models: 0=k i= i -k ; i=1,…,k Exercise: Find the relationship among Sum of Squares in regression and one way Anova. 1 if the observation j is from treatment i Xij= 0 otherwise Fall 2014 Regression II; Bu-Ali Sina University Regression II

94 Ch 8: Indicator Variables: Regression approach to analysis variance
For the case k=3 we have: equivalent H0:1= 2 = 3 =0 H1: 10 at least for one i H0:0= and 1= 2=0 H1:  10 or  10 or both Fall 2014 Regression II; Bu-Ali Sina University Regression II


Download ppt "Regression II Dr. Rahim Mahmoudvand Department of Statistics,"

Similar presentations


Ads by Google