2Learning Objectives At the end of this session, you will be able to describe assumptions underlying a regression analysisconduct analyses that will allow a check on model assumptionsdiscuss the consequences of failure of assumptionsconsider remedial action when assumption fail
3Checking assumptionsDescribing the relationship carries no assumptions.However, inferences concerning the slope of the line, e.g. by use of t-tests or F-tests, are subject to certain assumptions.Checking assumptions is important to avoid pitfalls associated with making invalid conclusions.
4Assumptions The simple linear regression model is: yi = 1 xi + iIn addition to assuming a linear form for themodel, the i are assumed to beindependent, withzero mean and constant variance 2,and be normally distributed.Note: Model predictions, often called fitted values, are
5How to check assumptions? The usual approach is to conduct a residual analysis. Residuals are deviations of observed values from model fitted values.Paddy data relating yield to fertiliserResidual
6Residual Plots Plotting residuals in various ways allows failure of assumptions to be detected.e.g. To check the normality assumption,plot a histogram of the residuals (provided there are enough observations);or do a normal probability plot of residuals. A straight line plot indicates that the normality assumption is reasonable.
7Residual Plots - continued Most useful is a plot of residuals against fitted values( ) .It helps to detect failure of the variance homogeneity assumption. Also helps to identify potential outliers.e.g. If standardised residuals are used,i.e. residuals/standard error,then 95% of observations would be expected to lie between –2 and +2.A random scatter with no obvious pattern is good! Some examples follow….
8Some Residual PlotsxxxxxxxxxxxxxxxxxxxxxxxxxA random scatter as above is good. It shows no obvious departures of the variance homogeneity assumption.
9Some Residual Plots Variance increases with increasing x. xxxxxxxxxxxxxxxVariance increases with increasing x.Could try a loge(y) transformation.
10Some Residual PlotsxxxxxxxxxxxxxxxxxxxxxxxxxxIndication that the response is a binomial proportion. Use a logistic regression model.
11Some Residual PlotsxxxxxxxxxxxxxxxxxxxxxxxxxLack of linearity. Pattern indicates an incorrect model - probably due to a missing squared term.
12Some Residual PlotsxxxxxxxxxxxxxxxxxxxxxxxxxPresence of an outlier. Investigate if there is a reason for this odd-point.
13Consequences of assumption failure Studies on consequences of assumptionfailure have demonstrated that:tests and confidence intervals for means are relatively robust to small departures from non-normality;the effects of non-homogeneous variance can be large, but not so serious if sample sizes in different sub-groupings are equal;dependence of observations can badly affect F-tests.
14Dealing with assumption failure One approach is to find a transformationthat will stabilize the variance.Some typical transformations are:taking logs (useful when there is skewness);square root transformation;reciprocal transformation.Sometimes theoretical grounds will determinethe transformation to use, e.g. when data arePoisson or Binomial. However, in such cases,exact methods of analysis will be preferable.
15Dealing with non-independence The assumption of independence is quite critical. Some attention to this is needed at the data collection stage.If observations are collected in time or space, plotting residuals in time (or space) order may reveal that subsequent observations are correlated.Techniques similar to those used in time series analysis or analysis of repeated measurements data may be more appropriate.
16An illustration – using paddy data Histogram of standardised residuals after fitting a linear regression of yield on fertiliser.This is a check on the normality assumption.
17A normal probability plot… Another check on the normality assumptionDo you think the points follow a straight line?
18Std. residuals versus fitted values Checking assumption of variance homogeneity, and identification of outliers:Do you judge this to be a random scatter?Are there any outliers?
19Conclusion:The residual plots showed no evidence of departures from the model assumptions.We may conclude that fertiliser does contribute significantly to explaining the variability in paddy yields.Note: Always conduct a residual analysis after fitting a regression model. The same concepts carry over to more complex models that may be fitted.
20Practical work follows to ensure learning objectives are achieved…