Presentation on theme: "Transformations Data transformation is commonly used to linearise the relationship between two numerical variables. If the relationship is non-linear,"— Presentation transcript:
Transformations Data transformation is commonly used to linearise the relationship between two numerical variables. If the relationship is non-linear, as revealed by a curved scatterplot, data transformation can used to make the relationship linear. Why transform the data?
Transformations For example, consider the following data which gives the marks achieved by a group of students plotted against the number of hours they spent studying.
Transformations Data transformation works by stretching out or compressing the scale of measurement. The result of this is that a non- linear relationship can be made linear. How do data transformations work?
Transformations We will consider the following commonly used transformations: LogXCompresses the X-scale. X 2 Expands the X-scale 1/XCompresses the X-scale LogYCompresses the Y-scale Y 2 Expands the Y-scale 1/YCompresses the Y-scale
Transformations How do we know which transformation to use? In practice we look at the scatterplot and decide whether a ‘stretching’ or ‘compressing’ transformation is needed. When we do this, we will see that (in theory at least) more than one transformation will do the job.
Transformations The relationship between mark and study hours can be linearised by either stretching the y-axis, or compressing the x-axis. Appropriate transformations are thus Y 2, logX and 1/X Stretch Y axis Compress X axis
Transformations Which is the best transformation in this case? We should try each of these suitable transformations. We should look at both the scatterplot and the residual plot for each, to evaluate the effect of the transformation. If the residual plot shows a clear pattern then the transformation has not been successful. If more than one residual plot is acceptable, then we should choose the transformation which results in the highest value of the coefficient of determination (r 2 ).
Transformations The Y 2 -transformation has improved the situation but the residual plot shows a clear pattern. Here, r 2 =.799 ScatterplotResidual plot Try Y 2
Transformations The log-transformation has proved more effective than the Y 2 -transformation the but the residual plot still shows some structure. Here, r 2 =.887 ScatterplotResidual plot Try log X
Transformations The 1/X-transformation appears to have linearised the relationship, and this is confirmed by the residual plot which shows no apparent pattern. Here, r 2 =.936 ScatterplotResidual plot Try 1/X
Transformations Which is the best transformation in this case? Here it is clear that the 1/X transformation is the most appropriate of the three. We can now fit a least squares line to the data, giving us the relationship: