Presentation on theme: "Xuhua Xia Data Transformation Objectives: –Understand why we often need to transform our data –The three commonly used data transformation techniques."— Presentation transcript:
Xuhua Xia Data Transformation Objectives: –Understand why we often need to transform our data –The three commonly used data transformation techniques –Additive effects and multiplicative effects –Application of data transformation in ANOVA and regression.
Xuhua Xia Why Data Transformation? The assumptions of most parametric methods: –Homoscedasticity –Normality –Additivity –Linearity Data transformation is used to make your data conform to the assumptions of the statistical methods Illustrative examples
Xuhua Xia Homoscedasticity and Normality The data deviates from both homoscedasticity and normality.
Xuhua Xia Homoscedasticity and Normality Won’t it be nice if we would make data look this way?
Xuhua Xia Types of Data Transformation The logarithmic transformation The square-root transformation The arcsine transformation. Data transformation can be done conveniently in EXCEL. Alternatives: Ranks and nonparametric methods.
Xuhua Xia Homoscedasticity The two groups of data seem to differ greatly in means, but a t-test shows that the means do not differ significantly from each other - a surprising result. The two groups of data differ greatly in variance, and both deviate significantly from normality. These results invalidate the t-test. We calculate two ratios: var/mean ratio and Std/mean ratio (i.e., coefficient of variation). Group1Group2 Var/mean56.420416.891 C.V. 1.230 1.230 Log-transformation
Xuhua Xia The transformation is successful because: –The variance is now similar –Deviation from normality is now nonsignificant –The t-test revealed a highly significant difference in means between the two groups Log-Transformed Data NewX = ln(X+1) 1.31 2.13
Xuhua Xia Log-Transformed Data NewX = ln(X+1) Transform back: Compare this mean with the original mean. Which one is more preferable? Calculate the standard error, the degree of freedom, and 95% CL (t 0.025,16 = 2.47).
Xuhua Xia Normal but Heteroscedastic Any transformation that you use is likely to change normality. Fortunately, t-test and ANOVA are quite robust for this kind of data. Of course, you can also use nonparametric tests.
Xuhua Xia Normal but Heteroscedastic The two variances are significantly different. The t-test, however, detects significant difference in means. You can use nonparametric methods to analyse data for comparison, and you are like to find t-test to be more powerful.
Xuhua Xia Additivity What experimental design is this? Compare the group means. Is there an interaction effect? Additivity means that the difference between levels of one factor is consistent for different levels of another factor.
Xuhua Xia Multiplicative Effects Compare the group means. Is there an interaction effect? Does this data set meet the assumption of additivity? When the assumption of additivity is not met, we have difficulty in interpreting main effects. Now calculate the ratio of group means. What did you find?
Xuhua Xia Multiplicative Effects For Factor A, we see that Level 2 has a mean about 2.88 times as large as that for Level 1. For factor B, Level 2 has a mean about 2.18 times as large as that for Level 1). If you know the value for Level 1 of Factor A, you can obtain the value for Level 2 of Factor A by multiplying the known value by 2.88. Similarly, you can do the same for Factor B. We say that the effect of Factors A and B are multiplicative, not additive.
Xuhua Xia 1.31 2.13 Log-transformation Now log-transform the data. Compare the means. Is the assumption of additivity met now? Original Data 37.262108.458 2102.35117878.648 82.403234.508 12400.09180241.944 3.0844.127 1.3021.268 3.7784.803 1.2351.385 Transformed data Mean Variance
Xuhua Xia Why log-transformation can change the multiplicative effects to additive effects?
Xuhua Xia Square-Root Transformation The two groups of data differ much in variance. Calculate two ratios: var/mean ratio and Std/mean ratio (i.e., coefficient of variation). Does your calculation suggest log- transformation? When is log- transformation appropriate? Use square-root transformation when different groups have similar Variance/Mean ratios Notice the means, which do not coincide with the most frequent observations
Xuhua Xia Square-Root Transformation 1.17 2.09 Square-root transformation: The variance is now almost identical between the two groups Transform the means back to the original scale and compare these means with the original means:
Xuhua Xia Quiz on Data Transformation The data set is right- skewed for each group. Calculate the variance/mean ratio and C.V. for each group, and decide what transformation you should use. Do the transformation and convert the means back to the original scale.
Xuhua Xia With Multiple Groups When you have multiple groups, a “Variance vs Mean” or a “Std vs Mean” plot can help you to decide which data transformation to use. The graph on the left shows that the Var/Mean ratio is almost constant. What transformation should you use?
Xuhua Xia Confidence Limits Before transformationAfter transformation With the skewness in our data, do confidence limits on the right make more sense? Why?
Xuhua Xia Arcsine Transformation Used for proportions Compare the variances before and after transformation Do you know how to transform the means and C.L. back to the original scale?
Xuhua Xia Data Transformation Using SAS Data Mydata; input x; newx=log(x); newx=sqrt(x+3/8); newx=arsin(sqrt(x)); cards; Natural logarithm transfromation Square-root transformation Arcsine transformation