# What is it and why could it be inappropriate? WINSORIZING Kyle Allen & Matthew Whitledge May 7, 2013.

## Presentation on theme: "What is it and why could it be inappropriate? WINSORIZING Kyle Allen & Matthew Whitledge May 7, 2013."— Presentation transcript:

What is it and why could it be inappropriate? WINSORIZING Kyle Allen & Matthew Whitledge May 7, 2013

 What it isn’t…  Trimming  Truncating  Any other method that completely removes observations from the data  Term first used in 1960  John W. Tukey; W. J. Dixon  “Numerical value of a wild observation is untrustworthy”  However, its direction of deviation is important  Decreasing the magnitude of the deviation, retaining its direction WHAT IS WINSORIZING?

 Order the observations by value  X i1, X i2, …X i100, where i denotes the i th regressor  If Winsorizing at 1% and 99%, then  The value for X i1 will be replaced by the value for X i2  The value for X i100 will be replaced by the value for X i99 Another example:  X i1, X i2, …X i100  Winsorize at 10% (5% from bottom and 5% from the top)  Beginning Sample:  X i1, X i2, X i3, X i4, X i5, X i6,… X i95, X i96, X i97, X i98, X i99, X i100  Winsorized Sample  X i5, X i5, X i5, X i5, X i5, X i6,… X i95, X i96, X i96, X i96, X i96, X i96 WINSORIZING AN EXAMPLE Winsorized at 5% and 95% Obs.OriginalWinsorized X i1 0.26.3 X i2 0.96.3 X i3 3.56.3 X i4 4.86.3 X i5 6.3 X i6 77 X i7 7.1 X i8 7.2 X i9 -X i92 …… X i93 82 X i94 83.2 X i95 83.5 X i96 98 X i97 11298 X i98 11498 X i99 315098 X i100 657298

 Are the observations really outliers?  Look at Cook’s D measure  Transform the variables  Take the log or square root of the variable  This shouldn’t be done only to increase significance  Median based estimations  Quantile regression  Median absolute deviation  Nonparametric methods WINSORIZING ALTERNATIVES

Lift Index Data  Workers perform lifting tasks  Each lift has an amount of stress associated with it  Measuring the number of days an employee missed based on the lift they were performing  206 observations WINSORIZING A SAS EXAMPLE

WINSORIZING SAS CODE  proc sgplot data=isqsdata.lilesmerge; scatter y=dayslost x=alr; scatter y=dayslost1 x=alr; run;  data isqsdata.lileswin; set isqsdata.lileswin; if subject = 6 then dayslost = 27; if subject = 35 then dayslost = 27; run;  proc qlim data=isqsdata.liles; model dayslost = alr; endogenous dayslost ~ censored(lb=0); run;  proc qlim data=isqsdata.lileswin; model dayslost1 = alr; endogenous dayslost1 ~ censored(lb=0); run;

PROC GLIM (NON-WINSORIZED)

PROC GLIM (WINSORIZED)

 May impact significance  The standard errors will decrease  Depending on how symmetrical the data is, the mean may increase or decrease  For example, if there is an extremely positive outlier, it will decrease the mean  The significance will be determined by the proportionate change in the estimated coefficient, relative to the change in the standard error WINSORIZING IMPLICATIONS

 May be appropriate for  Ratios  Book to Market  Other measures in which the denominator can be extremely small  Never winsorize valid observations  Investment Returns  R&D expenditures  Truly exceptional observations  Large number of biological elements  Extremely low stress tolerances for mechanical implements  Model should produce data we could actually see WINSORIZING WHY COULD IT BE INAPPROPRIATE?

 Bibliography  Brillinger, David R. “John W. Tukey: His Life and Professional Contributions.” The Annals of Statistics. 30(2002): 1535-75.  Dixon, W. J. “Simplified Estimation from Censored Normal Samples.” The Annals of Mathematical Statistics. 31(1960): 385-91.  Kafadar, Karen. “John Tukey and Robustness.” Proceedings of the Annual Meeting of the American Statistical Association. 2001.  Kruskal, William, Thomas Ferguson, John W. Tukey, E. J. Gumbel, and F. J. Anscombe. “Discussion of the Papers of Messrs, Anscombe and Daniel.” Technometrics. 2(1960): 157-66.  Tukey, John W. and Donald H. McLaughlin. “Less Vulnerable Confidence and Significance Procedures for Location Based on a Single Sample: Trimming/Winsorization 1. The Indian Journal of Statistics. 25(1963): 331-52.  Westfall, Peter H. and Kevin S. S. Henning. Understanding Advanced Statistical Methods. Boca Raton, FL: CRC Publishing, 2013. WINSORIZING BIBLIOGRAPHY

Similar presentations