Presentation is loading. Please wait.

Presentation is loading. Please wait.

Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4.

Similar presentations


Presentation on theme: "Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4."— Presentation transcript:

1 Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4

2 Transformations ANOVA and Regression Common assumptions; normality, constant variance, linear relationship What if these aren’t true? –One method - transform your data to help meet the necessary assumptions (choose a different scale of measurement)

3 Transformations Common transformations: log e, log 10, square root, inverse Steps: Choose your transformation Re-check assumptions (residual plot) Perform inference on transformed data Miles/hour to hours per mile

4 PROC CONTENTS OUTPUT The CONTENTS Procedure Data Set Name: TOMHS.BPSTUDY Observations: 902 Member Type: DATA Variables: 16 Engine: V8 Indexes: 0 Created: 9:07 Saturday, February 26, 2005 Observation Length: 128 Last Modified: 9:07 Saturday, February 26, 2005 Deleted Observations: 0 ----- Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ------------------------------------------ 3 AGE Num 8 16 6 CHOL12 Num 8 40 2 GROUP Num 8 8 8 HDL12 Num 8 56 9 PULSE12 Num 8 64 10 PULSEBL Num 8 72 4 SBP12 Num 8 24 5 SBPBL Num 8 32 1 SEX Num 8 0 7 TRIG12 Num 8 48 11 WT12 Num 8 80 12 WTBL Num 8 88 13 cholbl Num 8 96 14 hdlbl Num 8 104 16 id Char 6 120 15 trigbl Num 8 112 Triglycerides distributions are typically skewed

5 The UNIVARIATE Procedure Variable: TRIG12 Histogram # Boxplot 530+* 1 *..* 1 *. 430+.* 1 *..* 1 *.** 3 * 330+* 1 0.** 3 0.* 2 0.*** 5 0 230+*** 5 0.******* 13 |.********** 19 |.********* 18 |.************** 28 | 130+*************************** 53 +-----+.***************************** 58 | + |.********************************************* 89 *-----*.********************************************* 89 +-----+.******************************* 62 | 30+********* 18 | ----+----+----+----+----+----+----+----+----+

6 The UNIVARIATE Procedure The UNIVARIATE Procedure Variable: TRIG12 Normal Probability Plot 530+ * | | * | 430+ | * | | * | ** 330+ * | ** | * ++ | ** ++++ 230+ **++ | *** | +*** | ++*** | +++*** 130+ ++***** | ++**** | ****** | ******* | *********++ 30+******** +++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2

7 The UNIVARIATE Procedure Trig12 Moments N 472 Sum Weights 472 Mean 110.283898 Sum Observations 52054 Std Deviation 64.9410309 Variance 4217.33749 Skewness 2.2308045 Kurtosis 8.06124788 Uncorrected SS 7727084 Corrected SS 1986365.96 Coeff Variation 58.885324 Std Error Mean 2.98915323 Basic Statistical Measures Location Variability Mean 110.2839 Std Deviation 64.94103 Median 94.0000 Variance 4217 Mode 83.0000 Range 511.00000 Interquartile Range 68.5000 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.823599 Pr < W <0.0001 Kolmogorov-Smirnov D 0.13479 Pr > D <0.0100

8 Taking LOG Transformation – Base 10 Xlog 10 X 101 1002 10003 100004 Takes small values of X and spreads them out and takes large values of X and brings them closer together. DATA temp; SET tomhs.bpstudy; logtrig12 = log10(trig12); logtrig12x = log (trig12); Natural log;

9 The UNIVARIATE Procedure The UNIVARIATE Procedure Variable: logtrig12 Histogram # Boxplot 2.75+* 1 0.* 2 0.*** 6 |.************ 23 |.********************* 41 |.****************************** 59 +-----+ 2.05+************************************** 76 | |.********************************************* 89 *--+--*.********************************** 68 +-----+.************************** 52 |.**************** 31 |.******* 13 |.* 2 | 1.35+** 3 | ----+----+----+----+----+----+----+----+----+ * may represent up to 2 counts

10 The UNIVARIATE Procedure Normal Probability Plot 2.75+ * | * | ****+ | **+++ | ***** | ****** 2.05+ ***** | ****** | ***** | ****** |+**+ 1.35+* +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2

11 The UNIVARIATE Procedure Variable: logtrig12 Moments N 472 Sum Weights 472 Mean 1.98272777 Sum Observations 935.847505 Std Deviation 0.22379335 Variance 0.05008346 Skewness 0.20941868 Kurtosis 0.17927124 Uncorrected SS 1879.12014 Corrected SS 23.5893114 Coeff Variation 11.2871446 Std Error Mean 0.01030092 Basic Statistical Measures Location Variability Mean 1.982728 Std Deviation 0.22379 Median 1.973128 Variance 0.05008 Mode 1.919078 Range 1.34814 Interquartile Range 0.30586 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.996164 Pr < W 0.3132 Kolmogorov-Smirnov D 0.034423 Pr > D >0.1500

12 TOMHS Study 6 Treatment groups (Variable GROUP) –Beta-blocker –Calcium channel blocker –Diuretic –Alpha-blocker –ACE inhibitor –Placebo –All Treatments given lifestyle intervention to lower BP

13 TOMHS Triglyceride Analyses 3 Treatment groups (Variable GROUP) –Beta-blocker –Diuretic –Placebo Beta-blockers may increase triglycerides

14 LIBNAME tomhs 'C:\my documents\ph5415\'; DATA temp; SET tomhs.bpstudy; logtrig12 = log10(trig12); logtrig12x = log(trig12); if group in(1,3,6); Select only group 1, 3, and 6

15 PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; Dependent Variable: TRIG12 Sum of Source DF Squares Mean Square F Value Pr > F Model 2 31373.955 15686.978 3.76 0.0239 Error 469 1954992.003 4168.426 Corrected Total 471 1986365.958 Dependent Variable: logtrig12 (Analyses Using LOG Scale – Base 10) Sum of Source DF Squares Mean Square F Value Pr > F Model 2 0.35612380 0.17806190 3.59 0.0282 Error 469 23.23318762 0.04953771 Corrected Total 471 23.58931142

16 PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; Dependent Variable: logtrig12 (Analyses Using LOG Scale – Base 10) Sum of Source DF Squares Mean Square F Value Pr > F Model 2 0.35612380 0.17806190 3.59 0.0282 Error 469 23.23318762 0.04953771 Corrected Total 471 23.58931142 Dependent Variable: logtrig12x (Analyses Using LOG Scale - Base e) Sum of Source DF Squares Mean Square F Value Pr > F Model 2 1.8881321 0.9440660 3.59 0.0282 Error 469 123.1799936 0.2626439 Corrected Total 471 125.068125

17 PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; MEANS group; ESTIMATE 'BB vs Diur' group 1 -1 0; ESTIMATE 'BB vs Plac' group 1 0 -1; The GLM Procedure Level of ----------TRIG12--------- --------logtrig12-------- --------logtrig12x------- GROUP N Mean Std Dev Mean Std Dev Mean Std Dev 1 125 121.800000 73.6913791 2.02229444 0.23006400 4.65650504 0.52974193 3 125 112.856000 72.2005165 1.98992986 0.22306638 4.58198284 0.51362933 6 222 102.351351 53.6124266 1.95639400 0.21796954 4.50476365 0.50189340 Note SDs are much closer between groups in log scale

18 PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; MEANS group; ESTIMATE 'BB vs Diur' group 1 -1 0; ESTIMATE 'BB vs Plac' group 1 0 -1; Dependent Variable: TRIG12 Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur 8.9440000 8.16668985 1.10 0.2740 BB vs Plac 19.4486486 7.21970271 2.69 0.0073 Dependent Variable: logtrig12 Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur 0.03236458 0.02815321 1.15 0.2509 BB vs Plac 0.06590045 0.02488864 2.65 0.0084 Dependent Variable: logtrig12x Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur 0.07452220 0.06482517 1.15 0.2509 BB vs Plac 0.15174139 0.05730822 2.65 0.0084

19 Interpretation of Differences Using Natural Log Scale (Base e) Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur 0.07452220 0.06482517 1.15 0.2509 BB vs Plac 0.15174139 0.05730822 2.65 0.0084 0.074 indicates that BB increases triglycerides by approximately 7.45% compared to diuretic 0.152 indicates that BB increases trigycerides by approximately 15.2% compared to placebo More precise estimate is 100*(exp(0.074) – 1) = 7.7% More precise estimate is 100*(exp(0.152) – 1) = 16.4%

20 USING WILCOXON RANK TEST Each point is given score from 1 to n. Analyses is done on these ranked values PROC NPAR1WAY WILCOXON; CLASS group; VAR trig12; RUN; The NPAR1WAY Procedure Wilcoxon Scores (Rank Sums) for Variable TRIG12 Classified by Variable GROUP Sum of Expected Std Dev Mean GROUP N Scores Under H0 Under H0 Score ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 125 29614.50 29562.50 1307.50251 236.916000 6 222 49734.00 52503.00 1479.00376 224.027027 1 125 32279.50 29562.50 1307.50251 258.236000 Average scores were used for ties. Kruskal-Wallis Test Chi-Square 5.0323 DF 2 Pr > Chi-Square 0.0808


Download ppt "Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4."

Similar presentations


Ads by Google