Presentation is loading. Please wait.

Presentation is loading. Please wait.

Today: Feb 28 Reading Data from existing SAS dataset One-way ANOVA

Similar presentations


Presentation on theme: "Today: Feb 28 Reading Data from existing SAS dataset One-way ANOVA"— Presentation transcript:

1 Today: Feb 28 Reading Data from existing SAS dataset One-way ANOVA
Reading Le 7:5 Reading C&S 7:A-H

2 Reading SAS Datasets Sometimes your “raw” data is already a SAS dataset LIBNAME tomhs 'c:/my documents/ph5415/'; PROC CONTENTS DATA=tomhs.bpstudy; PROC PRINT DATA=tomhs.bpstudy (obs=10); RUN; The libname statement tells SAS which directory (folder) the dataset is in. DATA=tomhs.bpstudy Tells SAS to look for a SAS dataset called bpstudy in the directory referenced by tomhs.

3 PROC CONTENTS OUTPUT The CONTENTS Procedure
Data Set Name: TOMHS.BPSTUDY Observations: Member Type: DATA Variables: Engine: V Indexes: Created: :07 Saturday, February 26, Observation Length: 128 Last Modified: 9:07 Saturday, February 26, Deleted Observations: 0 -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos 3 AGE Num 6 CHOL Num 2 GROUP Num 8 HDL Num 9 PULSE Num 10 PULSEBL Num 4 SBP Num 5 SBPBL Num 1 SEX Num 7 TRIG Num 11 WT Num 12 WTBL Num 13 cholbl Num 14 hdlbl Num 16 id Char 15 trigbl Num

4 PROC PRINT – 10 Observations
C T U U c t G S S H R H L L h h r R B B O I D S S W W o d i O S O A P P L G L E E T T l l g b E U G B B B b b b i s X P E L L L l l l d A00001 A00010 A00021 A00023 A00056 A00075 A00083 A00105 A00133 A00143

5 Reading a SAS Dataset DATA temp; SET tomhs.bpstudy;
sbpdif = sbp12-sbpbl; PROC MEANS DATA=temp; Reads in an observation. Replaces the infile and input statements when reading in text data The MEANS Procedure Variable N Mean Std Dev Minimum Maximum SEX GROUP AGE SBP SBPBL CHOL TRIG HDL PULSE PULSEBL WT WTBL cholbl hdlbl trigbl sbpdif

6 One-Way Analysis of Variance
Two-sample t-test; compare means of two groups Are the means different? What if we have more than two groups? Examples; compare three different behavioral interventions compare 5 different BP drugs

7 Analysis of Variance Could compare all pairs of means with t-tests
three groups: A-B, B-C, A-C five groups: A-B, A-C, A-D, A-E B-C, B-D, B-E C-D, C-E D-E

8 Analysis of Variance Problem - multiple comparisons!!
When performing many tests, may reject null hypothesis by chance (Type I error) With  = 0.05, you allow for possibility of rejecting 1 out of 20 tests by chance Even if all group means are equal then there is a fairly large chance that one-pair will be different Think of the largest – smallest. This will get bigger as the number of groups gets larger.

9 Analysis of Variance ANOVA simultaneously tests for difference in k means Y - continuous k samples from k normal distributions each size ni, not necessarily equal each with possibly different mean each with constant variance 2

10 Constant variance ANOVA is robust for violations of constant variance (and normality) Rule of thumb: If largest standard deviation is less than twice the smallest standard deviation, you’re ok. Can sometimes transform to achieve equal variance or normality

11 Analysis of Variance Ho: 1 = 2= ... = k Ha: Not all i equal
For each group i; ni = number of observations = sample mean = sample variance = overall mean Two-sample t-test is special case; k = 2 Sometimes referred to as a global or omnibus test

12 Two-sample T-test - y y = t 1 1 + s n n Compared means for two groups
This compares variation between groups with variation within groups Variation Between Groups - y y = 1 2 t 1 1 + s p n n 1 2 Variation Within Groups

13 ANOVA F-test F = s Compared means for all groups
This compares variation between groups with variation within groups Variation Between Groups – Compared to Grand Mean F = 2 s p Variation Within Groups

14 Analysis of Variance Variation for all observations:
Called the “(corrected) total sum of squares” or SST Can be divided into two parts: deviation of individual observation from its sample mean deviation of sample means from overall mean Similar to regression

15 Analysis of Variance Measures variation within samples
Measures variation between samples Each has a corresponding “sum of squares” Sum of squares within (SSW) Sum of squares between (SSB)

16 Analysis of Variance Each has a corresponding degrees of freedom (DF)
SST = n-1 df SSB = k-1 df SSW = (n-1) - (k-1) = n-k df Ratio of each sum of squares over its degrees of freedom gives us the mean squares MSW = SSW / (n-k) = average variation within k samples MSB = SSB / (k-1) = average variation between k samples

17 Analysis of Variance MSW is estimate of the total variance, 2
MSW = SSW/(n-k) SSW = Sample variance for ith group, = Pooled variance for k groups

18 Analysis of Variance The null hypothesis is tested by looking at F ratio: F = MSB/MSW, compare to F distribution with k-1, n-k df If variation between groups much greater than variation within groups; F >> 1, reject null hypothesis F  1, fail to reject null hypothesis

19 Analysis of Variance Results often presented in an ANOVA table
SAS uses “Model” for “Between” and “Error” for “Within”

20 ANOVA in SAS; two ways PROC ANOVA DATA = LIPID; CLASS diet;
MODEL lipid = diet; RUN; PROC GLM DATA = LIPID; Both test for difference in mean lipid reduction for the two diets

21 PROC ANOVA and GLM Almost exactly the same for this case
GLM is a more general procedure

22 6 Treatment groups (Variable GROUP)
TOMHS Study 6 Treatment groups (Variable GROUP) Beta-blocker Calcium channel blocker Diuretic Alpha-blocker ACE inhibitor Placebo All Treatments given lifestyle intervention to lower BP

23 ANOVA – TOMHS Study PROC GLM DATA=temp; CLASS group;
MODEL sbpdif = group; MEANS group; RUN; Creates 5 dummy variables for you OUTPUT The GLM Procedure Class Level Information Class Levels Values GROUP Number of observations NOTE: Due to missing values, only 848 observations can be used in this analysis

24 GLM – OUTPUT ANOVA TABLE Pooled (over 6 groups) standard deviation
The GLM Procedure Dependent Variable: sbpdif Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE sbpdif Mean ANOVA TABLE If H0 is true than F should be near 1 F = /194.52 Pooled (over 6 groups) standard deviation Estimates s

25 GLM – OUTPUT Source DF Type I SS Mean Square F Value Pr > F GROUP <.0001 Source DF Type III SS Mean Square F Value Pr > F If no covariates are in the model this portion of the output will be the same as the ANOVA table because the model includes only GROUP. The GLM Procedure Level of sbpdif GROUP N Mean Std Dev

26 Contrasts PROC GLM DATA=temp; CLASS group; MODEL sbpdif = group;
MEANS group; ESTIMATE 'BB vs Placebo' group ; ESTIMATE 'CCB vs Placebo' group ; ESTIMATE 'Diur vs Placebo' group ; ESTIMATE 'AB vs Placebo' group ; ESTIMATE 'ACE vs Placebo' group ; RUN; The GLM Procedure OUTPUT Dependent Variable: sbpdif Standard Parameter Estimate Error t Value Pr > |t| BB vs Placebo <.0001 CCB vs Placebo <.0001 Diur vs Placebo <.0001 AB vs Placebo ACE vs Placebo <.0001

27 Compare all Groups PROC GLM DATA=temp; CLASS group;
MODEL sbpdif = group; LSMEANS group/PDIF; RUN;

28 GLM – OUTPUT The GLM Procedure Least Squares Means sbpdif LSMEAN
GROUP LSMEAN Number Least Squares Means for effect GROUP Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: sbpdif i/j <.0001 <.0001 <.0001 <.0001 < < < <.0001 NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be use P-value: Group 1 v Group 2


Download ppt "Today: Feb 28 Reading Data from existing SAS dataset One-way ANOVA"

Similar presentations


Ads by Google