Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.

Applied Statistics Using SPSS
Topic: One Way ANOVA By Prof Kelly Fan, Cal State Univ, East Bay

Statistical Tools vs. Variable Types
Response (output) Predictor (input) Numerical Categorical/Mixed Simple and Multiple Regression Analysis of Variance (ANOVA) Analysis of Covariance (ANCOVA) Categorical Categorical data analysis

Example: Battery Lifetime
8 brands of battery are studied. We would like to find out whether or not the brand of a battery will affect its lifetime. If so, of which brand the batteries can last longer than the other brands. Data collection: For each brand, 3 batteries are tested for their lifetime. What is Y variable? X variable?

Data: Y = LIFETIME (HOURS) BRAND
3 replications per level 5.8

Statistical Model • Yij “LEVEL” OF BRAND Yij = i + ij
(Brand is, of course, represented as “categorical”) “LEVEL” OF BRAND • • • • • • • • C 1 2 • n Y11 Y12 • • • • • • •Y1c Yij = i + ij i = 1, , C j = 1, , n Y21 • YnI • Yij Ync • • • • • • • •

Hypotheses Setup HO: Level of X has no impact on Y
HI: Level of X does have impact on Y HO: 1 = 2 = • • • • 8 HI: not all j are EQUAL

ONE WAY ANOVA Analysis of Variance for life Source DF SS MS F P
brand Error Total Estimate of the common variance s^2 S = R-Sq = 59.67% R-Sq(adj) = 42.02%

Review Fitted value = Predicted value
Residual = Observed value – fitted value

Normality plot: normal scores vs. residuals
Diagnosis: Normality The points on the normality plot must more or less follow a line to claim “normal distributed”. There are statistic tests to verify it scientifically. The ANOVA method we learn here is not sensitive to the normality assumption. That is, a mild departure from the normal distribution will not change our conclusions much. Normality plot: normal scores vs. residuals

From the Battery lifetime data:

Diagnosis: Equal Variances
The points on the residual plot must be more or less within a horizontal band to claim “constant variances”. There are statistic tests to verify it scientifically. The ANOVA method we learn here is not sensitive to the constant variances assumption. That is, slightly different variances within groups will not change our conclusions much. Residual plot: fitted values vs. residuals

From the Battery lifetime data:

Multiple Comparison Procedures
Once we reject H0: ==...c in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns, are all 4 ’s different? Are 3 the same and one different? If so, which one? etc.

P(at least one type I error in the 3 tests)
These “more detailed” inquiries into the process are called MULTIPLE COMPARISON PROCEDURES. Errors (Type I): We set up “” as the significance level for a hypothesis test. Suppose we test 3 independent hypotheses, each at = .05; each test has type I error (rej H0 when it’s true) of However, P(at least one type I error in the 3 tests) = 1-P( accept all ) = 1 - (.95)3  .14 3, given true

In other words, Probability is
In other words, Probability is .14 that at least one type one error is made. For 5 tests, prob = .23. Question - Should we choose = .05, and suffer (for 5 tests) a .23 OVERALL Error rate (or “a” or aexperimentwise)? OR Should we choose/control the overall error rate, “a”, to be .05, and find the individual test  by 1 - (1-)5 = .05, (which gives us  = .011)?

would be valid only if the tests are independent; often they’re not.
The formula 1 - (1-)5 = .05 would be valid only if the tests are independent; often they’re not. [ e.g., 1=22= 3, 1= 3 IF accepted & rejected, isn’t it more likely that rejected? ] 1 2 3 1 2 3

When the tests are not independent, it’s usually very difficult to arrive at the correct for an individual test so that a specified value results for the overall error rate.

Categories of multiple comparison tests
- “Planned”/ “a priori” comparisons (stated in advance, usually a linear combination of the column means equal to zero.) “Post hoc”/ “a posteriori” comparisons (decided after a look at the data - which comparisons “look interesting”) “Post hoc” multiple comparisons (every column mean compared with each other column mean)

There are many multiple comparison procedures. We’ll cover only a few.
Post hoc multiple comparisons Pairwise comparisons: Do a series of pairwise tests; Duncan and SNK tests (Optional) Comparisons to control: Dunnett tests

Example: Broker Study A financial firm would like to determine if brokers they use to execute trades differ with respect to their ability to provide a stock purchase for the firm at a low buying price per share. To measure cost, an index, Y, is used. Y=1000(A-P)/A where P=per share price paid for the stock; A=average of high price and low price per share, for the day. “The higher Y is the better the trade is.”

} R=6 Five brokers were in the study and six trades
CoL: broker 1 12 3 5 -1 6 2 7 17 13 11 12 3 8 1 7 4 5 4 21 10 15 12 20 6 14 5 24 13 14 18 19 17 } R=6 Five brokers were in the study and six trades were randomly assigned to each broker.

SPSS Output Analyze>>General Linear Model>>Univariate…

Homogeneous Subsets

Conclusion : 3, 1 2, 4, 5 Conclusion : 3, ???

Conclusion : 3, Broker 1 and 3 are not significantly different but they are significantly different to the other 3 brokers. Broker 2 and 4 are not significantly different, and broker 4 and 5 are not significantly different, but broker 2 is different to (smaller than) broker 5 significantly.

Comparisons to Control
Dunnett’s test Designed specifically for (and incorporating the interdependencies of) comparing several “treatments” to a “control.” Col Example: } R=6 CONTROL

1 2 3 4 5 In our example: 6 12 5 14 17 CONTROL
In our example: - Cols 4 and 5 differ from the control [ 1 ]. - Cols 2 and 3 are not significantly different from control.

Exercise: Sales Data Sales

Exercise. Find the Anova table.
Perform SNK tests at a = 5% to group treatments . Perform Duncan tests at a = 5% to group treatments. Which treatment would you use?

Post Hoc and Priori comparisons
F test for linear combination of column means (contrast) Scheffe test: To test all linear combinations at once. Very conservative; not to be used for a few of comparisons.

This assumes a “fixed model”: Inherent interest in the specific levels of the factors under study - there’s no direct interest in extrapolating to other levels - inference will be limited to levels that appear in the experiment. Experimenter selects the levels If a “random model”: Levels in experiment randomly selected from a population of such levels, and inference is to be made about the entire population of levels. Then, besides assumptions 1 to 3, there is another assumption: 4) a) the mi are independent random variables which are normally distributed with constant variance b) the mi and eij are independent Random Effect

SPSS: Stat>>General Linear Model, random factors
Tests of Between-Subjects Effects Dependent Variable: sales Source Type III SS df Mean Square F Sig. Intercept Hypothesis Error a broker Hypothesis Error b a. MS(broker) b. MS(Error) Random Effect

KRUSKAL - WALLIS TEST (Lesson 44)
(Non - Parametric Alternative) HO: The probability distributions are identical for each level of the factor HI: Not all the distributions are the same 1-Way Anova

BATTERY LIFETIME (hours)
Brand A B C BATTERY LIFETIME (hours) (each column rank ordered, for simplicity) Mean: (here, irrelevant!!) 1-Way Anova

HO: no difference in distribution. among the three brands with
HO: no difference in distribution among the three brands with respect to battery lifetime HI: At least one of the 3 brands differs in distribution from the others with respect to lifetime 1-Way Anova

Ranks Brand A B C 32 (29) 32 (29) 28 (24) 30 (26.5) 32 (29) 21 (18)
32 (29) (29) (24) 30 (26.5) (29) (18) 30 (26.5) (22) (10.5) 29 (25) (22) (10.5) 26 (22) (19) (7) 23 (20) (16.5) 14 (7) 20 (16.5) (14.5) (7) 19 (14.5) (12) (3) 18 (13) (7) (2) 12 (4) (7) (1) T1 = T2 = T3 = 90 n1 = n2 = n3 = 10 1-Way Anova

TEST STATISTIC: 12 •  (Tj2/nj ) - 3 (N + 1) H = N (N + 1)
K 12 H = •  (Tj2/nj ) - 3 (N + 1) N (N + 1) j = 1 nj = # data values in column j N = nj K = # Columns (levels) Tj = SUM OF RANKS OF DATA ON COL j When all DATA COMBINED (There is a slight adjustment in the formula as a function of the number of ties in rank.) K j = 1 1-Way Anova

[ [ H = = 8.41 (with adjustment for ties, we get 8.46)
30 (31) [ + + - 3 (31) = 8.41 (with adjustment for ties, we get 8.46) 1-Way Anova

What do we do with H? We can show that, under HO , H is well approximated by a 2 distribution with df = K - 1. Here, df = 2, and at = .05, the critical value = 5.99 = H  = .05 Reject HO; conclude that mean lifetime NOT the same for all 3 BRANDS 1-Way Anova

SPSS: Analyze >> Nonparametric tests >> Independent samples, fields
Double click the output table:

Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.

Similar presentations

Presentation on theme: "Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.

Similar presentations

Presentation on theme: "Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven."— Presentation transcript:

Similar presentations

About project

Feedback