IE341: Introduction to Design of Experiments

IE341: Introduction to Design of Experiments

Last term we talked about testing the difference between two independent means. For means from a normal population, the test statistic is where the denominator is the estimated standard deviation of the difference between two independent means. This denominator represents the random variation to be expected with two different samples. Only if the difference between the sample means is much greater than the expected random variation do we declare the means different.

We also covered the case where the two means are not independent, and what we must do to account for the fact that they are dependent.

And finally, we talked about the difference between two variances, where we used the F ratio. The F distribution is a ratio of two chi-square variables. So if s21 and s22 possess independent chi-square distributions with v1 and v2 df, respectively, then has the F distribution with v1 and v2 df.

It should be noted that the factor may be quantitative or qualitative.
All of this is valuable if we are testing only two means. But what if we want to test to see if there is a difference among three means, or four, or ten? What if we want to know whether fertilizer A or fertilizer B or fertilizer C is best? In this case, fertilizer is called a factor, which is the condition under test. A, B, C, the three types of fertilizer under test, are called levels of the factor fertilizer. Or what if we want to know if treatment A or treatment B or treatment C or treatment D is best? In this case, treatment is called a factor. A,B,C,D, the four types of treatment under test, are called levels of the factor treatment. It should be noted that the factor may be quantitative or qualitative.

Enter the analysis of variance!
ANOVA, as it is usually called, is a way to test the differences between means in such situations. Previously, we tested single-factor experiments with only two treatment levels. These experiments are called single-factor because there is only one factor under test. Single-factor experiments are more commonly called one-way experiments. Now we move to single-factor experiments with more than two treatment levels.

Yij = ith observation in the jth level
Let’s start with some notation. Yij = ith observation in the jth level N = total number of experimental observations = the grand mean of all N experimental observations = the mean of the observations in the jth level nj = number of observations in the jth level; the nj are called replicates. Replication of the design refers to using more than one experimental unit for each level. If there are the same number n replicates for each treatment, the design is said to be balanced.

Designs are more powerful if they are balanced, but balance is not always possible.
Suppose you are doing an experiment and the equipment breaks down on one of the tests. Now, not by design but by circumstance, you have unequal numbers of replicates for the levels. In all the formulas, we used nj as the number of replicates in treatment j, not n, so there is no problem.

= the effect of the jth level L = number of treatment levels
Notation continued = the effect of the jth level L = number of treatment levels eij = the “error” associated with the ith observation in the jth level, assumed to be independent normally distributed random variables with mean = 0 and variance = σ2, which are constant for all levels of the factor.

For all experiments, randomization is critical
For all experiments, randomization is critical. So to draw any conclusions from the experiment, we must require that the treatments be applied in random order. We must also assign the experimental units to the treatments randomly. If all this randomization occurs, the design is called a completely randomized design.

ANOVA begins with a linear statistical model

This model is for a one-way or single-factor ANOVA
This model is for a one-way or single-factor ANOVA. The goal of the model is to test hypotheses about the treatment effects and to estimate them. If the treatments have been selected by the experimenter, the model is called a fixed-effects model. In this case, the conclusions will apply only to the treatments under consideration.

Another type of model is the random effects model or components of variance model.
In this situation, the treatments used are a random sample from large population of treatments. Here the τi are random variables and we are interested in their variability, not in the differences among the means being tested.

First, we will talk about fixed effects, completely randomized, balanced models.
In the model we showed earlier, the τj are defined as deviations from the grand mean so It follows that the mean of the jth treatment is

Now the hypothesis under test is:
Ho: μ1= μ2 = μ3 = … μL Ha: μj≠ μk for at least one j,k pair The test procedure is ANOVA, which is a decomposition of the total sum of squares into its components parts according to the model.

The total SS is and ANOVA is about dividing it into its component parts. SS = variability of the differences among the L levels SSε = pooled variability of the random error within levels

This is easy to see because
But the cross-product term vanishes because

So SStotal = SS treatments + SS error
Most of the time, this is called SStotal = SS between + SS within Each of these terms becomes an MS (mean square) term when divided by the appropriate df.

The df for SSerror = N-L because
and the df for SSbetween = L-1 because there are L levels.

Now the expected values of each of these terms are
E(MSerror) = σ2 E(MStreatments) =

Now if there are no differences among the treatment means, then for all j.
So we can test for differences with our old friend F with L -1 and N -L df. Under Ho, both numerator and denominator are estimates of σ2 so the result will not be significant. Under Ha, the result should be significant because the numerator is estimating the treatment effects as well as σ2.

The results of an ANOVA are presented in an ANOVA table
The results of an ANOVA are presented in an ANOVA table. For this one-way, fixed-effects, balanced model: Source SS df MS p Model SSbetween L-1 MSbetween p Error SSwithin N-L MSwithin Total SStotal N-1

Let’s look at a simple example.
A product engineer is investigating the tensile strength of a synthetic fiber to make men’s shirts. He knows from prior experience that the strength is affected by the weight percent of cotton in the material. He also knows that the percent should range between 10% and 40% so that the shirts can receive permanent press treatment.

The engineer decides to test 5 levels: 15%, 20%, 25%, 30%, 35%
and to have 5 replicates in this design. His data are % 15 7 11 9 9.8 20 12 17 18 15.4 25 14 19 17.6 30 22 23 21.6 35 10 10.8 15.04

In this tensile strength example, the ANOVA table is
In this case, we would reject Ho and declare that there is an effect of the cotton weight percent. Source SS df MS p Model <0.01 Error Total

We can estimate the treatment parameters by subtracting the grand mean from the treatment means. In this example, τ1 = – = -5.24 τ2 = – = +0.36 τ3 = – = -2.56 τ4 = – = +6.56 τ5 = – = -4.24 Clearly, treatment 4 is the best because it provides the greatest tensile strength.

Now you could have computed these values from the raw data yourself instead of doing the ANOVA. You would get the same results, but you wouldn’t know if treatment 4 was significantly better. But if you did a scatter diagram of the original data, you would see that treatment 4 was best, with no analysis whatsoever. In fact, you should always look at the original data to see if the results do make sense. A scatter diagram of the raw data usually tells as much as any analysis can.

How do you test the adequacy of the model?
The model assumes certain assumptions that must hold for the ANOVA to be useful. Most importantly, that the errors are distributed normally and independently. The error for each observation, sometimes called the residual, is

A residual check is very important for testing for nonconstant variance. The residuals should be structureless, that is, they should have no pattern whatsoever, which, in this case, they do not.

These residuals show no extreme differences in variation because they all have about the same spread. They also do not show the presence of any outlier. An outlier is a residual value that is vey much larger than any of the others. The presence of an outlier can seriously jeopardize the ANOVA, so if one is found, its cause should be carefully investigated.

A histogram of residuals shows the distribution is slightly skewed
A histogram of residuals shows the distribution is slightly skewed. Small departures from symmetry are of less concern than heavy tails.

Another check is for normality
Another check is for normality. If we do a normal probability plot of the residuals, we can see whether normality holds.

A normal probability plot is made with ascending ordered residuals on the x-axis and their cumulative probability points, 100(k-.5)/n, on the y-axis. k is the order of the residual and n = number of residuals. There is no evidence of an outlier here. The previous slide is not exactly a normal probability plot because the y-axis is not scaled properly. But it does gives a pretty good suggestion of linearity.

A plot of residuals vs run order is useful to detect correlation between the residuals, a violation of the independence assumption. Runs of positive or of negative residuals indicates correlation. None is observed here.

One of the goals of the analysis is to estimate the level means
One of the goals of the analysis is to estimate the level means. If the results of the ANOVA shows that the factor is significant, we know that at least one of the means stands out from the rest. But which one or ones? The procedures for making these mean comparisons are called multiple comparison methods. These methods use linear combinations called contrasts.

A contrast is a particular linear combination of level means, such as
A contrast is a particular linear combination of level means, such as to test the difference between level 4 and level 5. Or if one wished to test the average of levels 1 and 3 vs levels 4 and 5, he would use In general, where

An important case of contrasts is called orthogonal contrasts
An important case of contrasts is called orthogonal contrasts. Two contrasts in a design with coefficients cj and dj are orthogonal if

There are many ways to choose the orthogonal contrast coefficients for a set of levels. For example, if level 1 is a control and levels 2 and 3 are two real treatments, a logical choice is to compare the average of the two treatments with the control: and then the two treatments against one another: These two contrasts are orthogonal because

Only L-1 orthogonal contrasts may be chosen because the L levels have only L-1 df. So for only three levels, the contrasts chosen exhaust those available for this experiment. Contrasts must be chosen before seeing the data so that experimenters aren’t tempted to contrast the levels with the greatest differences.

C1= 0(5)(9.8)+0(5)(15.4)+0(5)(17.6)-1(5)(21.6)+1(5)(10.8) =-54
For the tensile strength experiment with 5 levels and thus 4 df, the 4 contrasts are: C1= 0(5)(9.8)+0(5)(15.4)+0(5)(17.6)-1(5)(21.6)+1(5)(10.8) =-54 C2= +1(5)(9.8)+0(5)(15.4)+1(5)(17.6)-1(5)(21.6)-1(5)(10.8) =-25 C3= +1(5)(9.8)+0(5)(15.4)-1(5)(17.6)+0(5)(21.6)+0(5)(10.8) =-39 C4= -1(5)(9.8)+4(5)(15.4)-1(5)(17.6)-1(5)(21.6)-1(5)(10.8) = 9 These 4 contrasts completely partition the SStreatments. Then the SS for each contrast is formed:

So for the 4 contrasts we have:

Now the revised ANOVA table is
Source SS df MS p Weight % <0.001 C <0.001 C <0.06 C <0.001 C <0.76 Error Total

So contrast 1 (level 5 – level 4) and contrast 3 (level 1 – level 3) are significant.
Although the orthogonal contrast approach is widely used, the experimenter may not know in advance which levels to test or they may be interested in more than L-1 comparisons. A number of other methods are available for such testing.

These methods include:
Scheffe’s Method Least Significant Difference Method Duncan’s Multiple Range Test Newman-Keuls test There is some disagreement about which is the best method, but it is best if all are applied only after there is significance in the overall F test.

Now let’s look at the random effects model.
Suppose there is a factor of interest with an extremely large number of levels. If the experimenter selects L of these levels at random, we have a random effects model or a components of variance model.

The linear statistical model is
as before, except that both and are random variables instead of simply Because and are independent, the variance of any observation is These two variances are called variance components, hence the name of the model.

The requirements of this model are that the are NID(0,σ2), as before, and that the are NID(0, ) and that and are independent. The normality assumption is not required in the random effects model. As before, SSTotal = SStreatments + SSerror And the E(MSerror) = σ2. But now E(MStreatments) = σ2 + n So the estimate of is

The computations and the ANOVA table are the same as before, but the conclusions are quite different. Let’s look at an example. A textile company uses a large number of looms. The process engineer suspects that the looms are of different strength, and selects 4 looms at random to investigate this.

The results of the experiment are shown in the table below.
The ANOVA table is Source SS df MS p Looms <0.001 Error Total Loom 1 98 97 99 96 97.5 2 91 90 93 92 91.5 3 95 95.75 4 97.0 95.44

In this case, the estimates of the variances are:
=1.90 Thus most of the variability in the observations is due to variability in loom strength. If you can isolate the causes of this variability and eliminate them, you can reduce the variability of the output and increase its quality.

When we studied the differences between two treatment means, we considered repeated measures on the same individual experimental unit. With three or more treatments, we can still do this. The result is a repeated measures design.

Consider a repeated measures ANOVA partitioning the SSTotal.
This is the same as SStotal = SSbetween subjects + SSwithin subjects The within-subjects SS may be further partitioned into SStreatment + SSerror .

In this case, the first term on the RHS is the differences between treatment effects and the second term on the RHS is the random error.

Now the ANOVA table looks like this. Source SS df MS p
Between subjects n-1 Within Subjects n(L-1) Treatments L-1 Error (L-1)(n-1) Total Ln-1

The test for treatment effects is the usual
but now it is done entirely within subjects. This design is really a randomized complete block design with subjects considered to be the blocks.

Now what is a randomized complete blocks design?
Blocking is a way to eliminate the effect of a nuisance factor on the comparisons of interest. Blocking can be used only if the nuisance factor is known and controllable.

Let’s use an illustration
Let’s use an illustration. Suppose we want to test the effect of four different tips on the readings from a hardness testing machine. The tip is pressed into a metal test coupon, and from the depth of the depression, the hardness of the coupon can be measured.

The only factor is tip type and it has four levels
The only factor is tip type and it has four levels. If 4 replications are desired for each tip, a completely randomized design would seem to be appropriate. This would require assigning each of the 4x4 = 16 runs randomly to 16 different coupons. The only problem is that the coupons need to be all of the same hardness, and if they are not, then the differences in coupon hardness will contribute to the variability observed. Blocking is the way to deal with this problem.

In the block design, only 4 coupons are used and each tip is tested on each of the 4 coupons. So the blocking factor is the coupon, with 4 levels. In this setup, the block forms a homogeneous unit on which to test the tips. This strategy improves the accuracy of the tip comparison by eliminating variability due to coupons.

Because all 4 tips are tested on each coupon, the design is a complete block design. The data from this design are shown below. Test coupon Tip type 1 2 3 4 9.3 9.4 9.6 10.0 9.8 9.9 9.2 9.5 9.7 10.2

Now we analyze these data the same way we did for the repeated measures design. The model is
where βk is the effect of the kth block and the rest of the terms are those we already know.

Since the block effects are deviations from the grand mean,
just as

We can express the total SS as
which is equivalent to SStotal = SStreatments + SSblocks + SSerror with df N-1 = L B (L-1)(B-1)

The test for equality of treatment means is and the ANOVA table is
Source SS df MS p Treatments SStreatments L MStreatments Blocks SSblocks B MSblocks Error SSerror (L-1)(B-1) MSerror Total SStotal N-1

For the hardness experiment, the ANOVA table is
Source SS df MS p Tip type Coupons Error Total As is obvious, this is the same analysis as the repeated measures design.

Now let’s consider the Latin Square design
Now let’s consider the Latin Square design. We’ll introduce it with an example. The object of study is 5 different formulations of a rocket propellant on the burning rate of aircraft escape systems. Each formulation comes from a batch of raw material large enough for only 5 formulations. Moreover, the formulations are prepared by 5 different operators, who differ in skill and experience.

The way to test in this situation is with a 5x5 Latin Square, which allows for double blocking and therefore the removal of two nuisance factors. The Latin Square for this example is Batches of raw material Operators 1 2 3 4 5 A B C D E

Note that each row and each column has all 5 letters, and each letter occurs exactly once in each row and column. The statistical model for a Latin Square is where Yjkl is the jth treatment observation in the kth row and the lth column.

SStotal=SSrows+SScolumns+SStreatments+SSerror with df =
Again we have SStotal=SSrows+SScolumns+SStreatments+SSerror with df = N = R-1 + C L-1 + (R-2)(C-1) The ANOVA table for propellant data is Source SS df MS p Formulations Material batches Operators Error Total

So both the formulations and the operators were significantly different. The batches of raw material were not, but it still is a good idea to block on them because they often are different. This design was not replicated, and Latin Squares often are not, but it is possible to put n replicates in each cell.

Now if you superimposed one Latin Square on another Latin Square of the same size, you would get a Graeco-Latin Square. In one Latin Square, the treatments are designated by roman letters. In the other Latin Square, the treatments are designated by Greek letters. Hence the name Graeco-Latin Square.

Batches of raw material
A 5x5 Graeco-Latin Square is Note that the five Greek treatments appear exactly once in each row and column, just as the Latin treatments did. Batches of raw material Operators 1 2 3 4 5 Aα Bγ Cε Dβ Eδ Bβ Cδ Dα Eγ Aε Cγ Dε Eβ Aδ Bα Dδ Eα Aγ Bε Cβ Eε Aβ Bδ Cα Dγ

If Test Assemblies had been added as an additional factor to the original propellant experiment, the ANOVA table for propellant data would be Source SS df MS p Formulations Material batches Operators Test Assemblies Error Total The test assemblies turned out to be nonsignificant.

Note that the ANOVA tables for the Latin Square and the Graeco-Latin Square designs are identical, except for the error term. The SS(error) for the Latin Square design was decomposed to be both Test Assemblies and error in the Graeco-Latin Square. This is a good example of how the error term is really a residual. Whatever isn’t controlled falls into error.

Before we leave one-way designs, we should look at the regression approach to ANOVA. The model is
Using the method of least squares, we rewrite this as

Now to find the LS estimates of μ and τj,
When we do this differentiation with respect to μ and τj, and equate to 0, we obtain for all j

After simplification, these reduce to
In these equations,

These j + 1 equations are called the least squares normal equations.
If we add the constraint we get a unique solution to these normal equations.

It is important to see that ANOVA designs are simply regression models
It is important to see that ANOVA designs are simply regression models. If we have a one-way design with 3 levels, the regression model is where Xi1 = 1 if from level 1 = 0 otherwise and Xi2 = 1 if from level 2 Although the treatment levels may be qualitative, they are treated as “dummy” variables.

Since Xi1 = 1 and Xi2 = 0, so Similarly, if the observations are from level 2,

Finally, consider observations from level 3, for which Xi1 = Xi2 = 0
Finally, consider observations from level 3, for which Xi1 = Xi2 = 0. Then the regression model becomes so Thus in the regression model formulation of the one-way ANOVA, the regression coefficients describe comparisons of the first two level means with the third.

So Thus, testing β1= β2 = 0 provides a test of the equality of the three means. In general, for L levels, the regression model will have L-1 variables and

Now what if you have two factors under test? Or three?
Here the answer is the factorial design. A factorial design crosses all factors. Let’s take a two-way design. If there are L levels of factor A and M levels of factor B, then all LM treatment combinations appear in the experiment. Most commonly, L = M = 2.

In a two-way design, with two levels of each factor, we have
We can have as many replicates as we want in this design. With n replicates, there are n observations in each cell of the design. Factor A Factor B Response -1 (low level) 20 +1 (high level) 50 +1 (high level) 40 12

SStotal = SSA + SSB + SSAB + SSerror
This decomposition should be familiar by now except for SSAB. What is this term? Its official name is interaction. This is the magic of factorial designs. We find out about not only the effect of factor A and the effect of factor B, but the effect of the two factors in combination.

Now let’s look at the main effects of the factors graphically.

Now let’s look at the interaction effect
Now let’s look at the interaction effect. This is the effect of factors A and B in combination, and is often the most important effect.

Interaction of factors is the key to the East, as we say in the West.
Suppose you wanted the factor levels that give the lowest possible response. If you picked by main effects, you would pick A low and B high. But look at the interaction plot and it will tell you to pick A high and B high.

This is why, if the interaction term is significant, you never interpret main effects. They are meaningless in the presence of interaction. And it is because factorial designs provide interactions that they are so popular and so successful.

Now what if the interaction term is not significant
Now what if the interaction term is not significant? What if the results instead were

and the interaction is The clearest indication of no interaction is the parallel lines.

So this time, if you wanted the lowest response, you would pick A low and B low and that would be correct.

IE341: Introduction to Design of Experiments

Similar presentations

Presentation on theme: "IE341: Introduction to Design of Experiments"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IE341: Introduction to Design of Experiments

Similar presentations

Presentation on theme: "IE341: Introduction to Design of Experiments"— Presentation transcript:

Similar presentations

About project

Feedback