Presentation is loading. Please wait.

Presentation is loading. Please wait.

Planning Experiments – the general consideration and

Similar presentations


Presentation on theme: "Planning Experiments – the general consideration and"— Presentation transcript:

1 Planning Experiments – the general consideration and
Module Ten: Planning Experiments – the general consideration and Comparative Study for more than two groups For any quantitative investigation, it usually involves a variety of steps before the data is ready for analysis. It is crucial that the data for the final analysis is valid and reliable. Therefore, a scientific process for producing valid and reliable data is extremely important. Data are usually collected from either surveys or experiments. The type of studies can be loosely classified in terms of observational study and experimental study. In the first section of this module, we will discuss some general considerations of conducting an appropriate experimental study.

2 (Planning and designing appropriate experiment) DO THE THINGS RIGHT!
DO THE RIGHT THING! (Planning and designing appropriate experiment) DO THE THINGS RIGHT! (collecting appropriate data and conducting appropriate analysis) To plan an experimental study, here is a list of considerations that should be taken into account: Determine the specific objective of the experiment. Determine the response variables and the ways of measuring these measurements. Identify factors that are potentially influential to the response measurements. Determine which factors to vary and controlled in the experiment and which to be held at constant or it’s influence should be minimized. Determine the specific design and procedure for conducting the experiment. Determine the number of replications of the basic experiment to conduct. Identify and secure available resources, material and facility needed.

3 A few terms used in design of experiments
Treatments: As et of circumstances created for the experiment in response to the purpose of the study. Some times, the term factors are also used when there are more than one factor. Each factor may have two or more factor levels. For example, if we have two factors, temperature and type of material. Three levels of temperature and four types of material will be planned for the experiment. We have an factorial design with two-factors, and the total # of treatment combinations = 3x4 = 12 treatment combinations. Experimental unit: the physical entity or subject that are exposed to the treatment. Experimental error: describe variation among the identical and independently treated experimental units. The potential sources of variability may come from: The nature variation among experimental units Variability in measurement of response Inability to reproduce the treatment condition exactly from one unit to the other. Interaction of treatments or experimental units Any other external factors that may influence the measured characteristic.

4 The experimental design techniques are implemented to answer the purpose of the study, usually involves with the investigation of some hypothesis of interest as valid and reliable as possible. That is, in any experiment, we should try to minimize the systematic bias and reduce the experimental error. For example: here are some possible designs for designing the 3x4 (temperature, material) factorial experiment: Prepare 12 experiment units, and randomly assign one unit to each treatment combination. In this assignment, there is no replication for each combination This is a 3x4 factorial design without replication. It is a completely randomoized design set up. A possible design is to have two replications for each treatment combination. In order to achieve this, we will need a total of 24 experimental units. Two units are randomly assigned to each treatment combination. This is a 3x4 factorial design with 2 replications. Another possible design: Factorial with Block design. In many experiments, there environmental limitations such as location, or time. Suppose there are two labs that will be conducting this experiment. Design (2) does not take this external factor into account. It is possible that each lab has different systematic error. Design two may ends up to assign A and B material, but little or no C and D material in the randomization process. This will cause a problem we call:

5 Confounding: The effect between material (A,B) and (C,D) is mixed with the Lab difference. We have no way to separate these two effects unless we can assume there is very systematic error between two labs. One way to deal with this problem is to introducing Blocking factor. Take each lab as a block. Then, apply a completely randomized design within each Block. In the 3x4 factorial experiment, we randomly select 12 experimental units, randomly assign one to each treatment combination. And repeat the same procedure for Lab two. Through ‘Blocking’ control, we will be able to separate the Lab effect. 4. One can conduct a 3x4 factorial with 2 replications for each lab. In this design, we can also estimate within-lab variability as well as between lab variability along with all treatment effects. However the total number of experimental units will be 3x4x2x2 = 48

6 5. There is yet another important consideration for the 3x4 factorial experiment. Within the same lab, it is possible that there is a day-to-day systematic error. If such a systematic error is huge, it means the results obtained on Monday may be significantly different from that obtained on Tuesday. The consequence is that any uncertainty measurement is time-dependent. That is not appropriate. Therefore, one can conduct an experiment to study if day-to-day variability is a concern. To plan such an experiment, we will need to repeat the same 12 treatment combinations on some randomly selected days, say three, within the time period of the experiment. This design involves factorial experiments conducted in two labs, and the randomly selected tree days are nested within each lab. 6. There are other types of designs developed for some specific purposes. For example, fractional factorial design is usually applied to situations where we have a huge number of treatment combinations, and it takes either two much money or two much time, or the physical environment can not implement a complete experiment. As a consequence, we may plan half or even quarter of the complete factorial design. By doing so, we will not able to estimate all possible effects. Fortunately, there are some statistical and practical principles that help us to choose designs that will only sacrifice the effects that may not be significant or may not be as important. For example, if we have 8 factors, two levels for each factor. We have 28 treatment combinations. To conduct a complete factorial experiment without replication, we will need 256 experiment units. If we can conduct one quarter experiment runs (64), we still have 64 data for us to investigate and identify important factors and some interactions effects, if we plan and implement some appropriate fractional designs.

7 For the first part of the discussion, I will focus on important analysis tools that can be applied to every type of designs. The completely randomized design with one-factor will be used as the basis for this discussion. In the previous module, we discuss how to analyze a factor with two levels. We introduce paired sample and independent sample. In this section, we extend the two group independent sample to more than two groups. This is what we usually named as one-way ANOVA technique. Paired sample is extended using the concept of ‘Blocking’ in the general set up.

8 Local Control of Experimental Errors
In conducting an experiment, we would like to be able to make a precise and accurate comparison among treatments over a proper range of conditions. Some local control is possible for reduce or control experimental error, increase the accuracy of the observations and make valid inference for the study. The experimenters can control: Techniques – including tasks such as preparation of material, lab facility, calibration of instruments, design techniques that meet the purpose of the investigation. Selection of uniform experimental units – The selection of experimental units should take into account the regular condition that encountered in the field not just try to unify the unit. For example, it is not a good idea to take consecutive units from the process for experiment, since they are dependent.

9 3. Blocking to reduce experimental error variation,
4. Recording possible covariates. In many experiments, the response variables may be affected by some other variables of the same experiment unit. These variables will also undergoing changes during the experiment and they may have direct effect to the response. For example, when studying how different amount of a chemical component affect the brightness of paper, for each paper we measure the brightness, it would be important to also measure the roughness of the paper, since the roughness of the paper also have a direct effect on the brightness. The variable, Roughness is a covariate. 5. Replications: If a treatment is applied to only one unit, we will not be able to estimate the experimental error, nor can we know if the results can be reproduced. Replications allow us to Demonstrate if the results can be reproduced or not, Estimate the experimental error, Increase the precision of the estimate. The size of replication depends on (a) overall variance, (b) the size of anticipated difference between two means, © Type I error and (d) Power of test = 1- type II error. Minitab has a set of procedures for this, which will not be discussed here.

10 Why Randomization? In the choice of experimental units and the assignment of the units to treatment, we emphasize the use of ‘randomization’. The importance of ‘randomization’ include” Random selection of experimental units makes the statistical assumption of independent valid. When two units are chosen at random, it implies what ever we will measure from the second unit is not dependent on the choice of the first unit. This is easy to understand in any survey study. Random assignment of units to treatment balances the possible bias due to the units, in to prevent potential confounding effects.

11 In laboratory studies, it is often that more than two groups are to be compared. This is an extension of two-sample comparative study. A typical technique for analyzing comparing more than two groups is the Analysis of Variance. For most of comparative studies for more than two groups, the ANOVA is the first step of the analysis. ANOVA tells us if there is an overall differences among group means. AN important question after the ANOVA is to ask which group is different from which group, or if there is any identifiable pattern or trend when comparing these group means. For the rest of this Module, we will discuss the concept behind ANOVA, how to conduct ANOVA, how to perform post-hoc analysis, how to set up contrasts, how to identify patterns and test the significance of the existing patterns or trends.

12 Consider a laboratory study to compare the compressive strength of hydraulic cement mortars.
A study is conducted to compare the compressive strength (psi) on five different sizes of sands by using five different screens of diameters: 200 mm, 400 mm, 600 mm, 800 mm, 1000 mm. Purpose: To compare the compressive strength of concrete made of five sizes of sands, and to identify the size of sand that will result maximum compressive strength. Design: Five sizes of sands are prepared based on the above five screen sizes. For each size of sand, 12 samples will be randomly chosen for producing specimen which will be tested. Eight ‘good’ specimen will be selected for strength testing. Experiment: A commonly used process and formula will be applied for the experiment. The compressive strength test will be conducted four weeks after the specimen was formed. The same testing procedure will be applied to every specimen. Measurement: The standard psi readings will be recorded along with any unusual incidents during the experiment or testing process.

13 The strength data are given in the following table(in 100 psi)
Size (max) 1 2 3 4 5 6 7 8 200 mm 34.5 25.6 31.3 24.9 28.0 29.4 32.8 30.4 29.61 3.35 400 mm 48.6 52.6 63.9 62.5 58.3 53.8 54.0 61.5 56.90 5.46 600 mm 57.5 59.6 48.4 56.2 52.2 47.3 45.2 52.53 5.18 800 mm 50.3 45.8 42.7 44.8 41.7 38.4 39.2 43.59 3.91 1000 mm 22.0 25.2 27.4 23.8 21.7 19.6 21.5 19.4 22.58 2.75 Questions to ask: Are the mean strengths significantly different? What sand size will produce the maximum strength? If there is a significant difference among the five sand sizes, which one is different from which one? What kind of relationship between strength and sand size can be identified? In conducting the analysis we also need to conduct diagnosis of the assumptions: Normality and constant variance. What should we do if the an assumption is seriously violated?

14 Statistical Model for One-Way ANOVA
An appropriate analysis and interpretation is based on an underlying statistical model to describe the response variable, y. For the concrete strength data, y is the compressive strength. For this case, each y observation is labeled by the sand size and the specimen within the sand type by the notation, yij , representing the ith sand size and the jth specimen within the sand size. For each specimen, there is a true unknown mean, denoted by mi. The deviation: yij – mi is the random error, denoted by eij, which is assumed to follow a normal distribution with mean 0 and s.d., s. Therefore, the statistical model describes this one factor experiment is:

15 A quick check of the treatment means and within-treatment variation, the distribution for each treatment level is illustrated in the figure: m m m m m2 Note that: a quick eye-check indicates that the distribution of the population for each treatment level may have different mean and different variation based on the sample data. The order of the unknown treatment means is estimated by using the sample means, and the shape is assumed normal with different variations, based on the sample s.d. Since a typical ANOVA requires normality and equal variance, the process of the analysis should also include the diagnosis of the assumptions If the assumption(s) is violated, some actions should be taken before performing ANOVA table and other analysis. Common approaches for taking care of the violation of assumptions include Data transformation, Use techniques that are more robust to these assumptions, such as nonparametric techniques.

16 An alternative model that provides better interpretation of ANOVA is the Treatment Effect Model, which has the form: yij = m + ti + eij, i = 1,2, ... , t ; j = 1,2,…, r. The components in the model: is the grand mean. ti is the ith treatment effect = mi – m eij is the random error , which is independently distributed with mean 0 and s.d. s. Each of these components is estimated by the corresponding sample data:

17 Treatment Specimen Observation yij Cell Mean Model Effect Model 200 1 34.5 y11 mi e11 m t e11 2 25.6 y12 m e12 m t e12 …… …... ….. 8 30.4 y18 mi e18 m t e18 400 48.6 y21 m e21 m t e21 52.6 y22 m e22 m t e22 ---- ------ ----- 1000 22.0 y51 m e51 m t e51 25.2 y52 m e52 m t e52 19.4 y58 m e58 m t e58 Size (max) 1 2 3 4 5 6 7 8 200 mm 34.5 25.6 31.3 24.9 28.0 29.4 32.8 30.4 29.61 3.35 400 mm 48.6 52.6 63.9 62.5 58.3 53.8 54.0 61.5 56.90 5.46 600 mm 57.5 59.6 48.4 56.2 52.2 47.3 45.2 52.53 5.18 800 mm 50.3 45.8 42.7 44.8 41.7 38.4 39.2 43.59 3.91 1000 mm 22.0 25.2 27.4 23.8 21.7 19.6 21.5 19.4 22.58 2.75

18 The Sum of Square Decomposition provides the basis of the ANOVA Table:

19 The following model is a full model, in the sense that it covers the situation that the group means are different. yij = m + ti + eij, i = 1,2, ... , t ; j = 1,2,…, r. is the grand mean. ti is the ith treatment effect = mi – m eij is the random error , which is independently distributed with mean 0 and s.d. s. One of our purpose is to test if indeed the groups have different means. That is we are interested in testing a hypothesis: The full model above represents the situation when Ha occurs. When H0 occurs, the model is reduced to yij = m + eij This is usually called a reduced model.

20 To be completed as a hands-on activity
The observed values, estimates,and deviations using full and reduced models are illustrated in the following table: Reduced Model Full Model Treatment Specimen Observation Estimate Difference 200 1 34.5 41.04 -6.54 29.61 4.89 2 25.6 -15.44 -4.01 …… …... ….. 8 30.4 400 48.6 7.56 56.90 -8.30 52.6 11.56 -4.30 ---- ------ ----- 1000 22.0 25.2 19.4 To be completed as a hands-on activity

21 A graphical presentation of the least square estimates of components in the model

22 ANOVA Table for One-Way Analysis
Source of Variation Degrees of Freedom Sum of Squares Mean Square F-value P-value Treatment (Between Group) t – 1 SSTR MSt = SSTR/(t-1) MSt/MSE P(F > F-value) Error (Within Group) N - t SSE MSE=SSE/(N-t) Total N - 1 SSTO NOTE: SSTO = SSTR + SSE (2) DF Total = DF Treatment + DF Error F-test tests the hypothesis: The decision rule based on the p-value: If p-value < a , then, we conclude Ha: Not all group means are equal.

23 Why the F-test can be used to test the hypothesis?
One way to understand why F-test does the job we ask for is to find out the Expected Values of MSE, MSt and look at the ratio of MSt/MSE in terms of the expected values. This can be done by hand. The modern computer technology and statistical software can actually provide the expected value of MSE and MSt. All we need to understand is about the ratio and the relationship of the ratio with F-test. For One-factor design, the following gives the needed expected values:

24 Procedure of the analysis:
Analysis of the Compressive Strength of concrete made by five sand sizes Procedure of the analysis: Preparing the data: data cleaning to make sure no known errors due to sampling or other special causes. Perform descriptive analysis using graphical and numerical tools. Checking outliers. Conduct appropriate analysis of variance., and conduct residual analysis to check if the assumptions of normality and constant variance are appropriate. If yes, go to step 5; otherwise, take proper steps to adjust data and repeat this step. If the F-test shows significant, it says there exists significant differences among treatment means, go to step 6; otherwise, go to step 7. Conduct appropriate multiple comparison procedures to reveal which treatment is different from which treatment or conduct trend analysis to investigate the trend/patterns between the response and the treatments. Summarize and report the results in the context of the problem.

25 Size (max) 1 2 3 4 5 6 7 8 200 mm 34.5 25.6 31.3 24.9 28.0 29.4 32.8 30.4 29.61 3.35 400 mm 48.6 52.6 63.9 62.5 58.3 53.8 54.0 61.5 56.90 5.46 600 mm 57.5 59.6 48.4 56.2 52.2 47.3 45.2 52.53 5.18 800 mm 50.3 45.8 42.7 44.8 41.7 38.4 39.2 43.59 3.91 1000 mm 22.0 25.2 27.4 23.8 21.7 19.6 21.5 19.4 22.58 2.75 Variable Sand Size N Mean Median TrMean StDev Strength Variable Sand Size SE Mean Minimum Maximum Q Q3 Strength

26 A quick check indicates:
Group means are very different with size = 400 the maximum. Does not seem to have outliers Constant variance may not hold. Normality looks okay.

27 Data transformation techniques could be applied to this data.
Although test for equal variance is not significant, there does exist a pattern that the larger the strength, the larger the variation. Data transformation techniques could be applied to this data. Bonferroni CI’s for s.d. Lower Sigma Upper N Size

28 Data Transformation for adjusting the violation of assumptions
Two important assumptions are required in ANOVA: Population where we observe data should be approximately normal. The within-group variances should be approximately equal. We apply outlier detection technique, normality test and homogeneity of variance test to conduct the diagnosis. For the Strength data, the diagnosis shows no serious violation of these assumptions. Therefore, the ANOVA results are appropriate, and we can conduct multiple comparisons or trend analysis. However, the plot of residuals Vs. estimated response values does show a clear pattern that when the strength is larger, the variance is larger. That is there is a relation linear positive relation between variances and group means:

29 Data Transformation – Continued:
How to estimate b so that d can be determined by d = 1-b A quick and easy approach is to fit a simple linear regression line: Some commonly applied power transformations are d = 1-b yd Name Explanation 2 y2 Square Also transform skewed-to left distribution to close to normal 1 Y No transformation Square Root When Y follows Poisson distribution, this will transform Y to close to normal. Ln(y) Ln transformation Also transform skewed-to-right distribution to close to normal. (If y = 0, add .25 or .5 to the every observation.) -1/2 1/ Reciprocal square root -1 1/y Reciprocal Reexpress time to rate.

30 For the Compressive Strength data, we have
29.61 3.35 56.90 5.46 52.53 5.18 43.59 3.91 22.58 2.75 The fitted simple regression line is: Therefore, an appropriate transformation is d = 1-b = = .28 which suggests we may apply a log transformation or a square root transformation. In the following we use the Ln transformation to re-analyze data, and compare if the results are similar or not.

31 General Linear Model: Strength versus Sand Size
Factor Type Levels Values Sand Size fixed Analysis of Variance for Strength, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Sand Size Error Total Unusual Observations for Strength Obs Strength Fit SE Fit Residual St Resid R R denotes an observation with a large standardized residual. ANOVA Table indicates the treatment means are significantly different (p-value of the F-test = .000). An unusual observation is located with standardized residual = This is, however,, not a serious problem. Besides, there is no special causes for this observation. It should be kept for the entire analysis.

32 Expected Mean Squares, using Adjusted SS
Source Expected Mean Square for Each Term 1 Sand Size (2) + Q[1] 2 Error (2) Error Terms for Tests, using Adjusted SS Source Error DF Error MS Synthesis of Error MS 1 Sand Size (2) Variance Components, using Adjusted SS Source Estimated Value Error s2 , the variance for Error Term , is named as the source (2) Q[1] is the Quadratic fixed quantity due to Sand Size, the Source (1), which is This tells us what/how we should perform the F-test. The source (1), Sand Size is tested by the Source (2). The DF of the Source (2) is 35, and the Mean Square for the Source (2) is 18.1. This performs all F-tests that are meaningful and appropriate for the analysis. In this one-way model, the only component that is purely about variation is the random error variance. It is estimated by the MS Error.

33 SE Mean measures the uncertainty of the estimated group mean.
Least Squares Means for Strength Sand Size Mean SE Mean The final part of the results is the Least Square estimates of group mean for each Sand Size and the corresponding SE of each group mean. SE Mean measures the uncertainty of the estimated group mean. The Least Square Mean is the same as the sample mean of each Sand Size, when the replications are all equal, which is the case for this study, r = 8. The SE Mean is the estimated population s.d. divided by the square root of the number of observations that are used to compute the group mean. The estimated common population variance is MSE, and each Least Square Mean is the mean of r = 8 strengths. Therefore, SE Mean is given by

34 The Normality test of the residuals shows no violation of normality assumption.
The main effect plot shows there is a nonlinear relationship between Strength and Sand Size.

35 Analysis of the Ln-Transformed Strength Data
General Linear Model: LnStrength versus Sand size Factor Type Levels Values Sand size fixed Analysis of Variance for LnStreng, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Sand size Error Total Unusual Observations for LnStreng Obs LnStreng Fit SE Fit Residual St Resid R R denotes an observation with a large standardized residual. NOTE: The ANOVA results are similar to the raw data. Therefore, it is recommended that we should focus on the raw data, since the values are meaningful and easier to interpret.

36 The normal probability plot is almost the same as the result using the raw data.
The plot of Residual Vs. Fitted Value show the within-group variances are approximately the same. It is somewhat better than the raw data. The analysis results have little different. Therefore, due to the difficulty of interpreting the results, one should focus on the analysis of the raw data for this case.

37 Post Hoc analysis after the ANOVA
According to the ANOVA results, we conclude that the strengths of difference sand sizes are significantly different. But, we would like to take one more step to find out which one is different from which one. This is a post-hoc comparison. Several post-hoc comparisons have been developed. Each one has its purpose. In this section, we will introduce the following types of comparisons: Specific comparisons of interest – Contrasts Trend analysis when the treatment levels are ordinal scales Simultaneous multiple comparisons: Bonferroni simultaneous confidence interval Comparisons with a control – Dunnette’s method Pair-wise comparison : the Tukey’s method

38 Planning comparisons among Treatments - Contrasts
Contrast allows us to make any treatment comparison of interest. For the Concrete Strength case, suppose we suspect that very small and very large sand sizes may result very different strength from the middle sand size. We can then set up a specific comparison for this purpose by comparing the mean of Sand Size 200 and 1000 with the mean of Sand Size 600. In terms of the notation used for population means, this is equivalent to testing the hypothesis: The coefficients associated with the population means are: Parameter m1 m2 m3 m4 m5 Coefficient -1 Notation for coefficient c1 c2 c3 c4 c5

39 We call the following a contrast among the treatment means:
Note: the sum of ci’s = 0: We call the following a contrast among the treatment means: Examples: In many situations, we are interested in more than two contrasts. It is important to learn the relationship between two contrasts. When two contrasts have the following relationship, we say two contrasts are orthogonal Hands-on activity: Define two contrasts of interest for the concrete example. Are these two contrasts orthogonal?

40 How to conduct a hypothesis test:
How to construct a confidence interval for the contrast: The purpose of forming a contrast is either for testing a hypothesis or constructing a confidence interval to estimate the expanded uncertainty. A common and general technique for testing a hypothesis or constructing a confidence interval for any contrast is: Obtain the sample estimate of the contrast: Determine and obtain the measurement uncertainty of For this case, it is SE of , which is For hypothesis test, compute t-value: 100(1-a)% C.I. for is

41 Construct a 95% confidence interval for the contrast
And test the hypothesis: Size (max) replication 200 mm 8 29.61 400 mm 56.90 600 mm 52.53 800 mm 43.59 1000 mm 22.58 From ANOVA table, MSE = 18.1 and DF = 35 The Least Square Estimate of is = ( )/2 – = SE( ) = 95% CI is 95% of chance that the true difference between mean strength of sand size 600 and Sand Sizes (200, 1000) is from 2270 psi to 3018 psi.

42 To test the hypothesis:
Using t-test: P-value is 2P(t > |tobs|) = 2P(t > 14.35) =.000 We conclude the mean strength of sand sizes (200, 1000) is significantly different from the strength of size 600. Indeed, Size 600 results significantly higher strength. Hands-on Activity Obtain a 90% confidence interval for the contrast , C, you set up. Interpret your confidence interval. Test the hypothesis : H0: C = 0 Vs. Ha: , and interpret your result.

43 A Sum of Squares Decomposition for a Contrast
A contrast consists of one degree of freedom information. In the terminology of Sum of Square, it is part of the sum of squares from the treatment effect. Sum of Squares Due to the contrast C = can be computed from the least square means: For the contrast, of the concrete strength example, The Sum of Square of this one d.f. contrast is SSC = 8 (-26.44)2/( ) = Hands-on Activity Compute the SSC for the two contrasts that you set up for this concrete strength example

44 ANOVA Table with Sum of Squares Decomposition
Analysis of Variance for Strength, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Sand Size (1,5) Vs Rest Error Total NOTE: We can test a contrast using t-test, as presented before. We can also test a contrast using F-test, and include it into the ANONA table. The F-value = , for testing the contrast ‘(1,5) Vs 3’, must equal to the squared t-value we computed in the t-test. That is : F = = t2 = (-14.35)2 If we partition the four d.f. of the treatment into four orthogonal contrasts, total of the Sum of Squares of these four orthogonal contrasts much equal to SSt. However, if the four contrasts are not orthogonal, the sum of these SS’s will be smaller than SSt.

45 Hands-on Activity Define four meaningful contrasts, compute the corresponding sum of squares, and check if the sum of these four sum of squares equal to SSt. The concept and technique of contrast is one of the most important and useful techniques in ANOVA technique.

46 Setting contrasts to test the trend between response and treatments
Contrast is a powerful tool for conducting many types of tests of interest. For this strength example, we may be interested in finding the relationship between strength and sand size or looking for the sand size that will produce the maximum strength. This can be answered by using the technique of contrast. NOTE: It is important to remember that this type of contrast is meaningful only when the level of treatment is an ordinal scale, that is, the levels are meaningful numerically. In this strength study, the level is sand size, which is an ordinal scale of 200, 400, 600, 800, 1000. We usually are interested in how the strength changes when the sand size increases. Contrast can be applied to answer this question.

47 How to set up contrasts for testing trends?
For the concrete example, the levels are 200, 400, 600, 800 and The following questions may be of interest: Is there s linear trend of strength when sand size increases (could be positive or negative? Is there a quadratic trend? (a possible maximum or a minimum strength). Is there a trend that is more complicated than quadratic, such as cubic or higher? This is a problem of fitting polynomial regression: or can be simplified into orthogonal polynomial regression: There is a relationship between x and Pi. Fortunately, there is a table that we can use to construct orthogonal polynomials for testing these trends.

48 Orthogonal Polynomial coefficients (Pci) for the case of five sand size
200 400 600 800 1000 Treatment Means 29.61 56.90 52.53 43.59 22.58 Replication 8 Coefficients for Linear Trend Contrast -2 -1 1 2 Coefficients for Quadratic Trend Contrast Coefficients for Cubic Trend Contrast Coefficients for Quartic Trend Contrast -4 6 To test if there is a linear trend or a quadratic trend, we construct the contrasts: 100(1-a)% confidence interval is T-test for testing a trend contrast is Sum of Square due to a trend is

49 Hands-on Activity Use the concrete strength data to test the linear trend contrast, construct a 95% confidence interval for the linear trend, obtain the one D.F. sum of square for the linear trend, repeat questions 1,2, and 3 for the quadratic trend.

50 Multiple comparisons for more than one contrast simultaneously
What is multiple comparison, why conduct multiple comparison? We know how to conduct single contrast comparison. However, in many experiments, we are interested in testing a set of several contrasts together. We can conduct individual comparison and set the error rate at, say, 5% for each comparison. The problem is when we take all these comparisons together, the error rate is no longer a%. The probability of committing at least one Type I error will be 1-(1-a)k for k multiple comparisons. For example, If we use 5% as the error rate for each individual comparison when comparing 4 orthogonal contrasts, the probability of committing one or more type I error will be 1-(1-.05)4 = .185 , an 18.5% of chance, which is much much higher than the individual error rate of 5%. In order to maintain the error rate of 5% for the entire set of comparisons, we need to adjust the individual error rate when we make each single comparison. If we call aI individual error rate,and aE for the entire set of k orthogonal comparisons, we can fix aE and compute aI using the following relation: aE = 1-(1- aI)1/k

51 When contrasts are orthogonal, we can apply
aE = 1-(1- aI)1/k In many situations, the contrasts of interest may not be orthogonal. The above approach may not work. A simple approach is to use the most conservation error rate for individual contrast. When comparing k contrasts, each with error rate aI , the maximum error rate for the entire set of comparisons simultaneously is : This is known as the Bonforroni’s multiple comparison. Hands-on activity Complete the following table of individual error rates when the combined entire error rate is given: # of comparisons, k 2 3 4 5 6 Combined Error Rate, aE 5% % 5% % Individual rate – orthogonal, aI Individual rate – Bonferroni, aI

52 Bonferroni’s 100(1-a) simultaneous confidence interval
for the contrast C: Hands-on activity Construct 95% Bonferroni’s confidence interval for the confidence interval for the following contrasts simultaneously for the Concrete Strength data:

53 Multiple Comparison of all treatment with a Control
This type of comparison occurs often. Especially when we have a standard or a reference that is to be compared with. The Dunnett’s method for comparing k treatments with the control: 100(1-a)% confidence interval for mi- mc , I = 1,2, …, k : Two-sided CI: If the interval does not include Zero, the ith treatment is different from the Control; otherwise, it is NOT different from the Control. One-sided Lower bound, if better means greater than the Control: If the bound > 0, then, the ith treatment is better; otherwise, it is not. One-sided upper bound, if better means less than the Control: If the bound < 0, then the ith treatment is better (in the less sense) than the Control. The d(a,k,df) is given in a table, which will be provided in the class.

54 Hypothesis Test about mi-mc based on the Dunnett’s Method
Hands-on Activity Use the Concrete Strength data, suppose the sand size 1000 has been the common approach, since it is less expense. Conduct multiple comparisons of all treatments with the control, sand size =1000.

55 Pairwise Comparison of all treatments
When the F-test from ANOVA shows significant, a common question is ‘so which one is different from which one.’ This is a pairwise comparison. The purpose to compare every possible pairs of the treatments simultaneously, identify pairs that are significantly different. A variety of approaches have been proposed. In this section, we will discuss the that has been shown among the best and used commonly. The Tukey’s Method for Pairwise Comparisons Tukey’s method is based on the Studentized range statistic:

56 Procedure of Tukey’s Method of Pairwise Comparison
From the q-statistic, we see: Procedure of Tukey’s Method of Pairwise Comparison 100(1-a)% confidence interval for mi-mj for all I < j:

57 Hands-on Activity Conduct Pairwise Comparisons for the Concrete Strength data using Tukey’s Method

58 Using Minitab for testing contrasts, conducting trend analysis, simultaneous comparisons and pairwise comparisons Factor Type Levels Values Sand size fixed Analysis of Variance for Strength, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Sand size Error Total Sand size Mean SE Mean F-test indicates significant difference among treatments. Further analysis should be conducted

59 Tukey 95. 0% Simultaneous Confidence Intervals
Tukey 95.0% Simultaneous Confidence Intervals. Response Variable Strength All Pairwise Comparisons among Levels of Sand size Sand size = 200 subtracted from: Sand size Lower Center Upper (--*--) (--*---) (--*--) (--*---) Sand size = 400 subtracted from: (--*--) (--*--) (--*--) If an interval covers zero, it indicates no significant difference; otherwise, it is.

60 Sand size = 600 subtracted from:
Sand size Lower Center Upper (---*--) (--*--) Sand size = 800 subtracted from: (--*---)

61 Tukey Simultaneous Tests: Response Variable Strength
All Pairwise Comparisons among Levels of Sand size Sand size = 200 subtracted from: Level Difference SE of Adjusted Sand size of Means Difference T-Value P-Value Sand size = 400 subtracted from:

62 Sand size = 600 subtracted from:
Level Difference SE of Adjusted Sand size of Means Difference T-Value P-Value Sand size = 800 subtracted from:

63 Dunnett 95. 0% Simultaneous Confidence Intervals
Dunnett 95.0% Simultaneous Confidence Intervals. Response Variable Strength Comparisons with Control Level Sand size = 1000 subtracted from: Sand size Lower Center Upper (----*---) (----*---) (----*---) (----*---) Dunnett Simultaneous Tests Response Variable Strength Level Difference SE of Adjusted Sand size of Means Difference T-Value P-Value Sand size = 1000 is the control. Each treatment is compared with the sand size = All sizes show significantly better.

64 Trend Analysis – Sum of Square Decomposition using Minitab
The regression equation is Strength = Linear Quadratic Cubic Quartic Predictor Coef SE Coef T P Constant Linear Quadrati Cubic Quartic S = R-Sq = 91.6% R-Sq(adj) = 90.6% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Source DF Seq SS Linear Quadrati Cubic Quartic This analysis show that all four trend are significant, with quatratic term much more significant The Regression has 4 df, which are exactly the Treatment effects, SSt. It can be partitioned into four ‘1’ df components of trends

65 Variable : Lin is transformed from Sand-Size to simplify the model
Use of Lin and Linsq makes it easy for prediction: Size Lin Linsq 200 -2 4 400 -1 1 600 800 1000 2 The regression equation is Strength = Lin Linsq Predictor Coef SE Coef T P Constant Lin Linsq S = R-Sq = 85.7% R-Sq(adj) = 84.9% Analysis of Variance Source DF SS MS F P Regression Residual Error Lack of Fit Pure Error Total Source DF Seq SS Lin Linsq Lack of Fit is the SS due to Cubic and Quartic terms Pure Error is the MSE in ANOVA

66 Unusual Observations Obs Lin Strength Fit SE Fit Residual St Resid R R Predicted Values for New Observations New Obs Fit SE Fit % CI % PI ( , ) ( , ) Values of Predictors for New Observations New Obs Lin Linsq ( , ) ( , ) We conduct a prediction of strength using this quadratic model for coded sand size = -1.5 and .5. Using the coding relation, -1.5 is Sand Size = 300, and .5 is Sand size = 700 Formulae for computing the CI interval and Prediction interval can be done using matrix. It is rarely conducted by hand now a day.

67 A summary of Multiple Regression Modeling using Matrix

68 The ANOVA and prediction interval computations can be done much more effectively using the above matrix approach. Especially if there is no statistical software to perform these computations, and one needs to do these by hand.

69 Project Activity The life time of a certain type of heater depends on the temperature the heater is set up. A lab test is conducted to determine how the different level of temperature affect the life time of the heater. The Heater has four temperature levels: 600oF, 900oF, 1200oF and 1600oF. A total of 24 heaters are tested. Six heaters are randomly assigned to be tested at a given temperature. The life time of each heater is recorded. Test Temperature Life Time in Hours 600 2146 2865 3854 5732 5843 900 1254 1489 1732 2355 2724 1200 675 696 889 1124 1367 1600 489 552 584 674 712 Conduct an appropriate analysis including assumption diagnosis, outlier detection, possible transformation, ANOVA, post-hoc comparison, and trend analysis, where ever appropriate.

70 Create orthogonal polynomial coefficients :
How to use Minitab to conduct One-way ANOVA and the related post-hoc comparisons, Sum of Square Decomposition and Trend Analysis Create orthogonal polynomial coefficients : Before carrying out the trend analysis, we need to create the orthogonal polynomial coefficients for each type of trend contrasts. Linear trend for five levels of treatment has the contrast coefficients: -2,-1,0,1,2 corresponding to the treatment level: 200,400,600,800 and Therefore, we need to create a column in the Minitab worksheet with values of –2 for Sand-size = 200, value –1 for sand size = 400, and so on. Steps for this are: 1. Go to Calc, choose Patterned Data, then select ‘Arbitrary Set of Numbers’. In the dialog box, enter the column for ‘store patterned data ‘, and enter –2 – into ‘Arbitrary set of numbers’. Enter List each value 8 times, this is the number of replications. List whole sequence 1 time. This will create the sequence of coded values that are corresponding to the Sand-size column. Repeat the same procedure in (2) for Quadratic trend coefficients: {2,-1,-2,0,1,2}, and repeat (2) for Cubic and quartic trends.

71 Steps for Analysis with post-hoc comparisons:
Go to Stat, choose ANOVA, select General Linear Model. (One can also use the One-Way procedure. However, there are less selections in the analysis.) In the dialog box, enter response variable, Strength. Enter Model: Sand-size. If there are more than one factors, this is the box to enter the model. Eg, for a twy-way model, if we have factors defined in c3 and c4, then. In this Model box, we can enter C3 C4 C3*C4. This means we have a model that have two factors and an interaction term. (We will discuss two-way model later). Random Factors box is for identifying if a factor is so-called a Fix effect factor or am Random Effect Factor. 3. There are seven selections in the ANOVA dialog box. Covariates is to identify the terms in the model that is not a treatment factor, but, is an independent variable. For example, When measuring Brightness of paper as the response variable, roughness of the paper is a covariate. In statistics, this is so called Analysis of Covariance. This occurs often in medical studies, in surveys. Graphs is for residual analysis to check the model assumptions. Factor Plots allow us to present graphical views of main effects and interactions. Options is for different types of analysis. Default is what we typically choose.

72 Minitab Continued: Results allows us to display more detailed or less detailed results. One very useful choice in this Results selection is to ‘Display Expected Mean square and variance components’. This tells us why and how to conduct appropriate F-tests. The ‘Display Least Square Means’ is also a very useful choice. It is especially useful when replications are not equal. Comparisons provides a variety of Post-Hoc comparison, including Tukey’s Pairwise comparison, Dunnett’s comparison with control, Bonforroni’s confidence intervals. Storage allows us to store a variety of results in the worksheet for other analysis. Steps for conducting Trend Analysis: This can be using the Regression Procedure. The steps are: Go to Stat, choose Regression, selection Regression procedure. In the Regression Dialog box, enter Response: Strength and Predictors: column # or variable names for the Linear, Quadratic Cubic and Quartic trends. There are four selections: Graphs for residual analysis Results for displaying results. Options for a variety of optional analysis. One can enter a column# containing predictors for prediction mean response or new observations. This is very useful when doing calibration analysis. Storage for storing results in worksheet.

73 A quick explanation of the difference between Fix Factor and Random Factor:
If a the levels of the factors chosen for the experiment represent all of the possible levels for the factor, it is a Fixed factor. For example, If a heater has only three levels of temperature control, then, if we study the life time of the heater based on level of temperature control, then, the factor is a FIXED factor. Statistically, this reflects in the model set up and the Sum of Squares obtained from the ANOVA is the squared differences among factor levels. On the other hand, when planning a lab testing study, if we are interested in the effect of day-to-day changes, since there are unlimited numbers of days, we simply choose days at random for experiment. The levels of days chosen represent a random sample of all possible days. Therefore, Day is a RANDOM factor, and we are interested in study the variability, instead of the differences among factor levels. Statistically speaking, the sum of squares is an estimate of some function of the variability due to the different days. (We will discuss the difference in details later).

74

75


Download ppt "Planning Experiments – the general consideration and"

Similar presentations


Ads by Google