1 Basic Experimentation Notes developed by Ken Lulay for ESCO Mechanical Engineering University of Portland July 2008.

1 Basic Experimentation Notes developed by Ken Lulay for ESCO Mechanical Engineering University of Portland July 2008

2 What to bring? Attendees should bring: Scientific calculator Ideas for experiments they would like to do at ESCO The following should be provided: Two brands of paperclips Popcorn, scale, microwave

3 Objectives By the completion of this class, you will be able to: Understand basic experiment “vocabulary” Design and analyze a single variable experiment (2 level factor) Design and analyze a multi-variable experiment (2 level factors)

4 Basic Experimentation - Overview Experiment Basics (30 minutes) Single variable experiments – design and analysis (2 hours) Multi-variable experiments – design and analysis (4 hours)

5 Overview of Experiment Basics Differences between testing and experimenting Experimental variables Errors: systematic and random

6 Experimenting and testing… …both require obtaining data (taking measurements), but… …what are they and how are they different?

7 Testing Testing may involve investigating only one set of conditions. Usually evaluating performance Example: determine strength of a material May be a standardized test (ASMT, ISO) Often has pass/fail criteria Does it meet specifications or not? We will not be discussing “testing”

8 Experimentation Performed to increase knowledge how things perform under differing conditions Vary the input to determine the response Requires more than one set of conditions (design points) Evaluate “better/worse” (not pass/fail)

9 Experimentation & Testing A BIG difference : Tests are often routine  Same tests done daily!  Analogous to daily commuting to work Experiments are “unique”  Usually done only once!  Therefore, require more careful planning!  Analogous to a vacation trip

10 Variables Variables are physical quantities that may or may not affect the results of an experiment or test. Several types of variables are associated with any test and experiment: Controlled or Extraneous Controlled variable are held constant or intentionally manipulated (changed) during an experiment. Extraneous variables are not controlled. They are generally assumed to have no effect on the response (ex: ambient room temperature)

11 Variables Dependent or Independent The magnitude (value) of dependent variables are dependent upon other variables whereas the magnitudes of independent variables are not Ex: in an experiment to determine the effect of temperature change on the toughness of AISI 1045 steel, temperature would be an independent variable and toughness would be a dependent variable Continuous or Discrete (a.k.a Categorical) Discrete variables cannot take on a continuous range of values. Ex: Red/Green; Company A/Company B. Continuous variables can take on a continuous range. Ex: temperature, toughness, force

12 Terminology Factor - an independent variable in an experiment - factor levels are intentionally varied in an experiment to see what the effect is on the response. Factor Level - the target value of the factor. Example: pressure may be set to two levels: 0.5 Atm, and 1.0 Atm Response - the thing to be measured. Example, if you want to determine the yield strength at different temperatures, the yield strength is the response.

13 Variables and Levels Proper selection of appropriate variables and their levels is not trivial but is critical Selecting proper factors and levels is worth the effort. Don’t rush this step. Differences in factor levels: Factor levels must be “well separated” Far enough apart to be “different” (produce-ably and measurably) Not too far apart to be “unreasonable” (non-linear responses can be an issue – may miss the optimum)

14 Purpose of experiments? The sole purpose of our experiments will be to answer the following questions: Does changing one or more factor have a statistically significant effect on the response(s)? And if so, which factors appear to have the most significant effect?

15 Practice A materials engineer wants to study the effect of molybdenum content in a particular high alloy steel on the yield strength at various temperatures. For this experiment: Define the factors and their levels Define the response Identify “all” variables and classify them controlled/extraneous discrete/continuous dependent/independent

16 Practice (“answers”) Factors (controlled): Molybdenum content; levels: 5.1% and 5.2%? Test temperature; levels: -50F, 1000F? Also control: Test bar geometry, chemistry (other than Mo), strain rate, measurement methods and systems, test methods and systems,… Extraneous: humidity, … Dependent: yield strength (response)

17 Errors Errors (measurement variation) are due to a number of factors: Measurement error error = measured value - true value Changes in test specimen Ex: one specimen has slightly larger diameter Changes in environment Ex: ambient temperature increase Et cetera

18 Errors The “true” value is the value one would obtain with a perfect measurement. The true value is never known in an experiment Therefore, error can never be known exactly, it can only be estimated using statistical analysis. Errors are inherent in measuring devices and caused by uncontrollable variations within the experiment.

19 Systematic and Random Errors In any experiment, two types of error can exist: Systematic Random

20 Systematic Errors Caused by underlying factors which affect the results in a “consistent/reproducible” and sometime “knowable” way Sometimes referred to as “bias” Not random DANGER: can lead to false conclusions! Discuss this now, but example to follow later Can be managed (reduced effects) by properly designed experiments (randomizing the test conditions).

21 Causes of Systematic Errors Unknown changes during the experiment temperature, procedures, equipment, etc. Different batches of material or samples Et cetera

22 Random Errors Show no reproducible pattern – they are random. Sometimes referred to as “noise.” Typically have normal distribution (bell shaped) averaging several readings can reduce random errors.

23 Practice Consider the previous example ( experiment to determine effect of varying molybdenum content and temperature on yield strength ) Make a list of possible systematic errors and random errors for the design on next slide…

24 Practice RunMolyTemp. 12%50F 22%50F 32%50F 410%50F 510%50F 610%50F 72%150F 82%150F 92%150F 1010%150F 1110%150F 1210%150F The Experiment: –Two batches of steel: 2wt%Mo & 10wt%Mo –Test bars are machined by outside company –Two test temperatures: 50F, 150F PRACTICE: Make a list of possible systematic errors and random errors

25 Practice (“answers”) Possible systematic errors: Batches of steel (chemistry variation of other elements) How could this effect be mitigated? Machining of specimens (did moly content affect machining quality? Were specimens machined in batches with different diameters?) Temperature drift during testing (maybe from 52F towards 48F, and from 152F towards 148)? Variation between beginning and end of test (measurement systems, operator, test equipment, test procedures…) How could systematic errors have been reduced?

26 Practice (“answers”) Possible random errors: Measurement errors Diameter of bars (maybe random) Load cell variation Others?

27 Review of Terminology Do Exercise 1 (definitions) in the back of the booklet to review terminology.

28 BREAK TIME! Single Variable Experiments to follow

29 Overview of Single Variable Experiments Basic Design of Experiments (DOE) Example of “how not to” Statistics and t-testing Hypothesis testing Confidence Intervals

30 Design of Experiments (DOE) By careful design, errors can be mitigated Systematic errors are mitigated by randomizing the test conditions (randomized run order) Random errors are mitigated by increasing the number of data points Design is a compromise of competing criteria: Cost, time, availability of equipment, etc. Control over variables Importance of results and conclusion CAREFUL PLANNING is REQUIRED! Let’s look at a basic example…

31 Example: Single Variable Experiment Wacky Engineer, a new employee at ESCO, believes that the color of paint applied to a tensile bar can affect the strength. Let’s take a look at this experiment…

32 Single Variable Experiment Determine if paint color affects strength of tensile bars Factor 1: paint color Levels: Red, Green Other controlled variables: test specimen geometry and material (constant) Response: yield strength of bar Results: Red = 81.9ksi, Green = 80.2ksi Did color of paint have an effect? Not a well thought out experiment We need more and better data…

33 Single Variable Experiment New Experiment with more data: paint five bars red and five green Red paint is available, green paint is on backorder. Your boss really wants data soon! Test facility is available, so show progress: Paint and test red bars! Green paint arrives, complete the testing!

34 Single Variable Experiment The results: R: 80.3, 81.2, 82.1, 83.1, 82.2; Ave=81.9 G: 78.2, 82.1, 80.8, 81.6, 81.1; Ave=80.2 The red bars were stronger on average. Same operator did all testing. Red bars were the first tensile bars he’s ever tested. Did color of paint have an effect? This is another poorly thought out experiment. What are some problems with this experiment?

35 Another, Better Example Re-do the prior experiment, but randomize Randomize by using the following run order: R, G, G, R, G, G, R, R, G, R why randomize? Why would the following run order not be “OK”? R, G, R, G, R, G, R, G, R, G

36 Better Example The randomized run order results: R: 80.3, 81.2, 82.1, 83.1, 82.2; Ave=81.9 G: 78.2, 82.1, 79.8, 79.6, 81.1; Ave=80.2 R & G averages are different but did color of paint really have an effect? Averages are only part of the answer “Statistically significant” difference depends upon both the averages and the variation.

37 Plot the Data 798183 Looks like Red paint increased the strength! Will your boss believe this? How certain are you that the effect is real? How likely is this to be a “fluke”?

38 Need some statistical stuff…

39 Probability Distribution Assume distribution is “normal”!!! Measurements are a sample of the total We can never be 100% certain about experimental results (variation, error). Can only estimate “likelihood” or “probability” a b f(x)

40 t-test Comparing the averages is NOT sufficient! The best way to answer “are they different” is with the t-test. The t-test incorporates both the deviation of the data as well as the means.

41 t-test – what does it do? Consider two sets of sampled data Are their true means likely different? What about these two sets? t-test will help us decide Both sets have same averages

42 Basic Statistics  = true mean X = estimated mean based on finite sample size  = true standard deviation S = estimated standard deviated based on the finite sample size n = number of samples x i is the value of the i th sample (Equation 1) (Equation 2)

43 Basic Statistics, Continued Note: X is an estimate of the actual mean (  ). It becomes closer to  with increasing sample size, n. X itself is a random sample of the true mean, . For normally distributed data: 68.3% of all data will be with in +/- 1  95.4% of all data will be with in +/- 2  99.7% of all data will be with in +/- 3 

44 Hypothesis Testing We want to determine if color of paint had an effect on strength (Red vs. Green, prior example). Hypothesize there is no effect due to paint color (this is the so-called “null hypothesis” or H 0 =0). In other words, we claim that:  Red =  Green We have sample means (X R =81.9, X G =80.2) which are estimates for the true means (  Red,  Green ) but we can never know the true means exactly.

45 Statistics Assume the deviations are the same (  R =  G ) “Pool” the deviations: For our Paint Color experiment: n R = n G = 5, S R 2 = 1.33; S G 2 = 2.23 S p 2 = {(5-1)*1.33+ (5-1)*2.23} / {(5-1) + (5-1)} S p 2 = 1.78 (Equation 3)

46 t-test We now define t 0, which is from the t-distribution (don’t worry about what that means): For our example: t 0 = ABS{81.9 – 80.2} / {1.78 (1/5 + 1/5)} 1/2 t 0 = 2.06 (Equation 4)

47 t-test So what is this “t 0 ” number? Notice the “effect” (difference between the two samples) is in the numerator, the variation (“noise”) is in the denominator. The larger t 0 is the greater the probability that the effect (difference) is real. How large is large? “Effect” “Error” or “variance”

48 t-test To determine the t-distribution value we need to know the degrees of freedom and select a confidence level Determine the degree of freedom in our experiment DOF = (n R - 1) + (n G - 1) = (5 - 1) + (5 - 1) = 8 We need to compare t 0 calculate with tabulated values from t-distribution with corresponding degrees of freedom (8) at some level of confidence Confidence level is our choice, typically 95% or 99%.

49 t-distribution Table We select 95% confidence as our criterion For 95% confidence interval,  = 0.05 There are 8 degrees of freedom in this experiment From t-distribution: t  /2, DOF = t 0.05/2, 8 = 2.31 t-distribution values are obtained from tables in most statistics/experimentation books. Note, t  /2 – means we are using 2-sided or 2-tailed test which is appropriate for the hypothesis of  R =  G. If we were to ask the question is  R >  G, then we would use single-sided t-table ( t  /1, DOF).

50 t-test In our paint example t 0 < t  /2, DOF (2.06 < 2.31) t 0 is too small to reject the null hypothesis at 95% confidence. Therefore, we accept the null hypothesis (  Red =  Green ). This does not mean we are 95% confident that the bars painted red were equal to the green. It means we cannot say with confidence that they are different. Next slide…

51 95% Confident? Failing to reject the null hypothesis does not imply we are confident the two sets are equal. A “well mixed” box of 100 apples: any of the apples can be either red or green. Null hypothesis: 50 are red, 50 are green Pull 30 apples out: 14 red, 16 are green – would you reject the null hypothesis? New box, pull 30 out: 1 is red, 29 are green – would you reject the null hypothesis?

52 Exercise – t-test Task: conduct an experiment to determine if there is a difference in the fatigue life between two brands of paperclips. Fatigue life is defined in this experiment as number of times the clip can be bent back and forth 90 degrees. One 90 degree bend is one fatigue cycle. If there are 10 or more people in the class, split the class in half (effectively conducting two identical experiments with about 5 data points per paperclip brand in each experiment). Each person in the class should break one of each brand and record their results. Use the worksheets in back of this book.

53 Pairing – a special condition t-test Determining relative magnitudes of the “effect” and “noise” is foundational for statistical analysis of experimental data. “Noise” comes from many sources: differing batches of specimens, differing test or measurement apparatus, operator differences, etc. If we can “filter out” noise, we would be able to perform a more effective analysis.

54 Pairing – a special condition t-test If there is a single source of noise that we can identify and control, we may be able to “pair” the data. For example, if we want to test the wear life of a new alloy, we may conduct an experiment to compare the life of the new alloy with a traditional alloy. Put several of each part on buckets in the field. Due to differing loading conditions, one would expect there to be large variability in wear from one bucket to another. Therefore, the bucket variation will introduce a large amount of noise.

55 Pairing – a special condition t-test Pairing: put one sample of each alloy on each bucket (alternating location (left-right) of the two alloys from bucket to bucket). Determine the difference in life on each bucket between the two alloys. Since each alloy on a given bucket presumably will experience similar loading, pairing will effectively “filter out” the noise contributed by bucket-to-bucket variation. The null hypothesis now becomes  D = 0, where  D is the difference in means between the two groups (alloys, in this example).

56 Pairing – a special condition t-test For paired t-test where (Equation 5) X D is the average of the differences, n D is the number of pairs of data, d i is the individual data (difference).

57 Exercise - Pairing What about our paperclip experiment? A potentially large source of error was operator-to-operator variability. Each operator tested one of each paperclip, therefore, we can analyze using pairing! How lucky! Using the worksheet in the back of this booklet, re-analyze the paperclip fatigue data using pairing (Exercise 3).

58 Pairing – conclusion Pairing is not always an option (need to be able to identify a single source of noise and then introduce one of each group to the precise same noise). If it is an option – do it! There is no cost other than planning for it. It will increase the power of the conclusion Not uncommon to fail to reject the null hypothesis using the t-test alone, but rejecting it using pairing analysis because of its increased power.

59 Can the t-test mislead us? Yes! We can only make statements about probability! Also, for the t-test to be valid, the data must have normal distribution. There are two types of errors that can be made with hypothesis tests (next slide, please)…

60 Hypothesis Errors Type I The probability of erroneously rejecting the null hypothesis Also known as the level of significance (equals  ) Type II The probability of erroneously accepting the null hypothesis The power of the experiment increases with increased number of data points (less likely to make a type II error) These are independent, not complimentary (think about the previous “apple” example.)

61 Confidence Intervals Rather than asking the question “are they different” we may want to ask the question “how different are they” Confidence Intervals help us answer that question

62 Confidence Intervals What is the likely range of differences between the means of two groups (  A -  B )? The interval or range is: Where t  /2, DOF is based on the level of confidence, and DOF = n A + n B - 2 (Eq’n 6)

63 Confidence Intervals For our example, for 95% confidence intervals, we have: DOF = 5 + 5 – 2 = 8 (n G = n R = 5) t 0.05/2,8 = 2.31 (from t-distribution tables) X R =81.9, X G =80.2, S p 2 = 1.78 (from previous) = (81.9-80.2)+/-(2.31){1.78(1/5+1/5)} 1/2 = 1.7 +/- 1.95

64 Results The 95% confidence interval is: 1.7 - 1.95 <  R -  G < 1.7+1.95 Which is: -0.25 <  R –  G < 3.65 We are 95% confident that the true difference in means of these two groups lies somewhere within this interval (-0.25 to 3.65). Since the interval contains zero, we failed to reject the null hypothesis. Would the range increase or decrease for higher levels of confidence?

65 So what? The confidence interval was determined to be: -0.25 <  R –  G < 3.65 (95% level) What if we consider a difference of 2ksi or greater to have engineering significance, what next? What if we consider a difference of 5ksi or greater to have engineering significance, what next?

66 Exercise – confidence interval Using the worksheet in the back (Exercise 4) and the data already obtained, determine the confidence interval for difference in fatigue life (number of bends until fracture) of two paperclip brands (A and B).

67 Randomize run order (remember, systematic errors = bad) t-test is used to evaluate “is there an effect?” Pairing can be more powerful use it if possible Confidence Interval determines likely range of the difference (  R –  G ) Summary for Single Variable Experiment

68 We’ve looked at a single variable experiment. What about more complicated conditions experiments with two or more variables… What’s next?

69 BREAK TIME! Multi-Variable Experiments to follow

70 Overview of Multi-Variable Experiments “One variable at a time” approach  Interactions (what are they?) Terminology Balanced design (what’s this?) Factorial Experiments Practice (optimize a “manufacturing” process)

71 Purpose of experiments? Remember, the purpose of experiments is to answer the question “does changing one or more factors have an effect on the response.” Our job is to answer that question using the limited resources available as well as possible.

72 Multi-Variables Using “One Variable at a Time” Approach “One variable at a time” Very basic experiment Seems intuitive and simple Reality: Difficult to draw meaningful conclusions Poor use of resources Avoid these types of experiments! Example to follow illustrates why

73 One Variable at a Time Example: Determine optimal conditions for the following machining process: Factors: Tool condition (levels: dull or sharp) Cutting depth (levels: 0.005” or 0.010”) Cutting speed (levels: 500rpm and 1000rpm) Response: surface finish

74 One Variable at a Time, Design Test conditions: Run 1, “baseline” or “control” sharp, 0.005”, 500rpm Run 2, vary the tool condition dull, 0.005”, 500rpm Run 3: vary the depth sharp, 0.010”, 500rpm Run 4: vary the speed sharp, 0.005”, 1000rpm

75 One Variable at a Time, Results RunToolDepthSpeedResults (surface finish) 1SharpLowSlow140rms 2DullLowSlow190rms 3SharpDeepSlow120rms 4SharpLowFast90rms

76 One Variable at a Time, Conclusion? Run 4 produced the best result, but… How much random error was present? Did systematic error influence the results? Best set of variables maybe: Sharp tool? Deep cut? Fast? Run 1 (base)140 2 (dull)190 3 (deep)120 4 (fast)90

77 One Variable at a Time, Conclusion We have only one data point for conditions of “dull”, “deep”, “fast”, but three data points for “sharp”, “low”, “slow” There is no way to estimate errors (most conditions were tested only once) Without estimating the errors, it is difficult to draw valid conclusions. Did not test the “best conditions” together This is “okay” if there are no interactions Could conduct another experiment to validate – but wouldn’t it have been better to do a complete job the first time?

78 We’ve identified that there are problems with the “one variable at a time” approach. Before considering better alternatives, we need to understand “interactions.” Interactions?

79 What are Interactions? An interaction is when changing one factor influences how a different factor will affect the response. Clear? Example: Conduct an experiment to determine which is more effective at keeping your shirt dry in the rain: an umbrella or a raincoat. Experiment: 2 factors: Factor 1, “weather”: rain with wind, rain with no wind Factor 2, “tool”: umbrella, raincoat Response: “wetness”

80 Interaction Example No wind: raincoat and umbrella were effective. Wind: the umbrella was not effective but coat was. There is an “interaction” between “weather” and “tool” RunWeatherToolWetness 1WindUmbr80% 2WindCoat20% 3No windUmbr30% 4No windCoat20%

81 What would no interaction “look like?” Contrast the previous with the following “no interaction” results…

82 RunWeatherToolWetness 1WindUmbr90% 2WindCoat30% 3No windUmbr80% 4No windCoat20% RunWeatherToolWetness 1WindUmbr90% 2WindCoat80% 3No windUmbr30% 4No windCoat20% No interaction (“tool” had an effect: you’ll get wet if you use an umbrella) No interaction (“weather” had an effect: you’ll get wet if it’s windy)

83 No Wind Wind Coat Umbrella Response (wetness) No Wind Wind Coat Umbrella Response (wetness) No Wind Wind Coat Umbrella Response (wetness) InteractionNo interaction (Response not parallel)(Response parallel) Interactions are easier to see by plotting results. Our three different scenarios:

84 Alternative Interaction Plot RunWeatherToolInteractionWetness 1WindUmbr+90% 2WindCoat-80% 3No windUmbr-30% 4No windCoat+20% No Wind Wind Coat Umbrella Response (wetness) Plot all 4 interaction values and fit a trend line. If trend line is flat, there is no interaction. See “exercise” in back for more complete discussion. - + Response (wetness) Interaction

85 Exercise - interactions Determine if interactions exist in the results shown in the exercise in the back of the booklet (Exercise 5).

86 End of Story for “One Variable at a Time” The “machining” example above did not evaluate interactions… …We can not determine what best set of conditions are. And not only are we not confident in the results, we have no idea of how “not confident” we are!

87 What’s next? We’re almost ready for some really fun stuff… …but first, some terminology… …and then, explain what “balanced” designs mean. Then we can have fun with designing experiments “the right way!”

88 Terminology Repetition - measuring the same response more than once (or taking another data point) without resetting up the experimental conditions. Decreases measurement errors to a limited degree. Replication - requires completely redoing the experimental conditions. In other words, setting up the conditions as identically as possible to produce another measurement. Very important to estimate the experimental error. It shows the effects of set-up, and other unknown extraneous variables. Replication is NOT the same as repetition, although they sound similar.

89 Terminology, cont. Run - a set of experimental test conditions. All factors are set to specific levels. If I want to measure the boiling point at three pressure levels, I need at least three runs - one with the pressure at each of the 3 levels. Treatment (design point) - a set of experimental conditions. One treatment is conducted each run, but treatments may be replicated in an experiment (may occur more than once).

90 Repetition or Replication? Consider an experiment shown at right. How would this experiment be conducted differently if it were to have 2 replicates compared to 2 repetitions? RunTool Sharpness 1Sharp 2 3Dull 4

91 Designed Experiments (Design of Experiments, DOE’s) Statistically based methodology of conducting and analyzing experiments Interactions can be evaluated Systematic error can be mitigated by randomization Random error (noise) is mitigated by "balanced" designs since each variable is tested at different levels multiple times. Let’s explain “balanced” design…

92 Balanced Design - What it really means Each factor is tested an equal number of times at each level For each factor setting, all of the other factors are set to each of their levels an equal number of times. The variation of all the other factors does not bias the results. Balanced designs do not necessarily test all possible conditions. Need an example to understand “balanced”…

93 Prior Machining Example The “One Variable at a Time” example was not a balanced design. One level of each variable was tested 3 times, the other level was tested only once. RunToolDepthSpeed 1SharpLowSlow 2DullLowSlow 3SharpDeepSlow 4SharpLowFast

94 Example of Balanced Designs Consider the “machining experiment”: 3 factors, 2 levels each Tool: sharp, dull Depth: deep, low Speed: fast, slow To run every possible combination would require 2 f runs where f is the number of factors (f=3, 2 3 = 8). …but we don’t need all 8 conditions for a balanced design… Balanced means…well, we need an example…

95 Example of Balanced Designs RunToolDepthSpeed 1DullLowSlow 2DullDeepFast 3SharpDeepSlow 4SharpLowFast For each level of one factor, the other factors are tested an equal number of times at each level. Ex: For dull tool, depth is low once and deep once, speed is slow once and fast once, et cetera. This is a balanced design:

96 Contrast with not balanced RunToolDepthSpeed 1DullDeepSlow 2DullDeepFast 3SharpLowSlow 4SharpLowFast NOT BALANCED! All levels tested the same number of times as previous example (twice), but…if tool is dull, then depth is always deep

97 RunToolDepthSpeed 1DullLowSlow 2DullDeepFast 3SharpDeepSlow 4SharpLowFast RunToolDepthSpeed 1DullDeepSlow 2DullDeepFast 3SharpLowSlow 4SharpLowFast Balanced: Not balanced:

98 Exercise – balanced experiments Complete Exercise 6 in the back of this booklet: Create a balanced experiment with two factors at two levels each. Assume 8 runs (2 2 = 4 conditions) Do not randomize (for this practice) Factor A: levels: + and – Factor B: levels: + and – Notice “+” and “-” are often used in DOE’s to signify a “high” and “low” level. These are called “coded” levels

99 Review We’ve studied Single Variable experiments (t-test, pairing, Confidence Intervals) We have an understanding of interactions We have an understanding of “balanced” design We are ready to study experiments with multiple factors (factorial experiments)

100 Factorial Experiments – “The Right Way” We will consider only full factorial experiments (experiments where all possible combinations are tested ). We will limit our discussion to experiments with two levels per factor. Non-linear results will not be detected The total number of possible combinations for experiments with multiple factors, all with two levels is 2 f, where f is the total number of factors (test variables).

101 2-Level Factorial Design Matrix 2 2323 2424 2121

102 Effects? Remember, experiments answer the question “is there an effect caused by changing factor levels?” t-test answers this by comparing the difference (effect) to the error (noise): We can do something similar with multi- variable experiments. Example follows…

103 Example, 2 Factors We will use an example to develop our understanding of design and analysis Design an experiment with: Factor 1: Paint color; Levels: Red, Green Factor 2: Operator; Levels: Chris, Terry Response: Yield strength Use: Full factorial (all combinations tested) 3 replicates (each condition tested 3 times)

104 Design To estimate error we need at least 2 replicates (each condition is tested twice) More replicates = better estimate We decide to have 3 replicates (each condition (design point) tested 3 times) We need a balanced design

105 Design Matrix Design Point Factor 1Factor 2Factor 1Factor 2 1++RedChris 2-+GreenChris 3+-RedTerry 4--GreenTerry (coded levels, +/-) (Non-coded levels) Factor 1, color: (+) = Red; (-) = Green Factor 2, operator: (+) = Chris; (-) = Terry

106 Interactions With this DOE we will be able to analyze the effects of interactions. Interactions are treated as an independent factors in the analysis Two-way interactions (review): The effect of one factor depends upon the level of another Ex: you will stay dry if you use an umbrella and no wind, but will stay dry if you use a raincoat regardless of wind.

107 Design Matrix Design Point 121X2 1+++ 2-+- 3+-- 4--+ The level of interaction between Factors 1 and 2 (1X2) is the “product” of coded levels Factor 1 and 2 { i.e. (+)*(+)=(+); (+)*(-)=(-); (-)*(-)=(+) }

108 The randomized run sheet is on next slide Includes 3 replicates (each design point, or set of conditions, is tested 3 times)

109 Randomized Run Order RunDesign Point RunDesign Point 1474 2282 3393 41102 51114 63121 The design point defines the test conditions for the run (see previous slides)

110 DOE Results The experiment was conducted following the prescribed randomized run order. The next slide shows the re-organized results and calculates the means We will plot the results Then we will step through the analysis…

111 Results Response for the 3 replicates Main factors (1, 2) and interaction (1X2) Averages for the 3 replicates

112 We are concerned with the averages, not the individual data points (they vary due to noise) Let’s plot the data…graphs are a good way to visualize results…

113 Plot Results of Factor 1 Factor 1 (Color) - + Response 80 90 Plot the response values (averages) against the factor level (“-” “+”). The graph shows that the average response when Factor 1 was “-” compared to “+” is not much different. Changing Factor 1 had little effect.

114 Results for Factor 2 The slope of the trend line between the (-) and (+) levels shows that Factor 2 had a large effect on the response. Factor 2 (Operator) - + Response 80 90

115 Results for Interaction (1X2) Factor 1X2 (Interaction between Factors 1 and 2) - + Response 80 90 Again, a nearly level trend line indicates little effect due to this factor (1X2). In other words, there is little interaction between Factors 1 and 2.

116 More Rigor (Statistics!) The graphs are useful in terms of giving us a qualitative sense of effects. But as we’ve seen, “averages” are not sufficient. We need a method to quantify our confidence in the effect. Where are we going? t-test is where!

117 Remember the “t-test”? The t-test is used to answer the critical question: is there a statistically significant effect or is the change caused by random noise? In order to answer that question in any experiment, we must compare the “effect” with the “noise.” We must determine both “effect” and “noise” We’ll start with “noise” (error).

118 Nomenclature Let x ij be the response of the j th replicate of treatment “i” Let X i be the average of all responses within the replicate “j” Let X T be the average of all responses, total Let k be the total number of test conditions (design points) “i” goes from 1 to k. Let n i be the number of replicates for treatment “i”. Let N be the total number of tests

119 Sum of Squares (SS) SS total = SS within + SS between SS within is due to random noise. SS between is variation attributed to changing the factor levels If SS between is large compared to SS within then the treatment had an effect “Sum of Squares” is a measure of variance (Equation 7)

120 Example showing how to determine for SS within-1 design point 1: SS within-1 = (82-83) 2 + (84-83) 2 + (83-83) 2 = 2.0 Calculate sum of squares within each treatment (design point) and include in the table. This is the first step to determine the “noise.”

121 Sum of Squares “i” goes from 1 to 4 (design points) and n i = 3 (number of replicates for the i th design point) k = 4 (design pts) Determine sum of squares for each design point (following example on previous slide) Sum=21.3

122 Experimental “noise” SS within (calculated above) is related to the experimental “noise”, but it is not what we use in the t-test. What do we use? Next slide please…

123 “Noise” (error) for the experiment The mean square error is given as:  mse 2 = SS within /(N-2 f ); N=total number of data points (12), f = number of factors (2);  mse 2 = 21.3/(12-2 2 ) = 2.7 (SS within = 21.3) The “standard error” is: For our example: standard error:

124 The standard error (just calculated) is the same for all factors in the experiment – it is the “experimental noise.” Remember, t 0 is the ratio between “effect” and “noise” t 0 = effect / standard error “Noise” (error) for the experiment

125 Effect We’ve determined the standard error But what was the effect of various factor levels? Determine the average response for each factor at each level: Determine the average response when factor 1 was (+) and also when it was (-), then do this for factor 2, etc. This procedure requires a balanced design

126 Determine the Effect, Step 1 Factor 1 was (+) for design points 1 and 3: 83.0 + 89.0 = 172.0 Factor 1 was (-) for design points 2 and 4: 83.3 + 90.3 = 173.6 Notice, slight rounding error differences between table and hand calculations

127 The “effect” is the difference in averaged responses for (+) and (-) levels. For Factor 1: Effect = ABS {sum(+) – sum(-)}/n+ = {172.0 – 173.6} / 2 = 0.8 n+ = number of (+) data points (2 in this example) Remember, average does not tell the whole story! t-test to the rescue! Determining effect: Step 2 and Step 3

128 t-test Review  t 0 is the ratio of the “effect” to the “noise”. The larger t 0 is the greater the probability that the factor had a real effect.  In the previous table we calculated the effect of all three factors (1, 2, 1X2). The “effect” is the difference in averaged responses for (+) and (-) levels.  The noise has already been determined for our example (“standard error”).

129 t-test t 0 is equal to Effect / Standard error: Calculate t 0 for each factor (including interactions)

130 Example calculations… For this experiment we’ve calculated the standard error {(4  2 /N) 1/2 } to be 0.9. For Factor 1, the effect is = {sum(+)-sum(-)}/n+ = {172.0-173.3} / 2 = 0.8 Also for Factor 1, t 0 = 0.8/0.9 = 0.9 Determine t 0 for all factors and interactions

131 t-distribution We also need to determine the value from the t- distribution table: Degrees of freedom= N – 2 f N = total number of observations (12) f = number of factors (2) DOF = 12 – 2 2 = 8 For 95% confidence, from a t-distribution table: t  /2, DOF = t 0.05/2, 8 = 2.31 This is the same for all factors. Enter in the table…

132 Results

133 Experiment Conclusion The key points from the table: Factor 1Factor 2Factor 1X2 t0t0 0.96.90.5 t 0.05/2, 8 2.31 Effect? (is t 0 > t 0.05/2, 8 ?) noyesno Only Factor 2 had a statistically significant effect.

134 Factorial Experiment, Conclusion The above example shows the basics of analyzing a factorial experiment. DOE software will perform the analysis for you. Usually, F-test is performed rather than t- test, but the concept is the same (they are equivalent). We will do an experiment for practice, first, let’s talk about other aspects of an experiment…

135 Planning an Experiment Okay, we now have some idea about DOE “Design” is only a small part of the picture To conduct an experiment properly, much more is required. This typically includes most of the following…

136 Experiment Process Define the problem (write down a problem statement), define the objective (purpose). Determine available resources Determine factors, levels, and response(s) Create the design (number of runs, run order, etc.) Obtain resources $$$, measurement and test equipment, test specimens (have spares), personnel, etc. Create a plan Determine schedule for personnel, equipment, etc. Create a run-sheet. Save all used specimens, identify them. You may need to take another closer look at them later.

137 Practice Experiment Problem: we like “good” popcorn – and we currently can’t make good popcorn. Create an experiment to help solve this problem. As a class, complete the next slide

138 Practice As a class: Write a problem statement Write clear objective of experiment Determine Factors – only two – think carefully about what you select. Determine factor levels (do not worry that some combination of factor levels will produce bad popcorn – this is to be expected) Determine response (may be more than 1). Next slide please…

139 Practice Factors you may have considered: Time Power setting Placement of bag within the microwave Orientation of the bag Brands of popcorn Different microwaves Are “time” and “power setting” independent? Could they be combined and called “energy input”? Resources are limited – we want the most useful information possible.

140 Exercise – full factorial experiment Break into smaller groups (about 5 per group) and design and conduct the experiment and analyze the results. Use the worksheet in the back of this booklet (Exercise 7). After completion, discuss as a class.

141 One last thing “Outliers” may be an issue in an experiment. Unfortunately, if only 1 or 2 data points are observed for a given set of experimental conditions, it is not possible to determine if an outlier exists. Even more detrimental, with few data points a single outlier can dramatically change the sample mean! What to do? Always do a “reality check” – do the results seem reasonable? If not, it may be due to an outlier – OR it may not be a error in any form (your judgment may be off – no shame in being surprised by results). Be careful about dismissing what you think is an outlier – it may not be!

142 Limitations to what we’ve done We considered only experiments with: Assumed normal distributions All factors at 2 levels each These factors can be discrete or continuous Response must be continuous, not discrete At least 2 replicates We did not look at “censored” data (such as fatigue data that is terminated after so many cycles even if there was no failure) All Design Points were replicated an equal number of times Full factorial (all possible combinations were tested) Next slide please…

143 Advanced Stuff – But Not Here, Not Now There are more advanced concepts (and surprisingly, these are not necessarily much more complicated to design or analyze.)

144 Life beyond this course Experiments do not actually require having a second replicate to estimate errors (talk to a statistician) Fractionated experiments – experiments that not all possible combinations are tested. These are very beneficial if there is a large number of factors (2 f gets big fast!). The “cost” is lost knowledge regarding interactions. Experiments can model non-linearity if more than 2 levels per factor are included

145 CONCLUSIONS Experiments require planning! Randomize to mitigate systematic errors No pain, no gain Select factors and their levels carefully May want to “try out” levels (pre-experiment) before beginning a DOE t-test helps answer “is there an effect” Pairing is a good thing – if possible Full factorial designs are effective and efficient for multi-variable experiments

146 GO HOME! Happy Experimenting!

1 Basic Experimentation Notes developed by Ken Lulay for ESCO Mechanical Engineering University of Portland July 2008.

Similar presentations

Presentation on theme: "1 Basic Experimentation Notes developed by Ken Lulay for ESCO Mechanical Engineering University of Portland July 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Basic Experimentation Notes developed by Ken Lulay for ESCO Mechanical Engineering University of Portland July 2008.

Similar presentations

Presentation on theme: "1 Basic Experimentation Notes developed by Ken Lulay for ESCO Mechanical Engineering University of Portland July 2008."— Presentation transcript:

Similar presentations

About project

Feedback