Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basics of Designing Experiments

Similar presentations

Presentation on theme: "Basics of Designing Experiments"— Presentation transcript:

1 Basics of Designing Experiments
Thursday, October 24, 2013 5:00pm - 7:00pm GLC Room G

2 About Me Graduate student in Virginia Tech Department of Statistics
Enrolled in Master’s program Expected Graduation Date: December 2013 Future: Job in industry Lead Collaborator in LISA On-campus consulting group, led by Dr. Eric Vance and Dr. Chris Franck, with administrative specialist Tonya Pruitt

3 About What? Why? How? Where? Who?
Laboratory for Interdisciplinary Statistical Analysis Why? Mission: to provide statistical advice, analysis, and education to Virginia Tech researchers How? Collaboration requests, Walk-in Consulting, Short Courses Where? Walk-in Consulting in GLC and various other locations Collaboration meetings typically held in Sandy 312 Who? Graduate students and faculty members in VT statistics department

4 Requesting a LISA Meeting
Go to Click link for “Collaboration Request Form” Sign into the website using VT PID and password Enter your information ( , college, etc.) Describe your project (project title, research goals, specific research questions, if you have already collected data, special requests, etc.) Contact assigned LISA collaborators as soon as possible to schedule a meeting

5 Agenda Introduction to Designing Experiments 3 Main Principles
Randomization Replication Blocking (Local Control of Error) EX: Nozzle design and water jet performance EX: Treatment and leukemia cell gene expression Factorial experiments

6 INTRODUCTION to Experimental Design

7 Why is Experimental Design important?
MAXIMIZE… Probability of having a successful experiment Information gain from results of an experiment MINIMIZE… Unwanted effects from other sources of variation Cost of experiment if resources are limited

8 Experiment vs. Observational
OBSERVATIONAL STUDY Researcher observes the response of interest under natural conditions EX: Surveys, weather patterns EXPERIMENT Researcher controls variables that have a potential effect on the response of interest Which one helps establish cause-and-effect relationships better?

9 Correlation ≠ Causation

10 EXAMPLE: Impact of Exercise Intensity on Resting Heart Rate
Researcher surveys a sample of individuals to glean information about their intensity of exercise each week and their resting heart rate What type of study is this? Reported Intensity of Exercise each week Resting Heart Rate Ptp 1 Ptp 2 Ptp 3

11 EXAMPLE: Impact of Exercise Intensity on Resting Heart Rate
Researcher finds a sample of individuals, enrolls groups in exercise programs of different intensity levels, and then measures before/after heart rates Treatment Baseline RHR Post Program RHR Ptp 1 Ptp 2 Ptp 3

12 EXAMPLE: Impact of Exercise Intensity on Resting Heart Rate
What are some factors the experimental study can account for that the observational study cannot?

13 Sources of variation Sources of variation are anything that could cause an observation to be different from another observation What are some reasons that measurements of resting heart rate could differ from person to person?

14 Sources of variation There are two main types:
Gender and age are what are known as nuisance factors: we are not interested in their effects on RHR, but they are hard to control What we are interested in is the effect of the intensity of exercise: this source is known as a treatment factor

15 Sources of variation Good rule of thumb: list major and minor sources of variation before collecting data We want our design to minimize the impact of minor sources of variation, and to be able to separate effects of nuisance factors from treatment factors We want the majority of the variability of the data to be explained by the treatment factors

16 Designing the experiment: The Bare Minimum
Response: Resting heart rate (beats per minute) Treatment: Exercise Program Low intensity Moderate Intensity High Intensity

17 Designing the experiment: The Bare Minimum
Some assumptions We will be monitoring the participants’ diet and exercise throughout the study (not relying on self-reporting) We will only enroll participants with high (i.e. unhealthy) resting heart rates so that there is ample room for improvement Participants’ resting heart rate is all measured in the same manner, at the same time (upon waking up)

18 Designing the experiment: The Bare Minimum
Basic Design 36 participants: 24 males, 12 females Every person is assigned to one of the three 8-week exercise programs Resting heart rate is measured at the beginning and end of the 8 weeks What other considerations should we make in designing the experiment?


20 Randomization What? Why? How?
Random assignment of experimental treatments and order of runs Why? Often we assume an independent, random distribution of observations and errors – randomization validates this assumption Averages out the effects of extraneous/lurking variables Reduces bias and accusations of bias How? Depends on the type of experiment

21 Exercise Example 36 participants are randomly assigned to one of the three programs 12 in low intensity, 12 in moderate intensity, 12 in high intensity Like drawing names from a hat to fall into each group Oftentimes computer programs can randomize participants for an experiment

22 Exercise Example What if we did not randomize?
Suppose there is some reason behind who comes to volunteer for the study first versus later If we assigned first third to one intensity, second third to another, and so forth, it would be hard to separate the effects of the “early volunteers” and their assigned intensity level Run 1 2 3 4 5 6 7 8 EX1 EX2

23 Completely Randomized Design (CRD)
What we just came up with is called a completely randomized design Note that in our case, treatments were assigned randomly, but in some experiments where there are a sequence of runs performed, the order of runs need to be randomized as well

24 Summary Randomizing the assignment of treatments and/or order of runs accounts for known and unknown differences between subjects It does not matter if what occurs does not “looks random” (i.e. appears to have some pattern), as long as the order was generated using a proper randomization device


26 Replication What? Why? Independent repeat runs of each treatment
Improves precision of effect estimation Allows for estimation of error variation and background noise Check against aberrant results that could result in misleading conclusions EX: One person for each treatment. What could go wrong? Give an example for aberrant results

27 Experimental Units (EUs)
We now introduce the term “Experimental Unit” (EU) EU is the “material” to which treatment factors are assigned In our case, each person is an EU This is different from an “Observational Unit” (OU) OU is part of an EU that is measured Multiple OUs within an EU here would be if we took each person’s pulse at his/her neck, at the wrist, etc. and reported these observations

28 Replication Extension to EU
Thus, a treatment is only replicated if it is assigned to a new EU Taking multiple observations on one EU (i.e. creating more OUs) does not count as replication – this is known as subsampling Note that treating subsampling as replicating increases the chance of incorrect conclusions (psuedoreplication) Variability in multiple measurements is measurement error, rather than experimental error PTP 1 2 3 4 5 6 7 8 Wrist RHR 80 69 93 88 77 89 74 79 Neck RHR 84 65 92 86 81

29 Consequences of Pseudoreplication
Is it bad to take multiple OUs on each EU then? No, often the solution here is to average the measurements of from the OUs and treat it as one observation What if we don’t do this? We severely underestimate error We potentially overexaggerate the true treatment differences What if measurement error is high? Try to improve measurement process Revisit the experiment and assess the homogeneity of the EUs, thinking of potential covariates

30 Exercise Example Use formula: # 𝑹𝒆𝒑𝒔= # 𝑬𝑼𝒔 # 𝑻𝒓𝒆𝒂𝒕𝒎𝒆𝒏𝒕𝒔
# 𝑹𝒆𝒑𝒔= # 𝑬𝑼𝒔 # 𝑻𝒓𝒆𝒂𝒕𝒎𝒆𝒏𝒕𝒔 36 participants, 3 treatments  36/3 = 12 replications per treatment in the balanced case The balanced case is preferred because: Power of test to detect a significant effect of our treatment on the response is maximized with equal sample size

31 Exercise Example Unbalanced consequences? Suppose the following:
This would lead to better estimation of the high intensity treatment over the other two Thus if you have equal interest in estimating the treatments, try to equally replicate the number of treatment assignments Treatment Low Moderate High # Participants 9 reps 18 reps

32 Summary The number of replications is the number of experimental units to which a treatment is assigned Replicating in an experiment helps us decrease variance and increase precision in estimating treatment effects

33 THREE BASIC PRINCIPLES OF DOE: Blocking (or Local Control of Error)

34 Local Control of Error What? Why? How?
Any means of improving accuracy of measuring treatment effects in design Why? Removes sources of nuisance experimental variability Improves precision with which comparisons among factors are made How? Often through use of blocking (or ANCOVA)

35 Blocking What? A block is a set of relatively homogeneous experimental conditions EX: block on time, proximity of experimental units, or characteristics of experimental units How? Separate randomizations for each block Account for differences in blocks and then compare the treatments

36 Exercise Example Block on gender?
This assumes that males and females have different responses to exercise intensity Would have the following (balanced) design: Here, after the participants are blocked into male/female groups, they are then randomly assigned into one of three treatment conditions BLOCK 1 24 MALES BLOCK 2 12 FEMALES 8 low 4 low 8 moderate 4 moderate 8 high 4 high

37 Exercise Example Block on age?
This assumes that age may influence the effect exercise intensity has on resting heart rate Would have the following (balanced) design: Here, after the participants are blocked into respective age groups, they are then randomly assigned into one of three treatment conditions BLOCK 1 18-24 years (24 ptps) BLOCK 2 24-35 years (6 ptps) BLOCK 3 35-50 years (6 ptps) 8 low 2 low 8 moderate 2 moderate 8 high 2 high

38 Randomized Complete Block Design (RCBD)
This design is called Generalized RCBD Generalized merely means there are replications involved Here, each treatment appears in each block an equal number of times Benefits of RCBD We can compare the performance of the three treatments (exercise programs) We can account for the variability in gender that might otherwise obscure the treatment effects

39 Summary Blocking is separating EUs into groups with similar characteristics It allows us to remove a source of nuisance variability, and increase our ability to detect treatment differences Randomization is conducted within each block Note that we cannot make causal inferences about blocks– only treatment effects! 27 minutes here

40 EXAMPLE: Gene Expression in Leukemia Cells

41 Leukemia Cells Background
Suppose we are interested in how different treatment groups affect gene expression in human leukemia cells There are three treatment groups: MP only MP with low dose MTX MP with high dose MTX Each treatment group has 10 obs What type of design is this? mercaptopurine

42 CRD Assumptions and Background
The simplest design assumes that all the EUs are similar and the only major source of variation is the treatments Recall: A CRD randomizes all treatment-EU assignments for the specified number of treatment replications Recall: We want to aim to have a balanced experiment, i.e. equal replications of each treatment

43 Leukemia Cells As before, we want to randomize which subjects receive which of the three treatments The data looks as follows: Treatments Observations MP ONLY 334.5 31.6 701 41.2 61.2 69.6 67.5 66.6 120.7 881.9 MP + HDMTX 919.4 404.2 1024 54.1 62.8 671.6 882.1 354.2 321.9 91.1 MP + LDMTX 108.4 26.1 240.8 191.1 69.7 242.8 62.7 396.9 23.6 290.4

44 Leukemia Cells – Pre randomization
These EUs should be similar MP only MP + LDMTX MP + HDMTX

45 Leukemia Cells – Post randomization

46 Leukemia Cells in JMP We want to enter the data such that each response has its own row, with the corresponding treatment type We then choose Analyze  Fit Y by X

47 Leukemia Cells in JMP Choose “GeneExp” for Y, Response
Choose “Treatment” for X, factor

48 Leukemia Cells Visual Analysis
What do you see from this graph (to the left) here? General comments Treatment 3 has a smaller spread of data than the other two Treatment 2 has the highest average “gene expression”, followed by Treatment 1, then Treatment 3 Are the differences substantial?

49 Leukemia Cells Summary of Fit
R-square is a measure of fit. If it is close to 1, a good model is indicated. If it is close to 0, a poor model is indicated In more technical terms, it is the percent of variation in response (gene expression) that can be explained by our predictor (treatment group). Based on this first glance at the summary of fit, what would you conclude?

50 SStotal=SStrt + SSError
Leukemia Cells ANOVA Null hypothesis: The treatments have the same means Test: Is there at least one treatment effect that is different from the rest? SStotal=SStrt + SSError Variance of all observations from the mean of all the data Variance of treatment means from overall mean Variance of observations from their respective treatment means

51 Leukemia Cells ANOVA Each of these groups has its own mean. SSTrt compares these means to the overall mean SSError compares each observation to the treatment means SSTotal is the variance visualized from this plot

52 Leukemia Cells: ANOVA If the treatments have a similar effect, then SSTrt will be small (since treatment means are close to overall mean) If the treatments are different, then SSTrt will be large (since more of SSTotal comes from SSTrt, i.e. treatment differences are explaining the variance) ANOVA Table calculates these values and gives us a test statistic (F Ratio) to test for treatment effects

53 Leukemia Cells: ANOVA Under our null hypothesis, F= MSTrt/MSError follows an F-distribution; from this we obtain our p-value Here Prob > F = , which is just over the typical α=0.05 cutoff

54 Summary of Leukemia Example
Our ANOVA test failed to reject the null hypothesis that the treatment means are the same (p-value =0.0544) It seems that although the treatment means appeared to be very different (237.58, , and for treatments 1, 2, and 3 respectively), the variation of observations from their respective treatment means was so large that not enough of the variation in SSTotal could be attributed to treatment differences 40 minutes here

55 Take a 10 minute break!

56 EXAMPLE: Nozzle Designs and Shape Factor

57 Nozzles & Shapes Background
Suppose we are interested in how nozzle design (5 types here) affects the shape factor in the performance of turbulent water jets However, the jet efflux velocity has been known to influence the shape factor in a way that is hard to control. What is this called? What can we do to account for this source of variation?

58 Jet Efflux Velocity (m/s)
Nozzles & Shapes Runs Suppose we only have five nozzles total, one of each type of design. Here is a case where we would randomize run order (rather than treatment) Jet Efflux Velocity (m/s) Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Nozzle Design Run Order 2 3 1 5 4

59 Jet Efflux Velocity (m/s)
Nozzles & Shapes Data The data looks as follows: How many replicates do we have per treatment? What type of design is this? Nozzle Design Jet Efflux Velocity (m/s) 11.73 14.37 16.59 20.43 23.46 28.74 1 0.78 0.80 0.81 0.75 0.77 2 0.85 0.92 0.86 0.83 3 0.93 0.95 0.89 4 1.14 0.97 0.98 0.88 5 0.76

60 We are essentially running separate CRDs
RCBD Background Given t treatments and b blocks, a RCBD has one observation per treatment in each block If we have multiple observations per treatment in each block (replicates), this is a generalized RCBD Is our nozzle example a RCBD or GRCBD? In a (G)RCBD, Blocks represent a restriction on randomization We want to randomize treatment order within each block We are essentially running separate CRDs for each block!

61 Nozzles & Shapes in JMP To analyze in JMP, we want to enter the data such that each response is lined up in a different row, with its associated characteristics in the same row Note: make sure Nozzle Design and Jet Efflux are listed as nominal variables! What will it do if it’s continuous? Again, choose Analyze  Fit Y by X

62 Nozzles & Shapes in JMP Choose “Shape Factor” for Y, Response
Choose “Nozzle Design” for X, factor Choose “Jet Efflux” for Block

63 Nozzles & Shapes Visual Analysis
Always look for a visual pattern first. What do you see in this graph of shape factor against nozzle design? It appears that nozzle design 4 has the highest shape factor, followed by design 3, design 2, design 5, then design 1.

64 Nozzles & Shapes Analysis
From our earlier discussion on R-square values and ANOVA tests, what is your first intuition here?

65 Nozzles & Shapes ANOVA The p-value of Nozzle Design is significant. What does that mean? We have an additional Sum of Squares here. What is it? Are we interested in its effects? The p-value of Jet Efflux is significant. What does that mean?

66 Nozzles & Shapes ANOVA The p-value of Jet Efflux indicates how much we reduced experimental error  this means that blocking was a good idea! Can we do a CRD analysis if we find out blocking was a bad idea? (i.e. p-value of Jet Efflux is high?) No. Because we did not design the experiment using CRD protocol we cannot conduct the analysis this way. What do you think our next steps should be?

67 (1) τ1 + (-1) τ2 + (0) τ3 + (0)τ4 + (0) τ5
Contrasts Given v treatments and the treatment means τ1…τv : Note: Here, we have 5 treatments, so we would just have our three treatment means τ1, τ2 ,τ3, τ4 and τ5 A contrast is a specific linear combination of these means For example, if we were comparing treatments 1 and 2, we would have contrast : (1) τ1 + (-1) τ2 + (0) τ3 + (0)τ4 + (0) τ5 τ1 - τ2

68 Contrasts The most important contrasts include:
Pairwise treatment comparisons Group average comparisons

69 Nozzles & Shapes Means Comparison
ANOVA only tells us if there is at least one pair of nozzle means that differ  conduct pairwise comparisons τ1 – τ2 τ1 – τ3 τ1 – τ4 τ1 – τ5 τ2 – τ3 τ2 – τ4 τ2 – τ5 τ3 – τ4 τ3 – τ5 τ4 – τ5

70 Nozzles & Shapes Means Comparison
We find Nozzle 4 has a higher shape factor than Nozzles 5 and 1 Nozzle 3 has a higher shape factor than only Nozzle 1

71 Summary of Nozzles & Shapes
In this case, blocking was key in reducing experimental error, allowing us to better distinguish whether at least one of the nozzle designs differed from another (ANOVA Test) This means differences in jet efflux velocity were causing significant variation in shape factor responses Tukey’s pairwise comparisons test allowed us to see which specific nozzle designs differed. We found that: Shapenozz4 > Shapenozz1, Shapenozz5 Shapenozz3 > Shapenozz1 ~52 minutes

72 Introduction to: Factorial Designs

73 CRD Extension: More than one factor
Suppose we have two or more factors, each with 2+ levels/settings, that we want to investigate to see how they affect the response What are some ways we can conduct an experiment? “Best guess” experiments: researchers have practical and theoretical knowledge they use to “set levels” OFAT experiments: vary each factor individually while holding other factors constant at baseline levels Factorial experiments: Factors are varied together, and response of interest is observed at each combination of levels What are the PROs and CONs of each method?

74 CRD Extension: More than one factor
Best Guess Experiments PRO: experimenters have a good idea of what might work CON: Can lead to guessing for a long time without guarantee of success OFAT Experiments PRO: easy to interpret, and used extensively in practice CON: Can be inefficient, may not reach optimum solution, and fails to consider interactions (will discuss later) Factorial Experiment PRO: Efficient, can detect interactions 

75 CRD Extension: Factorial Experiments
In a factorial experiment, treatments are a combination of multiple factors with different levels (i.e. settings) There can be as few as two (common) and as many as desired (though this severely complicates the design) EX: In the Leukemia example, we could alter the experiment to low and high doses of MP and MTX so that there are now four “treatments” MTX level MP level Low High

76 Leukemia Cells – Factorial Design
MP Low MTX Low MP High MTX Low MP Low MTX High MP High MTX High Remember to still randomize these treatments across participants!

77 Leukemia Cells – Factorial Design
Data is collected as follows: How do we analyze a Factorial Experiment? Factor A (MP) Factor B (MTX) - + Treatment Combo A low, B low A high, B low A low, B high A high, B high Rep I II III 88.6 122 91.2 145.2 171.8 178.9 163 200.2 169.3 460.4 492.3 483.1

Given two factors, A and B, with varying number of levels, what do we want to examine to see how A and B affect the response? Overall mean (of all the data) Cell Means (mean for each treatment combo) Factor A and B level means We use the same ANOVA approach, but further decompose SSTrt into pieces for different factors SSTrt=SSA+SSB+SSAB

79 Factorial ANOVA Visualize in contingency table: MTX level MP level Low
High Cell mean MP Low Factor Mean MP High Factor Mean MTX Low Factor Mean MTX High Factor Mean Overall Mean

80 Factorial ANOVA Visualize in contingency table: MTX level MP level Low
High 100.6 177.5 139.05 165.3 478.6 321.95 132.95 328.05 230.5

81 Factorial ANOVA Let’s break down SSTrt into its respective pieces:
SSTrt = SSA + SSB + SSAB SSTrt: Compares cell means to overall mean SSA: Compares A level means to overall mean SSB: Compares B level means to overall mean SSA and SSB test for main effects of factors A and B Main effect: average effect of changing from one level of the factor to another, averaging over all levels of the other factors

82 Factorial ANOVA SSAB: Tests for interaction between A and B
Let’s break down SSTrt into its respective pieces: SSTrt = SSA + SSB + SSAB SSAB: Tests for interaction between A and B Interaction: When how factor A affects the response depends on the level of factor B

83 Interaction How to determine an interaction?
Look at behavior of the means as the levels vary

84 Main Effects & Interaction
Main effects and interactions are specific types of important contrasts Recall from our discussion of contrasts that group average contrasts are common: Let’s suppose the treatment means are as such: MTX level MP level Low High τ1 τ2 τ3 τ4

85 Main Effects & Interaction
MTX level MP level Low High τ1 τ2 τ3 τ4 Main Effects Interaction MP: ½ (τ3 + τ4 – τ1 – τ2) MP*MTX ½ (τ4 + τ1 – τ2 – τ3) MTX: ½ (τ2 + τ4 – τ1 – τ3)

86 Factorial Design: Summary
One treatment is combination of multiple factors Efficient way to test effect of multiple treatment factors We may extend to more than two factors, but the number of EUs necessarily grows rapidly! Use an interaction plot to help visualize effects Main effects and interactions can be represented through group average contrasts

87 Wrap-Up: Conclusions & Questions

88 Summary of the Short Course
Remember to randomize! Randomize run order, and treatments Remember to replicate! Use multiple EUs for each treatment– it will help you be more accurate in estimating your effects Remember to block! In the case where you suspect some inherent quality of your experimental units may be causing variation in your response, arrange your experimental units into groups based on similarity in that quality Remember to contact LISA! For short questions, attend our Walk-in Consulting hours For research, come before you collect your data for design help

89 The End!

90 References Cheok, M. H., Yang, W., Pui, C. H., Downing, J. R., Cheng, C., Naeve, C. W., Evans, W. E. (2003). Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nature Genetics, 34(1). Theobald, C. (1981). The effect of Nozzle design on the stability and performance of turbulent water jets. Fire Safety Journal, 4(1).

Download ppt "Basics of Designing Experiments"

Similar presentations

Ads by Google