Presentation on theme: "Basics of Designing Experiments"— Presentation transcript:
1 Basics of Designing Experiments Thursday, October 24, 20135:00pm - 7:00pmGLC Room G
2 About Me Graduate student in Virginia Tech Department of Statistics Enrolled in Master’s programExpected Graduation Date: December 2013Future: Job in industryLead Collaborator in LISAOn-campus consulting group, led by Dr. Eric Vance and Dr. Chris Franck, with administrative specialist Tonya Pruitt
3 About What? Why? How? Where? Who? Laboratory for Interdisciplinary Statistical AnalysisWhy?Mission: to provide statistical advice, analysis, and education to Virginia Tech researchersHow?Collaboration requests, Walk-in Consulting, Short CoursesWhere?Walk-in Consulting in GLC and various other locationsCollaboration meetings typically held in Sandy 312Who?Graduate students and faculty members in VT statistics department
4 Requesting a LISA Meeting Go toClick link for “Collaboration Request Form”Sign into the website using VT PID and passwordEnter your information ( , college, etc.)Describe your project (project title, research goals, specific research questions, if you have already collected data, special requests, etc.)Contact assigned LISA collaborators as soon as possible to schedule a meeting
5 Agenda Introduction to Designing Experiments 3 Main Principles RandomizationReplicationBlocking (Local Control of Error)EX: Nozzle design and water jet performanceEX: Treatment and leukemia cell gene expressionFactorial experiments
7 Why is Experimental Design important? MAXIMIZE…Probability of having a successful experimentInformation gain from results of an experimentMINIMIZE…Unwanted effects from other sources of variationCost of experiment if resources are limited
8 Experiment vs. Observational OBSERVATIONAL STUDYResearcher observes the response of interest under natural conditionsEX: Surveys, weather patternsEXPERIMENTResearcher controls variables that have a potential effect on the response of interestWhich one helps establish cause-and-effect relationships better?
10 EXAMPLE: Impact of Exercise Intensity on Resting Heart Rate Researcher surveys a sample of individuals to glean information about their intensity of exercise each week and their resting heart rateWhat type of study is this?Reported Intensity of Exercise each weekResting Heart RatePtp 1Ptp 2Ptp 3…
11 EXAMPLE: Impact of Exercise Intensity on Resting Heart Rate Researcher finds a sample of individuals, enrolls groups in exercise programs of different intensity levels, and then measures before/after heart ratesTreatmentBaseline RHRPost Program RHRPtp 1Ptp 2Ptp 3…
12 EXAMPLE: Impact of Exercise Intensity on Resting Heart Rate What are some factors the experimental study can account for that the observational study cannot?
13 Sources of variationSources of variation are anything that could cause an observation to be different from another observationWhat are some reasons that measurements of resting heart rate could differ from person to person?
14 Sources of variation There are two main types: Gender and age are what are known as nuisance factors: we are not interested in their effects on RHR, but they are hard to controlWhat we are interested in is the effect of the intensity of exercise: this source is known as a treatment factor
15 Sources of variationGood rule of thumb: list major and minor sources of variation before collecting dataWe want our design to minimize the impact of minor sources of variation, and to be able to separate effects of nuisance factors from treatment factorsWe want the majority of the variability of the data to be explained by the treatment factors
16 Designing the experiment: The Bare Minimum Response: Resting heart rate (beats per minute)Treatment: Exercise ProgramLow intensityModerate IntensityHigh Intensity
17 Designing the experiment: The Bare Minimum Some assumptionsWe will be monitoring the participants’ diet and exercise throughout the study (not relying on self-reporting)We will only enroll participants with high (i.e. unhealthy) resting heart rates so that there is ample room for improvementParticipants’ resting heart rate is all measured in the same manner, at the same time (upon waking up)
18 Designing the experiment: The Bare Minimum Basic Design36 participants: 24 males, 12 femalesEvery person is assigned to one of the three 8-week exercise programsResting heart rate is measured at the beginning and end of the 8 weeksWhat other considerations should we make in designing the experiment?
20 Randomization What? Why? How? Random assignment of experimental treatments and order of runsWhy?Often we assume an independent, random distribution of observations and errors – randomization validates this assumptionAverages out the effects of extraneous/lurking variablesReduces bias and accusations of biasHow?Depends on the type of experiment
21 Exercise Example36 participants are randomly assigned to one of the three programs12 in low intensity, 12 in moderate intensity, 12 in high intensityLike drawing names from a hat to fall into each groupOftentimes computer programs can randomize participants for an experiment
22 Exercise Example What if we did not randomize? Suppose there is some reason behind who comes to volunteer for the study first versus laterIf we assigned first third to one intensity, second third to another, and so forth, it would be hard to separate the effects of the “early volunteers” and their assigned intensity levelRun12345678…EX1EX2
23 Completely Randomized Design (CRD) What we just came up with is called a completely randomized designNote that in our case, treatments were assigned randomly, but in some experiments where there are a sequence of runs performed, the order of runs need to be randomized as well
24 SummaryRandomizing the assignment of treatments and/or order of runs accounts for known and unknown differences between subjectsIt does not matter if what occurs does not “looks random” (i.e. appears to have some pattern), as long as the order was generated using a proper randomization device
26 Replication What? Why? Independent repeat runs of each treatment Improves precision of effect estimationAllows for estimation of error variation and background noiseCheck against aberrant results that could result in misleading conclusionsEX: One person for each treatment. What could go wrong?Give an example for aberrant results
27 Experimental Units (EUs) We now introduce the term “Experimental Unit” (EU)EU is the “material” to which treatment factors are assignedIn our case, each person is an EUThis is different from an “Observational Unit” (OU)OU is part of an EU that is measuredMultiple OUs within an EU here would be if we took each person’s pulse at his/her neck, at the wrist, etc. and reported these observations
28 Replication Extension to EU Thus, a treatment is only replicated if it is assigned to a new EUTaking multiple observations on one EU (i.e. creating more OUs) does not count as replication – this is known as subsamplingNote that treating subsampling as replicating increases the chance of incorrect conclusions (psuedoreplication)Variability in multiple measurements is measurement error, rather than experimental errorPTP12345678…Wrist RHR8069938877897479Neck RHR8465928681
29 Consequences of Pseudoreplication Is it bad to take multiple OUs on each EU then?No, often the solution here is to average the measurements of from the OUs and treat it as one observationWhat if we don’t do this?We severely underestimate errorWe potentially overexaggerate the true treatment differencesWhat if measurement error is high?Try to improve measurement processRevisit the experiment and assess the homogeneity of the EUs, thinking of potential covariates
30 Exercise Example Use formula: # 𝑹𝒆𝒑𝒔= # 𝑬𝑼𝒔 # 𝑻𝒓𝒆𝒂𝒕𝒎𝒆𝒏𝒕𝒔 # 𝑹𝒆𝒑𝒔= # 𝑬𝑼𝒔 # 𝑻𝒓𝒆𝒂𝒕𝒎𝒆𝒏𝒕𝒔36 participants, 3 treatments 36/3 = 12 replications per treatment in the balanced caseThe balanced case is preferred because:Power of test to detect a significant effect of our treatment on the response is maximized with equal sample size
31 Exercise Example Unbalanced consequences? Suppose the following: This would lead to better estimation of the high intensity treatment over the other twoThus if you have equal interest in estimating the treatments, try to equally replicate the number of treatment assignmentsTreatmentLowModerateHigh# Participants9 reps18 reps
32 SummaryThe number of replications is the number of experimental units to which a treatment is assignedReplicating in an experiment helps us decrease variance and increase precision in estimating treatment effects
33 THREE BASIC PRINCIPLES OF DOE: Blocking (or Local Control of Error)
34 Local Control of Error What? Why? How? Any means of improving accuracy of measuring treatment effects in designWhy?Removes sources of nuisance experimental variabilityImproves precision with which comparisons among factors are madeHow?Often through use of blocking (or ANCOVA)
35 BlockingWhat?A block is a set of relatively homogeneous experimental conditionsEX: block on time, proximity of experimental units, or characteristics of experimental unitsHow?Separate randomizations for each blockAccount for differences in blocks and then compare the treatments
36 Exercise Example Block on gender? This assumes that males and females have different responses to exercise intensityWould have the following (balanced) design:Here, after the participants are blocked into male/female groups, they are then randomly assigned into one of three treatment conditionsBLOCK 124 MALESBLOCK 212 FEMALES8 low4 low8 moderate4 moderate8 high4 high
37 Exercise Example Block on age? This assumes that age may influence the effect exercise intensity has on resting heart rateWould have the following (balanced) design:Here, after the participants are blocked into respective age groups, they are then randomly assigned into one of three treatment conditionsBLOCK 118-24 years (24 ptps)BLOCK 224-35 years (6 ptps)BLOCK 335-50 years (6 ptps)8 low2 low8 moderate2 moderate8 high2 high
38 Randomized Complete Block Design (RCBD) This design is called Generalized RCBDGeneralized merely means there are replications involvedHere, each treatment appears in each block an equal number of timesBenefits of RCBDWe can compare the performance of the three treatments (exercise programs)We can account for the variability in gender that might otherwise obscure the treatment effects
39 SummaryBlocking is separating EUs into groups with similar characteristicsIt allows us to remove a source of nuisance variability, and increase our ability to detect treatment differencesRandomization is conducted within each blockNote that we cannot make causal inferences about blocks– only treatment effects!27 minutes here
41 Leukemia Cells Background Suppose we are interested in how different treatment groups affect gene expression in human leukemia cellsThere are three treatment groups:MP onlyMP with low dose MTXMP with high dose MTXEach treatment group has 10 obsWhat type of design is this?mercaptopurine
42 CRD Assumptions and Background The simplest design assumes that all the EUs are similar and the only major source of variation is the treatmentsRecall: A CRD randomizes all treatment-EU assignments for the specified number of treatment replicationsRecall: We want to aim to have a balanced experiment, i.e. equal replications of each treatment
43 Leukemia CellsAs before, we want to randomize which subjects receive which of the three treatmentsThe data looks as follows:TreatmentsObservationsMP ONLY334.531.670141.261.269.667.566.6120.7881.9MP + HDMTX919.4404.2102454.162.8671.6882.1354.2321.991.1MP + LDMTX108.426.1240.8191.169.7242.862.7396.923.6290.4
44 Leukemia Cells – Pre randomization These EUsshould be similarMP onlyMP + LDMTXMP + HDMTX
46 Leukemia Cells in JMPWe want to enter the data such that each response has its own row, with the corresponding treatment typeWe then choose Analyze Fit Y by X
47 Leukemia Cells in JMP Choose “GeneExp” for Y, Response Choose “Treatment” for X, factor
48 Leukemia Cells Visual Analysis What do you see from this graph (to the left) here?General commentsTreatment 3 has a smaller spread of data than the other twoTreatment 2 has the highest average “gene expression”, followed by Treatment 1, then Treatment 3Are the differences substantial?
49 Leukemia Cells Summary of Fit R-square is a measure of fit.If it is close to 1, a good model is indicated.If it is close to 0, a poor model is indicatedIn more technical terms, it is the percent of variation in response (gene expression) that can be explained by our predictor (treatment group).Based on this first glance at the summary of fit, what would you conclude?
50 SStotal=SStrt + SSError Leukemia Cells ANOVANull hypothesis: The treatments have the same meansTest: Is there at least one treatment effect that is different from the rest?SStotal=SStrt + SSErrorVariance of all observations from the mean of all the dataVariance of treatment means from overall meanVariance of observations from their respective treatment means
51 Leukemia Cells ANOVAEach of these groups has its own mean. SSTrt compares these means to the overall meanSSError compares each observation to the treatment meansSSTotal is the variance visualized from this plot
52 Leukemia Cells: ANOVAIf the treatments have a similar effect, then SSTrt will be small (since treatment means are close to overall mean)If the treatments are different, then SSTrt will be large (since more of SSTotal comes from SSTrt, i.e. treatment differences are explaining the variance)ANOVA Table calculates these values and gives us a test statistic (F Ratio) to test for treatment effects
53 Leukemia Cells: ANOVAUnder our null hypothesis, F= MSTrt/MSError follows an F-distribution; from this we obtain our p-valueHere Prob > F = , which is just over the typical α=0.05 cutoff
54 Summary of Leukemia Example Our ANOVA test failed to reject the null hypothesis that the treatment means are the same (p-value =0.0544)It seems that although the treatment means appeared to be very different (237.58, , and for treatments 1, 2, and 3 respectively), the variation of observations from their respective treatment means was so large that not enough of the variation in SSTotal could be attributed to treatment differences40 minutes here
57 Nozzles & Shapes Background Suppose we are interested in how nozzle design (5 types here) affects the shape factor in the performance of turbulent water jetsHowever, the jet efflux velocity has been known to influence the shape factor in a way that is hard to control.What is this called?What can we do to account for this source of variation?
58 Jet Efflux Velocity (m/s) Nozzles & Shapes RunsSuppose we only have five nozzles total, one of each type of design. Here is a case where we would randomize run order (rather than treatment)Jet Efflux Velocity (m/s)Block 1Block 2Block 3Block 4Block 5Block 6NozzleDesignRunOrder23154
59 Jet Efflux Velocity (m/s) Nozzles & Shapes DataThe data looks as follows:How many replicates do we have per treatment?What type of design is this?Nozzle DesignJet Efflux Velocity (m/s)11.7314.3716.5920.4323.4628.7410.780.800.810.750.7720.850.920.860.8330.930.950.8941.140.970.980.8850.76
60 We are essentially running separate CRDs RCBD BackgroundGiven t treatments and b blocks, a RCBD has one observation per treatment in each blockIf we have multiple observations per treatment in each block (replicates), this is a generalized RCBDIs our nozzle example a RCBD or GRCBD?In a (G)RCBD,Blocks represent a restriction on randomizationWe want to randomize treatment order within each blockWe are essentially running separate CRDsfor each block!
61 Nozzles & Shapes in JMPTo analyze in JMP, we want to enter the data such that each response is lined up in a different row, with its associated characteristics in the same rowNote: make sure Nozzle Design and Jet Efflux are listed as nominal variables!What will it do if it’s continuous?Again, choose Analyze Fit Y by X
62 Nozzles & Shapes in JMP Choose “Shape Factor” for Y, Response Choose “Nozzle Design” for X, factorChoose “Jet Efflux” for Block
63 Nozzles & Shapes Visual Analysis Always look for a visual pattern first. What do you see in this graph of shape factor against nozzle design?It appears that nozzle design 4 has the highest shape factor, followed by design 3, design 2, design 5, then design 1.
64 Nozzles & Shapes Analysis From our earlier discussion on R-square values and ANOVA tests, what is your first intuition here?
65 Nozzles & Shapes ANOVAThe p-value of Nozzle Design is significant. What does that mean?We have an additional Sum of Squares here. What is it? Are we interested in its effects?The p-value of Jet Efflux is significant. What does that mean?
66 Nozzles & Shapes ANOVAThe p-value of Jet Efflux indicates how much we reduced experimental error this means that blocking was a good idea!Can we do a CRD analysis if we find out blocking was a bad idea? (i.e. p-value of Jet Efflux is high?)No. Because we did not design the experiment using CRD protocol we cannot conduct the analysis this way.What do you think our next steps should be?
67 (1) τ1 + (-1) τ2 + (0) τ3 + (0)τ4 + (0) τ5 ContrastsGiven v treatments and the treatment means τ1…τv :Note: Here, we have 5 treatments, so we would just have our three treatment means τ1, τ2 ,τ3, τ4 and τ5A contrast is a specific linear combination of these meansFor example, if we were comparing treatments 1 and 2, we would have contrast :(1) τ1 + (-1) τ2 + (0) τ3 + (0)τ4 + (0) τ5τ1 - τ2
68 Contrasts The most important contrasts include: Pairwise treatment comparisonsGroup average comparisons
69 Nozzles & Shapes Means Comparison ANOVA only tells us if there is at least one pair of nozzle means that differ conduct pairwise comparisonsτ1 – τ2 τ1 – τ3 τ1 – τ4 τ1 – τ5τ2 – τ3 τ2 – τ4 τ2 – τ5τ3 – τ4 τ3 – τ5 τ4 – τ5
70 Nozzles & Shapes Means Comparison We find Nozzle 4 has a higher shape factor than Nozzles 5 and 1Nozzle 3 has a higher shape factor than only Nozzle 1
71 Summary of Nozzles & Shapes In this case, blocking was key in reducing experimental error, allowing us to better distinguish whether at least one of the nozzle designs differed from another (ANOVA Test)This means differences in jet efflux velocity were causing significant variation in shape factor responsesTukey’s pairwise comparisons test allowed us to see which specific nozzle designs differed. We found that:Shapenozz4 > Shapenozz1, Shapenozz5Shapenozz3 > Shapenozz1~52 minutes
73 CRD Extension: More than one factor Suppose we have two or more factors, each with 2+ levels/settings, that we want to investigate to see how they affect the responseWhat are some ways we can conduct an experiment?“Best guess” experiments: researchers have practical and theoretical knowledge they use to “set levels”OFAT experiments: vary each factor individually while holding other factors constant at baseline levelsFactorial experiments: Factors are varied together, and response of interest is observed at each combination of levelsWhat are the PROs and CONs of each method?
74 CRD Extension: More than one factor Best Guess ExperimentsPRO: experimenters have a good idea of what might workCON: Can lead to guessing for a long time without guarantee of successOFAT ExperimentsPRO: easy to interpret, and used extensively in practiceCON: Can be inefficient, may not reach optimum solution, and fails to consider interactions (will discuss later)Factorial ExperimentPRO: Efficient, can detect interactions
75 CRD Extension: Factorial Experiments In a factorial experiment, treatments are a combination of multiple factors with different levels (i.e. settings)There can be as few as two (common) and as many as desired (though this severely complicates the design)EX: In the Leukemia example, we could alter the experiment to low and high doses of MP and MTX so that there are now four “treatments”MTX levelMP levelLowHigh
76 Leukemia Cells – Factorial Design MP LowMTX LowMP HighMTX LowMP LowMTX HighMP HighMTX HighRemember to still randomize these treatments across participants!
77 Leukemia Cells – Factorial Design Data is collected as follows:How do we analyze a Factorial Experiment?Factor A(MP)Factor B(MTX)-+Treatment ComboA low, B lowA high, B lowA low, B highA high, B highRepIIIIII88.612291.2145.2171.8178.9163200.2169.3460.4492.3483.1
78 Factorial ANOVA SSTrt=SSA+SSB+SSAB Given two factors, A and B, with varying number of levels, what do we want to examine to see how A and B affect the response?Overall mean (of all the data)Cell Means (mean for each treatment combo)Factor A and B level meansWe use the same ANOVA approach, but further decompose SSTrt into pieces for different factorsSSTrt=SSA+SSB+SSAB
79 Factorial ANOVA Visualize in contingency table: MTX level MP level Low HighCell meanMP Low Factor MeanMP High Factor MeanMTX Low Factor MeanMTX High Factor MeanOverall Mean
81 Factorial ANOVA Let’s break down SSTrt into its respective pieces: SSTrt = SSA + SSB + SSABSSTrt: Compares cell means to overall meanSSA: Compares A level means to overall meanSSB: Compares B level means to overall meanSSA and SSB test for main effects of factors A and BMain effect: average effect of changing from one level of the factor to another, averaging over all levels of the other factors
82 Factorial ANOVA SSAB: Tests for interaction between A and B Let’s break down SSTrt into its respective pieces:SSTrt = SSA + SSB + SSABSSAB: Tests for interaction between A and BInteraction: When how factor A affects the response depends on the level of factor B
83 Interaction How to determine an interaction? Look at behavior of the means as the levels vary
84 Main Effects & Interaction Main effects and interactions are specific types of important contrastsRecall from our discussion of contrasts that group average contrasts are common:Let’s suppose the treatment means are as such:MTX levelMP levelLowHighτ1τ2τ3τ4
85 Main Effects & Interaction MTX levelMP levelLowHighτ1τ2τ3τ4Main Effects Interaction MP: ½ (τ3 + τ4 – τ1 – τ2) MP*MTX ½ (τ4 + τ1 – τ2 – τ3) MTX: ½ (τ2 + τ4 – τ1 – τ3)
86 Factorial Design: Summary One treatment is combination of multiple factorsEfficient way to test effect of multiple treatment factorsWe may extend to more than two factors, but the number of EUs necessarily grows rapidly!Use an interaction plot to help visualize effectsMain effects and interactions can be represented through group average contrasts
88 Summary of the Short Course Remember to randomize!Randomize run order, and treatmentsRemember to replicate!Use multiple EUs for each treatment– it will help you be more accurate in estimating your effectsRemember to block!In the case where you suspect some inherent quality of your experimental units may be causing variation in your response, arrange your experimental units into groups based on similarity in that qualityRemember to contact LISA!For short questions, attend our Walk-in Consulting hoursFor research, come before you collect your data for design help
90 ReferencesCheok, M. H., Yang, W., Pui, C. H., Downing, J. R., Cheng, C., Naeve, C. W., Evans, W. E. (2003). Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nature Genetics, 34(1).Theobald, C. (1981). The effect of Nozzle design on the stability and performance of turbulent water jets. Fire Safety Journal, 4(1).