Download presentation

Presentation is loading. Please wait.

Published byKelly Twitchell Modified over 2 years ago

1
Luc Wouters October 2013 Statistical Thinking and Smart Experimental Design

2
Luc Wouters October 2013 Statistical Thinking and Smart Experimental Design

3
Statistical Thinking and Reasoning in Experimental Research 9h00 – 9h30: Welcome - Coffee 11h00 – 11h15: Comments & Question - Coffee break 12h30 -13h15: Lunch 15h00 – 15h15: Comments & Questions – Coffee Break 16h45 – 17h15: Summary – Concluding Remarks – Comments & Questions 3

4
Statistical Thinking and Reasoning in Experimental Research Introduction Smart Research Design by Statistical Thinking Planning the Experiment Principles of Experimental Design Common Designs in Biosciences The Required Number of Replicates - Sample Size The Statistical Analysis The Study Protocol Interpretation & Reporting Concluding Remarks 4

5
Statistical Thinking and Smart Experimental Design I. Introduction 5

6
There is a problem with research in the biosciences 6 Scholl, et al. (Cell, 2009): Synthetic lethal interaction between oncogenic KRAS dependency and STK33 suppression in human cancer cells. Conclusion: cancer tumors could be destroyed by targeting STK33 protein Amgen Inc.: 24 researchers, 6 months labwork Unable to replicate results of Scholl, et al. Scholl, et al. (Cell, 2009): Synthetic lethal interaction between oncogenic KRAS dependency and STK33 suppression in human cancer cells. Conclusion: cancer tumors could be destroyed by targeting STK33 protein Amgen Inc.: 24 researchers, 6 months labwork Unable to replicate results of Scholl, et al.

7
There is a problem with research in the biosciences 7 Séralini, et al. (Food Chem Toxicol, 2012): Long term toxicity of a roundup herbicide and a roundup-tolerant genetically modified maize Conclusion: GM maize and low concentrations of Roundup herbicide causes toxic effects (tumors) in rats Severe impact on general public and on interest of industry Referendum labeling of GM food in California, Bans on importation of GMOs in Russia and Kenya Séralini, et al. (Food Chem Toxicol, 2012): Long term toxicity of a roundup herbicide and a roundup-tolerant genetically modified maize Conclusion: GM maize and low concentrations of Roundup herbicide causes toxic effects (tumors) in rats Severe impact on general public and on interest of industry Referendum labeling of GM food in California, Bans on importation of GMOs in Russia and Kenya

8
Séralini, et al. The critiques: 8 European Food Safety Authority (2012) review: inadequate design, analysis and reporting wrong strain of rats number of animals too small European Food Safety Authority (2012) review: inadequate design, analysis and reporting wrong strain of rats number of animals too small

9
The problem of doing good science Some critical reviews 9 Ioannidis, 2005 Kilkenny, 2009 Loskalzo, 2012 Ioannidis, 2005 Kilkenny, 2009 Loskalzo, 2012

10
Things get even worse Retraction rates 10 21% of retractions were due to error 67% misconduct, including fraud or suspected fraud 21% of retractions were due to error 67% misconduct, including fraud or suspected fraud

11
The integrity of our profession is put into question 11

12
Whats wrong with research in the biosciences? 12

13
Course objectives 13 Increase quality of research reports by: Statistical Thinking: to provide elements of a generic methodology to design insightful experiments Statistical Reasoning: to provide guidance for interpreting and reporting results Increase quality of research reports by: Statistical Thinking: to provide elements of a generic methodology to design insightful experiments Statistical Reasoning: to provide guidance for interpreting and reporting results

14
This course prepares you for a dialogue 14

15
Statistical thinking leads to highly productive research 15 Set of statistical precepts (rules) accepted by his scientitst Kept research to proceed in orderly and planned fashion Open mind for the unexpected World record: 77 approved medicines over a period of 40 years 200 research papers/year Set of statistical precepts (rules) accepted by his scientitst Kept research to proceed in orderly and planned fashion Open mind for the unexpected World record: 77 approved medicines over a period of 40 years 200 research papers/year

16
Statistical Thinking and Smart Experimental Design II. Smart Research Design by Statistical Thinking 16

17
The architecture of experimental research The two types of studies 17

18
The architecture of experimental research Research is a phased process 18

19
The architecture of experimental research Research is an iterative process 19

20
Modulating between the concrete and the abstract 20

21
Research styles may vary… 21 The novelistThe data salvager The lab freakThe smart researcher Definition Design Data Collection Analysis Reporting

22
The smart researcher 22 time spent planning and designing an experiment at the outset saves time and money in the long run optimizes the lab work and reduces the time spent in the lab the design of the experiment is most important and governs how data are analyzed minimizes time spent at the data analysis stage can look ahead to the reporting phase with peace of mind since early phases of study are carefully planned and formalized (proposal, protocol) time spent planning and designing an experiment at the outset saves time and money in the long run optimizes the lab work and reduces the time spent in the lab the design of the experiment is most important and governs how data are analyzed minimizes time spent at the data analysis stage can look ahead to the reporting phase with peace of mind since early phases of study are carefully planned and formalized (proposal, protocol)

23
The design of insightful experiments is a process informed by statistical thinking 23

24
The Seven Principles of Statistical Thinking 24 1.Time spent thinking on the conceptualization and design of an experiment is time wisely spent 2.The design of an experiment reflects the contributions from different sources of variability 3.The design of an experiment balances between its internal validity and external validity 4.Good experimental practice provides the clue to bias minimization 5.Good experimental design is the clue to the control of variability 6.Experimental design integrates various disciplines; 7.A priori consideration of statistical power is an indispensable pillar of an effective experiment. 1.Time spent thinking on the conceptualization and design of an experiment is time wisely spent 2.The design of an experiment reflects the contributions from different sources of variability 3.The design of an experiment balances between its internal validity and external validity 4.Good experimental practice provides the clue to bias minimization 5.Good experimental design is the clue to the control of variability 6.Experimental design integrates various disciplines; 7.A priori consideration of statistical power is an indispensable pillar of an effective experiment.

25
Statistical Thinking and Smart Experimental Design III. Planning the Experiment 25

26
The planning process 26

27
The planning process 27

28
The planning process 28

29
Limit number of research objectives 29 Example Séralini study: 10 treatment groups, females and males = 20 groups Only 10 animals/group 50 animals/group golden standard in toxicology Example Séralini study: 10 treatment groups, females and males = 20 groups Only 10 animals/group 50 animals/group golden standard in toxicology Number of research objectives should be limited ( 3 objectives)

30
Importance of auxiliary hypotheses 30 Example Séralini study: Use of Sprague-Dawley breed of rats as model for human cancer Known to have higher mortality in 2 year studies Prone to develop spontaneous tumors Example Séralini study: Use of Sprague-Dawley breed of rats as model for human cancer Known to have higher mortality in 2 year studies Prone to develop spontaneous tumors Reliance on auxiliary hypotheses is the rule rather than the exception in testing scientific hypotheses (Hempel, 1966)

31
Types of experiments 31

32
Abuse of exploratory experiments 32 Most published research exploratory Too often published as if confirmatory experiments Do not unequivocally answer research question Use of same data that generated research hypotheses to prove these hypotheses involves circular reasoning Inferential statistics should be interpreted and published as for exploratory purposes only Séralini study was actually conceived as exploratory Most published research exploratory Too often published as if confirmatory experiments Do not unequivocally answer research question Use of same data that generated research hypotheses to prove these hypotheses involves circular reasoning Inferential statistics should be interpreted and published as for exploratory purposes only Séralini study was actually conceived as exploratory

33
Role of the pilot experiment 33

34
Types of experiments objective - usage 34

35
Statistical Thinking and Smart Experimental Design IV. Principles of Statistical Design 35

36
Some terminology 36 Factor: the condition that we manipulate in the experiment (e.g. concentration of a drug, temperature) Factor level: the value of that condition (e.g M, 40°C) Treatment: combination of factor levels (e.g M and 40°C) Response or dependent variable: characteristic that is measured Experimental unit: the smallest physical entity to which a treatment is independently applied Observational unit: physical entity on which the response is measured See Appendix A, Glossary of Statistical Terms Factor: the condition that we manipulate in the experiment (e.g. concentration of a drug, temperature) Factor level: the value of that condition (e.g M, 40°C) Treatment: combination of factor levels (e.g M and 40°C) Response or dependent variable: characteristic that is measured Experimental unit: the smallest physical entity to which a treatment is independently applied Observational unit: physical entity on which the response is measured See Appendix A, Glossary of Statistical Terms

37
The structure of the response 37 Additive model Treatment effects constant Treatment effects in one unit do not affect other units Additive model Treatment effects constant Treatment effects in one unit do not affect other units Assumptions

38
Defining the experimental unit 38

39
The choice of the experimental unit 39 Temme et al. (2001) Comparison of two genetic strains of mice, wild-type and connexin32-defficient Diameters of bile caniliculi in livers Statistically significant difference between two strains (P < = 1/200) Experimental unit = Cell Is this correct? Temme et al. (2001) Comparison of two genetic strains of mice, wild-type and connexin32-defficient Diameters of bile caniliculi in livers Statistically significant difference between two strains (P < = 1/200) Experimental unit = Cell Is this correct? Means ± SEM from 3 livers

40
The choice of the experimental unit 40 Experimental unit = animal Results not conclusive at all Wrong choice made out of ignorance, or out of convenience? This was published in peer reviewed journal !!! Experimental unit = animal Results not conclusive at all Wrong choice made out of ignorance, or out of convenience? This was published in peer reviewed journal !!!

41
The choice of the experimental unit Rivenson study on N-nitrosamines 41 Rats housed 3 / cage Treatment supplied in drinking water Impossible to treat any 2 rats differently Rats within a cage not independent Rat not experimental unit Experimental unit = cage Rats housed 3 / cage Treatment supplied in drinking water Impossible to treat any 2 rats differently Rats within a cage not independent Rat not experimental unit Experimental unit = cage Same remarks for Séralini study Rats housed 2/cage Rats in same cage are not independent of one another Number of experimental units is not 10/treatment, but 5/treatment Same remarks for Séralini study Rats housed 2/cage Rats in same cage are not independent of one another Number of experimental units is not 10/treatment, but 5/treatment

42
The choice of the experimental unit The reverse case 42 Skin irritation test in volunteers 2 compounds Backs of volunteers divided in 8 test areas Each compound randomly assigned to 4 different test areas Experimental unit = Test Area, not volunteer Skin irritation test in volunteers 2 compounds Backs of volunteers divided in 8 test areas Each compound randomly assigned to 4 different test areas Experimental unit = Test Area, not volunteer

43
Why engineers dont care very much about statistics 43

44
The basic dilemma: balancing internal and external validity 44

45
Failure to minimize bias ruins an experiment 45 Failure to minimize bias results in lost experiments Failure to control variability can sometimes be remediated after the facts Allocation to treatment is most important source of bias

46
Basic strategies for maximizing Signal/Noise ratio 46

47
47 A control is a standard treatment condition against which all others may be compared

48
48 Single blinding – the condition under which investigators are uninformed as to the treatment condition of experimental units Double blinding – the condition under which both experimenter and observer are uninformed as to the treatment condition of experimental units Single blinding – the condition under which investigators are uninformed as to the treatment condition of experimental units Double blinding – the condition under which both experimenter and observer are uninformed as to the treatment condition of experimental units Neutralizes investigator bias Neutralizes investigator bias and bias in the observers scoring system In toxicological histopathology, both blinded and unblinded evaluation are recommended

49
49 Group blinding Individual blinding Group blinding Individual blinding Entire treatment groups are coded: A, B, C Each experimental unit is coded individually Codes are kept in randomization list Involves independent person that maintains list and prepares treatments Each experimental unit is coded individually Codes are kept in randomization list Involves independent person that maintains list and prepares treatments

50
50 Describes practical actions Guidelines for lab technicians Manipulation of experimental units (animals, etc.) Materials used Logistics Definitions of measurements and scoring methods Data collection & processing Personal responsibilities Describes practical actions Guidelines for lab technicians Manipulation of experimental units (animals, etc.) Materials used Logistics Definitions of measurements and scoring methods Data collection & processing Personal responsibilities A detailed protocol is imperative to minimize potential bias

51
Requirements for a good experiment 51

52
52 Calibration is an operation that compares the output of a measurement device to standards of known value, leading to correction of the values indicated by the measurement device Neutralizes bias in the investigators measurement system

53
53 Randomization ensures that the effect of uncontrolled sources of variability has equal probability in all treatment groups Randomization provides a rigorous method to assess the uncertainty (standard error) in treatment comparisons Formal randomization involves a chance dependent mechanism, e.g. flip of a coin, by computer (Appendix B) Formal randomization is not haphazard allocation Moment of randomization such that the randomization covers all substantial sources of variation, i.e. immediately before treatment administration Additional randomization may be required at different stages Randomization ensures that the effect of uncontrolled sources of variability has equal probability in all treatment groups Randomization provides a rigorous method to assess the uncertainty (standard error) in treatment comparisons Formal randomization involves a chance dependent mechanism, e.g. flip of a coin, by computer (Appendix B) Formal randomization is not haphazard allocation Moment of randomization such that the randomization covers all substantial sources of variation, i.e. immediately before treatment administration Additional randomization may be required at different stages

54
54 Randomized versus Systematic Allocation First unit treatment A, second treatment B, etc. yielding sequence AB, AB, AB Other sequences as AB BA, BA AB, etc. possible First unit treatment A, second treatment B, etc. yielding sequence AB, AB, AB Other sequences as AB BA, BA AB, etc. possible any systematic arrangement might coincide with a specific pattern in the variability biased estimate of treatment effect randomization is a necessary condition for statistical analysis without randomization analysis is not justified any systematic arrangement might coincide with a specific pattern in the variability biased estimate of treatment effect randomization is a necessary condition for statistical analysis without randomization analysis is not justified.

55
55 Improvements of Randomization Schemes. Balancing means causes imbalance in variability Invalidates subsequent statistical analysis Balancing means causes imbalance in variability Invalidates subsequent statistical analysis

56
56 Haphazard Allocation versus Formal Randomization Haphazard treatment allocation is NOT the same as formal randomization Haphazard allocation is subjective and can introduce bias Proper randomization requires a physical randomizing device Haphazard treatment allocation is NOT the same as formal randomization Haphazard allocation is subjective and can introduce bias Proper randomization requires a physical randomizing device Effect of 2 diets on body weight of rats 12 animals arrive in 1 cage technician takes animals out of cage first 6 animals diet A, remaining 6 diet B Effect of 2 diets on body weight of rats 12 animals arrive in 1 cage technician takes animals out of cage first 6 animals diet A, remaining 6 diet B Isnt this Random ? heavy animals react slower and are easier to catch than smaller animals first 6 animals weigh more than remaining 6 heavy animals react slower and are easier to catch than smaller animals first 6 animals weigh more than remaining 6

57
57 Moment of Randomization Effect of treatment confounded with systematic errors due to incubation Randomization should be done as late as possible just before treatment application Randomization sequence should be maintained throughout the experiment Effect of treatment confounded with systematic errors due to incubation Randomization should be done as late as possible just before treatment application Randomization sequence should be maintained throughout the experiment Rat brain cells harvested and seeded on Petri-dishes (1 animal = 1 dish) Dishes randomly divided in two groups Two sets of Petri dishes placed in incubator After 72 hours group 1 treated with drug A and group 2 with drug B Rat brain cells harvested and seeded on Petri-dishes (1 animal = 1 dish) Dishes randomly divided in two groups Two sets of Petri dishes placed in incubator After 72 hours group 1 treated with drug A and group 2 with drug B

58
Basic strategies for maximizing Signal/Noise ratio 58

59
59 Simple random sample a selection of units from a defined target population such that each unit has an equal chance of being selected Neutralizes sampling bias Provides the foundation for the population model of inference Particularly important in genetic research (Nahon & Shoemaker) Neutralizes sampling bias Provides the foundation for the population model of inference Particularly important in genetic research (Nahon & Shoemaker) A random sample is a stringent requirement that is very difficult to fulfill in biomedical research Switch to randomization model of statistical inference in which inference is restricted to the actual experiment only

60
60 Reducing the intrinsic variability and bias by: use of genetically uniform animals use of phenotypically uniform animals environmental control nutritional control acclimatization measurement system Reducing the intrinsic variability and bias by: use of genetically uniform animals use of phenotypically uniform animals environmental control nutritional control acclimatization measurement system Beware of external validity

61
Basic strategies for maximizing Signal/Noise ratio 61

62
62 Replication can be an effective strategy to control variability (precision) Replication is an expensive strategy to control variability Precision of experiment is quantified by standard error of difference between two treatment means Replication is also needed to estimate SD Replication is only effective at the level of the true experimental unit

63
63 Replication is in most cases only effective at the level of the true experimental unit Replication at the subsample level (observational unit) makes only sense when the variability at this level is substantial as compared to the variability between the experimental units Replication is in most cases only effective at the level of the true experimental unit Replication at the subsample level (observational unit) makes only sense when the variability at this level is substantial as compared to the variability between the experimental units

64
64 A blocking factor is a known and unavoidable source of variability that adds or subtracts an offset to every experimental unit in the block Typical blocking factors age group microtiter plate run of assay batch of material subject (animal) date operator Typical blocking factors age group microtiter plate run of assay batch of material subject (animal) date operator

65
65 A covariate is an uncontrollable but measurable attribute of the experimental unit or its environment that is unaffected by the treatments but may have an influence on the measured response Covariates filter out one particular source of variation. Rather than blocking it represents a quantifiable (by coefficient β ) attribute of the system studied. Typical covariates baseline measurement weight age ambient temperature Typical covariates baseline measurement weight age ambient temperature Covariance adjustment requires slopes to be parallel

66
Requirements for a good experiment 66

67
Simplicity of design 67 Complex Design Protocol adherence? Statistical analysis? Simple designs require a simple analysis

68
The calculation of uncertainty (error) 68 It is possible and indeed it is all too frequent, for an experiment to be so conducted that no valid estimate of error is available (Fisher, 1935) Experimental units Respond independently to treatment Differ only in random way from experimental units in other treatment groups Without a valid estimate of error, a valid statistical analysis is impossible Validity of error estimation

69
Statistical Thinking and Smart Experimental Design V. Common Designs in Biosciences 69

70
The jungle of experimental designs 70 Split Plot Completely Randomized Design Randomized Complete Block Design Latin Square Design Greco-Latin Square Design Youden Square Design Factorial Design Plackett-Burman Design

71
Experimental design is an integrative concept 71 An experimental design is a synthetic approach to minimize bias and control variability

72
Three aspects of experimental design determine complexity and resources 72

73
Completely Randomized Design 73 Studies the effect of a single independent variable (treatment) One-way treatment design Completely randomized error–control design Most common design (default design), easy to implement Studies the effect of a single independent variable (treatment) One-way treatment design Completely randomized error–control design Most common design (default design), easy to implement Lack of precision in comparisons is major drawback

74
Completely Randomized Design Example 1 74 Effect of treatment on proliferation of gastric epithelial cells in rats 2 drugs & 2 controls (vehicle) = 4 experimental groups 10 animals/group, individual housing in cages Cages numbered and treatments assigned randomly to cage numbers Effect of treatment on proliferation of gastric epithelial cells in rats 2 drugs & 2 controls (vehicle) = 4 experimental groups 10 animals/group, individual housing in cages Cages numbered and treatments assigned randomly to cage numbers Cages sequentially put in rack Blinding of laboratory staff and of histological evaluation, Treatment code = cage number (individual blinding) Cages sequentially put in rack Blinding of laboratory staff and of histological evaluation, Treatment code = cage number (individual blinding)

75
Completely Randomized Design Example 2 75 Bias due to plate location effects in microtiter plates (Burrows (1984), causes parabolic patterns: Underlying causes unkown Solution by Faessel (1999): Random allocation of treatments to the wells Bias of plate location is now introduced as random variation Solution by Faessel (1999): Random allocation of treatments to the wells Bias of plate location is now introduced as random variation Randomization can be used to transform bias into noise

76
A convenient way to randomize microtiter plates 76 dilutions made in tube rack 1 tubes are numbered MS Excel used to make randomization map map taped on top of rack 2 tubes from rack 1 are pushed to corresponding number in rack 2 multipipette used to apply drugs to 96- well plate results entered in MS Excel and sorted dilutions made in tube rack 1 tubes are numbered MS Excel used to make randomization map map taped on top of rack 2 tubes from rack 1 are pushed to corresponding number in rack 2 multipipette used to apply drugs to 96- well plate results entered in MS Excel and sorted Alternative methods: design (Burrows et al. 1984, Schlain et al. 2001) statistical model (Straetemans et al. 2005) Alternative methods: design (Burrows et al. 1984, Schlain et al. 2001) statistical model (Straetemans et al. 2005)

77
Factorial Design (full factorial) 77 Studies the joint effect of two or more independent variables (factors) Considers main effects and interactions Factorial treatment design Completely Randomized error-control design Studies the joint effect of two or more independent variables (factors) Considers main effects and interactions Factorial treatment design Completely Randomized error-control design

78
Factorial Design Example 78 No interaction, additive effects Interaction present, superadditive effects

79
Factorial Designs Interaction and Drug Synergy 79 A hypothetical factorial experiment of a drug combined with itself at 0 and 0.2 mg/l Synergy of a drug with itself? The pharmacological concept of drug synergy requires more than just statistical interaction

80
Factorial Design Further examples 80 Effects of 2 different culture media and 2 different incubation times on growth of a viral culture Microarray experiments interaction between mutants and time of observation For cDNA microarrays specialized procedures required (see Glonek and Solomon, 2004; Banerjee and Mukerjee, 2007) Effects of 2 different culture media and 2 different incubation times on growth of a viral culture Microarray experiments interaction between mutants and time of observation For cDNA microarrays specialized procedures required (see Glonek and Solomon, 2004; Banerjee and Mukerjee, 2007) The need for a factorial experiment should be recognized at the design phase The analysis should make use of the proper methods The need for a factorial experiment should be recognized at the design phase The analysis should make use of the proper methods

81
Factorial Design Higher dimensional designs 81 3 x 3 x 3 design = 27 treatments, 2 replicates = 54 experimental units is this manageable? Interpretation of higher (3-way) interactions? 3 x 3 x 3 design = 27 treatments, 2 replicates = 54 experimental units is this manageable? Interpretation of higher (3-way) interactions? Neglect higher order interactions Do not test ALL combinations Fractional Factorial Design

82
Factorial Design Efficiency of factorial design 82 No important interaction from statistical AND scientific point of view Effect of factor A: [(13 – 18) + (10 -15)]/2 = -5 d.f. = 4 x (n – 1) Effect of factor B: [(15 – 18) + (10 -13)]/2 = -3 d.f. = 4 x (n – 1) Interaction = ( ) – (15 +13) = 0 Increase external validity Only valid method to study interaction between factors

83
Randomized Complete Block Design 83 Studies the effect of a single independent variable (treatment) in presence of a single isolated extraneous source of variability (blocks) One way treatment design One way block error-control design Increase of precision Eliminates possible imbalance in blocking factor Increase of precision Eliminates possible imbalance in blocking factor Assumes treatment effect the same among all blocks Blocking characteristic must be related to response or else loss of efficiency due to loss in df Assumes treatment effect the same among all blocks Blocking characteristic must be related to response or else loss of efficiency due to loss in df

84
Completely Randomized Design Example 84 Effect of treatment on proliferation of gastric epithelial cells in rats 2 drugs & 2 controls (vehicle) = 4 experimental groups 10 animals/group, individual housing in cages Cages numbered and treatments assigned randomly to cage numbers Blinding of laboratory staff Shelf height will probably influence the results 1 shelf = 1 block 8 animals (cages) / shelf, 5 shelves Randomize treatments per shelf 2 animals / shelf / treatment 1 shelf = 1 block 8 animals (cages) / shelf, 5 shelves Randomize treatments per shelf 2 animals / shelf / treatment Not restricted to shelf height, other characteristics can also be included (e.g. body weight)

85
Randomized Complete Block Design Paired Design 85 Special case of RCB design with only 2 treatments and 2 EU/block Example Cardiomyocyte experiment Special case of RCB design with only 2 treatments and 2 EU/block Example Cardiomyocyte experiment

86
Randomized Complete Block Design Paired Design – Example 86 Experimental Design Results Left panel ignores pairing groups overlap, standard error drug-vehicle = 7.83 Right panel pairs are connected by lines consistent increase in drug treated dishes standard error drug-vehicle = 2.51 Left panel ignores pairing groups overlap, standard error drug-vehicle = 7.83 Right panel pairs are connected by lines consistent increase in drug treated dishes standard error drug-vehicle = 2.51 The CRD requires 8 times more EUs then the paired design

87
Latin Square Design 87 Studies the effect of a single independent variable (treatment) in presence of two isolated, extraneous sources of variability (blocks) One-way treatment design Two-way block error-control design In 1 k x k Latin square only k EUs/treatment More EUs by stacking Latin squares upon each other or next to each other Randomizing the rows (columns) of the stacked LS design can be advantageous In 1 k x k Latin square only k EUs/treatment More EUs by stacking Latin squares upon each other or next to each other Randomizing the rows (columns) of the stacked LS design can be advantageous Latin squares can be used to eliminate row-column effects in 96-well microtiter plates, 1 plate = 2 x (5 x 5 LS)

88
Split Plot Design 88 A split-plot design recognizes two types of experimental units: Plots Subplots Two-way crossed (factorial) treatment design Split plot error-control design Plots randomly assigned to primary factor (color) Subplots randomly assigned to secondary factor (symbol) Plots randomly assigned to primary factor (color) Subplots randomly assigned to secondary factor (symbol)

89
Split Plot design Example 89 Effects of 2 diets and vitamin supplement on weight gain in rats Rats housed 2 / cage (independence?) Cages randomly allocated to diet A or diet B Individual rats color marked randomly selected to receive vitamin or not Main plot factor = diet Subplot factor = vitamin Main plot factor = diet Subplot factor = vitamin

90
Repeated Measures Design 90 Two independent variables time and treatment Each is associated with different experimental units Special case of split plot design

91
Statistical Thinking and Smart Experimental Design VI. The Required Number of Replicates – Sample Size 91

92
Determining sample size is a risk-cost assessment 92 Replicates Cost Replicates Cost Uncertainty Confidence Uncertainty Confidence Choose sample size just big enough to give confidence that any biologically meaningful effect can be detected

93
The context of biomedical experiments 93 Point Estimation Interval Estimation Hypothesis Testing Statistical Context Sample size depends on Assumptions Study Specifications Study Objectives Study Design Study Design

94
The hypothesis testing context 94 Population Model Null Hypothesis Alternative Hypothesis Specify allowable false positive & false negative rate Level of significance 0.01, 0.05, 0.10 Power 80% or 90% Sample size Alternative hypothesis, effect size: Size of difference Variability Alternative hypothesis, effect size: Size of difference Variability Number of treatments Number of blocks Number of treatments Number of blocks

95
Major determinants of sample size 95

96
Approaches for determining sample size 96 Formal power analysis approaches Requires input of significance level power alternative hypothesis (i.e. effect size Δ ) smallest difference variability (SD) R-package pwr (Champely, 2009) Lehrs equation: n per group single comparison (2 groups) significance level 0.05 power 80%, other powers possible Rule-of-thumb: Meads Resource Requirement Equation E = N - T - B E = df for error, 12 E 20, blocked exps. 6 N = df total sample size (no. EUs – 1) T = df for treatment combination B = df for blocks and covariates df = number of occurences – 1

97
Approaches for determining sample size Examples 97 Effect size Δ = difference in means / standard deviation Δ = 0.2 small; Δ = 0.5 medium; Δ = 0.8 large (Cohen) Effect size Δ = difference in means / standard deviation Δ = 0.2 small; Δ = 0.5 medium; Δ = 0.8 large (Cohen) standard deviation of either group or pooled SD Cardiomyocytes completely randomized design Treated - Control = 10; SD = 12.5 Δ = 10/12.5 = 0.8 Cardiomyocytes completely randomized design Treated - Control = 10; SD = 12.5 Δ = 10/12.5 = 0.8 > require(pwr) > pwr.t.test(d=0.8,power=0.8,sig.level=0.05,type="two.sample", + alternative="two.sided") Lehrs equation 16/0.64 = 25 Two-sample t test power calculation n = d = 0.8 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group Two-sample t test power calculation n = d = 0.8 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group

98
Approaches for determining sample size Examples (cont.) 98 Rule-of-thumb: Meads Resource Requirement Equation E = N - T - B E = df for error, 12 E 20, 6 blocked exps. N = df total sample size (no. EUs – 1) T = df for treatment combination B = df for blocks and covariates Cardiomyocytes completely randomized: N = 9, T = 1, B = 0 E = 10, too small (min. = 12) Cardiomyocytes completely randomized: N = 9, T = 1, B = 0 E = 10, too small (min. = 12) Cardiomyocytes paired experiment: N = 9, T = 1, B = 4 E = 4, too small (min. = 6) Cardiomyocytes paired experiment: N = 9, T = 1, B = 4 E = 4, too small (min. = 6)

99
Multiplicity and sample size 99

100
Multiplicity and sample size tests: 20% increase RSS 3 tests: 30% increase RSS 4 tests: 40% increase RSS 10 tests: 70% increase RSS 2 tests: 20% increase RSS 3 tests: 30% increase RSS 4 tests: 40% increase RSS 10 tests: 70% increase RSS For single comparison, large sample sizes required for medium ( Δ = 0.5) to large ( Δ = 0.8) effects Effects in early research usually very large For single comparison, large sample sizes required for medium ( Δ = 0.5) to large ( Δ = 0.8) effects Effects in early research usually very large

101
Statistical Thinking and Smart Experimental Design VII. The Statistical Analysis 101

102
The Statistical Triangle Design, Model, and Analysis are Interdependent 102

103
Every experimental design is underpinned by a statistical model 103 Statistical analysis: Fit model to data Compare treatment effect(s) with error Statistical analysis: Fit model to data Compare treatment effect(s) with error

104
Measurement theory defines allowable operations and statements 104 Nominal Naming Ex. Color, gender, etc. Frequency tables Naming Ex. Color, gender, etc. Frequency tables Ordinal Ranking and ordering Ex. Scoring as none, minor, moderate,…. Differences between scores do not reflect same distance Frequency tables, median values, IQR Ranking and ordering Ex. Scoring as none, minor, moderate,…. Differences between scores do not reflect same distance Frequency tables, median values, IQR Interval Adding, subtracting (no ratios) Arbitrary zero, 30°C 2 x 15°C Ex. Temperature in degrees Fahrenheit or Celsius Means, standard deviations, etc. Adding, subtracting (no ratios) Arbitrary zero, 30°C 2 x 15°C Ex. Temperature in degrees Fahrenheit or Celsius Means, standard deviations, etc. Ratio Differences and ratios make sense True zero point, negative numbers impossible Ex. Concentrations, duration, temperature in degrees Kelvin, age, counts,… Geometric means, coefficient of variation Differences and ratios make sense True zero point, negative numbers impossible Ex. Concentrations, duration, temperature in degrees Kelvin, age, counts,… Geometric means, coefficient of variation

105
Significance tests 105 Randomization Model Null hypothesis (H 0 ): no difference in response between treatment groups The concept of the p-value is often misunderstood Exp. Data Observed Test Statistic t Observed Test Statistic t P-value = Probability of obtaining a value for the test statistic that is at least as extreme as t assuming H 0 is true Consider H 0 as true p α Statistical Model Reject H 0 Do not reject H 0 Yes No

106
Significance tests Example Cardiomyocytes 106 Mean diff. = 7% viable MC standard error = 2.51 Test statistic t = 7/2.51 = 2.79 Test statistic t = 7/2.51 = 2.79 Assume H 0 is true Distribution of possible values of t with mean 0 P(t 2.79) = Probability of obtaining an increase of 2.79 or more is 0.024, provided H 0 is true

107
Significance tests Two-sided tests 107 Mean diff. = 7% viable MC standard error = 2.51 Test statistic t = 7/2.51 = 2.79 Test statistic t = 7/2.51 = 2.79 Assume H 0 is true Distribution of possible values of t with mean 0 P(|t| 2.79) = Probability of obtaining a difference of ± 2.79 or more is 0.048, provided H 0 is true Results in opposite direction are also of interest Two-sided tests are the rule!

108
Statistics always makes assumptions 108 Parametric Methods: Distribution of error term Randomization Independence Equality of variance Parametric Methods: Distribution of error term Randomization Independence Equality of variance Nonparametric Methods: Randomization Independence Nonparametric Methods: Randomization Independence Always verify assumptions Planning (historical data) Analysis time before actual analysis Always verify assumptions Planning (historical data) Analysis time before actual analysis

109
Multiplicity 109 Multiple comparisons Multiple variables Multiplicity Multiple measurements (in time) 20 doses of drug tested against its control each at significance level of 0.05 Assume all null hypotheses are true, no dose differs from control P(correct decision of no difference) = = 0.95 P(ALL 20 decisions correct) = = 0.36 P(at least 1 mistake) = P(NOT ALL 20 correct) = 1 – 0.36 = doses of drug tested against its control each at significance level of 0.05 Assume all null hypotheses are true, no dose differs from control P(correct decision of no difference) = = 0.95 P(ALL 20 decisions correct) = = 0.36 P(at least 1 mistake) = P(NOT ALL 20 correct) = 1 – 0.36 = 0.64 The Curse of Multiplicity is of particular importance in microarray data (30,000 genes 1,500 FP) Multiplicity and how to deal with it must be recognized at the planning stage The Curse of Multiplicity is of particular importance in microarray data (30,000 genes 1,500 FP) Multiplicity and how to deal with it must be recognized at the planning stage

110
Statistical Thinking and Smart Experimental Design VIII. The Study Protocol 110

111
The writing of the study protocol finalizes the research design phase 111 Research protocol Rehearses research logic Forms the basis for reporting Research protocol Rehearses research logic Forms the basis for reporting Technical protocol Describes practical actions Guidelines for lab technicians Technical protocol Describes practical actions Guidelines for lab technicians General hypothesis Working hypothesis Experimental design rationale (treatment, error control, sampling design) Measures to minimize bias (randomization, etc.) Measurement methods Statistical analysis Manipulation of animals Materials Logistics Data collection & processing Personal responsibilities Writing the Study Protocol is already working on the Study Report

112
Statistical Thinking and Smart Experimental Design IX. Interpretation & Reporting 112

113
Points to consider in the Methods Section 113 Experimental design Weaknesses & Strengths : Size and number of experimental groups How were experimental units assigned to treatments? Was proper randomization used or not? Was it a blinded study and how was blinding introduced? Was blocking used and how? How were outcomes assessed? In case of ambiguity, which was the experimental unit and which the observational unit and why? Experimental design Weaknesses & Strengths : Size and number of experimental groups How were experimental units assigned to treatments? Was proper randomization used or not? Was it a blinded study and how was blinding introduced? Was blocking used and how? How were outcomes assessed? In case of ambiguity, which was the experimental unit and which the observational unit and why? Statistical Methods: Report and justify statistical methods with enough detail Supply level of significance and when applicable direction of test (one-sided, two-sided) Recognize multiplicity and give justification of strategy to deal with it Reference to software and version Statistical Methods: Report and justify statistical methods with enough detail Supply level of significance and when applicable direction of test (one-sided, two-sided) Recognize multiplicity and give justification of strategy to deal with it Reference to software and version

114
Points to consider in the Results Section Summarizing the Data 114 Always report number of experimental units used in the analysis For - and only for - normally distributed data mean and standard deviation (SD) are sufficient statistics SD is descriptive statistic about spread, i.e. variability Report as mean (SD) NOT as mean ± SD SEM is measure of precision of the mean (makes not much sense) SD is descriptive statistic about spread, i.e. variability Report as mean (SD) NOT as mean ± SD SEM is measure of precision of the mean (makes not much sense) Not normally distributed Median values and interquartile range Extremely small datasets (n < 6) Raw data Mean – 2 x SD can lead to ridiculous values if data not normally distributed (concentrations, durations, counts, etc.) Spurious precision detracts a papers readability and credibility

115
Points to consider in the Results Section Graphical Displays 115 Complement tabular presentations Better suited for identifying patterns:A picture says more than a thousand words Whenever possible display individual data (e.g. use of beeswarm package in R) Complement tabular presentations Better suited for identifying patterns:A picture says more than a thousand words Whenever possible display individual data (e.g. use of beeswarm package in R) Individual data, median values and 95% confidence intervals Longitudinal data (measurements over time) Individual subject profiles

116
Points to consider in the Results Section Significance Tests 116 Specify method of analysis do not leave ambiguity e.g. statistical methods included ANOVA, regression analysis as well as tests of significance in the methods section is not specific enough Tests of significance should be two-sided Two group tests allow direction of alternative hypothesis, e.g. treated > control, this is a one-sided alternative Use of one-sided tests must be justified and the direction of the test must be stated beforehand (in protocol) For tests that allow a one-sided alternative it must be stated whether two-sided or one-sided was used Two group tests allow direction of alternative hypothesis, e.g. treated > control, this is a one-sided alternative Use of one-sided tests must be justified and the direction of the test must be stated beforehand (in protocol) For tests that allow a one-sided alternative it must be stated whether two-sided or one-sided was used Report exact p-values rather than p<0.05 or NS is significant, not ? Allows readers to choose their own critical value Avoid reporting p = 0.000; but report p<0.001 instead Report p-values to the third decimal For one-sided tests if result is in wrong direction then p = 1 - p is significant, not ? Allows readers to choose their own critical value Avoid reporting p = 0.000; but report p<0.001 instead Report p-values to the third decimal For one-sided tests if result is in wrong direction then p = 1 - p

117
Points to consider in the Results Section Significance Tests – Confidence Intervals 117 Provide confidence intervals to interpret size of effect Null hypothesis is never proved Lack of evidence is no evidence for lack of effect Confidence intervals provide a region of plausible values for the treatment effect Null hypothesis is never proved Lack of evidence is no evidence for lack of effect Confidence intervals provide a region of plausible values for the treatment effect

118
Points to consider in the Results Section Significance Tests – Confidence Intervals 118

119
X is significant, while Y is not False claims of interaction 119 The difference between significant and not significant is not itself significant Test for equalitiy of effect of training in 2 groups The difference between significant and not significant is not itself significant Test for equalitiy of effect of training in 2 groups The percentage of neurons showing cue-related activity increased with training in the mutant mice (P 0.05) Invalid reasoning from nonsignificant result (P>0.05) Only a factorial experiment with proper method of analysis (ANOVA) and test for interaction can make such a claim

120
Concluding Remarks 120 Statistical thinking permeates the research process is the key factor for successful experiments Role of Statistician? Professional skilled in solving research problems Team member and collaborator/partner in research project often granted co-authorship Should be consulted when there is doubt with regard to design, sample size, or statistical analysis It can be wise to include a statistician from the very beginning of the project Professional skilled in solving research problems Team member and collaborator/partner in research project often granted co-authorship Should be consulted when there is doubt with regard to design, sample size, or statistical analysis It can be wise to include a statistician from the very beginning of the project Seven principles of statistical thinking

121
Fisher about the Role of the Statistician 121 To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of. (Presidential Address to the First Indian Statistical Congress, 1938; Fisher, 1938). To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of. (Presidential Address to the First Indian Statistical Congress, 1938; Fisher, 1938).

122
122

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google