Experimental design, basic statistics, and sample size determination

Slides:



Advertisements
Similar presentations
Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Advertisements

Designing Experiments: Sample Size and Statistical Power Larry Leamy Department of Biology University of North Carolina at Charlotte Charlotte, NC
Chapter 6 Sampling and Sampling Distributions
LSU-HSC School of Public Health Biostatistics 1 Statistical Core Didactic Introduction to Biostatistics Donald E. Mercante, PhD.
Parametric Inferential Statistics. Types of Inference Estimation: On the basis of information in a sample of scores, we estimate the value of a population.
Statistical Issues in Research Planning and Evaluation
Inferential Statistics & Hypothesis Testing
Topic 6: Introduction to Hypothesis Testing
Confidence Intervals © Scott Evans, Ph.D..
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Nemours Biomedical Research Statistics March 19, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Research vs Experiment
Chapter 7 Sampling and Sampling Distributions
Research Study. Type Experimental study A study in which the investigator selects the levels of at least one factor Observational study A design in which.
Experimental Design Research vs Experiment. Research A careful search An effort to obtain new knowledge in order to answer a question or to solve a problem.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Sample Size Determination
Experimental design and sample size determination Karl W Broman Department of Biostatistics Johns Hopkins University
Courtney McCracken, M.S., PhD(c) Traci Leong, PhD May 1 st, 2012.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Inference for Distributions
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Inference for a Single Population Proportion (p).
Comparing Two Population Means
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
RDPStatistical Methods in Scientific Research - Lecture 11 Lecture 1 Interpretation of data 1.1 A study in anorexia nervosa 1.2 Testing the difference.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
The Scientific Method Formulation of an H ypothesis P lanning an experiment to objectively test the hypothesis Careful observation and collection of D.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
The Practice of Statistics Third Edition Chapter 10: Estimating with Confidence Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
1 When we free ourselves of desire, we will know serenity and freedom.
C82MST Statistical Methods 2 - Lecture 1 1 Overview of Course Lecturers Dr Peter Bibby Prof Eamonn Ferguson Course Part I - Anova and related methods (Semester.
9.3/9.4 Hypothesis tests concerning a population mean when  is known- Goals Be able to state the test statistic. Be able to define, interpret and calculate.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
© Copyright McGraw-Hill 2004
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
1 Probability and Statistics Confidence Intervals.
The Practice of Statistics Third Edition Chapter 11: Testing a Claim Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Hypothesis Testing and Statistical Significance
Hypothesis Tests for 1-Proportion Presentation 9.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
Statistical Core Didactic
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing: Hypotheses
Lesson Comparing Two Means.
Comparing Populations
Experimental Design Research vs Experiment
Research vs Experiment
CHAPTER 16: Inference in Practice
Presentation transcript:

Experimental design, basic statistics, and sample size determination Karl W Broman Department of Biostatistics Johns Hopkins Bloomberg School of Public Health http://www.biostat.jhsph.edu/~kbroman 1

Experimental design 2

Basic principles Formulate question/goal in advance Comparison/control Replication Randomization Stratification (aka blocking) Factorial experiments 3

Example Question: Does salted drinking water affect blood pressure (BP) in mice? Experiment: Provide a mouse with water containing 1% NaCl. Wait 14 days. Measure BP. 4

Comparison/control Good experiments are comparative. Compare BP in mice fed salt water to BP in mice fed plain water. Compare BP in strain A mice fed salt water to BP in strain B mice fed salt water. Ideally, the experimental group is compared to concurrent controls (rather than to historical controls). 5

Replication 6

Why replicate? Reduce the effect of uncontrolled variation (i.e., increase precision). Quantify uncertainty. A related point: An estimate is of no value without some statement of the uncertainty in the estimate. 7

Randomization Experimental subjects (“units”) should be assigned to treatment groups at random. At random does not mean haphazardly. One needs to explicitly randomize using A computer, or Coins, dice or cards. 8

Why randomize? Avoid bias. For example: the first six mice you grab may have intrinsically higher BP. Control the role of chance. Randomization allows the later use of probability theory, and so gives a solid foundation for statistical analysis. 9

Stratification Suppose that some BP measurements will be made in the morning and some in the afternoon. If you anticipate a difference between morning and afternoon measurements: Ensure that within each period, there are equal numbers of subjects in each treatment group. Take account of the difference between periods in your analysis. This is sometimes called “blocking”. 10

Example 20 male mice and 20 female mice. Half to be treated; the other half left untreated. Can only work with 4 mice per day. Question: How to assign individuals to treatment groups and to days? 11

An extremely bad design 12

Randomized 13

A stratified design 14

Randomization and stratification If you can (and want to), fix a variable. e.g., use only 8 week old male mice from a single strain. If you don’t fix a variable, stratify it. e.g., use both 8 week and 12 week old male mice, and stratify with respect to age. If you can neither fix nor stratify a variable, randomize it. 15

Factorial experiments Suppose we are interested in the effect of both salt water and a high-fat diet on blood pressure. Ideally: look at all 4 treatments in one experiment. Plain water Normal diet Salt water High-fat diet Why? We can learn more. More efficient than doing all single-factor experiments.  16

Interactions 17

Other points Blinding Measurements made by people can be influenced by unconscious biases. Ideally, dissections and measurements should be made without knowledge of the treatment applied. Internal controls It can be useful to use the subjects themselves as their own controls (e.g., consider the response after vs. before treatment). Why? Increased precision. 18

Other points Representativeness Are the subjects/tissues you are studying really representative of the population you want to study? Ideally, your study material is a random sample from the population of interest. 19

Summary Characteristics of good experiments: Unbiased High precision Randomization Blinding High precision Uniform material Replication Stratification Simple Protect against mistakes Wide range of applicability Deliberate variation Factorial designs Able to estimate uncertainty Replication Randomization 20

Basic statistics 21

What is statistics? “We may at once admit that any inference from the particular to the general must be attended with some degree of uncertainty, but this is not the same as to admit that such inference cannot be absolutely rigorous, for the nature and degree of the uncertainty may itself be capable of rigorous expression.” — Sir R. A. Fisher 22

What is statistics? Data exploration and analysis Inductive inference with probability Quantification of uncertainty 23

Example We have data on the blood pressure (BP) of 6 mice. We are not interested in these particular 6 mice. Rather, we want to make inferences about the BP of all possible such mice. 24

Sampling 25

Several samples 26

Distribution of sample average 27

Confidence intervals We observe the BP of 6 mice, with average = 103.6 and standard deviation (SD) = 9.7. We assume that BP in the underlying population follows a normal (aka Gaussian) distribution. On the basis of these data, we calculate a 95% confidence interval (CI) for the underlying average BP: 103.6 ± 10.2 = (93.4 to 113.8) 28

What is a CI? The plausible values for the underlying population average BP, given the data on the six mice. In advance, there is a 95% chance of obtaining an interval that contains the population average. 29

100 CIs 30

95% CI for treatment effect = 12.6 ± 11.5 CI for difference 95% CI for treatment effect = 12.6 ± 11.5 31

Significance tests Confidence interval: The plausible values for the effect of salt water on BP. Test of statistical significance: Answer the question, “Does salt water have an effect?” Null hypothesis (H0): Salt water has no effect on BP. Alt. hypothesis (Ha): Salt water does have an effect. 32

Two possible errors Type I error (“false positive”) Conclude that salt water has an effect on BP when, in fact, it does not have an effect. Type II error (“false negative”) Fail to demonstrate the effect of salt water when salt water really does have an effect on BP. 33

Type I and II errors  The truth Conclusion No effect Has an effect Reject H0 Type I error  Fail to reject H0 Type II error 34

Conducting the test Calculate a test statistic using the data. (For example, we could look at the difference between the average BP in the treated and control groups; let’s call this D.) If this statistic, D, is large, the treatment appears to have some effect. How large is large? We compare the observed statistic to its distribution if the treatment had no effect. 35

Significance level We seek to control the rate of type I errors. Significance level (usually denoted ) = chance you reject H0, if H0 is true; usually we take  = 5%. We reject H0 when |D| > C, for some C. C is chosen so that, if H0 is true, the chance that |D| > C is . 36

If salt has no effect 37

If salt has an effect 38

P-values A P-value is the probability of obtaining data as extreme as was observed, if the null hypothesis were true (i.e., if the treatment has no effect). If your P-value is smaller than your chosen significance level (), you reject the null hypothesis. We seek to reject the null hypothesis (we seek to show that there is a treatment effect), and so small P- values are good. 39

Never cite a P-value without a confidence interval. Summary Confidence interval Plausible values for the true population average or treatment effect, given the observed data. Test of statistical significance Use the observed data to answer a yes/no question, such as “Does the treatment have an effect?” P-value Summarizes the result of the significance test. Small P-value  conclude that there is an effect. Never cite a P-value without a confidence interval. 40

Data presentation Good plot Bad plot 41

Data presentation Good table Bad table Treatment Mean (SEM) A 11.3 (0.6) B 13.5 (0.8) C 14.7 Treatment Mean (SEM) A 11.2965 (0.63) B 13.49 (0.7913) C 14.727 (0.6108) 42

Sample size determination 43

Fundamental formula 44

Listen to the IACUC Too few animals  a total waste Too many animals  a partial waste 45

Distribution of D when  = 0 Significance test Compare the BP of 6 mice fed salt water to 6 mice fed plain water.  = true difference in average BP (the treatment effect). H0:  = 0 (i.e., no effect) Test statistic, D. If |D| > C, reject H0. • C chosen so that the chance you reject H0, if H0 is true, is 5% Distribution of D when  = 0 46

Statistical power Power = The chance that you reject H0 when H0 is false (i.e., you [correctly] conclude that there is a treatment effect when there really is a treatment effect). 47

Power depends on… The structure of the experiment The method for analyzing the data The size of the true underlying effect The variability in the measurements The chosen significance level () The sample size Note: We usually try to determine the sample size to give a particular power (often 80%). 48

Effect of sample size 6 per group: Power = 70% 12 per group: 49

Effect of the effect  = 8.5: Power = 70%  = 12.5: Power = 96% 50

A formula Censored 51

Various effects Desired power   sample size  Stringency of statistical test   sample size  Measurement variability   sample size  Treatment effect   sample size  52

Determining sample size The things you need to know: Structure of the experiment Method for analysis Chosen significance level,  (usually 5%) Desired power (usually 80%) Variability in the measurements if necessary, perform a pilot study, or use data from prior publications The smallest meaningful effect 53

Reducing sample size Reduce the number of treatment groups being compared. Find a more precise measurement (e.g., average time to effect rather than proportion sick). Decrease the variability in the measurements. Make subjects more homogeneous. Use stratification. Control for other variables (e.g., weight). Average multiple measurements on each subject. 54

Final conclusions Experiments should be designed. Good design and good analysis can lead to reduced sample sizes. Consult an expert on both the analysis and the design of your experiment. 55

Resources ML Samuels, JA Witmer (2003) Statistics for the Life Sciences, 3rd edition. Prentice Hall. An excellent introductory text. GW Oehlert (2000) A First Course in Design and Analysis of Experiments. WH Freeman & Co. Includes a more advanced treatment of experimental design. Course: Statistics for Laboratory Scientists (Biostatistics 140.615-616, Johns Hopkins Bloomberg Sch. Pub. Health) Introductory statistics course, intended for experimental scientists. Greatly expands upon the topics presented here. 56