Estimation of Sample Size

Slides:



Advertisements
Similar presentations
Sampling techniques & sample size
Advertisements

II. Potential Errors In Epidemiologic Studies Random Error Dr. Sherine Shawky.
Sample size estimation
Statistical Decision Making
Sample Size Estimation
SAMPLE SIZE ESTIMATION
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Nemours Biomedical Research Statistics March 19, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Sample size computations Petter Mostad
Basic Elements of Testing Hypothesis Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director, Data Coordinating Center College.
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
BCOR 1020 Business Statistics Lecture 18 – March 20, 2008.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Sample Size Determination
Sample Size and Statistical Power Epidemiology 655 Winter 1999 Jennifer Beebe.
SAMPLE SIZE AND POWER CALCULATION
Sample size calculation
AM Recitation 2/10/11.
Statistical significance using p-value
Comparing Means From Two Sets of Data
Dr Mohammad Hossein Fallahzade Determining the Size of a Sample In the name of God.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Confidence Intervals (Chapter 8) Confidence Intervals for numerical data: –Standard deviation known –Standard deviation unknown Confidence Intervals for.
Power and Sample Size Determination Anwar Ahmad. Learning Objectives Provide examples demonstrating how the margin of error, effect size and variability.
Topic 5 Statistical inference: point and interval estimate
Lecture 14 Sections 7.1 – 7.2 Objectives:
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
PARAMETRIC STATISTICAL INFERENCE
Instructor Resource Chapter 5 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
Dr.Shaikh Shaffi Ahamed, PhD Associate Professor Department of Family & Community Medicine College of Medicine, KSU Statistical significance using p -value.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Sampling and Confidence Interval Kenneth Kwan Ho Chui, PhD, MPH Department of Public Health and Community Medicine
Inference and Inferential Statistics Methods of Educational Research EDU 660.
CONFIDENCE INTERVAL It is the interval or range of values which most likely encompasses the true population value. It is the extent that a particular.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Medical Statistics as a science
Psych 230 Psychological Measurement and Statistics
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Introduction to Statistical Inference Jianan Hui 10/22/2014.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size for Precision or Power.
Chapter 10 The t Test for Two Independent Samples
© Copyright McGraw-Hill 2004
Sample Size Determination
Various Topics of Interest to the Inquiring Orthopedist Richard Gerkin, MD, MS BGSMC GME Research.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Statistical significance using Confidence Intervals
Dr.Shaikh Shaffi Ahamed, PhD Associate Professor Department of Family & Community Medicine College of Medicine, KSU Statistical significance using p -value.
Course: Research in Biomedicine and Health III Seminar 5: Critical assessment of evidence.
Sample Size Mahmoud Alhussami, DSc., PhD. Sample Size Determination Is the act of choosing the number of observations or replicates to include in a statistical.
Chapter 13 Understanding research results: statistical inference.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Statistical Significance or Hypothesis Testing. Significance testing Learning objectives of this lecture are to Understand Hypothesis: definition & types.
Dr.Theingi Community Medicine
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Chapter 8: Inference for Proportions
Statistical Process Control
Statistical significance using p-value
SAMPLE SIZE AND POWER CALCULATION
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Professor Dept. of Family & Community Medicine College.
Presentation transcript:

Estimation of Sample Size By Dr.Shaikh Shaffi Ahamed Ph.d., Associate Professor Dept.of Family & Community Medicine College of Medicine, KSU

INTRODUCTION ---A COMMON STATISTICAL PROBLEM ---SAMPLE SIZE REQUIRED TO ANSWER THE RESEARCH QUESTION OF INTEREST ---IT IS UNETHICAL TO CONDUCT STUDIES WHICH HAVE INAPPROPRIATE NUMBERS OF STUDY SUBJECTS .

Am I going to reach my objective? I have 4 months to finish my research project, of which only one week is for data collection I think I can get data on 50 subjects in a week Is 50 a sufficient number of subjects to test my hypothesis with the significance level I want?

Why to calculate sample size? To show that under certain conditions, the hypothesis test has a good chance of showing a desired difference (if it exists) To show to that the study has a reasonable chance to obtain a conclusive result To show that the necessary resources (human, monetary, time) will be minimized and well utilized

Sample Size Too Big: Requires too much resources Too Small: Won’t do the job

What do I need to know to calculate sample size? Most Important: sample size calculation is an educated guess It is more appropriate for studies involving hypothesis testing There is no magic involved; only statistical and mathematical logic and some algebra Researchers need to know something about what they are measuring and how it varies in the population of interest

Sample Size Calculations Formulate a PRIMARY question or hypothesis to test (or determine what you are estimating). Write down H0 and HA. Determine the endpoint. Choose an outcome measure. How do we “measure” or “quantify” the responses?

Factors related to the sample size There are many factors that are intertwined in the calculation of sample sizes or power of a study. Some factors depend on the design of the study, others on the investigator’s choices, and others on the data themselves. The first consideration is the type of response variable. The study design and the response variable will determine the type of statistical method used in the data analysis. For example, if the data are continuous, and two groups are being compared for their means, a t-test may be the appropriate statistical method of analysis. The t-test defines the formula for the sample size or power calculation. The second set of factors depend on the investigator’s choices. He/she needs to define the acceptable levels of significance and power of the study. The sample size may be more a consideration of availability and/or resources than what is necessary to achieve a certain power. The third factor is the variation of the data in the population of interest. It is intuitive to realize that the higher the variation of the data (this may includes measures of variance and correlation among observations), the larger the sample size will have to be to give us enough confidence that we have a good estimate of the mean, for example. The last five items in the slide above are related to each other in the formula defined by specific statistical test used in the study. If one knows four of those values, the fifth will be determined. Therefore, some values will have to either come from previous studies and/or knowledge, or will have to be assumed. Variance of outcome measure (cannot be controlled by researcher) Characteristics of the study design Quantities related to the research question (defined by the researcher)

Where do we get this knowledge? Previous published studies Pilot studies If information is lacking, there is no good way to calculate the sample size!

Study Design Type of response variable or outcome Number of groups to be compared Specific study design Type of statistical analysis In conjunction with the research question, the type of outcome and study design will determine the statistical method of analysis

Errors in sample Systematic error (or bias) Inaccurate response (information bias) Selection bias Sampling error (random error)

Type 1 error The probability of finding a difference when compared our sample with population, and in reality there is no difference Known as the α (or “type 1 error”) Usually set at 5% (or 0.05)

Type 2 error The probability of not finding a difference that actually exists between two groups (or between sample and population). Known as the β (or “type 2 error”) Power is (1- β) and is usually 80%

Diagnosis and statistical reasoning Disease status Present Absent Test result +ve True +ve False +ve (sensitivity) -ve False –ve True -ve (Specificity) Significance Difference is Present Absent (Ho not true) (Ho is true) Test result Reject Ho No error Type I err. 1-b a Accept Ho Type II err. No error b 1-a a : significance level 1-b : power

Estimation of Sample Size by Three ways: By using (1) Formulae (manual calculations) (2) Sample size tables or Nomogram (3) Softwares

Scenario 1 Precision Scenario 2 Power All studies Descriptive Hypothesis testing Sample surveys Simple - 2 groups Complex studies

SAMPLE SIZE FOR ADEQUATE PRECISION In a descriptive study, Summary statistics (mean, proportion) Reliability (or) precision By giving “confidence interval” Wider the C.I – sample statistic is not reliable and it may not give an accurate estimate of the true value of the population parameter

Sample size formulae For single mean : n = Z2α S2 /d2 where S=sd (s ) For a single proportion : n = Z2αP(1-P)/d2 Where , Zα =1.96 for 95% confidence level Zα = 2.58 for 99% confidence level

Sample size for estimating a population mean How close to the true mean Confidence around the sample mean Type I error. n = (Za/2)2 s2 / d2 s: standard deviation d: the accuracy of estimate (how close to the true mean). Za/2: A Normal deviate reflects the type I error. Example: we want to estimate the average weight in a population, and we want the error of estimation to be less than 2 kg of the true mean, with a probability of 95% (e.g., error rate of 5%). n = (1.96)2 s2 / 22

Effect of standard deviation Std Dev (s) Sample size 10 96 12 138 14 188 16 246 18 311 20 384

Problem 2 A study is to be performed to determine a certain parameter in a community. From a previous study a sd of 46 was obtained. If a sample error of up to 4 is to be accepted. How many subjects should be included in this study at 99% level of confidence?

Answer n = (Za/2)2 s2 / d2 s: standard deviation = 46 d: the accuracy of estimate (how close to the true mean)= given sample error =4 Za/2: A Normal deviate reflects the type I error. For 99% the critical value =2.58

Sample size for estimating a population proportion How close to the true proportion Confidence around the sample proportion. Type I error. n = (Za/2)2 p(1-p) / d2 p: proportion to be estimated. d: the accuracy of estimate (how close to the true proportion). Za/2: A Normal deviate reflects the type I error. Example: The proportion of preference for male child is around 80%. We want to estimate the preference p in a community within 5% with 95% confidence interval. N = (1.96)2 (0.8)(0.2) / 0.052 = 246 married women.

Problem 2 It was desired to estimate proportion of anemic children in a certain preparatory school. In a similar study at another school a proportion of 30 % was detected. Compute the minimal sample size required at a confidence limit of 95% and accepting a difference of up to 4% of the true population.

Answer n = (Za/2)2 p(1-p) / d2 p: proportion to be estimated = 30% (0.30) d: the accuracy of estimate (how close to the true proportion) = 4% (0.04) Za/2: A Normal deviate reflects the type I error For 95% the critical value =1.96

Three bits of information required to determine the sample size Scenario 2 Three bits of information required to determine the sample size Type I & II errors Variation Clinical effect

For two means : n =2 S2 (Zα+ Zβ)2 /d2 where S=sd For two proportions : Sample size formulae For two means : n =2 S2 (Zα+ Zβ)2 /d2 where S=sd For two proportions :   Zα= 1.96 for 95% confidence level Zα = 2.58 for 99% confidence level ; Zβ= 0.842 for 80% power Zβ= 1.282 for 90% power

Quantities related to the research question (defined by the researcher)  = Probability of rejecting H0 when H0 is true  is called significance level of the test  = Probability of not rejecting H0 when H0 is false 1- is called statistical power of the test

Quantities related to the research question (defined by the researcher) Size of the measure of interest to be detected Difference between two or more means Difference between two or more proportions Odds ratio, Relative risk, etc., The magnitude of these values depend on the research question and objective of the study (for example, clinical relevance)

Comparison of two means Objective: To observe whether feeding milk to 5 year old children enhances growth. Groups: Extra milk diet Normal milk diet Outcome: Height ( in cms.)

Assumptions or specifications: Type-I error (α) =0.05 Type-II error (β) = 0.20 i.e., Power(1-β) = 0.80 Clinically significant difference (∆) =0.5 cm., Measure of variation (SD.,) =2.0 cm., ( from literature or “Guesstimate”)

Using the appropriate formula: n =2 S2 (Zα+ Zβ)2 /d2 2(2)²(1.96 +0.842) 2 = -------------------------- (0.5)² = 252.8 ( in each group)

Simple Method: --- Nomogram = 0.5/2.0 = 0.25

0.25 500 80%power

Problem 2 A study is to be done to determine effect of 2 drugs (A and B) on blood glucose level. From previous studies using those drugs, Sd of BGL of 8 and 12 g/dl were obtained respectively. A significant level of 95% and a power of 90% is required to detect a mean difference between the two groups of 3 g/dl. How many subjects should be include in each group?

Answer (SD1 + SD2)² n = -------------------- * f(α,β) ∆²

Sample size for two proportions: example Example: The efficacy of ‘treatment A ‘ is expected to be 70%, and for ‘treatment B’ to be 60%. A study is planned to show the difference at the significance level of 1% and power of 90%. The sample size can be calculated as follows: p1 = 0.6; q1= 1-0.6 =0.4; p2 = 0.7; q2 =1-0.7=0.3; Z0.01 = 2.58; Z1-0.9 = 1.28. The sample size required for each group should be: n = (2.58+1.28)2[(0.6)(0.4)+(0.7)(0.30] /(0.6-0.7)2 = 670.5 Total sample size = 1342 ( consider for drop outs & lost to followup)

Important to remember Pilot studies do not need sample size calculation!!! Sample size is an educated guess, and it works only if: The study samples comes from the same or similar populations to the pilot study populations The population of interest is not changing over time The difference or association being studied exists

Summary Define research question well Consider study design, type of response variable, and type of data analysis Decide on the type of difference or change you want to detect (make sure it answers your research question) Choose  and  Use appropriate equation for sample size calculation or sample size tables/ nomogram or software.

Thanks