Sampling Methods, Sample Size, and Study Power
Objectives Describe common sampling methods Review basic types of samples Discuss the basic information required to calculate a sample size Review the concept of study power
Definition of Terms Population = a collection of items Census = a complete canvass of a population Sample = units selected as representative of a population and from which inferences about the population will be made Unit = an individual item in the population
Relationship Between Samples and Validity Sampling has the largest effect on which of the following? a. random assignment b. internal validity c. construct validity d. external validity e. both a and d
Basic Questions of Sampling Sample or Census? If sample, what procedure? If sample, what size?
Census or Sample? Depends on.... Size of population Cost Need for precision Destruction of sampling units Need for in depth study Constancy of characteristics being studied
Sample if the ... Population is large Cost of census is large Need for absolute accuracy is not critical There is a need for in-depth study Observed units will be destroyed Characteristic being studied is relatively constant
Sampling Procedures Non-probability samples Probability Unit selection is influenced by personal judgement Objectivity and randomness is absent Probability Objective rules for selection Randomness allows for inferences about a population based on measurement of sampling error
Non-probability Samples Convenience sample Select units on the basis of convenience e.g., a mall intercept Judgement sample Units selected based on the assumption that the researcher can identify subjects that serve the research purpose e.g., an expert panel
Non-probability Samples Quota sample Select units to make the sample contain the same proportion of an important characteristic as found in the population e.g., public and private hospitals, bed size Snowball sample Obtain a few units and use those units to obtain referrals Used when the study topic is a rare event, accidental, controversial, or confidential
Based on two principles Probability Sampling Based on two principles Unbiased selection procedure All units in a population have a known and non-zero probability of selection
The value of probability sampling Standard error = / n A measure of precision For any given standard deviation of a mean a sample of 100 is as precise for a population of 200,000 as it is for 20,000
Probability Samples Simple random sample Each population unit has a known and equal probability of selection Each possible sample of n units from a population of N units has an equal probability of selection Assumes replacement of units to population Sampling from finite populations requires an adjustment Use of a sampling frame limits generalizations to the sampling frame
Probability Sampling Stratified Sampling Divide the population into mutually exclusive and exhaustive categories Take a simple random sample from each strata Take a simple random sample from each strata
Stratified Sample Use when simple random sample will not capture enough units of a strata Units may be taken proportionate to the size of the strata or to the variance Size procedure is more common Want units to be similar within strata and different between strata (e.g., maximize intra-strata homogeneity and inter-strata heterogeneity
Probability Sampling Cluster sampling Divide the population into mutually exclusive and exhaustive categories Take a simple random sample of clusters Take a simple random sample of clusters
Cluster Sample Can do multi-stage cluster sample Main advantage is that you do not need a sampling frame Census map example Two common approaches within clusters Area sampling Systematic sampling
Steps for Drawing a Sample Define the population Identify the sampling frame Select a sampling procedure Determine the sample size Select the sample elements
Importance of Sample Size n must be large enough to detect differences worth detecting If n is too large, it is wasteful, potentially costly, and risky for sampling units If n is too small, effects will be missed
Basic Information Needed to Calculate Sample Size Purpose Point estimate (e.g., population mean) Hypothesis testing Precision desired (acceptable error) Variability (dispersion in the population) Confidence
Sample size for a point estimate Set level of acceptable error (e) Set the confidence level ( expressed as Z) Calculate (estimate) the standard deviation Calculate sample size: n = [Z * SD / e]2
Sample size for a point estimate Example: ER visits/year for asthma pts n = [Z * SD / e]2 Z = 1.96 (alpha = 0.05) SD = 100 e = +/- 4 n = [1.96 * 100 / 4]2 = 2,401 How can we reduce sample size required?
Sample size for a hypothesis test Set values for Type I () and Type II () errors and express them as Z values Calculate (estimate) the standard deviation Calculate sample size: n = (Z + Z) SD 2 m1 – m2 Where n = number of subject per group
Sample Size Example for Paired t-test Quality adjusted life years Drug A - 9.8 QALYs Drug B - 11.2 QALYs Standard deviation Range 1.3 to 18.5 = 0.05 (z=1.64, one tailed) = 0.20 (z=0.84); power = 1 - Beta
Sample size for a paired t-test n = (Z + Z) SD 2 m1 – m2 n = (1.64 + 0.84) 4.3 2 11.2 – 9.8 n = ??? (per group)
Independent Sample t-test Multiply the results of the paired test formula by 2 This is the number of subjects per group Multiply by 2 again for the total number of subjects
Sample Size Caveats Response rates Refusal to participate Attrition Quality of sample is as important as size Sampling error is controlled by increasing sample size
Power of a statistical test Power = probability of rejecting the null when it is false Power is a function of: Sample size Effect size Alpha level (e.g., odds the result is due to chance)
Power of a statistical test Usually try to estimate sample size needed to achieve a desired power of statistical test A priori power calculation Cannot know power of the test until the study is complete because effect size is an estimate Post hoc power calculation
Power of a statistical test Quick calculation for: alpha = 0.05 beta = 0.20 power = 0.80 n >= 15.68 (C/R)2 where C = coefficient of variation (std dev as a % of mean) R = relative precision level (as a % of the mean) Sometimes rounded to n >= 20 (C/R)2
Summary We always prefer probability to non-prob. samples Studies with low sample size often have low power and therefore have difficulty detecting effect sizes Sampling is efficient