Appropriate Sampling Ann Abbott Rocky Mountain Research Station Moscow Forestry Sciences Laboratory
Outline What is Appropriate Sampling How do we do it Questions to Ask Sampling Designs Sample Size Northern Region Protocol
What is appropriate sampling? Meets the objectives of the research question Representative of the population Feasible Cost effective
Appropriate Sampling is the RESULT Of answering a series of questions The answers to the appropriate questions lead naturally to the appropriate Sampling Design and Data Analysis/Interpretation
Questions to Ask Objectives of the Research Population for inferences Sampling Units Translation of the objectives Preliminary Information Choice of Sampling Design
Questions to Ask Determination of Sample Size Auxiliary Variables Randomization Recording Results Analysis
Stating the Objectives Have the objectives of the investigation been clearly and explicitly stated, along with the reasons for undertaking it? Have the objectives been translated into precise questions that sample determinations can be expected to answer?
Defining the Population Has the population about which inferences to be made been carefully defined? What constraints are to be placed on the population? Are the units to be measured or counted representative of the population? If not, what changes must be made to ensure representativeness?
Defining the Population Is there a logical framework for the choice of sample units from the defined population? If not, what steps can be taken to impose a logical sampling frame?
Sampling Units A successful sampling scheme involves the selection of an appropriate sampling unit Quadrat Leaves of a plant Individual organism Belt transect Point
Sampling Units Are the sampling units naturally defined? If not, how will they be defined? Is the number of sampling units finite? If it is finite, is the total number of units in the population large enough to ignore finite sampling considerations? Is the definition of the sampling units appropriate to the objectives
Choice of Sampling Unit Must be the unit upon which you wish to make inferences and estimates Defined to be “nonoverlapping collections of elements from the population that cover the entire population” Sampled without replacement
Choice of Sampling Unit Point versus Area Point samples allow inferences based on the number of observations in the sample Inferences are made on means or percentages from the sample observations Area samples are generally measured with densities or percent of the area covered Inferences are made by extrapolating the sample density to the entire area
Choice of Sampling Unit: Point vs Area Point samples are quicker, can potentially give a more cost effective coverage of the area Area samples can yield more detailed information but may be more time consuming Area sampling assumes that counts are made without error
Translating the Objectives What exactly is to be estimated or tested? Are the required estimates proportions, totals, means, totals or means over sub-populations, or something else? Have blank data sheets been constructed? What is the smallest subset of data from which estimates are to be made? What precision is required of the estimates for the various subsets?
Preliminary Information Is information about the population available that may be helpful in designing the sampling scheme? Are estimates of the likely variability available? Is a pilot study feasible or desirable? Are there any known factors that help stratify the population?
Variability The variation that is inherent in soils data must be accounted for during the design phase of a soil sampling plan, including Sampling design Data collection procedures Analytical procedures Data Analysis
“One of the key characteristics of the soil system is its extreme variability.” (Mason 1992) Researchers have long been cautioned about failing to consider the variability in soil sampling when dealing with any study of the soils system (e.g. Cline 1944).
Accounting for Variability Ensuring that the sample adequately covers the entire population Reporting variability estimates along with central tendency estimates Reporting interval estimates
Use an interactive approach to balance the data quality needs and resources with designs that will either control variation, stratify to reduce variation, or reduce the influence of variation on the decision process
Precision, Bias and Accuracy Precision is a measure of the reproducibility of measurements of a particular soil condition or constituent The statistical techniques seen in soil sampling are designed to measure precision and not accuracy Bias is a systematic error that contributes to the difference between the mean of a large number of test results and an accepted reference value.
Precision, Bias and Accuracy Accuracy is the correctness of the measurement and cannot be directly measured: it is the sum of precision and bias Red dots are precise but biased Blue dots are unbiased but imprecise Yellow dots are biased and imprecise Green dots are unbiased, precise and therefore accurate
Sampling Designs Simple Random Sampling Stratified Random Sampling Systematic Random Sampling Cluster Sampling Other Combinations
Sampling Designs Can the population as defined be broken into naturally occurring groups, where the grouping variable affects the measured variable(s)? If it cannot, Simple Random Sampling or Systematic Sampling can be effective If it can, Stratified Random Sampling or Cluster Sampling
Simple Random vs Systematic Simple Random Sampling: If there a “list” (sampling frame) of all sampling units in the population Randomly selects from units on the list Systematic Random Sampling: If there is no sampling frame available but there is an estimate of the total number of sampling units Randomly selects starting point
Simple Random Sampling Used when there is inadequate information for developing a conceptual model for a site or for stratifying a site Any sample in which the probabilities of selection are known Sampling units are chosen by using some method using chance to determine selection
Simple random sampling is the basis for all probability sampling techniques and is the point of reference from which modifications to increase sampling efficiency may be made Alone, simple random sampling may not give the desired precision
Simple Random Sampling Advantages Prior information about population is not necessary Easy to perform, easy to analyze Disadvantages May not give desired precision Need a sampling frame
Computation Simple Random Sample-continuous variable Mean Variance Confidence Interval Sample Size
Computation Simple Random Sample-Binomial variable Proportion Variance Confidence Interval Sample Size
Systematic Random Sampling Attempt to provide better coverage of the study area or population than that provided by a simple random sample or a stratified random sample Is a simple random sample based on spatial distribution over the site Does not require a complete list of sampling units Can give better coverage than a simple random sample
Systematic Random Sampling Requires some estimate of the total number of sampling units in the population Required sample size must be calculated Determine sampling interval between units Randomly select starting point Transect sampling is a version of Systematic Random Sampling
Systematic Random Sampling Collects samples in a regular pattern over the area in the investigation Grid Line Transect Orientation of grid or transect starting point should be randomly selected
Systematic Random Sampling Considerations Sample size and population size estimates Some knowledge of the population to avoid sampling along periodicities
Stratified vs Cluster Sampling Used when the population can be broken into naturally occurring groups or segments Stratified Random Sampling: when there is more variability among groups than within groups Cluster Sampling: when there is more variability within groups than among
Stratified Random Sampling Prior knowledge of the sampling area and information obtained from background data may be used to reduce the number of observations necessary to attain specified precision Goal is to increase precision and control sources of variability in the data
Stratified Random Sampling Variability between strata must be larger than variability with strata for any benefit to be seen Sampling within each stratum is done with a Simple Random Sample
Stratified Random Sampling Advantages Gives estimates for subgroups Can be more precise than Simple Random Sampling Can be more convenient to implement Disadvantages Requires prior information about the population More complicated computation
Computation Stratified Random Sample-continuous variable Mean Variance Confidence Interval
Stratified Random Sample Sample Size Calculation Requires information about the relationship between the individuals among strata Can be calculated by weighting strata Can allocate sampling based on minimizing the variance for a fixed cost Other ways to allocate sampling among strata (optimal, Neyman)
Post Stratification Can be used when stratification is appropriate for some key variable, but cannot be done until after the sample is selected Often appropriate when a simple random sample is not properly balanced according to major groupings
Post Stratification Mean Variance
Cluster Sampling Used when there is more variability within groups than among Groups are randomly sampled Units within groups are sampled Can sample every element within the group Can take a second random sample within the group
Questions to Ask in Choosing a Sampling Design If there is no information on population groupings, will simple random or systematic random sampling better meet the objectives? Is Simple Random Sampling likely to be effective? If not, have the reasons for not using simple random sampling been clearly stated?
Questions to Ask in Choosing a Sampling Design If Systematic Random Sampling is chosen, what interval will separate units? Is there a likelihood that the interval will coincide with periodicity in the data? If so, what steps will be taken to avoid the resulting bias in the estimates?
Questions to Ask in Choosing a Sampling Design If there is a grouping in the population, will stratification improve the precision of the estimates? Has the efficiency of the stratification been calculated? What is the basis of the stratification? How will the sampling units be allocated?
Questions to Ask in Choosing a Sampling Design If there is a grouping in the population, is there an advantage to cluster sampling? Has the efficiency been calculated?
Sample Size Calculated based on variability (standard deviation) within the population and desired precision of the estimate (confidence level) Simple Random Sample and Systematic Random Sample Stratified Random Sample (complicated) but still needs variance
Sample Size Specific sampling design considerations Systematic: is the sample size required to uniformly cover the population consistent with the expected precision? Stratified: has the efficiency of the stratification been tested in reducing the sample size or in obtaining the largest number of observations from the part of the population of greatest interest?
Sample Size Sample design considerations, continued Multistage: has the efficiency of various combinations of sample units at different stages been tested? Cluster: has the efficiency of various size clusters been tested?
Sample Size Cost considerations Must the number of observations be modified to account for variation in cost in different parts of the sampling procedure? If so, can the design be improved for better cost efficiency?
Randomization Have the sampling units been selected by an explicit randomization procedure? Has the randomization procedure been documented? Were any constraints correctly applied?
Sample Design Example Northern Region Soil Monitoring Protocol Goal: Develop an easy, cost effective and statistically defensible monitoring protocol for disturbance Stating the objectives: Characterize the activity area in terms of management related disturbance
Northern Region Protocol Defining the population: All possible ‘points’ within the Activity Area Sampling units defined as ‘points’ Infinite number of possible ‘points’ in the population so finite sample correction factors do not need to be used
Northern Region Protocol Sample Design Stratification may be desirable but variability information is unavailable Simple Random Sampling may not give the appropriate coverage Systematic Random Sampling (Transect) was chosen to give the best coverage of the area
Northern Region Protocol Translating the objectives What exactly will be measured or tested: Forest floor depth Forest floor missing Topsoil displacement Mixed topsoil/subsoil Erosion Rutting (3 depths) Burning (light, moderate, severe) Compaction (3 depths) Platy/massive structure (3 depths) 5 forest floor variables
Northern Region Protocol Translating the objectives: Blank data tables
Northern Region Protocol Translating the objectives: what exactly is to be estimated or tested? What proportion of points in the sample have the characteristic of the indicator variable? What is the variability associated with the proportion?
Northern Region Protocol Translating the objectives: What is the required precision of the estimates? Confidence intervals within ± 5% of the estimate Confidence levels are determined by the line officer, allow choice from 70% to 95%
Northern Region Protocol What preliminary information is available about the activity area? Approximate size and shape Harvest history Variability estimates generally unknown A pilot would be best Stratification potential exists
Northern Region Protocol Problem: Variability estimates are unavailable Pilot studies are not feasible due to time and cost constraint Statistically valid sample sizes are required
Sequential Sampling An alternative approach to sampling in which the sample size is not fixed in advance Observations are collected individually or in small batches After each observation or batch, the data are examined to determine whether or not a decision may be made from the accumulated data
Sequential Sampling Combines data collection and data analysis into a single process or sampling plan Can considerably reduce the sample size requirements and data processing overheads
Sequential Sampling Best used in situations where classification of a population is useful and where the emphasis is on decision making In the simplest and most frequently used form, it is used to make binary classifications but can be extended into other applications
Northern Region Protocol Use a combination of sequential and systematic random sampling to obtain variability information for sample size calculation at the same sampling visit as the full data collection trip First 30 observations are used to calculate initial sample size, then sample size is continually updated as sampling continues
Northern Region Protocol Indicator variables are binomial (0,1) Binomial variables converge to a normal distribution when n ≥ 30 Attractive for sampling since the maximum variability can be computed
Northern Region Protocol When sampling is complete for the activity area, the estimates and confidence intervals are computed Protocol allows field crews to sample an activity area with a statistically valid sample size in one visit