2 Learning Objectives Understand . . . The two premises on which sampling theory is based.The accuracy and precision for measuring sample validity.The five questions that must be answered to develop a sampling plan.
3 Learning Objectives Understand . . . The two categories of sampling techniques and the variety of sampling techniques within each category.The various sampling techniques and when each is used.
4 Small Samples Can Enlighten “The proof of the pudding is in the eating.By a small sample we may judge of thewhole piece.”Miguel de Cervantes Saavedraauthor
5 PulsePoint: Research Revelation 80The average number of text messages sent per day byAmerican teens.See the text Instructors Manual (downloadable from the text website) for ideas for using this research-generated statistic.
6 The Nature of Sampling Population Population Element Census Sample Sampling frameThe basic idea of sampling is that by selecting some of the elements in a population, we may draw conclusions about the entire population.A population element is the individual participant or object on which the measurement is taken. It is the unit of study. It may be a person but it could also be any object of interest.A population is the total collection of elements about which we wish to make some inferences.A census is a count of all the elements in a population.A sample frame is the listing of all population elements from which the sample will be drawn.
7 Availability of elements Why Sample?Availability of elementsLower costSamplingprovidesGreater speedGreater accuracyThis slide lists the reasons researchers use a sample rather than a census.
8 What Is a Sufficiently Large Sample? “In recent Gallup ‘Poll on polls,’ When asked about the scientific sampling foundation on which polls are based most said that a survey of 1,500 – 2,000 respondents—a larger than average sample size for national polls—cannot represent the views of all Americans.”Frank NewportThe Gallup Poll editor in chiefThe Gallup Organization
9 When Is a Census Appropriate? FeasibleNecessaryThe advantages of sampling over census studies are less compelling when the population is small and the variability within the population is high. Two conditions are appropriate for a census study. A census is feasible when the population is small and necessary when the elements are quite different from each other.
10 What Is a Valid Sample? Accurate Precise The ultimate test of a sample design is how well it represents the characteristics of the population it purports to represent. In measurement terms, the sample must be valid. Validity of a sample depends on two considerations: accuracy and precision.Here a sample is being taken of water, using a can suspended on a fishing line.Accuracy is the degree to which bias is absent from the sample. When the sample is drawn properly, the measure of behavior, attitudes, or knowledge of some sample elements will be less than the measure of those same variables drawn from the population. The measure of other sample elements will be more than the population values. Variations in these sample values offset each other, resulting in a sample value that is close to the population value. For these offsetting effects to occur, there must be enough elements in the sample and they must be drawn in a way that favors neither overestimation nor underestimation. Increasing the sample size can reduce systematic variance as a cause of error. Systematic variance is a variation that causes measurements to skew in one direction or another.Precision of estimate is the second criterion of a good sample design. The numerical descriptors that describe samples may be expected to differ from those that describe populations because of random fluctuations inherent in the sampling process. This is called sampling error and reflects the influence of chance in drawing the sample members. Sampling error is what is left after all known sources of systematic variance have been accounted for.Precision is measured by the standard error of estimate, a type of standard deviation measurement. The smaller the standard error of the estimate, the higher is the precision of the sample.
11 Sampling Design within the Research Process Exhibit 14-1 represents the several decisions the researcher makes when designing a sample. The sampling decisions flow from two decisions made in the formation of the management-research question hierarchy: the nature of the management question and the specific investigative questions that evolve from the research question.
12 Types of Sampling Designs ElementSelectionProbabilityNonprobabilityUnrestrictedSimple randomConvenienceRestrictedComplex randomPurposiveSystematicJudgmentClusterQuotaStratifiedSnowballDoubleExhibit 14-2The members of a sample are selected using probability or nonprobability procedures. Nonprobability sampling is an arbitrary and subjective sampling procedure where each population element does not have a known, nonzero chance of being included.Probability sampling is a controlled, randomized procedure that assures that each population element is given a known, nonzero chance of selection.
13 Steps in Sampling Design What is the target population?What are the parameters of interest?What is the sampling frame?This slide addresses the steps in sampling design.What is the appropriate sampling method?What size sample is needed?
14 When to Use Larger Sample? Desired precisionNumber of subgroupsConfidence levelPopulation varianceSmall error rangeThe greater the dispersion or variance within the population, the larger the sample must be to provide estimation precision.The greater the desired precision of the estimate, the larger the sample must be.The narrower or smaller the error range, the larger the sample must be.The higher the confidence level in the estimate, the larger the sample must be.The greater the number of subgroups of interest within a sample, the greater the sample size must be, as each subgroup must meet minimum sample size requirements.Cost considerations influence decisions about the size and type of sample and the data collection methods.A cheese factory is pictured here. Ask students if taking a sample would require a large or small sample of the output and what would influence their answer.
15 Simple Random Advantages Disadvantages Easy to implement with random dialingDisadvantagesRequires list of population elementsTime consumingLarger sample neededProduces larger errorsHigh costIn drawing a sample with simple random sampling, each population element has an equal chance of being selected into the samples. The sample is drawn using a random number table or generator. This slide shows the advantages and disadvantages of using this method.The probability of selection is equal to the sample size divided by the population size.Exhibit 14-6 covers how to choose a random sample. The steps are as follows:Assign each element within the sampling frame a unique number.Identify a random start from the random number table.Determine how the digits in the random number table will be assigned to the sampling frame.Select the sample elements from the sampling frame.
16 Systematic Advantages Disadvantages Simple to design Easier than simple randomEasy to determine sampling distribution of mean or proportionDisadvantagesPeriodicity within population may skew sample and resultsTrends in list may bias resultsModerate costIn drawing a sample with systematic sampling, an element of the population is selected at the beginning with a random start and then every Kth element is selected until the appropriate size is selected.The kth element is the skip interval, the interval between sample elements drawn from a sample frame in systematic sampling. It is determined by dividing the population size by the sample size.To draw a systematic sample, the steps are as follows:Identify, list, and number the elements in the populationIdentify the skip intervalIdentify the random startDraw a sample by choosing every kth entry.To protect against subtle biases, the research canRandomize the population before sampling,Change the random start several times in the process, andReplicate a selection of different samples.
17 Advantages Disadvantages StratifiedAdvantagesControl of sample size in strataIncreased statistical efficiencyProvides data to represent and analyze subgroupsEnables use of different methods in strataDisadvantagesIncreased error if subgroups are selected at different ratesEspecially expensive if strata on population must be createdHigh costIn drawing a sample with stratified sampling, the population is divided into subpopulations or strata and uses simple random on each strata. Results may be weighted or combined. The cost is high.Stratified sampling may be proportion or disproportionate. In proportionate stratified sampling, each stratum’s size is proportionate to the stratum’s share of the population.Any stratification that departs from the proportionate relationship is disproportionate.
18 Cluster Advantages Disadvantages Provides an unbiased estimate of population parameters if properly doneEconomically more efficient than simple randomLowest cost per sampleEasy to do without listDisadvantagesOften lower statistical efficiency due to subgroups being homogeneous rather than heterogeneousModerate costIn drawing a sample with cluster sampling, the population is divided into internally heterogeneous subgroups. Some are randomly selected for further study.Two conditions foster the use of cluster sampling:the need for more economic efficiency than can be provided by simple random sampling, and2) the frequent unavailability of a practical sampling frame for individual elements.Exhibit 14-7 provides a comparison of stratified and cluster sampling and is highlighted on the next slide.Several questions must be answered when designing cluster samples.How homogeneous are the resulting clusters?Shall we seek equal-sized or unequal-sized clusters?How large a cluster shall we take?Shall we use a single-stage or multistage cluster?How large a sample is needed?
19 Stratified and Cluster Sampling Population divided into few subgroupsHomogeneity within subgroupsHeterogeneity between subgroupsChoice of elements from within each subgroupClusterPopulation divided into many subgroupsHeterogeneity within subgroupsHomogeneity between subgroupsRandom choice of subgroupsExhibit 14-7
20 Area SamplingArea sampling is a cluster sampling technique applied to a population with well-defined political or geographic boundaries. It is a low-cost and frequently used method.
21 Double Sampling Advantages Disadvantages May reduce costs if first stage results in enough data to stratify or cluster the populationDisadvantagesIncreased costs if discriminately usedIn drawing a sample with double (sequential or multiphase) sampling, data are collected using a previously defined technique. Based on the information found, a subsample is selected for further study.
22 Nonprobability Samples No need to generalizeFeasibilityLimited objectivesWith a subjective approach like nonprobability sampling, the probability of selecting population elements is unknown. There is a greater opportunity for bias to enter the sample and distort findings. We cannot estimate any range within which to expect the population parameter. Despite these disadvantages, there are practical reasons to use nonprobability samples.When the research does not require generalization to a population parameter, then there is no need to ensure that the sample fully reflects the population. The researcher may have limited objectives such as those in exploratory research. It is less expensive to use nonprobability sampling. It also requires less time. Finally, a list may not be available.TimeCost
23 Nonprobability Sampling Methods ConvenienceJudgmentQuotaConvenience samples are nonprobability samples where the element selection is based on ease of accessibility. They are the least reliable but cheapest and easiest to conduct. Examples include informal pools of friends and neighbors, people responding to an advertised invitation, and “on the street” interviews.Judgment sampling is purposive sampling where the researcher arbitrarily selects sample units to conform to some criterion. This is appropriate for the early stages of an exploratory study.Quota sampling is also a type of purposive sampling. In this type, relevant characteristics are used to stratify the sample which should improve its representativeness. The logic behind quota sampling is that certain relevant characteristics describe the dimensions of the population. In most quota samples, researchers specify more than one control dimension. Each dimension should have a distribution in the population that can be estimated and be pertinent to the topic studied.Snowball sampling means that subsequent participants are referred by the current sample elements. This is useful when respondents are difficult to identify and best located through referral networks. It is also used frequently in qualitative studies.Snowball
24 Key Terms Area sampling Multiphase sampling Census Cluster samplingConvenience samplingDisproportionate stratified samplingDouble samplingJudgment samplingMultiphase samplingNonprobability samplingPopulationPopulation elementPopulation parametersPopulation proportion of incidenceProbability sampling
27 Random SamplesExhibit 14a-1 shows the Metro U dining club study population (N = 20,000) consisting of five subgroups based on their preferred lunch times. The values 1 through 5 represent preferred lunch times, each a 30-minute interval, starting at 11:00 a.m.Next we sample 10 elements from this population without knowledge of the population’s characteristics. We draw four samples of 10 elements each. The means for each sample are provided in the slide. Each mean is a point estimate, the best predictor of the unknown population mean. None of the samples shown is a perfect duplication because no sample perfectly replicates its population.We cannot judge which estimate is the true mean of the population but we can estimate the interval in which the true mean will fall by using any of the samples. This is accomplished by using a formula that computes the standard error of the mean.
28 Increasing Precision Exhibit 14a-2 The standard error creates an interval estimate that brackets the point estimate. The interval estimate is an interval or range of values within which the true population parameter is expected to fall. In this example, mu is predicted o be 3.0 or 12:00 noon plus or minus .36. Thus we would expect to find the true population parameter to be between 11:49 a.m. and 12:11 p.m.We have 68% confidence in this estimate because one standard error encompasses plus or minus 1 Z. This is illustrated in Exhibit 14a-3 on the next slide.
29 Confidence Levels & the Normal Curve Exhibit 14a-3The area under the curve represents the confidence estimates that one makes about the results. The combination of the interval range and the degree of confidence creates the confidence interval.With 95% confidence, the interval in which we would find the true mean increases from 11:39 a.m. to 12:21 p.m. We find this by multiplying the standard error by plus or minus 1.96 Z, which covers 95% of the area under the curve.
30 Approximate Degree of Confidence Standard ErrorsStandard Error(Z score)% of AreaApproximate Degree of Confidence1.0068.2768%1.6590.1090%1.9695.0095%3.0099.7399%Exhibit 14a-4These are the Z scores associated with various degrees of confidence. To increase the degree of confidence that the true population parameter falls within a given range, the standard error is multiplied by the appropriate z score.
31 Central Limit Theorem Exhibit 14a-5, Part B According to the central limit theorem, for sufficiently large samples (n 30), the sample means will be distributed around the population mean approximately in a normal distribution. If researchers draw repeated samples, as we did in the Metro U dining club study, the means for each sample could be plotted, and will form a normal distribution.
32 Estimates of Dining Visits ConfidenceZ score% of AreaInterval Range(visits per month)68%1.0068.2790%1.6590.1095%1.9695.0099%3.0099.73Exhibit 14a-6In this example, we want to know how many visits the dining club users make to the dining club each month. Using the formula for standard error of the mean with the standard deviation of the sample (because the value for the standard deviation of the population is unknown), we find that the standard error of the mean is .51 visits standard errors are equal to 1 visit. The researcher can estimate with 95% confidence that the population mean of expected number of visits is within 10 (the sample mean) plus or minus 1 visit or between 9 and 11 visits per month.The confidence level is a percentage that reflects the probability that the results will be correct. We might want a higher degree of confidence than the 95% level used. The table illustrates the interval ranges at various levels of confidence.If we want an estimate that will hold for a much smaller range, for example, 10.0 plus or minus .2 visits, we must either accept a lower level of confidence or take a sample large enough to provide this smaller interval with the highest desired confidence level.
33 Calculating Sample Size for Questions involving Means PrecisionConfidence levelSize of interval estimateTo compute the desired sample size for questions involving means, we need certain information.The precision and how to quantify it:The confidence level we want with our estimate.The size of the interval estimate.The expected dispersion in the population for the investigative question used.Whether a finite population adjustment is needed. When the size of the calculated sample exceeds 5% of the population, the finite limits of the population constrain the sample size needed. A correction factor formula is available in that event. In most sample calculations, population size does not have a major effect on sample size.Population DispersionNeed for FPA
34 Metro U Sample Size for Means StepsInformationDesired confidence level95% (z = 1.96)Size of the interval estimate .5 meals per monthExpected range in population0 to 30 mealsSample mean10Standard deviation4.1Need for finite population adjustmentNoStandard error of the mean.5/1.96 = .255Sample size(4.1)2/ (.255)2 = 259Exhibit 14a-7In this example, the researcher wants to know what size sample is necessary to estimate the number of meals per month consumed by dining club members. The questions mentioned on the previous slide must be addressed.The desired confidence level is 95% which means we will use a Z score of The interval estimate that the researcher is willing to accept is plus or minus .5 meals per month. These two items represent the desired precision.The sample mean is 10 and the standard deviation is 4.1. These figures were derived from a pretest. If a pretest had not provided the standard deviation, then the population dispersion could have been used to get a standard deviation. This is discussed further on the following slide.To calculate the standard error of the mean, the interval estimate is divided by the z score. This figure is then used in the sample size calculation. The standard deviation squared divided by the standard error of the mean squared is equal to the calculated sample size.Note that the more precise the desired results, the larger the sample size must be.
35 Proxies of the Population Dispersion Previous research on the topicPilot test or pretestRule-of-thumb calculation1/6 of the range
36 Metro U Sample Size for Proportions StepsInformationDesired confidence level95% (z = 1.96)Size of the interval estimate .10 (10%)Expected range in population0 to 100%Sample proportion with given attribute30%Sample dispersionPq = .30(1-.30) = .21Finite population adjustmentNoStandard error of the proportion.10/1.96 = .051Sample size.21/ (.051)2 = 81Exhibit 14a-7In this example, the researcher wants to know what size sample is necessary to estimate what percentage of the population says it would join the dining club, based on the projected rates and services. A pretest told us that 30% of those in the pretest sample were interested in joining. In this case, dispersion is measured in terms of p * q (in which q is the proportion of the population not having the attribute and q = (1-p). The measure of dispersion of the sample statistic also changes from the standard error of the mean to the standard error of the proportion.Like before, the desired confidence level is 95% which means we will use a Z score of The interval estimate that the researcher is willing to accept is plus or minus .10 or 10% (this is a subjective decision). These two items represent the desired precision.To calculate the standard error of the proportion, the interval estimate is divided by the z score. This figure is then used in the sample size calculation. The dispersion divided by the standard error of the proportion squared is equal to the calculated sample size.In this case, the sample size is smaller than the one in the previous example. If both questions were relevant to the research, the larger sample size would be used.