# Chapter 14 Sampling McGraw-Hill/Irwin

## Presentation on theme: "Chapter 14 Sampling McGraw-Hill/Irwin"— Presentation transcript:

Chapter 14 Sampling McGraw-Hill/Irwin

Learning Objectives Understand . . .
The two premises on which sampling theory is based. The accuracy and precision for measuring sample validity. The five questions that must be answered to develop a sampling plan.

Learning Objectives Understand . . .
The two categories of sampling techniques and the variety of sampling techniques within each category. The various sampling techniques and when each is used.

Small Samples Can Enlighten
“The proof of the pudding is in the eating. By a small sample we may judge of the whole piece.” Miguel de Cervantes Saavedra author

PulsePoint: Research Revelation
80 The average number of text messages sent per day by American teens. See the text Instructors Manual (downloadable from the text website) for ideas for using this research-generated statistic.

The Nature of Sampling Population Population Element Census Sample
Sampling frame The basic idea of sampling is that by selecting some of the elements in a population, we may draw conclusions about the entire population. A population element is the individual participant or object on which the measurement is taken. It is the unit of study. It may be a person but it could also be any object of interest. A population is the total collection of elements about which we wish to make some inferences. A census is a count of all the elements in a population. A sample frame is the listing of all population elements from which the sample will be drawn.

Availability of elements
Why Sample? Availability of elements Lower cost Sampling provides Greater speed Greater accuracy This slide lists the reasons researchers use a sample rather than a census.

What Is a Sufficiently Large Sample?
“In recent Gallup ‘Poll on polls,’ When asked about the scientific sampling foundation on which polls are based most said that a survey of 1,500 – 2,000 respondents—a larger than average sample size for national polls—cannot represent the views of all Americans.” Frank Newport The Gallup Poll editor in chief The Gallup Organization

When Is a Census Appropriate?
Feasible Necessary The advantages of sampling over census studies are less compelling when the population is small and the variability within the population is high. Two conditions are appropriate for a census study. A census is feasible when the population is small and necessary when the elements are quite different from each other.

What Is a Valid Sample? Accurate Precise
The ultimate test of a sample design is how well it represents the characteristics of the population it purports to represent. In measurement terms, the sample must be valid. Validity of a sample depends on two considerations: accuracy and precision. Here a sample is being taken of water, using a can suspended on a fishing line. Accuracy is the degree to which bias is absent from the sample. When the sample is drawn properly, the measure of behavior, attitudes, or knowledge of some sample elements will be less than the measure of those same variables drawn from the population. The measure of other sample elements will be more than the population values. Variations in these sample values offset each other, resulting in a sample value that is close to the population value. For these offsetting effects to occur, there must be enough elements in the sample and they must be drawn in a way that favors neither overestimation nor underestimation. Increasing the sample size can reduce systematic variance as a cause of error. Systematic variance is a variation that causes measurements to skew in one direction or another. Precision of estimate is the second criterion of a good sample design. The numerical descriptors that describe samples may be expected to differ from those that describe populations because of random fluctuations inherent in the sampling process. This is called sampling error and reflects the influence of chance in drawing the sample members. Sampling error is what is left after all known sources of systematic variance have been accounted for. Precision is measured by the standard error of estimate, a type of standard deviation measurement. The smaller the standard error of the estimate, the higher is the precision of the sample.

Sampling Design within the Research Process
Exhibit 14-1 represents the several decisions the researcher makes when designing a sample. The sampling decisions flow from two decisions made in the formation of the management-research question hierarchy: the nature of the management question and the specific investigative questions that evolve from the research question.

Types of Sampling Designs
Element Selection Probability Nonprobability Unrestricted Simple random Convenience Restricted Complex random Purposive Systematic Judgment Cluster Quota Stratified Snowball Double Exhibit 14-2 The members of a sample are selected using probability or nonprobability procedures. Nonprobability sampling is an arbitrary and subjective sampling procedure where each population element does not have a known, nonzero chance of being included. Probability sampling is a controlled, randomized procedure that assures that each population element is given a known, nonzero chance of selection.

Steps in Sampling Design
What is the target population? What are the parameters of interest? What is the sampling frame? This slide addresses the steps in sampling design. What is the appropriate sampling method? What size sample is needed?

When to Use Larger Sample?
Desired precision Number of subgroups Confidence level Population variance Small error range The greater the dispersion or variance within the population, the larger the sample must be to provide estimation precision. The greater the desired precision of the estimate, the larger the sample must be. The narrower or smaller the error range, the larger the sample must be. The higher the confidence level in the estimate, the larger the sample must be. The greater the number of subgroups of interest within a sample, the greater the sample size must be, as each subgroup must meet minimum sample size requirements. Cost considerations influence decisions about the size and type of sample and the data collection methods. A cheese factory is pictured here. Ask students if taking a sample would require a large or small sample of the output and what would influence their answer.

Easy to implement with random dialing Disadvantages Requires list of population elements Time consuming Larger sample needed Produces larger errors High cost In drawing a sample with simple random sampling, each population element has an equal chance of being selected into the samples. The sample is drawn using a random number table or generator. This slide shows the advantages and disadvantages of using this method. The probability of selection is equal to the sample size divided by the population size. Exhibit 14-6 covers how to choose a random sample. The steps are as follows: Assign each element within the sampling frame a unique number. Identify a random start from the random number table. Determine how the digits in the random number table will be assigned to the sampling frame. Select the sample elements from the sampling frame.

Easier than simple random Easy to determine sampling distribution of mean or proportion Disadvantages Periodicity within population may skew sample and results Trends in list may bias results Moderate cost In drawing a sample with systematic sampling, an element of the population is selected at the beginning with a random start and then every Kth element is selected until the appropriate size is selected. The kth element is the skip interval, the interval between sample elements drawn from a sample frame in systematic sampling. It is determined by dividing the population size by the sample size. To draw a systematic sample, the steps are as follows: Identify, list, and number the elements in the population Identify the skip interval Identify the random start Draw a sample by choosing every kth entry. To protect against subtle biases, the research can Randomize the population before sampling, Change the random start several times in the process, and Replicate a selection of different samples.

Stratified Advantages Control of sample size in strata Increased statistical efficiency Provides data to represent and analyze subgroups Enables use of different methods in strata Disadvantages Increased error if subgroups are selected at different rates Especially expensive if strata on population must be created High cost In drawing a sample with stratified sampling, the population is divided into subpopulations or strata and uses simple random on each strata. Results may be weighted or combined. The cost is high. Stratified sampling may be proportion or disproportionate. In proportionate stratified sampling, each stratum’s size is proportionate to the stratum’s share of the population. Any stratification that departs from the proportionate relationship is disproportionate.

Provides an unbiased estimate of population parameters if properly done Economically more efficient than simple random Lowest cost per sample Easy to do without list Disadvantages Often lower statistical efficiency due to subgroups being homogeneous rather than heterogeneous Moderate cost In drawing a sample with cluster sampling, the population is divided into internally heterogeneous subgroups. Some are randomly selected for further study. Two conditions foster the use of cluster sampling: the need for more economic efficiency than can be provided by simple random sampling, and 2) the frequent unavailability of a practical sampling frame for individual elements. Exhibit 14-7 provides a comparison of stratified and cluster sampling and is highlighted on the next slide. Several questions must be answered when designing cluster samples. How homogeneous are the resulting clusters? Shall we seek equal-sized or unequal-sized clusters? How large a cluster shall we take? Shall we use a single-stage or multistage cluster? How large a sample is needed?

Stratified and Cluster Sampling
Population divided into few subgroups Homogeneity within subgroups Heterogeneity between subgroups Choice of elements from within each subgroup Cluster Population divided into many subgroups Heterogeneity within subgroups Homogeneity between subgroups Random choice of subgroups Exhibit 14-7

Area Sampling Area sampling is a cluster sampling technique applied to a population with well-defined political or geographic boundaries. It is a low-cost and frequently used method.

May reduce costs if first stage results in enough data to stratify or cluster the population Disadvantages Increased costs if discriminately used In drawing a sample with double (sequential or multiphase) sampling, data are collected using a previously defined technique. Based on the information found, a subsample is selected for further study.

Nonprobability Samples
No need to generalize Feasibility Limited objectives With a subjective approach like nonprobability sampling, the probability of selecting population elements is unknown. There is a greater opportunity for bias to enter the sample and distort findings. We cannot estimate any range within which to expect the population parameter. Despite these disadvantages, there are practical reasons to use nonprobability samples. When the research does not require generalization to a population parameter, then there is no need to ensure that the sample fully reflects the population. The researcher may have limited objectives such as those in exploratory research. It is less expensive to use nonprobability sampling. It also requires less time. Finally, a list may not be available. Time Cost

Nonprobability Sampling Methods
Convenience Judgment Quota Convenience samples are nonprobability samples where the element selection is based on ease of accessibility. They are the least reliable but cheapest and easiest to conduct. Examples include informal pools of friends and neighbors, people responding to an advertised invitation, and “on the street” interviews. Judgment sampling is purposive sampling where the researcher arbitrarily selects sample units to conform to some criterion. This is appropriate for the early stages of an exploratory study. Quota sampling is also a type of purposive sampling. In this type, relevant characteristics are used to stratify the sample which should improve its representativeness. The logic behind quota sampling is that certain relevant characteristics describe the dimensions of the population. In most quota samples, researchers specify more than one control dimension. Each dimension should have a distribution in the population that can be estimated and be pertinent to the topic studied. Snowball sampling means that subsequent participants are referred by the current sample elements. This is useful when respondents are difficult to identify and best located through referral networks. It is also used frequently in qualitative studies. Snowball

Key Terms Area sampling Multiphase sampling Census
Cluster sampling Convenience sampling Disproportionate stratified sampling Double sampling Judgment sampling Multiphase sampling Nonprobability sampling Population Population element Population parameters Population proportion of incidence Probability sampling

Key Terms Proportionate stratified sampling Quota sampling
Sample statistics Sampling Sampling error Sampling frame Sequential sampling Simple random sample Skip interval Snowball sampling Stratified random sampling Systematic sampling Systematic variance

Determining Sample Size

Random Samples Exhibit 14a-1 shows the Metro U dining club study population (N = 20,000) consisting of five subgroups based on their preferred lunch times. The values 1 through 5 represent preferred lunch times, each a 30-minute interval, starting at 11:00 a.m. Next we sample 10 elements from this population without knowledge of the population’s characteristics. We draw four samples of 10 elements each. The means for each sample are provided in the slide. Each mean is a point estimate, the best predictor of the unknown population mean. None of the samples shown is a perfect duplication because no sample perfectly replicates its population. We cannot judge which estimate is the true mean of the population but we can estimate the interval in which the true mean will fall by using any of the samples. This is accomplished by using a formula that computes the standard error of the mean.

Increasing Precision Exhibit 14a-2
The standard error creates an interval estimate that brackets the point estimate. The interval estimate is an interval or range of values within which the true population parameter is expected to fall. In this example, mu is predicted o be 3.0 or 12:00 noon plus or minus .36. Thus we would expect to find the true population parameter to be between 11:49 a.m. and 12:11 p.m. We have 68% confidence in this estimate because one standard error encompasses plus or minus 1 Z. This is illustrated in Exhibit 14a-3 on the next slide.

Confidence Levels & the Normal Curve
Exhibit 14a-3 The area under the curve represents the confidence estimates that one makes about the results. The combination of the interval range and the degree of confidence creates the confidence interval. With 95% confidence, the interval in which we would find the true mean increases from 11:39 a.m. to 12:21 p.m. We find this by multiplying the standard error by plus or minus 1.96 Z, which covers 95% of the area under the curve.

Approximate Degree of Confidence
Standard Errors Standard Error (Z score) % of Area Approximate Degree of Confidence 1.00 68.27 68% 1.65 90.10 90% 1.96 95.00 95% 3.00 99.73 99% Exhibit 14a-4 These are the Z scores associated with various degrees of confidence. To increase the degree of confidence that the true population parameter falls within a given range, the standard error is multiplied by the appropriate z score.

Central Limit Theorem Exhibit 14a-5, Part B
According to the central limit theorem, for sufficiently large samples (n  30), the sample means will be distributed around the population mean approximately in a normal distribution. If researchers draw repeated samples, as we did in the Metro U dining club study, the means for each sample could be plotted, and will form a normal distribution.

Estimates of Dining Visits
Confidence Z score % of Area Interval Range (visits per month) 68% 1.00 68.27 90% 1.65 90.10 95% 1.96 95.00 99% 3.00 99.73 Exhibit 14a-6 In this example, we want to know how many visits the dining club users make to the dining club each month. Using the formula for standard error of the mean with the standard deviation of the sample (because the value for the standard deviation of the population is unknown), we find that the standard error of the mean is .51 visits standard errors are equal to 1 visit. The researcher can estimate with 95% confidence that the population mean of expected number of visits is within 10 (the sample mean) plus or minus 1 visit or between 9 and 11 visits per month. The confidence level is a percentage that reflects the probability that the results will be correct. We might want a higher degree of confidence than the 95% level used. The table illustrates the interval ranges at various levels of confidence. If we want an estimate that will hold for a much smaller range, for example, 10.0 plus or minus .2 visits, we must either accept a lower level of confidence or take a sample large enough to provide this smaller interval with the highest desired confidence level.

Calculating Sample Size for Questions involving Means
Precision Confidence level Size of interval estimate To compute the desired sample size for questions involving means, we need certain information. The precision and how to quantify it: The confidence level we want with our estimate. The size of the interval estimate. The expected dispersion in the population for the investigative question used. Whether a finite population adjustment is needed. When the size of the calculated sample exceeds 5% of the population, the finite limits of the population constrain the sample size needed. A correction factor formula is available in that event. In most sample calculations, population size does not have a major effect on sample size. Population Dispersion Need for FPA

Metro U Sample Size for Means
Steps Information Desired confidence level 95% (z = 1.96) Size of the interval estimate  .5 meals per month Expected range in population 0 to 30 meals Sample mean 10 Standard deviation 4.1 Need for finite population adjustment No Standard error of the mean .5/1.96 = .255 Sample size (4.1)2/ (.255)2 = 259 Exhibit 14a-7 In this example, the researcher wants to know what size sample is necessary to estimate the number of meals per month consumed by dining club members. The questions mentioned on the previous slide must be addressed. The desired confidence level is 95% which means we will use a Z score of The interval estimate that the researcher is willing to accept is plus or minus .5 meals per month. These two items represent the desired precision. The sample mean is 10 and the standard deviation is 4.1. These figures were derived from a pretest. If a pretest had not provided the standard deviation, then the population dispersion could have been used to get a standard deviation. This is discussed further on the following slide. To calculate the standard error of the mean, the interval estimate is divided by the z score. This figure is then used in the sample size calculation. The standard deviation squared divided by the standard error of the mean squared is equal to the calculated sample size. Note that the more precise the desired results, the larger the sample size must be.

Proxies of the Population Dispersion
Previous research on the topic Pilot test or pretest Rule-of-thumb calculation 1/6 of the range

Metro U Sample Size for Proportions
Steps Information Desired confidence level 95% (z = 1.96) Size of the interval estimate  .10 (10%) Expected range in population 0 to 100% Sample proportion with given attribute 30% Sample dispersion Pq = .30(1-.30) = .21 Finite population adjustment No Standard error of the proportion .10/1.96 = .051 Sample size .21/ (.051)2 = 81 Exhibit 14a-7 In this example, the researcher wants to know what size sample is necessary to estimate what percentage of the population says it would join the dining club, based on the projected rates and services. A pretest told us that 30% of those in the pretest sample were interested in joining. In this case, dispersion is measured in terms of p * q (in which q is the proportion of the population not having the attribute and q = (1-p). The measure of dispersion of the sample statistic also changes from the standard error of the mean to the standard error of the proportion. Like before, the desired confidence level is 95% which means we will use a Z score of The interval estimate that the researcher is willing to accept is plus or minus .10 or 10% (this is a subjective decision). These two items represent the desired precision. To calculate the standard error of the proportion, the interval estimate is divided by the z score. This figure is then used in the sample size calculation. The dispersion divided by the standard error of the proportion squared is equal to the calculated sample size. In this case, the sample size is smaller than the one in the previous example. If both questions were relevant to the research, the larger sample size would be used.

Appendix 14a: Key Terms Central limit theorem Confidence interval
Confidence level Interval estimate Point estimate Proportion

Keynote Experiment Use the Keynote experiment in this chapter’s CloseUp to discuss participant assignment to groups.

Keynote Experiment (cont.)

Determining Sample Size

Random Samples Exhibit 14a Random Samples of Preferred Lunch Times.

Confidence Levels Exhibit 14a-3 Confidence Levels and the Normal Curve

Metro U. Dining Club Study
Exhibit 14a-5 Metro U Dining Club Study