Chapter 15 Sampling.

Chapter 15 Sampling

Learning Objectives Understand . . .
two premises on which sampling theory is based accuracy and precision for measuring sample validity five questions that must be answered to develop a sampling plan

Learning Objectives Understand . . .
two categories of sampling techniques and the variety of sampling techniques within each category various sampling techniques and when each is used

The Nature of Sampling Sampling Population Element Population Census
Sampling frame The basic idea of sampling is that by selecting some of the elements in a population, we may draw conclusions about the entire population. A population element is the individual participant or object on which the measurement is taken. It is the unit of study. It may be a person but it could also be any object of interest. A population is the total collection of elements about which we wish to make some inferences. A census is a count of all the elements in a population. A sample frame is the listing of all population elements from which the sample will be drawn.

Availability of elements
Why Sample? Availability of elements Lower cost Sampling provides This slide lists the reasons researchers use a sample rather than a census. Greater speed Greater accuracy

When Is A Census Appropriate?
Feasible Necessary The advantages of sampling over census studies are less compelling when the population is small and the variability within the population is high. Two conditions are appropriate for a census study. A census is feasible when the population is small and necessary when the elements are quite different from each other.

What Is A Good Sample? Accurate Precise
The ultimate test of a sample design is how well it represents the characteristics of the population it purports to represent. In measurement terms, the sample must be valid. Validity of a sample depends on two considerations: accuracy and precision. Accuracy is the degree to which bias is absent from the sample. When the sample is drawn properly, the measure of behavior, attitudes, or knowledge of some sample elements will be less than the measure of those same variables drawn from the population. The measure of other sample elements will be more than the population values. Variations in these sample values offset each other, resulting in a sample value that is close to the population value. For these offsetting effects to occur, there must be enough elements in the sample and they must be drawn in a way that favors neither overestimation nor underestimation. Increasing the sample size can reduce systematic variance as a cause of error. Systematic variance is a variation that causes measurements to skew in one direction or another. Precision of estimate is the second criterion of a good sample design. The numerical descriptors that describe samples may be expected to differ from those that describe populations because of random fluctuations inherent in the sampling process. This is called sampling error and reflects the influence of chance in drawing the sample members. Sampling error is what is left after all known sources of systematic variance have been accounted for. Precision is measured by the standard error of estimate, a type of standard deviation measurement. The smaller the standard error of the estimate, the higher is the precision of the sample.

Exhibit 15-1 Sampling Design within the Research Process
Exhibit 15-1 represents the several decisions the researcher makes when designing a sample. The sampling decisions flow from two decisions made in the formation of the management-research question hierarchy: the nature of the management question and the specific investigative questions that evolve from the research question.

Exhibit 15-2 Types of Sampling Designs
Element Selection Probability Nonprobability Unrestricted Simple random Convenience Restricted Complex random Purposive Systematic Judgment Cluster Quota Stratified Snowball Double The members of a sample are selected using probability or nonprobability procedures. Nonprobability sampling is an arbitrary and subjective sampling procedure where each population element does not have a known, nonzero chance of being included. Probability sampling is a controlled, randomized procedure that assures that each population element is given a known, nonzero chance of selection.

Steps in Sampling Design
What is the target population? What are the parameters of interest? What is the sampling frame? This slide addresses the steps in sampling design. What is the appropriate sampling method? What size sample is needed?

Larger Sample Sizes When Population variance Number of subgroups
Desired precision When The greater the dispersion or variance within the population, the larger the sample must be to provide estimation precision. The greater the desired precision of the estimate, the larger the sample must be. The narrower or smaller the error range, the larger the sample must be. The higher the confidence level in the estimate, the larger the sample must be. The greater the number of subgroups of interest within a sample, the greater the sample size must be, as each subgroup must meet minimum sample size requirements. Cost considerations influence decisions about the size and type of sample and the data collection methods. Confidence level Small error range

Simple Random Advantages Easy to implement with random dialing
Disadvantages Requires list of population elements Time consuming Uses larger sample sizes Produces larger errors High cost In drawing a sample with simple random sampling, each population element has an equal chance of being selected into the samples. The sample is drawn using a random number table or generator. This slide shows the advantages and disadvantages of using this method. The probability of selection is equal to the sample size divided by the population size. Exhibit 15-4 covers how to choose a random sample. The steps are as follows: Assign each element within the sampling frame a unique number. Identify a random start from the random number table. Determine how the digits in the random number table will be assigned to the sampling frame. Select the sample elements from the sampling frame.

Systematic Advantages Simple to design Easier than simple random
Easy to determine sampling distribution of mean or proportion Disadvantages Periodicity within population may skew sample and results Trends in list may bias results Moderate cost In drawing a sample with systematic sampling, an element of the population is selected at the beginning with a random start and then every Kth element is selected until the appropriate size is selected. The kth element is the skip interval, the interval between sample elements drawn from a sample frame in systematic sampling. It is determined by dividing the population size by the sample size. To draw a systematic sample, the steps are as follows: Identify, list, and number the elements in the population Identify the skip interval Identify the random start Draw a sample by choosing every kth entry. To protect against subtle biases, the research can Randomize the population before sampling, Change the random start several times in the process, and Replicate a selection of different samples.

Stratified Advantages Control of sample size in strata
Increased statistical efficiency Provides data to represent and analyze subgroups Enables use of different methods in strata Disadvantages Increased error will result if subgroups are selected at different rates Especially expensive if strata on population must be created High cost In drawing a sample with stratified sampling, the population is divided into subpopulations or strata and uses simple random on each strata. Results may be weighted or combined. The cost is high. Stratified sampling may be proportion or disproportionate. In proportionate stratified sampling, each stratum’s size is proportionate to the stratum’s share of the population. Any stratification that departs from the proportionate relationship is disproportionate.

Cluster Advantages Provides an unbiased estimate of population parameters if properly done Economically more efficient than simple random Lowest cost per sample Easy to do without list Disadvantages Often lower statistical efficiency due to subgroups being homogeneous rather than heterogeneous Moderate cost In drawing a sample with cluster sampling, the population is divided into internally heterogeneous subgroups. Some are randomly selected for further study. Two conditions foster the use of cluster sampling: the need for more economic efficiency than can be provided by simple random sampling, and 2) the frequent unavailability of a practical sampling frame for individual elements. Exhibit 15-5 provides a comparison of stratified and cluster sampling and is highlighted on the next slide. Several questions must be answered when designing cluster samples. How homogeneous are the resulting clusters? Shall we seek equal-sized or unequal-sized clusters? How large a cluster shall we take? Shall we use a single-stage or multistage cluster? How large a sample is needed?

Exhibit 15-5 Stratified and Cluster Sampling
Population divided into few subgroups Homogeneity within subgroups Heterogeneity between subgroups Choice of elements from within each subgroup Cluster Population divided into many subgroups Heterogeneity within subgroups Homogeneity between subgroups Random choice of subgroups

Area Sampling Area sampling is a cluster sampling technique applied to a population with well-defined political or geographic boundaries. It is a low-cost and frequently used method.

Double Advantages May reduce costs if first stage results in enough data to stratify or cluster the population Disadvantages Increased costs if discriminately used In drawing a sample with double (sequential or multiphase) sampling, data are collected using a previously defined technique. Based on the information found, a subsample is selected for further study.

Nonprobability Samples
No need to generalize Feasibility Limited objectives Issues With a subjective approach like nonprobability sampling, the probability of selecting population elements is unknown. There is a greater opportunity for bias to enter the sample and distort findings. We cannot estimate any range within which to expect the population parameter. Despite these disadvantages, there are practical reasons to use nonprobability samples. When the research does not require generalization to a population parameter, then there is no need to ensure that the sample fully reflects the population. The researcher may have limited objectives such as those in exploratory research. It is less expensive to use nonprobability sampling. It also requires less time. Finally, a list may not be available. Time Cost

Nonprobability Sampling Methods
Convenience Judgment Quota Convenience samples are nonprobability samples where the element selection is based on ease of accessibility. They are the least reliable but cheapest and easiest to conduct. Examples include informal pools of friends and neighbors, people responding to an advertised invitation, and “on the street” interviews. Judgment sampling is purposive sampling where the researcher arbitrarily selects sample units to conform to some criterion. This is appropriate for the early stages of an exploratory study. Quota sampling is also a type of purposive sampling. In this type, relevant characteristics are used to stratify the sample which should improve its representativeness. The logic behind quota sampling is that certain relevant characteristics describe the dimensions of the population. In most quota samples, researchers specify more than one control dimension. Each dimension should have a distribution in the population that can be estimated and be pertinent to the topic studied. Snowball sampling means that subsequent participants are referred by the current sample elements. This is useful when respondents are difficult to identify and best located through referral networks. It is also used frequently in qualitative studies. Snowball

Key Terms Area sampling Census Cluster sampling Convenience sampling
Disproportionate stratified sampling Double sampling Judgment sampling Multiphase sampling Nonprobability sampling Population Population element Population parameters Population proportion of incidence Probability sampling

Key Terms Proportionate stratified sampling Quota sampling
Sample statistics Sampling Sampling error Sampling frame Sequential sampling Simple random sample Skip interval Snowball sampling Stratified random sampling Systematic sampling Systematic variance

Determining Sample Size
Appendix 15a Determining Sample Size

Exhibit 15a-1 Random Samples
Exhibit 15a-1 shows the Metro U dining club study population (N = 20,000) consisting of five subgroups based on their preferred lunch times. The values 1 through 5 represent preferred lunch times, each a 30-minute interval, starting at 11:00 a.m. Next we sample 10 elements from this population without knowledge of the population’s characteristics. We draw four samples of 10 elements each. The means for each sample are provided in the slide. Each mean is a point estimate, the best predictor of the unknown population mean. None of the samples shown is a perfect duplication because no sample perfectly replicates its population. We cannot judge which estimate is the true mean of the population but we can estimate the interval in which the true mean will fall by using any of the samples. This is accomplished by using a formula that computes the standard error of the mean.

Exhibit 15a-2 Increasing Precision
Reducing the Standard Deviation by 50% Quadrupling the Sample The standard error creates an interval estimate that brackets the point estimate. The interval estimate is an interval or range of values within which the true population parameter is expected to fall. In this example, mu is predicted o be 3.0 or 12:00 noon plus or minus .36. Thus we would expect to find the true population parameter to be between 11:49 a.m. and 12:11 p.m. We have 68% confidence in this estimate because one standard error encompasses plus or minus 1 Z. This is illustrated in Exhibit 15a-2 on the next slide.

Exhibit 15a-3 Confidence Levels & the Normal Curve
The area under the curve represents the confidence estimates that one makes about the results. The combination of the interval range and the degree of confidence creates the confidence interval. With 95% confidence, the interval in which we would find the true mean increases from 11:39 a.m. to 12:21 p.m. We find this by multiplying the standard error by plus or minus 1.96 Z, which covers 95% of the area under the curve.

Exhibit 15a-4 Standard Errors
(Z score) % of Area Approximate Degree of Confidence 1.00 68.27 68% 1.65 90.10 90% 1.96 95.00 95% 3.00 99.73 99% These are the Z scores associated with various degrees of confidence. To increase the degree of confidence that the true population parameter falls within a given range, the standard error is multiplied by the appropriate z score.

Central Limit Theorem According to the central limit theorem, for sufficiently large samples (n  30), the sample means will be distributed around the population mean approximately in a normal distribution. If researchers draw repeated samples, as we did in the Metro U dining club study, the means for each sample could be plotted, and will form a normal distribution.

Exhibit 15a-6 Estimates of Dining Visits
Confidence Z score % of Area Interval Range (visits per month) 68% 1.00 68.27 90% 1.65 90.10 95% 1.96 95.00 99% 3.00 99.73 In this example, we want to know how many visits the dining club users make to the dining club each month. Using the formula for standard error of the mean with the standard deviation of the sample (because the value for the standard deviation of the population is unknown), we find that the standard error of the mean is .51 visits standard errors are equal to 1 visit. The researcher can estimate with 95% confidence that the population mean of expected number of visits is within 10 (the sample mean) plus or minus 1 visit or between 9 and 11 visits per month. The confidence level is a percentage that reflects the probability that the results will be correct. We might want a higher degree of confidence than the 95% level used. The table illustrates the interval ranges at various levels of confidence. If we want an estimate that will hold for a much smaller range, for example, 10.0 plus or minus .2 visits, we must either accept a lower level of confidence or take a sample large enough to provide this smaller interval with the highest desired confidence level.

Calculating Sample Size for Questions involving Means
Precision Confidence level Size of interval estimate To compute the desired sample size for questions involving means, we need certain information. The precision and how to quantify it: The confidence level we want with our estimate. The size of the interval estimate. The expected dispersion in the population for the investigative question used. Whether a finite population adjustment is needed. When the size of the calculated sample exceeds 5% of the population, the finite limits of the population constrain the sample size needed. A correction factor formula is available in that event. In most sample calculations, population size does not have a major effect on sample size. Population Dispersion Need for FPA

Exhibit 15a-7 Metro U Sample Size for Means
Steps Information Desired confidence level 95% (z = 1.96) Size of the interval estimate  .5 meals per month Expected range in population 0 to 30 meals Sample mean 10 Standard deviation 4.1 Need for finite population adjustment No Standard error of the mean .5/1.96 = .255 Sample size (4.1)2/ (.255)2 = 259 In this example, the researcher wants to know what size sample is necessary to estimate the number of meals per month consumed by dining club members. The questions mentioned on the previous slide must be addressed. The desired confidence level is 95% which means we will use a Z score of The interval estimate that the researcher is willing to accept is plus or minus .5 meals per month. These two items represent the desired precision. The sample mean is 10 and the standard deviation is 4.1. These figures were derived from a pretest. If a pretest had not provided the standard deviation, then the population dispersion could have been used to get a standard deviation. This is discussed further on the following slide. To calculate the standard error of the mean, the interval estimate is divided by the z score. This figure is then used in the sample size calculation. The standard deviation squared divided by the standard error of the mean squared is equal to the calculated sample size. Note that the more precise the desired results, the larger the sample size must be.

Proxies of the Population Dispersion
Previous research on the topic Pilot test or pretest Rule-of-thumb calculation 1/6 of the range

Exhibit 15a-7 Metro U Sample Size for Proportions
Steps Information Desired confidence level 95% (z = 1.96) Size of the interval estimate  .10 (10%) Expected range in population 0 to 100% Sample proportion with given attribute 30% Sample dispersion Pq = .30(1-.30) = .21 Finite population adjustment No Standard error of the proportion .10/1.96 = .051 Sample size .21/ (.051)2 = 81 In this example, the researcher wants to know what size sample is necessary to estimate what percentage of the population says it would join the dining club, based on the projected rates and services. A pretest told us that 30% of those in the pretest sample were interested in joining. In this case, dispersion is measured in terms of p * q (in which q is the proportion of the population not having the attribute and q = (1-p). The measure of dispersion of the sample statistic also changes from the standard error of the mean to the standard error of the proportion. Like before, the desired confidence level is 95% which means we will use a Z score of The interval estimate that the researcher is willing to accept is plus or minus .10 or 10% (this is a subjective decision). These two items represent the desired precision. To calculate the standard error of the proportion, the interval estimate is divided by the z score. This figure is then used in the sample size calculation. The dispersion divided by the standard error of the proportion squared is equal to the calculated sample size. In this case, the sample size is smaller than the one in the previous example. If both questions were relevant to the research, the larger sample size would be used.

Appendix 15a: Key Terms Central limit theorem Confidence interval
Confidence level Interval estimate Point estimate Proportion

Chapter 15 Sampling.

Similar presentations

Presentation on theme: "Chapter 15 Sampling."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 15 Sampling.

Similar presentations

Presentation on theme: "Chapter 15 Sampling."— Presentation transcript:

Similar presentations

About project

Feedback