Basic Sampling & Review of Statistics
Basic Sampling What is a sample? Selection of a subset of elements from a larger group of objects Why use a sample? Saves Time Money Accuracy Lessens non-sampling error
Basic Sampling Major definitions Sample population – entire group of people from whom the researcher needs to obtain information Sample element -- unit from which information is sought (consumers) Sampling unit -- elements available for selection during the sampling process (consumers who are in the US at the time of the study) Sampling frame -- list of all sampling units available for selection to the sample (list of all consumers who are in the US at the time of the study) Sampling error -- difference between population response and sample response Non-sampling error – all other errors that emerge during data collection
Basic Sampling Procedure for selecting a sample Define the population – who (or what) we want data from Identify the sampling frame – those available to get data from Select a sampling procedure – how we are going to obtain the sample Determine the sample size (n) Draw the sample Collect the data
Basic Sampling General Types of Samples Non-probability – selection of element to be included in final sample is based on judgment of the researcher Probability – each element of population has a known chance of being selected Selection of element is chosen on the basis of probability Characteristics of probability samples Calculation of sampling error (+ or - z ( x )) Make inferences to the population as a whole
Non-Probability samples Convenience Sample is defined on the basis of the convenience of the researcher Judgment Hand-picked sample because elements are thought to be able to provide special insight to the problem at hand Snowball Respondents are selected on the basis of referrals from other sample elements Often used in more qualitative/ethnographic type studies Quota Sample chosen such that a specified proportion of elements possessing certain characteristics are approximately the same as the proportion of elements in the universe
Probability Samples Simple random sample (SRS) Assign a number to each sampling unit Use random number table Systematic Sample Easy alternative to SRS Stratified sample Divide population into mutually exclusive strata Take a SRS from each strata
Probability Samples Cluster sample Divide population into mutually exclusive clusters Select a SRS of clusters One-stage -- measure all members in the cluster Two-stage --measure a SRS within the cluster Area sample One-stage -- Choose an SRS of blocks in an area; sample everyone on the block Two-stage -- Choose an SRS of blocks in an area; select an SRS of houses on the block
Random Number Table
Hypothetical Sample Populations Responden t Number Income ($,000) Education (Years) Yogurt Consumptio n (Cartons/Yea r) Satisfaction Level (1 – 7) City Madison Milwaukee Milwaukee Milwaukee Madison Milwaukee Madison Madison Madison Milwaukee Other Madison Milwaukee Milwaukee Madison Other Milwaukee Milwaukee Madison Madison
Review of Statistics Probability Samples – note that statistical error can be computed when they are used Thus, need to know about statistics Descriptive statistics Estimates of descriptions of a population Statistical terms used in sampling Mean ( or x x i /n Variance ( 2 or s 2 ) -- x i -x) 2 /n - 1 Standard Deviation ( or s) – Square Root (Variance)
Review of Statistics Inferential Statistics Terms Parameter -- Statistic -- x Sample Statistics Best estimate of population parameter Why? -- Central Limit Theorem
Review of Statistics Central Limit Theorem Based on the distribution of the means of numerous samples Sampling Distribution of Means Theorem states: as sample size (n) approaches infinity (gets large), the sampling distribution of means becomes normally distributed with mean ( ) and standard deviation ( √ n) Allows the calculation of sampling error ( s √ n) Thus a confidence interval can be calculated
Review of Statistics Confidence interval -- tells us how close, based on n and the sampling procedure, how close the sampling mean (x) is to the population mean ( ) Formula: x - z ( x ) < ( ) < x + z ( x ) z-values: 90% % %
Review of Statistics Confidence interval -- interpretation For the same sampling procedure, 95 out 100 calculated confidence intervals would include the true mean ( )
Sample Size Sample size and total error Larger n increases probability of non-sampling error Larger n reduces sampling error ( √ n) Effect on n on total error? Can pre-determine the level of error (by setting n) Depends mainly on the method of analysis
Sample Size Sample size when research objective is estimate a population parameter CI = x ± z S x CI = x ± 1.96 (s/ √n) n = x ± z 2 s 2 / h 2 n = (1.96) 2 s 2 / h 2 n = (3.84) s 2 / h 2 s = expected standard deviation h = absolute precision of the estimate (or with of the desired confidence interval)
Sample Size (Sample Exercise) n = (1.96) 2 s 2 / h 2 S = 7.5 h =.50 n = (3.84) (56.25)/.025 n = 216/.025 n = 8640 What if s = 10; h = 1 n = (3.84) (100)/1 n = 384
Sample Size (Conclusion) Unaffected by size of universe Affected by Choice of Desired Precision of Confidence Interval Estimate of standard deviation
Sample Size Sample size estimation With cross-tabulation based research Objective is to get a minimum of 25 subjects per cell Must estimate relationship up front – what is smallest cell <3030+Total Fem Male.30Small est (.10).40 Total.55.45
Sample Size Know smallest cell size should be 25 Calculate Total Sample size 25 is 10% of sample Total Sample size 25 =.10 n 25/.10 = n 250 = n <3030+Total Fem Male25 Total