 # Why sample? Diversity in populations Practicality and cost.

## Presentation on theme: "Why sample? Diversity in populations Practicality and cost."— Presentation transcript:

Why sample? Diversity in populations Practicality and cost

Terms Population = large group about which conclusions are drawn. Real, but unknown. Sample = small group that represents population. Real, known. Sample Sample Sample Sample Sample Sample Population Sample Sample

Element = individual member of a population.
Sampling unit = element or group of elements selected in a sample. Unit of analysis = element or group of elements compared in the analysis The above units can be the same or different.

Element, Sampling Unit, Unit of Analysis: Examples
Opinion Survey of UMD Students Element = individual student Sampling unit = individual student Unit of analysis = individual student (student opinions measured) Survey of family incomes Element = adult household member Sampling unit = household or address Unit of analysis = family (total family income measured)

Element, Sampling Unit, Unit of Analysis: More Examples
Voter Polls Element = individual voter Sampling unit = telephone number Unit of analysis = individual voter (voter opinions measured) U.S. Census of housing Element = household or address Sampling unit = household or address Unit of analysis = household or address (# of rooms measured)

Sampling frame = list of all the sampling units in the population
Sampling frame = list of all the sampling units in the population. Needed for probability sampling. Probability sample = researcher knows and controls the probability of selection. Main advantage: Only probability samples permit accurate estimation of sampling error.

Simple Random Sample Every element in the population has an equal and constant chance of selection 1. Physical sampling with replacement 2. Table of random numbers 3. Random selection by computer Probability of selection = Sample Size/ Pop. size Requires list (frame) of all elements in population

Systematic Random Sample
Every “kth” element is drawn from a list. (e.g. every 50th name) 1. K = sampling interval = Pop. Size/Sample size (e.g. 5000/100). 2. Random starting point between 1 and K (e.g. 1 and 50). 3. Statistically equivalent to simple random sample) 4. List must be randomly ordered. 5. Convenient, since lists are available for many populations

Stratified Random Sample
Population is first divided into groups (strata). Simple random sample is taken from within each stratum Separate random samples are combined into a single total sample.

Example of stratified sample
Seniors Sample 2 Juniors Sample Sample UMD Population Sample 3 Sophomores Freshmen Sample 4 Sample 1 Sample 1 Stratified Sample Sample 2 Sample 2 Sample 3 Sample 3 Sample 4 Sample 4

Considerations in Stratified Sampling
Requires knowledge of stratifying variable Best used when there is much variation between strata in variable being measured (Example: Stratify by year in school if measuring opinions of advising) Lowest sampling error Most costly

Sampling error = estimated difference between sample value and actual population value (e.g. + 3%)

Cluster Sample Elements in population are naturally grouped together (“clusters”) Simple random sample of clusters is taken Every element in selected clusters is studied. Population: Sample

Considerations in Cluster Sampling
Best when there is little variation between clusters in variable being measured. Does not require a list of individual elements (only clusters). May be used to cover large geographic area (smaller areas = clusters) May be less expensive Highest sampling error.

Multistage Designs Combines two or more sampling designs.
Example: sampling voters in MN Stage 1: Stratify by geographic area (e.g. county) Stage 2: Sample census tracts (clusters) in selected counties. Stage 3: Take SRS of households in each tract. Commonly used in large, diverse populations Design is best left to experts!

Sampling Why use sampling? Terms and definitions
Probability Sampling Designs Simple random Systematic Stratified Cluster Multistage designs Estimation from samples

Estimation from Samples
Find a likely range of values for a population parameter (e.g. average, %) Parameter = characteristic of a population Statistic = characteristic of a sample Statistical inference = drawing conclusions about a population based on sample data Usually connected with a probability of error.

Sampling Distribution
Distribution of results of all possible samples of size N taken from same population Theoretical, not actually done in practice Properties of sampling distributions are known to statisticians Used as basis for inferring from samples to populations

Example: estimating proportion of homes with internet access
Suppose population proportion = .62 Take 1 sample of size 200 homes have internet access. Sample p = .60 Can we conclude that the population proportion is .60? A different sample might produce a different answer

What if we took all possible samples?
Most sample proportions would be close to population value A few would be much higher or lower Average of sample proportions would be the true population proportion Distribution would be a bell-shaped curve % of samples All possible sample proportions

What we know from sampling distribution:
We DON’T know the true population proportion. We DO know how many sample proportions fall within a given distance of the true proportion. Sampling error = estimated difference between sample value and actual population value (example: 95% of sample proportions fall within + 3% of true proportion)

How we make an estimate Find sample proportion
Add sampling error (margin of error) on either side True proportion probably falls within this interval % of samples All possible sample proportions % of samples All possible sample proportions p p p p

Examples of estimates If 95% of sample proportions (p) fall within + 3% of true proportion, then 95% of all intervals p will contain true population proportion. If p = .6, we estimate the true proportion is = .57 to .63 If p = .62, we estimate the true proportion is = .59 to .65 If p = .57, we estimate the true proportion is = .54 to .60 If p = If p = .7, we estimate the true proportion is = .67 to .73 95% of the time this procedure yields a correct estimate.

Download ppt "Why sample? Diversity in populations Practicality and cost."

Similar presentations