# Sampling.

## Presentation on theme: "Sampling."— Presentation transcript:

Sampling

We select a sample from the population; change x, measure y
We want to know something about a population

Sampling terms Population – the universe of units from which the sample is to be selected. For some populations, not all units will be accessible Sampling frame – all the units in the population from which the sample will be selected Sample – the segment of the population that is selected for investigation Representative sample – a sample that reflects the population accurately so it is a microcosm of the population

Probability sampling A sample that has been selected using random selection so that each unit in the population has a known chance of being selected Is assumed to be a representative sample You can generalise: make valid inferences from the sample to the population

Non-probability sampling
A sample that has not been selected using a random selection method. This implies that some units in the population are more likely to be selected than others. Inferences from the sample to the population [=generalisations] are less valid

Non-response A source of non-sampling error that is particularly likely to happen when individuals are being sampled. It occurs whenever some members of the sample refuse to cooperate, cannot be contacted or for some reason do not supply the required data

Types of probability sample
Random – each member of the population has an equal probability of inclusion - the best form of sampling Stratified random – may want a sample to exhibit proportional representation from certain categories (eg gender, age) – so randomly sample from those categories egv eg Multi-stage cluster – good for ‘national’ research. Rather than sample students from any uni in UK, you group unis by standard region and randomly sample say 2 regions. Then say take 5 unis from each region and 500 students from each uni, so ask 500 students from each of 5 unis from each of 2 regions. egv

Qualities of probability samples
Possible to generalise findings - sample data represents population data. If you want to know about children, then sample from all World’s children Absolute size of sample is more important than relative size Kind of analysis – will affect best sample size Non-response will affect results Heterogeneity of population will affect results Time-consuming and potentially expensive

Types of non-probability samples
Convenience sampling – A sample that is available to the researcher by virtue of its accessibility, time, cost Snowball sampling – Type of convenience sample – make initial contact with small group of relevant people, use them to establish contact with others Quota sampling – stratify by categories of interest – eg gender, ethnicity, age. Then select people who fit those categories, usually by convenience

Qualities of non-prob samples
Limits to generalisability - so samples may not be representative of population Sometimes may be the only way to get participants (may not know full extent of population eg studies on hackers) Legitimate way of carrying out preliminary analysis before doing larger [probabilistic] study More often used in qualitative research than quantitative research

Sample size depends on Sampling and non-sampling error
How precise we want final estimates to be Precision can be specified by confidence level or significance level, eg 95% confidence is 5% significance 99% confidence is 1% significance We later calculate the achieved significance probability = p-value

Sampling error Is the difference between Measured as margin of error
observing a sample and observing the whole population from which the sample is selected Measured as margin of error expressed as +/- x% Reducing sampling error: Increase sample size n towards population size N Improve sampling method; random sampling best

Non-sampling error Is due to non-response or poor design, eg choice of sampling frame, data collection [question wording, interviewing style] We aim for probabilistic sampling, but obtaining access to the entire population can be impossible, eg all criminals, everyone who eats X Busy people are likely to be non-responders, but these might provide the best information

Overview Probability sampling Non-probability sampling Sample size
Random Stratified random Multi-stage cluster Non-probability sampling Convenience Snowball Quota Sample size