Presentation on theme: "SJS SDI_161 Design of Statistical Investigations Stephen Senn Random Sampling I."— Presentation transcript:
SJS SDI_161 Design of Statistical Investigations Stephen Senn Random Sampling I
SJS SDI_162 Simple Random Sample Definition If a sample of size n is drawn from a population of size N in such a way that every possible sample of size n has the same chance of being selected the sampling procedure is called simple random sampling. The sample thus obtained is called a simple random sample. Scheaffer, Mendenhall and Ott, Elementary Survey Sampling, Fourth Edition
SJS SDI_163 Typical Use of Sample The typical use of a sample is to say something about a population mean or proportion Point estimate –mean, proportion Confidence interval –95%, 99% etc Occasionally we are interested in estimating totals –Total weight, total value etc.
SJS SDI_164 95% CI for mean for 100 samples of size 10 from n(50,4)
SJS SDI_165 With or without replacement? The above definition is slightly more general than that which we encountered previously. It allows for sampling without replacement. Our previous definition stressed independence. Strictly speaking, for any finite population draws are not quite independent if they do not occur without replacement. Why is this? Consider a sample of size two drawn from N without replacement. There are ways of choosing the sample. Hence, the probability of a given sample being chosen is
SJS SDI_166 But an independence argument would produce a different answer. The probability of a given item being chosen is 1/N. Hence the probability of two given items being independently chosen in any order is Note, however, that provided N is large compared to n, the distinction between sampling with and without replacement is unimportant. This is fortunate, since correcting for sampling without replacement from finite populations involves a lot of tedious but elementary algebra!
SJS SDI_167 How Not to Draw a Simple Random Sample Do not use own judgement –This is haphazard sampling –Subject to psychological bias –Human beings are not good randomisers Do not use systematic sampling –There may be cyclic patterns or other trends in the population
SJS SDI_168 The Swiss Lottery Draw 6 from 45 – 45 C 6 =8,145,060 combinations Professor Hans Riedwyls study of a given draw –16,862,596 tickets sold –approximately two tickets per choice –There were over 5000 combinations that were chosen more than 50 times!
SJS SDI_169 The UK Lottery This is a 6/49 lottery In the first 282 draws –average jackpot £2 million –maximum £22.6 million Draw 9 January 1995 –133 people bought the winning combination 7,17,23,32,38,42 –£122,510 each Source John Haigh, Taking Chances, Oxford
SJS SDI_1610 Random Pattern? Random, from the point of view of the lottery machine but evidently not to the punter!
11 How to Choose a Random Sample Sampling frame of N population units with each item identified by a unique number 1 to N Generate random number between 0 and 1 –Using computer, random number table, randomising device Multiply by N and round up Select population member indicated Repeat n times –For sampling without replacement draw again if number is chosen twice
SJS SDI_1612 The S-PLUS Approach > #To illustrate different approaches to sampling N <- 20 > # Size of population n <- 10 > # Size of sample Identify <- c(1:N) > #Population identifiers Identify  > #Sample with replacement sort(sample(Identify, size = n, replace = T))  > #Sample without replacement sort(sample(Identify, size = n, replace = F)) 
SJS SDI_1613 Finite Population Correction Factors In practice we very rarely carry out sampling with replacement. EXCEPTION The bootstrap - re-sampling investigation of properties of statistics. However, often populations are large compared to samples. Hence we can behave as if draws were independent. The theory of sampling with replacement applies. In the next few slides we consider what happens when the population is not large. y 1,y 2,…y n simple random sample from a population of values u 1,u 2,…u N.
SJS SDI_1614 This section based closely on Scheaffer, Mendenhall and Ott
SJS SDI_1615 We can use this fact to find the variance of
SJS SDI_1617 Variance estimation
SJS SDI_1618 When Can we Ignore FPCFs? Large population relative to sample –N/n is large The sample does not form part of the population for which we are issuing the prediction –Destructive sampling of manufacturing output From now on we shall ignore FPCFs
SJS SDI_1620 Error Bounds and Sample Size It is traditional to use error bounds of two standard errors. This is a way of giving an impression of the precision of the survey. The desired error bound,, can be used to fix the size of the sample. This is the appropriate formula for the sample size given a desired bound on the mean
SJS SDI_1621 Questions How many people do you have to have in a room before the probability that at least two share the same birthday is at least 1/2? Suppose we want to estimate the total in a population rather than the mean. What is the error bound on the total? What is the error bound for a population proportion?