# Estimating from Samples © Christine Crisp “Teach A Level Maths” Statistics 2.

## Presentation on theme: "Estimating from Samples © Christine Crisp “Teach A Level Maths” Statistics 2."— Presentation transcript:

Estimating from Samples © Christine Crisp “Teach A Level Maths” Statistics 2

Estimating from Samples Suppose we want to know the average height of 17 year olds. The heights of 17 year-olds form a population. But I’ve been vague about my population. Do I mean the 17 year-olds in my school or college? Or do I mean all the 17 year-olds in England? Or all those in the world? Am I putting the girls and boys together? Or do I want to consider 2 populations; boys and girls separately? In Statistics, a population is the property of the entire group that we are interested in. We always need to state clearly what our population is.

Estimating from Samples Even fairly small populations can be costly or impossible to investigate. If you wanted to know the mean height of the 17 year- olds in your school or college would you have time to measure them all? What would you do about those who are absent when you measure? If you wanted to know the mean height of the 17 year- olds in England it would be impossible to collect the data on all of them. For these reasons we take samples. From a sample we can make a prediction about the population. We are going to see how a calculation from a sample, called a statistic, can be used to estimate a parameter of a population.

Estimating from Samples A population doesn’t have to consist of people. It might, for example, be the times taken to run the 100m in the last Olympics, or lengths of the runner beans in my garden this year. Suppose we are interested in the size of eggs laid by a flock of hens. For example, we may already have an estimate of the mean weight before a change of diet and we want to know whether the new diet has increased the weight of the eggs. We can’t weigh all the eggs so we choose to take a simple random sample and weigh those. Let’s assume that our population consists of the weights of all the eggs laid in a week and that there are 1000 of them.

Estimating from Samples 58·6, 61·0, 63·0, 64·0, 66·8 Suppose the sample consists of 5 eggs with the following masses in grams. The mean, ( The subscript 1 indicates the 1 st sample. ) Suppose on the second day we take another sample and this time the sample is 52  8, 57  1, 58  8, 59  9, 62  5 The mean, It’s very likely that the population mean is neither of these values so we need to decide whether we are justified in using either of them and, if so, how accurate they will be.

Estimating from Samples Before we can make a decision about the sample means we need to look at some theory. To do this I’m going to pretend that I know the weights of all the eggs in the population. We can then compare the samples with the population. I’ll draw a frequency diagram of the weights of all the eggs in the population.

Estimating from Samples Population Now we’ll superimpose the 1 st sample of 5 eggs.

Estimating from Samples The weights in the sample are indicated by the arrows. Population and 1 st sample

Estimating from Samples This is the 1 st sample mean, Now for the 2 nd sample: Population and 1 st sample

Estimating from Samples This is the 2 nd sample mean, Now we’ll take 10 samples, each of size 5. Instead of showing the individual weights, we’ll just show the means of the samples. Population and 2 nd sample

Estimating from Samples Now for 100 samples Population and 10 sample means

Estimating from Samples Population and 100 sample means Finally 1000 samples

Estimating from Samples Population and 1000 sample means We have a distribution of the means of 1000 samples each of size 5. We now want to see what happens if we increase the size of each sample.

Estimating from Samples Population and 1000 sample means Each sample is of size 5 : n = 5 The means are less spread out. Each sample is of size 20 : n = 20 Population and 1000 sample means

Estimating from Samples We can notice 4 things about the distribution of the sample means: The distribution is approximately Normal The spread is less than that of the population. The mean of the sample means is approximately the same as the population mean. n = 5 As the sample size increases, the spread decreases. n = 20 Population and 1000 sample means N.B. The distribution is correctly called the “ distribution of the sample means”.

Estimating from Samples It can be shown that if we could take all samples of a given size, n, that could be constructed ( which in practice is not possible ) then, So, we are justified in using a sample mean to estimate the population mean even though on some occasions it will be a poor estimate. the mean of the sample means equals the population mean Poor estimates occur rarely, and even less often as the sample size increases. On average a sample mean will give a good estimate of the population mean. n = 5 Population and 1000 sample means a poor estimate of 

Estimating from Samples In the example using hens eggs, the population was approximately Normal. We’ll now look at an example where the population is not Normal.

Estimating from Samples 100 samples n = 5 Increasing the sample size to 30... and the number of samples to 1000 We notice the same things as before. A population that isn’t Normal

Estimating from Samples SUMMARY The mean of a sample can be used to estimate the mean of a population. The population need not have a Normal distribution but in that case the sample size should be at least 30. ( The less Normal the population is, the greater the sample size should be ). The larger the sample size, the more likely it is that the estimate of the population mean will be accurate.  Using a sample to make estimates:  A statistic is a quantity calculated from a sample.  A population has parameters such as the mean which can be estimated from a sample using a statistic.

Estimating from Samples 1. The table below gives a random sample of the number of goals scored in 10 Premier league football matches taken from the first 3 weeks of the 2005/06 season. 0, 3, 1, 0, 6, 4, 1, 1, 3, 2 Solution: (a). This is the estimate of . (b)As the population is unlikely to be Normal, the sample size is too small to give an accurate result. The weather in the first 3 weeks will not be typical of the whole season so the results may be unrepresentative. Exercise (a) Use the sample to estimate the mean number of goals likely to be scored throughout the season. (b) Make at least 2 comments on the accuracy of your estimate.

Estimating from Samples The following slides contain repeats of information on earlier slides, shown without colour, so that they can be printed and photocopied. For most purposes the slides can be printed as “Handouts” with up to 6 slides per sheet.

Estimating from Samples A population doesn’t have to consist of people. It might, for example, be the times taken to run the 1000m in the last Olympics, or lengths of the runner beans in my garden this year. Suppose we are interested in the size of eggs laid by a flock of hens. For example, we may already have an estimate of the mean weight before a change of diet and we want to know whether the new diet has increased the weight of the eggs. We can’t weigh all the eggs so we choose to take a simple random sample and weigh those. Let’s assume that our population consists of the weights of all the eggs laid in a week and that there are 1000 of them. Populations

Estimating from Samples 58·6, 61·0, 63·0, 64·0, 66·8 Suppose the sample consists of 5 eggs with the following masses in grams. The mean, ( The subscript 1 indicates the 1 st sample. ) Suppose on the second day we take another sample and this time the sample is 52  8, 57  1, 58  8, 59  9, 62  5 The mean, It’s very likely that the population mean is neither of these values so we need to decide whether we are justified in using either of them and, if so, how accurate they will be.

Estimating from Samples Before we can make a decision about the sample means we need to look at some theory. To do this I’m going to pretend that I know the weights of all the eggs in the population. We can then compare the samples with the population. The diagrams show the entire population of weights and the means of samples. Each sample consists of 5 weights.

Estimating from Samples Population and 10 sample means 1 sample mean Population

Estimating from Samples Population and 100 sample means Population and 1000 sample means n = 5

Estimating from Samples We now want to see what happens if we increase the size of each sample. Each sample is of size 5 : n = 5 The means are less spread out. Each sample is of size 20 : n = 20 Population and 1000 sample means n = 5 n = 20

Estimating from Samples We can notice 4 things about the distribution of the sample means: The distribution is approximately Normal The spread is less than that of the population. The mean of the sample means is approximately the same as the population mean. n = 5 As the sample size increase, the spread decreases. n = 20 Population and 1000 sample means Sample size

Estimating from Samples It can be shown that if we could take all samples of a given size, n, that could be constructed ( which in practice is not possible ) then, So, we are justified in using a sample mean to estimate the population mean even though on some occasions it will be a poor estimate. the mean of the sample means equals the population mean Poor estimates occur rarely, and even less often as the sample size increases. On average a sample mean will give a good estimate of the population mean. a poor estimate of  n = 5 Population and 1000 sample means

Estimating from Samples 100 samples n = 5 Increasing the sample size to 30... and the number of samples to 1000 We notice the same things as before. A population that isn’t Normal

Estimating from Samples The mean of a sample can be used to estimate the mean of a population. The population need not have a Normal distribution but in that case sample size should be at least 30. ( The less Normal the population is, the greater the sample size should be ). The larger the sample size, the more likely it is that the estimate of the population mean will be accurate.  Using a sample to make estimates:  A statistic is a quantity calculated from a sample.  A population has parameters such as the mean which can be estimated from a sample using a statistic.