Presentation is loading. Please wait.

Presentation is loading. Please wait.

S2 Chapter 6: Populations and Samples

Similar presentations


Presentation on theme: "S2 Chapter 6: Populations and Samples"β€” Presentation transcript:

1 S2 Chapter 6: Populations and Samples
Dr J Frost Last modified: 2nd November 2015

2 Populations and samples
A population is: the full collection of people or things. A sample is: some subset of the population intended to represent the population. ? ? Advantages of sampling Cheaper/quicker than taking a census. Useful when testing of items results in their destruction (e.g. life-time of light bulb) ? Data obtained from all members of the population is known as a census. Disadvantages of sampling ? Potential for bias. Natural variation between any two samples due to variation in data. ?

3 Sampling key terms Sample
! Each individual thing in the population that can be sampled is known as a sampling unit. ! The list of all those within the population that can be sampled is known as the sampling frame.

4 Random sampling ! Suppose that the heights of people in a population are represented using a random variable 𝑋, where 𝑋 is (as you might expect), normally distributed, e.g. 𝑋~𝑁 1.5, 0.3 𝑓 π‘₯ Bro Helping Hand: This might conceptually seem confusing as a population is a list of things. The population can be represented as a distribution where the outcomes are possible samples. For example, if a population is all possible lottery tickets, then the distribution representing it is a uniform distribution whose outcomes are all the possible tickets. π‘₯ β„Žπ‘’π‘–π‘”β„Žπ‘‘ We want a sample with 𝒏 things in it. How could we represent the possible choice of 1st member of our sample? ? A random variable 𝑋 1 where 𝑋 1 ~𝑁 1.5, 0.3 Bro Helping Hand: Notice we’re representing the possible choice of the item in the sample, not the item itself. 𝑋 1 must have the same distribution as 𝑋, because our sample item is drawn from the population. How could we represent the possible choice of 𝑛th member of our sample? ? A random variable 𝑋 𝑛 where 𝑋 𝑛 ~𝑁 1.5, 0.3

5 Random sampling ! A simple random sample, of size 𝑛, is one taken so that every possible sample of size 𝑛 has an equal chance of being selected. It consists of the observations 𝑋 1 , 𝑋 2 , …, 𝑋 𝑛 from a population where each 𝑋 𝑖 : Are independent random variables Have the same distribution as the population. This means for example that if the first person chosen for our sample is Indian, that doesn’t make it any less or more likely our second choice will be Indian, i.e. our second choice is independent of the first. This will all become a lot clearer once we do an example…

6 Random sampling We might wish to calculate some numerical property of a population or a sample, e.g. mean, variance, mode, range. ! A population parameter is a quantity calculated from the population. ! A statistic is a quantity calculated (solely) from the observations in a sample. e.g. 𝑋 2 + 𝑋 5 + 𝑋 is a statistic (the average of the 2nd, 5th and 8th items in the sample) Ξ£ 𝑋 2 𝑛 βˆ’ Σ𝑋 𝑛 2 is a statistic. But Σ𝑋 𝑛 βˆ’ πœ‡ 2 is not as it involves the population mean πœ‡, which is not known purely from the sample. The idea of a statistic is that the we hope it resembles the equivalent population parameter. For example, if we’re trying to find the mean age in England, we might take a sample, calculate the sample mean age 𝑋 , and hope this represents the β€˜true’ unknown population mean age πœ‡β€¦ (Recall that sample mean = 𝑋 and population mean =πœ‡)

7 Sampling Distribution of a Statistic
! The sampling distribution of a statistic gives all the values of a statistic and the probability that each would happen by chance alone. BOB Suppose we had 10 families which form the population of an island (The Isle of Bob), for which we know the number of children in each family. Suppose we took a (very small!) sample of 2 families. Statistics for this sample could be the mode number of children, median, maximum, mean, … #3: Thus we now have a distribution over possible values of the statistics across all possible samples we could have had, i.e. the β€˜sampling distribution’. Sampling distribution for sample maximum. 𝑋 𝑋 2 𝑃 𝑋 1 , 𝑋 2 π‘€π‘’π‘‘π‘–π‘Žπ‘› π‘€π‘Žπ‘₯ Max 𝑴 𝑷(𝑴) 0.25 1 0.56 2 0.19 ? Possible Samples? 0.5Γ—0.5= Γ—0.4= Γ—0.1= Γ—0.5= Γ—0.4= Γ—0.1= Γ—0.5= Γ—0.4= Γ—0.1=0.01 ? ? 0.5 1 1.5 2 ? 1 2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Note: Because each thing in the sample is independently drawn from the population, we technically have sampling with replacement, and hence the same item could be in the sample twice. In practice however (and in exams) you won’t have to worry about this, as the population in exams is assumed to be infinitely large. Let’s reflect on what we did. #1: We considered all possible samples, and the probability of each sample occurring. ? ? ? #2: We’re interested in some statistic for each sample (let’s say the sample maximum)

8 Exam Example ? ? ? Key Points: Edexcel S2 May 2013 Q1 a b c π‘š 1 2 5
a) Ensure you don’t forget possibilities through other possible orderings, etc. b) If we know all the possible values of the statistic, we can find the probability of the last by just subtracting from 1 (as it’s a probability distribution!) As per tip, ordering matters! (1p, 5p, 5p), (5p, 1p, 5p), (5p, 5p, 1p) (2p, 5p, 5p), (5p, 2p, 5p), (5p, 5p, 2p) (5p, 5p, 5p) For first three possibilities, probability is 3Γ—0.5Γ— =0.135 For next three: 3Γ—0.2Γ— =0.054 Last: =0.027 𝑃(𝑀=5)=0.216 Possible values of the statistic 𝑀 (the median) is 1p, 2p, 5p. 𝑃 𝑀=1 = 3Γ— Γ— Γ— Γ— =0.5 𝑃 𝑀=2 =1βˆ’0.5βˆ’0.216=0.284 (since this is the only other possibility) a ? b ? c ? π‘š 1 2 5 𝑃 𝑀=π‘š 0.5 0.284 0.216

9 Test Your Understanding
Edexcel S2 June 2007 Q4 Step 1: List possible samples (and statistic for each if possible). Step 2: Use this to work out the probability of obtaining each value of the statistic. ? π‘š 5 10 𝑃 𝑀=π‘š

10 Sampling Distribution by Inspection
Sometimes it is not practical to list out all the possible samples, but we can tell what the sampling distribution is by thinking about what the statistic represents. Q A school wishes to introduce a school uniform and is seeking to find out the support this idea has among the students at the school. The random variable 𝑋 is defined as: 𝑋= 1, 𝑖𝑓 𝑠𝑑𝑒𝑑𝑒𝑛𝑑 π‘€π‘œπ‘’π‘™π‘‘ π‘ π‘’π‘π‘π‘œπ‘Ÿπ‘‘ π‘–π‘‘π‘’π‘Ž 0, π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’ Suggest a suitable population and the parameter of interest. A random sample of 15 students is asked if they would support the idea. The random sample is represented by 𝑋 1 , 𝑋 2 ,…, 𝑋 15 . Write down the sampling distribution of the Statistic π‘Œ= 𝑖=1 15 𝑋 𝑖 The population is the responses of the people, represented as 0s and 1s (note, it is not the people themselves!) Population parameter of interest (based on the original question based by the school) is the proportion 𝑝 of students who support the idea. Think what π‘Œ actually represents… π‘Œ=the number of students who support the idea. Since sample is random, observations are independent, each with a constant probability 𝑝 of β€œsuccess”. These are conditions for a Binomial Distribution! 𝒀~𝑩 πŸπŸ“,𝒑 a ? b ?

11 More Wordy Questions Bob has a cupcake factory. He is trying to establish the proportion of cupcakes that are poisonous; he assumes 15%. He has ID numbers for all the cupcakes. He takes a sample of 20 cupcakes. Identify the sampling frame. The list of id numbers of all the cupcakes. Identify the sampling distribution of the number of poisonous cupcakes in the sample. If π‘ͺ is the number of poisonous cupcakes in the sample, then π‘ͺ~𝑩(𝟐𝟎,𝟎.πŸπŸ“). Bro Note: The mark schemes likes the idea of a β€˜list’ and the idea that things in the sampling frame can be clearly identified. ? ?

12 Exercise 6B ? ? ? ? 7 1 5 Continue onto Exercise 6C if you’re done. a
A forester wants to estimate the height of the trees in a forest. He measures the heights of 50 randomly selected trees and works out the mean height. Is this a statistic? Yes as it is based only on the sample. A bag contains a large number of coins. 50% are 50p coins. 25% are 20p coins, 25% are 10p coins. Find the mean πœ‡ and variance 𝜎 2 for the value of this population of coins. 𝝈 𝟐 =πŸ‘πŸπŸ–.πŸ•πŸ“, 𝝁=πŸ‘πŸ.πŸ“ A random sample of 2 coins is chosen from the bag. List all the possible samples that can be chosen. πŸ“πŸŽ,πŸ“πŸŽ , πŸ“πŸŽ,𝟐𝟎 , 𝟐𝟎,πŸ“πŸŽ , πŸ“πŸŽ,𝟏𝟎 , 𝟏𝟎,πŸ“πŸŽ , 𝟐𝟎,𝟐𝟎 , 𝟐𝟎,𝟏𝟎 , 𝟏𝟎,𝟐𝟎 , 𝟏𝟎,𝟏𝟎 Find the sampling distribution for the mean 𝑋 = 𝑋 1 + 𝑋 2 2 7 A supermarket sells a large number of 3-litre and 2-litre cartons of milk. They are sold in the ratio 3:2. Find the mean and variance of the milk content of this population of cartons. A random sample of 3 cartons is taken from the shelves ( 𝑋 1 , 𝑋 2 , 𝑋 3 ). List all of the possible samples. Find the sampling distribution of the mean 𝑋 . Find the sampling distribution of the mode 𝑀. Find the sampling distribution of the median 𝑁 of these samples. 1 ? 5 a ? b ? c π‘š 50 35 30 20 15 10 𝑃 𝑀=π‘š 0.25 0.0625 0.125 ? Continue onto Exercise 6C if you’re done.


Download ppt "S2 Chapter 6: Populations and Samples"

Similar presentations


Ads by Google