Sampling Distribution Models

Sampling Distribution Models
Week 9 Chapter 15. Sampling Distribution Models

Probability of the Possible Outcome
Suppose there are two candidates in an electoral campaign. Let Y denote the possible outcome for selecting candidate #1: y = 0 (no), 1 (yes) Suppose that each candidate has a fair probability of being selected (0.50). Let P(y) denote the probability of the possible outcome for selecting candidate #1. Let 𝝁 denotes the mean population for possible outcome of y. 𝝁 = (0 x ½) + (1 x ½) = ½ = 0.50 This means that in the long run, on average we would get 0.50 of the population vote for candidate #1. Random Outcome Possible Outcome Probability of the Possible Outcome Y 1 ½ = 0.5

Probability of the Possible Outcome
The mean population, which is population proportion for the votes for candidate 1, is: 𝝁 = (0 x ½) + (1 x ½) = ½ = 0.50 The variance of population is (squared deviation between an observation and the mean; here we also multiply the squared deviation by the probability of that observation which is the random outcome): 𝝈 𝟐 = (𝟎−𝟎.𝟓) 𝟐 (𝟏−𝟎.𝟓) 𝟐 0.5 = 0.25 The standard deviation population is the square root of variance (take positive value): 𝝈 = 𝟎.𝟐𝟓 = +0.50 Let P(y) denote the probability of the possible outcome for selecting candidate #1. Random Outcome Possible Outcome Probability of the Possible Outcome Y 1 ½ = 0.5

Number of times that 1 occurs in the possible sample
For three randomly selected eligible voters, the sampling distribution of sample proportion when population proportion is 0.50 is as follows: Note the reason we multiply the probabilities is because, the outcome (whether voting for candidate #1 or not ) stays the same (independent) from a person to another person (eligible voters). Possible Sample Number of times that 1 occurs in the possible sample Sample Proportion Probability of each Possible sample (1, 1, 1) 3 3/3 = 1 (0.5 x 0.5 x 0.5) = 0.125 (1, 1, 0) 2 2/3 = 0.667 (0.5 x 0.5) x 0.5 = 0.125 (1, 0, 1) (1, 0, 0) 1 1/3 = 0.333 (0.5) x 0.5 x 0.5 = 0.125 (0, 1, 1) (0, 1, 0) (0, 0, 1) (0, 0, 0) 0/3 = 0.000 0.5 x 0.5 x 0.5 = 0.125

This table is the organized version of the previous table.
For three randomly selected eligible voters, the sampling distribution of sample proportion when population proportion is 0.50 is as follows: This table is the organized version of the previous table. Sample Proportion Probability of each Possible sample 1 x = 0.125 0.333 3 x = 0.375 0.667 1

Let’s experiment this concept with simulation using StatCrunch.
Idea is the same as tossing a fair coin (with equal probability, 0.50, for obtaining a head or a tail). Let’s toss one fair coin a fixed number of times.

Suppose we toss 1 fair coin 1000 times.
Suppose we take a random sample of size 1 eligible voters, 1000 times. The histogram:

Toss 2 fair coins 1000 times: Suppose we take a random sample of size 2 eligible voters, 1000 times.

I saved the data for each experiment in a worksheet:
Roll 1 fair coin (n* =1) 1000 times Roll 2 fair coins (n* = 2) 1000 times Roll 5 fair coins (n* = 5) 1000 times Roll 20 fair coins (n* = 20) 1000 times Roll 50 fair coins (n* = 50) 1000 times Roll 100 fair coins (n* = 100) 1000 times Roll 1000 fair coins (n* = 1000) 1000 times n *= 1 n *= 2 n *= 5 n* = 20 n* = 50 n* = 100 n* = 1000

Group dot plots: What do you see?
What happens as we increase the size of our sample (n*) in the repeated experiment (n = 1000)? n = 1000 n = 100 n = 50 n = 20 n = 5 n = 2 n = 1

Group dot plots. What do you see?
Note that the spread gets narrower as the size of repeated sampling increases.

What do you see in the descriptive statistics table (e. g
What do you see in the descriptive statistics table (e.g., sample mean (sample proportion) for each sample)?

What do you see in the descriptive statistics table (e. g
What do you see in the descriptive statistics table (e.g., sample mean (sample proportion) for each sample)? Note that the sample mean varies from sample to sample and as the size of the repeated sampling increases the sample proportion gets closer to the actual population proportion (0.5).

Sampling Distribution of Sample Mean, 𝒙
Sample mean is a variable because it varies from sample to sample. For random sample, sample mean fluctuates around the population mean 𝝁. The standard deviation of sample mean, 𝒙 , is called standard error of the sample mean and it is denoted by 𝝈 𝒙 . In practice, we don’t do repeated sampling, we use the theory behind the idea of repeated sampling. Hence, 𝝈 𝒙 = 𝝈 𝒏 𝝈 𝒙 = 𝝈 𝒏 is a fraction of the spread of population. Individual observations tend to vary much more than the sample means vary from sample to sample. As sample size increases, the standard error decreases, and the sampling distribution gets narrower (what we saw in our previous experiment with tossing a fair coin many times) and gets closer to the actual population parameter (e.g., mean, proportion).

Central Limit Theorem (CLT)
Regardless of the original shape of the population, for large random sample, n, the sampling distribution of 𝒙 is approximately normal. 𝒙 ~𝑵(𝝁, 𝝈 𝒙 = 𝝈 𝒏 ) We can apply the empirical rule, in that sample mean most certainly (close to probability of 1) falls within 3 standard error of the population mean.

Example The distribution of household electricity usage is right skewed with mean 673 KWh and standard deviation of 556 KWh. Suppose a researcher takes a random sample of 900 households. For the sampling distribution of his sample mean, a. specify its mean and its standard deviation (standard error). b. specify its shape: c. specify the theorem that you used to answer part a: d. sketch the sampling distribution of the sample mean for n = 900.

Example The distribution of household electricity usage is right skewed with mean 673 KWh and standard deviation of 556 KWh. Suppose a researcher takes a random sample of 900 households. For the sampling distribution of his sample mean, a. specify its mean and its standard deviation (standard error). Sample mean ( 𝒙 ) 𝒉𝒂𝒔 𝒎𝒆𝒂𝒏 𝝁=𝟔𝟕𝟑, 𝒂𝒏𝒅 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓 𝒐𝒇 𝝈 𝒙 = 𝝈 𝒏 = 𝟓𝟓𝟔 𝟗𝟎𝟎 =𝟏𝟖.𝟓𝟑 b. specify its shape: Approximately Normal c. specify the theorem that you used to answer part a: Central Limit Theorem d. sketch the sampling distribution of the sample mean for n = 900.

Example The distribution of household electricity usage is right skewed with mean 673 KWh and standard deviation of 556 KWh. Suppose a researcher takes a random sample of 900 households. What is the probability that his sample mean is more than 720? 𝑩𝒚 𝑪𝑻𝑳: 𝒙 ~𝑵(𝝁=𝟔𝟕𝟑, 𝝈 𝒙 = 𝝈 𝒏 = 𝟓𝟓𝟔 𝟗𝟎𝟎 =𝟏𝟖.𝟓𝟑) 𝑭𝒐𝒓𝒎𝒖𝒍𝒂: 𝒁= 𝒙 −𝝁 𝝈 𝒙 = 𝒙 −𝝁 𝝈 𝒏 𝒁= 𝟕𝟐𝟎−𝟔𝟕𝟑 𝟓𝟓𝟔 𝟗𝟎𝟎 =𝟏𝟖.𝟓𝟑 = 2.54 Area above Z = 2.54 based on our table is: 1 – Area below Z = 2.54; So, 1 – = Or you can think of it as follow: Area above Z = 2.54 is equivalent to area below Z = in the Z-table: Thus, it is very unlikely that his sample mean would be more 720.

Watch the following video about the idea of Central Limit Theorem 

Sampling Distribution Models

Similar presentations

Presentation on theme: "Sampling Distribution Models"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sampling Distribution Models

Similar presentations

Presentation on theme: "Sampling Distribution Models"— Presentation transcript:

Similar presentations

About project

Feedback