Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distribution of the sample mean and the central limit theorem

Similar presentations


Presentation on theme: "Distribution of the sample mean and the central limit theorem"— Presentation transcript:

1 Distribution of the sample mean and the central limit theorem

2 Means of different random variables
The mean, X, of 2 rolls of a die takes on various values - it is a random variable. The mean waiting time between arrivals of customers to a restaurant during a certain morning, X, may take various values in different days – it is a random variable The mean number of boys per family, X, for certain 6 families is a random variable. In this lecture we will examine how the mean of a sample, X, behaves – its probability distribution, its mean and its variance.

3 Example The mean of all possible rolls of a die is μ=3.5, and the standard deviation is σ=1.7. The shape of the probability distribution is: X 1 2 3 4 5 6 P(x) 1/6 1/6

4 The mean outcome, , of two rolls of a die is a random variable.
Roll a die twice: The mean outcome, , of two rolls of a die is a random variable. Sometimes this mean will be less than 3.5, sometimes higher. The sampling distribution of the mean will be centered around 3.5. It can be any number between ___ to ___

5 Example Suppose a teacher asked each student in a class of 50 students to roll a die twice and to compute the average of the two rolls. Today, instead of actually rolling a die twice we will use Minitab to generate possible results of the outcomes of the roll of two dice for 50 “hypothetical students”.

6 Histogram of the mean of 2 rolls of a die by 50 simulated students:
SD=1.1285

7 Suppose the 50 students had to roll the die 10 times:
The mean outcome, X, of ten rolls of a die is a random variable. Sometimes this mean will be less than 3.5, sometimes higher. The sampling distribution of the mean will be centered around 3.5. We expect the range of results to be (compared to the two rolls) (i) more concentrated around the mean (ii) less concentrated around the mean

8 We, again, use Minitab to help us simulate the results of 10 rolls of a die for 50 hypothetical students:

9 Histogram of the mean of 10 rolls of a die by 50 simulated students:
SD=

10 Now, suppose the 50 students had to roll the die 100 times:
The mean outcome, X, of 100 rolls of a die is a random variable. Sometimes this mean will be less than 3.5, sometimes higher. The sampling distribution of the mean will be centered around 3.5. we expect the range of results to be (compared to the ten rolls) (i) more concentrated around the mean (ii) less concentrated around the mean

11 Histogram of the mean of 100 rolls of a die by 50 simulated students:
SD=

12 As the sample size, n, increases, we expect the distribution of the sample mean to be more concentrated around the mean. That is – as the sample size, n, increases, the distribution of the mean has (i) more spread (ii) less spread. In fact, it can be shown that the standard deviation of X is , where σ is the standard deviation of the population. Thus, increasing from 2 to 10 rolls of a die means that the spread of X decreases from to Further increasing to 100 rolls - the spread of decreases to

13 Decrease in standard deviation of as the sample size n increases:
Mean roll SD of the mean roll

14 The central limit theorem
Take a large (30 or more) random sample of size n from any population with mean μ and standard deviation σ. The sample mean, X is approximately normal with mean μ and standard deviation : Note – if the original population is exactly Normal, than

15 Example A population standard deviation 10. What is the standard deviation of for a random sample of size: 25 10/√25=2 (b) 100 10/√100=1 (c) 400 10/√400=.5

16 Example A normal population has μ=20 and σ=5. For a random sample of size n=6, determine the Mean of 20 (b) Standard deviation of 5/√6=2.04 (c) Distribution of normal

17 Example A random sample of size 30 is taken from a population having mean of 20 and a standard deviation of 5. The shape of the distribution is unknown. What can you say about the sampling distribution of the sample mean ? approximately N(μ=20, σ=5/√30)  approximately N(μ=20, σ=.91) Find the probability that will exceed

18 Example In a certain population, Math SAT scores are approximately N(500,100) Pick 1 student at random. What is the probability that her score X is between 490 and 510?

19 Example - continued 2. Pick 25 students at random. What is the probability that their sample mean score is between 490 and 510? X has a mean 500 and standard deviation 100, therefore

20 Example - continued Pick 400 students at random. What is the probability that their sample mean score is between 490 and 510. X has a mean 500 and standard deviation 100, therefore

21 Example A bottling company uses a filling machine to fill plastic bottles with a popular cola. The bottles are supposed to contain 300 milliliters (ml). In fact, the contents vary, according to a normal distribution with mean μ=298 ml and standard deviation σ=3 ml. What is the probability that an individual bottle contains less than 295 ml? X-content of a bottle, X~N(298, 32) (b) What is the probability that the mean contents of the bottles in a six-pack is less than 295 ml. Before you answer, do you expect this probability to be (i)same (ii)higher (iii)lower than the probability in (a)?

22 Example The level of nitrogen oxide (NOX) in the exhaust of a particular car model varies with mean 0.9 grams per mile (g/mi) and standard deviation 0.15 g/mi. A company has 125 cars of this model in its fleet. What is the approximate distribution of the mean NOX emission level for these cars? Approximately normal, mean=0.9, Sd=0.15/√125=0.0134 (b) If the NOX level exceeds 1.0, the car is not authorized to be on the road. What is the probability that a car would not be permitted to go on the road? The distribution of NOX is not specified – cannot compute the probability

23 (c) The quality control test requires that the mean NOX of an examined sample does not exceed What sample size should the company suggest so that the probability that the mean of this sample would not exceed is .99? A sample of 87.

24 Example Household size X in the U.S has mean μ=2.6, and standard deviation σ=1.4. Does this imply that the population distribution is normal? Not necessarily. It is more reasonable to assume that most households will have about 1 or 2 or 3 people, and a few households will be usually large. Take a random sample of 10 households. Find the probability that the mean household size exceeds 2.7. Can’t be done: sample size 10 is too small to expect the central limit theorem to guarantee an approximately normal distribution of , so we can not find the probabilities from normal tables Take a random sample of size 100 households. Find the probability that the mean household size exceeds 2.7. now is approximately normal (2.6, 1.4/√100), and so

25 Inference – Confidence intervals for the mean
Population Mean - μ Sample mean: X

26 Point estimate for μ: Example:
Unknown: mean height of female students - μ. Estimate: We take a random sample of 225 female students and measure the mean height, , of the females in the sample. is a random variable, it may vary from sample to sample. Suppose that in a certain sample we obtained = 68 inches.

27 Limitations of point estimator
What value do we expect to get in another sample? How reliable is this estimate? An estimate without and indication of its variability is of little value!!! - We would like to know precisely how far tends to be from the parameter of interest μ.

28 Interval estimate for μ:
Specify an interval in which you think μ lies. We want to say something such as: We are 95% confident that μ is between 60 and 70 inches

29 Moreover, is approximately normal for large n (Central Limit Theorem):
Therefore, according to the empirical rule ( ): 95% interval around the mean would be within ___ standard errors around the mean Standard error of 2 95%

30 If the standard deviation of females heights is σ=3
If the standard deviation of females heights is σ=3.0, and we take a sample of 225 - Then, a 95% interval around the mean is: For a sample mean of 68 inches:

31 Thus, we are 95% confident that μ lies within the interval [67.6,68.4]
In other words –95% of intervals conducted in this manner ( ) will cover the true values of μ.

32 Question Take 100 samples and for each sample compute a 95% confidence interval for the mean. How many of the confidence intervals are expected to cover the real value of the mean? Take 1000 samples and for each sample compute a 95% confidence interval for the mean. How many of the confidence intervals are expected to cover the real value of the mean?

33 95% is within 2 standard errors around the mean.
So a confidence interval for μ was If we wanted to be 99.7% confident about our estimation of μ then we would take a 99.7 interval around μ. 99.7% is within ___ standard errors around the mean. So a confidence interval for μ would be: 3

34 What about a confidence level of 90% ?
5% Find how many standard errors a and b are from For a: Z0.05= For b: Z0.95= -1.645 1.645

35 1.645 -1.645 90% 5% An interval for estimating μ would be within standard errors form the mean .

36 For σ=3.0 and a sample of 225: = = = If the mean height in a sample is 68 inches, then an interval estimate for μ is: 68 ± 0.329= [67.671, ]

37 In general – a (1-α)% confidence level
A confidence interval for μ, with (1-α)% confidence level, is (1-α)% α/2% α/2% Z α/2 Z 1-α/2

38 Confidence interval for μ
For or A confidence interval for μ, with (1-α)% confidence level, is

39 Example A test for the level of potassium in the blood is not perfectly precise. Moreover, the actual level of potassium in a person’s blood varies slightly from day to day. Suppose that repeated measurements for the same person on different days vary normally with σ=0.2. Julie’s potassium level is measured three times and the mean result is Give a 90% confidence interval for Julie’s mean blood potassium level. 90% confidence interval for :

40 (a) Julie’s potassium level is measured three times and the mean result is Give a 90% confidence interval for Julie’s mean blood potassium level. 90% confidence interval for : =[3.21,3.59]

41 (b) A confidence interval of 95% level would be:
(i) wider than a confidence interval of 90% level (ii) narrower than a confidence interval of 90% level (c) Give a 95% confidence interval for Julie’s mean blood potassium level. 95% confidence interval for : =[3.174,3.626]

42 (d) Julie wants a confidence interval of [3. 3,3
(d) Julie wants a confidence interval of [3.3,3.5] and she wants a confidence level of 90%. What sample size should she take to achieve this (how many times should she measure her potassium blood level?) [3.3,3.5]=3.4±Z0.95(0.2/√n) 3.4-Z0.95(0.2/√n)=3.3 3.4+Z0.95(0.2/√n)=3.5 solve for n: subtract the first equation from the second  2Z0.95(0.2/√n)=0.2 Z0.95(0.2/√n)=0.1 1.645(0.2/√n)=0.1 (0.2/√n)= √n=3.29  n=10.82 she needs at least 11 blood tests to achieve an CI [3.3,3.5] with 90% confidence level.

43 Finding n for a specified confidence interval
Suppose we want a specific interval with a confidence level 1-α. What sample size should be taken to obtain this CI? Define m = the distance from the mean to the upper/lower limit of the CI For the blood potassium example: m= = (| |=0.1)

44 Example A test for the level of potassium in the blood is not perfectly precise. Moreover, the actual level of potassium in a person’s blood varies slightly from day to day. Suppose that repeated measurements for the same person on different days vary normally with σ=0.2.

45 (a) Julie’s potassium level is measured three times and the mean result is Give a 90% confidence interval for Julie’s mean blood potassium level. 90% confidence interval for : =[3.21,3.59]

46 (b) A confidence interval of 95% level would be:
(i) wider than a confidence interval of 90% level (ii) narrower than a confidence interval of 90% level (c) Give a 95% confidence interval for Julie’s mean blood potassium level. 95% confidence interval for : =[3.174,3.626]

47 (d) Julie wants a 99% confidence interval of [3. 3,3. 5]
(d) Julie wants a 99% confidence interval of [3.3,3.5]. What sample size should she take to achieve this (=how many times should she measure her potassium blood level?) [3.3,3.5]=3.4±Z0.995(0.2/√n) 3.4-Z0.995(0.2/√n)=3.3 3.4+Z0.995(0.2/√n)=3.5 solve for n: subtract the first equation from the second  2Z0.995(0.2/√n)=0. 2 Z0.995(0.2/√n)=0.1 2.575(0.2/√n)=0.1 (0.2/√n)=0.0388 √n=5.15  n= she needs 27 blood tests to achieve a 99% CI [3.3,3.5].

48 Finding n for a specified confidence interval
Suppose we want a specific interval with a confidence level 1-α. What sample size should be taken to obtain this CI? Define m = the distance from the mean to the upper/lower limit of the CI For the blood potassium example: m= = (| |=0.1)

49 Example An agricultural researcher plants 25 plots with a new variety of corn. The average yield for these plots is bushels per Acre. Assume that the yield per acre for the new variety of corn follows a normal distribution with unknown μ and standard deviation σ=10 bushels per acre. A 90% confidence interval for μ is: (a) 150±2.00 (b) 150±3.29 (c) 150±3.92 (d) 150±32.9

50 (a) Plant only 5 plots rather than 25
An agricultural researcher plants 25 plots with a new variety of corn. The average yield for these plots is bushels per Acre. Assume that the yield per acre for the new variety of corn follows a normal distribution with unknown μ and standard deviation σ=10 bushels per acre. Which of the following will produce a narrower confidence interval than the 90% confidence interval that you computed above? (a) Plant only 5 plots rather than 25 (b) Plant 100 plots rather than 25 (c) Compute a 99% confidence interval rather than a 90% confidence interval. (d) None of the above

51 Example (a) [59.61,60.39] (b) [59,61] (c) [58.04,61.96]
You measure the weight of a random sample of 25 male runners. The sample mean is kilograms (kg). Suppose that the weights of male runners follow a normal distribution with unknown mean μ and standard deviation σ=5 kg. A 95% confidence interval for μ is: (a) [59.61,60.39] (b) [59,61] (c) [58.04,61.96] (d) [50.02,69.8]

52 (a) The lengths of the confidence interval would increase
You measure the weight of a random sample of 25 male runners. The sample mean is kilograms (kg). Suppose that the weights of male runners follow a normal distribution with unknown mean μ and standard deviation σ=5 kg. Supposed I had measured the weights of a random sample of 100 runners rather than 25 runners. Which of the following statements is true? (a) The lengths of the confidence interval would increase (b) The lengths of the confidence interval would decrease (c) The lengths of the confidence interval would stay the same (d) σ would decrease

53 m=half of the CI length (=marginal error)
α= 4/2=2 0.1

54 Example You plan to construct a confidence interval for the mean μ of a normal population with known standard deviation σ. Which of the following will reduce the size of the confidence interval? use a lower level of confidence Increase the sample size Reduce σ All the above

55 Example A 95% confidence interval for the mean μ of a population is computed from a random sample and found to be 9±3. We may conclude that: (a) There is a 95% probability that μ is between 6 and 12 (b) There is a 95% probability that the true mean is 9 and there is a 95% probability that the true mean is 3 (c) If we took many additional random samples and from each computed a 95% confidence interval for μ, approximately 95% of these intervals would contain μ. (d) All of the above

56 Example The heights of young American women, in inches, are normally distributed with mean μ and standard deviation σ=2.4. I select a simple random sample of four young American women and measure their heights. The four heights, in inches, are: Based on these data, a 99% confidence interval for μ, in inches, is: (a) 65±1.55 (b) 65±2.35 (c) 65±3.09 (d) 65±4.07

57 Mean of four heights: CI= CI=[61.91,68.09] = = = = = =

58 The heights of young American women, in inches, are normally distributed with mean μ and standard deviation σ=2.4. I select a simple random sample of four young American women and measure their heights. The four heights, in inches, are: If I wanted the 99% confidence interval to be ± 1 inch from the mean, I should select a simple random sample of size: 2 7 16 39

59 m=half of the CI length (=marginal error)
α= 1 0.01


Download ppt "Distribution of the sample mean and the central limit theorem"

Similar presentations


Ads by Google