Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sampling Distributions and Estimation

Similar presentations


Presentation on theme: "Sampling Distributions and Estimation"— Presentation transcript:

1

2 Sampling Distributions and Estimation
8 Chapter Sampling Distributions and Estimation Part 1 Sampling Variation Estimators and Sampling Distributions Sample Mean and the Central Limit Theorem Confidence Interval for a Mean (m) with Known s Confidence Interval for a Mean (m) with Unknown s Confidence Interval for a Proportion (p) McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

3 Sampling Variation Sample statistic – a random variable whose value depends on which population items happen to be included in the random sample. Depending on the sample size, the sample statistic could either represent the population well or differ greatly from the population. This sampling variation can easily be illustrated.

4 Sampling Variation Consider eight random samples of size n = 5 from a large population of GMAT scores for MBA applicants. The sample means ( xi ) tend to be close to the population mean (m = ).

5 Sampling Variation Dot plot of eight sample means
Dot plot of eight samples of size n = 5

6 Estimators and Sampling Distributions
Some Terminology Estimator – a statistic derived from a sample to infer the value of a population parameter. Estimate – the value of the estimator in a particular sample. Population parameters are represented by Greek letters and the corresponding statistic by Roman letters.

7 Estimators and Sampling Distributions
Examples of Estimators

8 Estimators and Sampling Distributions
The sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of size n is taken. An estimator is a random variable since samples vary. Sampling error =  –  ^

9 Estimators and Sampling Distributions
Bias Bias is the difference between the expected value of the estimator and the true parameter. Bias = E(  ) –  ^ An estimator is unbiased if E(  ) =  ^ On average, an unbiased estimator neither overstates nor understates the true parameter.

10 Estimators and Sampling Distributions
Bias Sampling error is random whereas bias is systematic. An unbiased estimator avoids systematic error.

11 Estimators and Sampling Distributions

12 Estimators and Sampling Distributions
Efficiency Efficiency refers to the variance of the estimator’s sampling distribution. A more efficient estimator has smaller variance.

13 Estimators and Sampling Distributions
Consistency A consistent estimator converges toward the parameter being estimated as the sample size increases.

14 Sample Mean and the Central Limit Theorem
The sample mean is an unbiased estimator of m, therefore, E( X ) = E(X) = m The standard error of the mean is the standard deviation of the sampling error of x : sx = s n

15 Sample Mean and the Central Limit Theorem
If the population is exactly normal, then the sample mean follows a normal distribution.

16 Sample Mean and the Central Limit Theorem
For example, the average price, m, of a 5 GB MP3 player is $80.00 with a standard deviation, s, equal to $ What will be the mean and standard error from a sample of 20 players? E( X ) = E(X) = m = $80.00 sx = s n = 10 20 = $2.236 If the distribution of prices for these players is a normal distribution, then the sampling distribution on x is N(80.00, 2.236).

17 Sample Mean and the Central Limit Theorem
Central Limit Theorem (CLT) for a Mean If a random sample of size n is drawn from a population with mean m and standard deviation s, the distribution of the sample mean x approaches a normal distribution with mean m and standard deviation sx = s/ n as the sample size increase. If the population is normal, the distribution of the sample mean is normal regardless of sample size.

18 Sample Mean and the Central Limit Theorem

19 Sample Mean and the Central Limit Theorem
Symmetric Population: Uniform Distribution Rule of thumb: to obtain a normal distribution for the sample mean, n > 30. A much smaller n will suffice if the population is symmetric. For example, consider a uniform population U(500, 1000).

20 Sample Mean and the Central Limit Theorem
Symmetric Population: Uniform Distribution The central limit theorem predicts that samples drawn from this population will have a mean of 1000 and the standard error of the mean of: Predicted S.E. for sx = s/ n = 288.7/ = 288.7 n = 1 = 288.7/ = 204.1 n = 2 = 288.7/ = 144.3 n = 4 = 288.7/ = 72.2 n = 16

21 Sample Mean and the Central Limit Theorem
Histograms of Sample Means from Uniform Population

22 Sample Mean and the Central Limit Theorem
Histograms of Sample Means from Uniform Population

23 Sample Mean and the Central Limit Theorem
Skewed Population: Waiting Time Consider a strongly skewed population for waiting times at airport security screening with m = and s = 2.451

24 Sample Mean and the Central Limit Theorem
Skewed Population: Waiting Time The CLT predicts that samples drawn from this population will have a mean of minutes and standard error of the mean: Predicted S.E. for sx = s/ n = 2.451/ = 2.451 n = 1 = 2.451/ = 1.733 n = 2 = 2.451/ = 1.255 n = 4 = 2.451/ = 0.613 n = 16

25 Sample Mean and the Central Limit Theorem
Histograms of Sample Means from Skewed Population

26 Sample Mean and the Central Limit Theorem
Histograms of Sample Means from Skewed Population

27 Sample Mean and the Central Limit Theorem
Range of Sample Means The CLT permits a range or interval within which the sample means are expected to fall. m + z s n Where z is from the standard normal table. If we know m and s, the range of sample means for samples of size n are predicted to be: m s n 90% Interval m s n 95% Interval m s n 99% Interval

28 Sample Mean and the Central Limit Theorem
Illustration: GMAT Scores For samples of size n = 5 applicants, within what range would GMAT means be expected to fall? The parameters are m = and s = The predicted range for 95% of the sample means is: m s n = =

29 Sample Mean and the Central Limit Theorem
Sample Size and Standard Error The standard error declines as n increases, but at a decreasing rate. Make the interval small by increasing n. m + z s n The distribution of sample means collapses at the true population mean m as n increases.

30 Sample Mean and the Central Limit Theorem
Illustration: All Possible Samples from a Uniform Population Consider a discrete uniform population consisting of the integers {0, 1, 2, 3}. The population parameters are: m = 1.5, s = 1.118

31 Sample Mean and the Central Limit Theorem
Illustration: All Possible Samples from a Uniform Population Consider a discrete uniform population consisting of the integers {0, 1, 2, 3}. The population parameters are: m = 1.5, s = 1.118

32 Sample Mean and the Central Limit Theorem
Illustration: All Possible Samples from a Uniform Population All possible samples of size n = 2, with replacement, are given below along with their means.

33 Sample Mean and the Central Limit Theorem
Illustration: All Possible Samples from a Uniform Population The population is uniform, yet the distribution of all possible sample means has a peaked triangular shape.

34 Sample Mean and the Central Limit Theorem
Illustration: All Possible Samples from a Uniform Population The CLT’s predictions for the mean and standard error are mx = m = 1.5 and sx = s/ n = / 2 =

35 Sample Mean and the Central Limit Theorem
Illustration: All Possible Samples from a Uniform Population x the mean of means is x = 1(0.0) + 2(.05) + 3(1.0) + 4(1.5) + 3(2.0) + 2(2.5) + 1(3.0) = The standard deviation of the means is

36 Confidence Interval for a Mean (m) with Known s
What is a Confidence Interval? A sample mean x is a point estimate of the population mean m. A confidence interval for the mean is a range mlower < m < mupper The confidence level is the probability that the confidence interval contains the true population mean. The confidence level (usually expressed as a %) is the area under the curve of the sampling distribution.

37 Confidence Interval for a Mean (m) with Known s
What is a Confidence Interval? The confidence interval for m with known s is:

38 Confidence Interval for a Mean (m) with Known s
Choosing a Confidence Level A higher confidence level leads to a wider confidence interval. Greater confidence implies loss of precision. 95% confidence is most often used.

39 Confidence Interval for a Mean (m) with Known s
Interpretation A confidence interval either does or does not contain m. The confidence level quantifies the risk. Out of 100 confidence intervals, approximately 95% would contain m, while approximately 5% would not contain m.

40 Confidence Interval for a Mean (m) with Known s
Is s Ever Known? Yes, but not very often. In quality control applications with ongoing manufacturing processes, assume s stays the same over time. In this case, confidence intervals are used to construct control charts to track the mean of a process over time.

41 Confidence Interval for a Mean (m) with Unknown s
Student’s t Distribution Use the Student’s t distribution instead of the normal distribution when the population is normal but the standard deviation s is unknown and the sample size is small. x + t s n The confidence interval for m (unknown s) is x - t s n x + t < m <

42 Confidence Interval for a Mean (m) with Unknown s
Student’s t Distribution

43 Confidence Interval for a Mean (m) with Unknown s
Student’s t Distribution t distributions are symmetric and shaped like the standard normal distribution. The t distribution is dependent on the size of the sample.

44 Confidence Interval for a Mean (m) with Unknown s
Degrees of Freedom Degrees of Freedom (d.f.) is a parameter based on the sample size that is used to determine the value of the t statistic. Degrees of freedom tell how many observations are used to calculate s, less the number of intermediate estimates used in the calculation. n = n - 1

45 Confidence Interval for a Mean (m) with Unknown s
Degrees of Freedom As n increases, the t distribution approaches the shape of the normal distribution. For a given confidence level, t is always larger than z, so a confidence interval based on t is always wider than if z were used.

46 Confidence Interval for a Mean (m) with Unknown s
Comparison of z and t For very small samples, t-values differ substantially from the normal. As degrees of freedom increase, the t-values approach the normal z-values. For example, for n = 31, the degrees of freedom are: What would the t-value be for a 90% confidence interval? n = 31 – 1 = 30

47 Confidence Interval for a Mean (m) with Unknown s
Comparison of z and t For n = 30, the corresponding z-value is

48 Confidence Interval for a Mean (m) with Unknown s
Example GMAT Scores Again Here are the GMAT scores from 20 applicants to an MBA program:

49 Confidence Interval for a Mean (m) with Unknown s
Example GMAT Scores Again Construct a 90% confidence interval for the mean GMAT score of all MBA applicants. x = s = 73.77 Since s is unknown, use the Student’s t for the confidence interval with n = 20 – 1 = 19 d.f. First find t0.90 from Appendix D.

50 Confidence Interval for a Mean (m) with Unknown s

51 Confidence Interval for a Mean (m) with Unknown s
Example GMAT Scores Again The 90% confidence interval is: x - t s n x + t < m < 513 – 1.729 < m < 513 – 28.52 < m < We are 90% certain that the true mean GMAT score is within the interval < m <

52 Confidence Interval for a Mean (m) with Unknown s
Confidence Interval Width Confidence interval width reflects - the sample size, - the confidence level and - the standard deviation. To obtain a narrower interval and more precision - increase the sample size or - lower the confidence level (e.g., from 90% to 80% confidence)

53 Confidence Interval for a Mean (m) with Unknown s
A “Good” Sample Here are five different samples of 25 births from a population of N = 4,409 births and their 95% CIs.

54 Confidence Interval for a Mean (m) with Unknown s
A “Good” Sample An examination of the samples shows that sample 5 has an outlier. The outlier is a warning that the resulting confidence interval possibly could not be trusted. In this case, a larger sample size is needed.

55 Confidence Interval for a Mean (m) with Unknown s
Using Appendix D Beyond n = 50, Appendix D shows n in steps of 5 or 10. If the table does not give the exact degrees of freedom, use the t-value for the next lower n. This is a conservative procedure since it causes the interval to be slightly wider. For d.f. above 150, use the z-value.

56 Confidence Interval for a Mean (m) with Unknown s
Using Excel Use Excel’s function =TINV(probability, d.f.) to obtain a two-tailed value of t. Here, “probability” is 1 minus the confidence level.

57 Confidence Interval for a Mean (m) with Unknown s
Using MegaStat MegaStat give you a choice of z or t and does all calculations for you.

58 Confidence Interval for a Mean (m) with Unknown s
Using MINITAB MINITAB also gives confidence intervals for the median and standard deviation.

59 Confidence Interval for a Proportion (p)
A proportion is a mean of data whose only value is 0 or 1. The Central Limit Theorem (CLT) states that the distribution of a sample proportion p = x/n approaches a normal distribution with mean p and standard deviation p = x/n is a consistent estimator of p. sp = p(1-p) n

60 Confidence Interval for a Proportion (p)
Illustration: Internet Hotel Reservations Management of the Pan-Asian Hotel System tracks the percent of hotel reservations made over the Internet. The binary data are: 1 Reservation is made over the Internet 0 Reservation is not made over the Internet After data was collected, it was determined that the proportion of Internet reservations is p = .20.

61 Confidence Interval for a Proportion (p)
Illustration: Internet Hotel Reservations Here are five random samples of n = 20. Each p is a point estimate of p. Notice the sampling variation in the value of p.

62 Confidence Interval for a Proportion (p)
Applying the CLT The distribution of a sample proportion p = x/n is symmetric if p = .50 and regardless of p, approaches symmetry as n increases.

63 Confidence Interval for a Proportion (p)
Applying the CLT As n increases, the statistic p = x/n more closely resembles a continuous random variable. As n increases, the distribution becomes more symmetric and bell shaped. As n increases, the range of the sample proportion p = x/n narrows. The sampling variation can be reduced by increasing the sample size n.

64 Confidence Interval for a Proportion (p)
When is it Safe to Assume Normality? Rule of Thumb: The sample proportion p = x/n may be assumed to be normal if both np > 10 and n(1-p) > 10. Sample size to assume normality:

65 Confidence Interval for a Proportion (p)
Standard Error of the Proportion The standard error of the proportion sp depends on p, as well as n. It is largest when p is near .50 and smaller when p is near 0 or 1.

66 Confidence Interval for a Proportion (p)
Standard Error of the Proportion The formula for the standard error is symmetric.

67 Confidence Interval for a Proportion (p)
Standard Error of the Proportion Enlarging n reduces the standard error sp but at a diminishing rate.

68 Confidence Interval for a Proportion (p)
Confidence Interval for p The confidence interval for p is p(1-p) n p + z Where z is based on the desired confidence. Since p is unknown, the confidence interval for p = x/n (assuming a large sample) is p(1-p) n p + z

69 Confidence Interval for a Proportion (p)
Confidence Interval for p z can be chosen for any confidence level. For example,

70 Confidence Interval for a Proportion (p)
Example Auditing A sample of 75 retail in-store purchases showed that 24 were paid in cash. What is p? p = x/n = 24/75 = .32 Is p normally distributed? np = (75)(.32) = 24 n(1-p) = (75)(.88) = 51 Both are > 10, so we may conclude normality.

71 Confidence Interval for a Proportion (p)
Example Auditing The 95% confidence interval for the proportion of retail in-store purchases that are paid in cash is: p(1-p) n p + z = .32(1-.32) = .214 < p < .426 We are 95% confident that this interval contains the true population proportion.

72 Confidence Interval for a Proportion (p)
Narrowing the Interval The width of the confidence interval for p depends on - the sample size - the confidence level - the sample proportion p To obtain a narrower interval (i.e., more precision) either - increase the sample size - reduce the confidence level

73 Confidence Interval for a Proportion (p)
Using Excel and MegaStat To find a confidence interval for a proportion in Excel, use (for example) =0.15-NORMSINV(.95)*SQRT(0.15*(1-0.15)/200) =0.15+NORMSINV(.95)*SQRT(0.15*(1-0.15)/200)

74 Confidence Interval for a Proportion (p)
Using Excel and MegaStat In MegaStat, enter p and n to obtain the confidence interval for a proportion. MegaStat always assumes normality.

75 Confidence Interval for a Proportion (p)
Using Excel and MegaStat If the sample is small, the distribution of p may not be well approximated by the normal. Confidence limits around p can be constructed by using the binomial distribution.

76 Confidence Interval for a Proportion (p)
Polls and Margin of Error In polls and surveys, the confidence interval width when p = .5 is called the margin of error. Below are some margins of error for 95% confidence interval assuming p = .50. Each reduction in the margin of error requires a disproportionately larger sample size.

77 Confidence Interval for a Proportion (p)
Rule of Three If in n independent trials, no events occur, the upper 95% confidence bound is approximately 3/n. Very Quick Rule A Very Quick Rule (VQR) for a 95% confidence interval when p is near .50 is p + 1/ n

78 Applied Statistics in Business and Economics
End of Part 1 of Chapter 8


Download ppt "Sampling Distributions and Estimation"

Similar presentations


Ads by Google