 # Estimation of Means and Proportions

## Presentation on theme: "Estimation of Means and Proportions"— Presentation transcript:

Estimation of Means and Proportions

Welcome to the interesting part of the course

Concepts Estimator: a rule that tells us how to estimate a value for a population parameter using sample data Estimate: a specific value of an estimator for particular sample data

Concepts A point estimator is a rule that tells us how to calculate a particular number from sample data to estimate a population parameter An interval estimator is a rule that tells us how to calculate two numbers based on sample data, forming a confidence interval within which the parameter is expected to lie

Properties of a Good Estimator
Unbiasedness: mean of the sampling distribution of the estimator equals the true value of the parameter Efficiency: The most efficient estimator among a group of unbiased estimators is the one with the smallest variance

Properties of a Good Estimator

Estimation of a Population Mean
The CLT suggests that the sample mean may be a good estimator for the population mean. The CLT says that: Sampling distribution of sample mean will be approximately normally distributed regardless of the distribution of the sampled population if n is large The sample mean is an unbiased estimator The standard error of the sample mean is

Estimation of a Population Mean
A point estimator of the population mean is: An interval estimator of the population mean is a confidence interval, meaning that the true population parameter lies within the interval of the time, where is the z value corresponding to an area in the upper tail of a standard normal distribution

Estimation of a Population Mean
Usually σ (the population standard deviation) is unknown. If n is large enough (n ≥ 30) then we can approximate it with the sample standard deviation s.

One Sided Confidence Intervals
In some cases we may be interested in the probability the population parameter falls above or below a certain value Lower One Sided Confidence Interval (LCL): LCL= (point estimate) – Upper One Sided Confidence Interval (UCL): UCL = (point estimate) +

Small Sample Estimation of a Population Mean
If n is large, we can use sample standard deviation s as reliable estimator of population standard deviation No matter what distribution the population has, sampling distribution of sample mean is normally distributed As the sample size n decreases, the sample standard deviation s becomes a less reliable estimator of the population standard deviation (because we are using less information from the underlying distribution to compute s) How do we deal with this issue?

t Distribution Assume (1) The underlying population is normally distributed (2) Sample is small and σ is unknown Using the sample standard deviation s to replace σ, the t statistic follows the t – distribution

Properties of the t Distribution
mound-shaped perfectly symmetric about t=0 more variable than z (the standard normal distribution) affected by the sample size n (as n increases s becomes a better approximation for σ) n-1 is the degrees of freedom (d.f.) associated with the t statistic

More on the t Distribution
Remember the t-distribution is based on the assumption that the sampled population possesses a normal probability distribution. This is a very restrictive assumption. Fortunately, it can be shown that for non-normal but mound-shaped distributions, the distribution of the t statistic is nearly the same shape as the theoretical t-distribution for a normal distribution. Therefore the t distribution is still useful for small sample estimation of a population mean even if the underlying distribution of x is not known to be normal

How to use the t-distribution table
The t-distribution table is in the book (Appendix II, Table 4, pp611). tα is the value of t such that an area α lies to its right. To use the table: Determine the degrees of freedom Determine the appropriate value of α Lookup the value for tα

Table: t Distribution

The Difference Between Two Means
Suppose independent samples of n1 and n2 observations have been selected from populations with means , and variances , The Sampling Distribution of the difference in means ( ) will have the following properties

The Difference Between Two Means
The mean and standard deviation of is If the sampled populations are normally distributed, the sampling distribution of ( ) is exactly normally distributed regardless of n If the sampled populations are not normally distributed, the sampling distribution of ( ) is approximately normally distributed when n1 and n2 are large

Point Estimation of the Difference Between Two Means
Point Estimator: A confidence interval for ( ) is

Difference Between Two Means (small sample)
If n1 and n2 are small then the t statistic is distributed according to the t distribution if the following assumptions are satisfied: 1. Both samples are drawn from populations with a normal distribution 2. Both populations have equal variances

Difference Between Two Means (small sample)
In practice, the t statistic is still appropriate even if the underlying distributions are not exactly normally distributed. To compute s, we can pool the information from both samples: or

Difference Between Two Means (small sample)
Point Estimate: Interval Estimate: a confidence interval for is Where s is computed using the pooled estimate described earlier

Sampling Distribution of Sample Proportions
Recall from Chapter 6: If a random sample of n objects is selected from the population and if x of these possess a chararacteristic of interest, the sample proportion is The sampling distribution of will have a mean and standard deviation

Estimators for p Assuming n is sufficiently large and the interval lies in the interval from 0 to 1, the: Point Estimator for p: Interval Estimator for p: A confidence interval for p is

Estimating the Difference Between Two Binomial Proportions
Point estimate Confidence interval for the difference

Choosing Sample Size How many measurements should be included in the sample? Increasing n increases the precision of the estimate, but increasing n is costly Answer depends on: What level of confidence do you want to have (i.e., the value of 100(1- α )? What is the maximum difference (B) you want to permit between the estimate of the population parameter and the true population parameter

Choosing Sample Size Once you have chosen B and α, you can solve the following equation for sample size n: If the resulting value of n is less than 30 and an estimate

Choosing Sample Size