Presentation is loading. Please wait.

Presentation is loading. Please wait.

Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Similar presentations


Presentation on theme: "Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9."— Presentation transcript:

1 Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9

2 From sample to population Inductive (inferential) statistical methods Make inference about a population based on information from a sample derived from that population Population sample inductive statistical methods

3 Statistical Concepts of Sampling Suppose we want to estimate the mean birthweight of Malay male live births in Singapore, 1992 Due to logistical constraints, we decide to take a random sample of 50 live births from the records of all Malay male live births for that year

4 Sampling from Target Population random sample of 50 Malay male live births in Singapore, 1992 Target population: All Malay male live births in Singapore, 1992 Suppose sample mean = 3.55 kg sample SD (S) = 0.92 kg What can we say about the population mean?

5 Statistical Modeling Assume the population values follow a normal or some other appropriate distribution. This means a relative frequency histogram of the population values will look like a normal or that appropriate distribution. Assume we have a random sample, i.e., we sample n (=50 in example) values independently from the population

6 Notation Sample data : Assumeare independent and each is distributed according to say a normal distribution Population parameters: Population mean = mean of the normal population Population variance = variance of the normal population Population standard deviation

7 Two general areas: (a)Statistical Estimation i.e. estimating population parameters based on sample statistics Statistical Inference (b) Hypothesis Testing i.e. testing certain assumptions about the population Also called Test of Statistical Significance

8 Statistical Estimation There are two ways by which a population parameter can be estimated from a sample: (1)Point estimate (2)Interval estimate

9 Point Estimate Estimate the population parameter by a single value: Sample meanpopulation mean Sample medianpopulation median Sample variancepopulation variance Sample SDpopulation SD Sample proportionpopulation proportion

10 If the average birthweight for a random sample of Malay male births was 3.55 kg and we use it to estimate , the mean birthweight of all Malay male births in the population, we would be making a point estimate for  Point Estimate Poor practice to report just the point estimate because people cannot judge how good the estimate is Should also report the accuracy of the estimate. Remember that the quality of an estimator is judged by its performance over REPEATED SAMPLING although we have just one sample in hand. Inference for population parameter should make allowance for sampling error

11 Accuracy of statistical estimation Two types of error: (a) Sampling error or fluctuation “random” error or fluctuation that is due entirely to chance in the process of sampling. Minimizing the sampling error maximizes the precision of a statistical estimate. (b)Systematic error or bias Non-random error/bias which is either a property of the estimator itself or due to bias in the sampling or measurement process. Minimizing the systematic error maximizes the validity of a statistical estimate. Systematic errors can be minimized by making efforts to reduce measurement bias (eg non-random sampling, non- response and non-coverage, untruthful answers, unreliable calibration, errors with data recording and coding etc)

12 Unbiased estimation of the mean i.e., the sample mean equals the population mean when averaged over repeated samples

13 Unbiasedness means the sample mean equals the population mean when averaged over repeated samples However, there is fluctuation from sample to sample Variance = ? Hypothetical results of repeated sampling

14

15

16 Standard Error (SE) of an estimator The SE of an estimator (e.g., the sample mean) is just the standard deviation (SD) of the estimator. It measures the variability of the estimator under “repeated” sampling SE is just a special case of SD The reason why the standard deviation of an estimator is called standard error is because it is a measure the magnitude of the estimation error due to sampling fluctuation

17 Standard Deviation vs Standard Error The population standard deviation (SD) measures the amount of variation among the individual measurements that make up the population and can be estimated from a sample using the sample standard deviation. The standard error (e.g. of the sample mean), on the other hand, measures how much the value of the estimator changes from sample to sample under repeated sampling. As we take only 1 sample rather that repeated samples in practice, it seems impossible at first to estimate standard error which is defined with reference to repeated sampling. Fortunately, the standard error of the sample mean is a function of the population SD. As the latter is estimable from a single sample, so is the standard error.

18 Estimated standard error of the sample mean Let denote the population SD It was shown earlier that SE = SD(sample mean) = /, where n is the sample size Since can be estimated by the sample standard deviation S, we can estimate the standard error by SE = S/ Note that SE decreases with n at the rate 1/, i.e., the precision of the sample mean improves as sample size increases

19 Knowing the mean and standard error of an estimator still doesn’t tell us the whole story The whole story is told by the sampling distribution since that helps in calculating the probabilities

20 Sampling distribution of the sample mean The distribution of the sample mean under “repeated” sampling from the population Distribution of the sample mean rather than individual measurements In practice, we take only one sample, not repeated samples and so the sampling distribution is unobserved but fortunately it can often be derived theoretically Demo: http://www.ruf.rice.edu/~lane/stat_sim/index.html

21 If the population is normal with mean and variance, then the sample mean based on a random sample of size n is also normal with mean and variance Note how we can derive theoretically the distribution of the sample mean under repeated sampling without actually drawing repeated samples This is important because we usually only have one sample at our disposal in practice Exact result when sampling from a normal population

22 Topic 10: Interval Estimate Provides an estimate of the population parameter by defining an interval or range of plausible values within which the population parameter could be found with a given confidence. This interval is called a confidence interval. The sampling distribution is used in constructing confidence intervals.

23 Confidence interval for the mean of a normal population Fact: With probability 0.95, a normally distributed variable is within 1.96 standard deviations from its mean. Now It follows that the sample mean must be within 1.96 standard errors from the population mean with probability 0.95. Equivalently, the population mean is within 1.96 standard errors from the sample mean.

24 We call a 95% confidence interval for the population mean. If is unknown, replace it by the sample SD and replace 1.96 by the upper 2.5-percentile of a t-distribution with n-1 degrees of freedom to yield

25 as a 95% confidence interval for the population mean

26 The t densities t densities are symmetric and similar in appearance to N(0,1) density but with heavier tails Tables for t distributions are widely available As d.f. increases, t distribution converges to standard normal distribution Demo: http://www.isds.duke.edu/sites/java.html

27 95% confidence interval for the population mean Birthweight data revisited n = 100,Sample mean = 3.55 kg, S = 0.92 kg SE =.92/sqrt(50) = 0.13 kg d.f. = 49, upper 2.5-percentile of t = 2.01 95% C.I. for the mean Malay male birthweight is 3.55 +/- 2.01 (0.13) = (3.29 kg, 3.81 kg)

28 The meaning of confidence interval Under repeated sampling, will contain the true mean 95% of the times.

29 Demo: http://www.isds.duke.edu/sites/java.html


Download ppt "Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9."

Similar presentations


Ads by Google