Presentation on theme: "Statistical Estimation and Sampling Distributions"— Presentation transcript:
1 Statistical Estimation and Sampling Distributions Topic 7Statistical Estimation and Sampling Distributions
2 Statistical Inference A statistical method which involves investigation of properties (estimation) concerning the unknown population parameters based on sample statistic results.
3 Point EstimatesThe parameter, which is denoted by θ , is an unknown property of a population. For example, mean, variance, proportion or particular quantile of the probability distribution.The statistic is a property of a sample. For example, sample mean, sample variance, proportion or a particular sample quantileEstimation is a procedure by which the information contained within a sample is used to investigate properties of the population from which the sample is drawn
4 Estimate Population Parameters ( ) …. with Sample Statistics ( ) Point Estimates of ParametersA point estimate of an unknown parameter θ is a statistic that represents a “best guess” at the value of θ . There may be more than one sensible point estimate of a parameter. For example,Estimate Population Parameters ( ) ….with Sample Statistics ( )Mean ( µ )Standard Deviation ( )SProportion ( p )
5 Unbiased and Biased Point Estimates A point estimates for a parameter is said to be unbiased ifUnbiasedness is a very good property for a point estimate to possess.If a point estimate is not unbiased then its bias can be defined to be
6 Point Estimate of a Success Probability The obvious point estimate of p isNotice that the number of successes X has a binomial distribution, X ~ B(n,p). ThereforeSo that indeed an unbiased point estimate of p
7 Point Estimate of a Population Mean Clearly it is since[ Remember, fair coin (n = 2, μ = p = ½) and fair dice (n = 6, μ = p =1/6 ]So thatThen indeed an unbiased point estimate of μ7
8 Point Estimate of a Population Variance We know that the sample varianceThen
9 Point Estimate of a Population Variance Sincethen
10 Point Estimate of a Population Variance We notice thatRemember,
11 Point Estimate of a Population Variance Putting this all together gives
12 Minimum Variance Estimates The best situation is constructing a point estimate that is unbiased and that also has the smallest possible varianceAn unbiased point estimate that has a smaller variance than any other point estimate is called a minimum variance unbiased estimate (MVUE).The efficiency of MVUE is shown by its relative efficiencyThe relative efficiency of an unbiased point estimate to an unbiased point estimate is given by
13 Mean Square ErrorsIn the case that two point estimates have different expectations and different variances, we prefer the point estimate that minimizes the value of mean square error (MSE) which is defined to be
14 ExercisesSuppose that E(X1) = μ, Var(X1) = 10, E(X2) = μ, and Var(X2) = 15, and consider the point estimatesHave students explain why each of these occurs.Level of confidence can be seen in the sampling distribution.Calculate the bias of each point estimate. Is any one of them unbiasedCalculate the variance of each point estimate. Which one has the smallest variance?Calculate the mean square error of each point estimate. Which point estimate has the smallest mean square error when μ = 8What is the relative efficiency of to the point estimate of ?
15 Exercise Solution Have students explain why each of these occurs. Level of confidence can be seen in the sampling distribution.
16 Exercise Solution b. Have students explain why each of these occurs. Level of confidence can be seen in the sampling distribution.
17 Exercise Solutionc.d.Have students explain why each of these occurs.Level of confidence can be seen in the sampling distribution.
18 Sampling Distributions Since the summary measures of one sample vary to those of another sample, we need to consider the probability distributions or sampling distributions of the sample mean , the sample variance S2, and the sample proportion24
19 Sampling MeansIf X1, … , Xn are observations from a population with a mean μ and a variance σ2 , then the central limit theorem indicates that the sample mean has the approximate distributionThe standard deviation of the sample mean is referred to as standard error (SE)Since the standard deviation σ is usually unknown, it can be replaced by S.24
20 Sample VariancesIf X1, … , Xn are normally distributed with a mean μ and a variance σ2 , then the sample variance S2 has the distributionis a chi-square distribution with n – 1 degrees of freedom.In the case that the variance is unknown, If X1, …. Xn are normally distributed with a mean μ , thentn-1 is student’s t distribution with n – 1 degrees of freedom.24
21 Sample ProportionsIf X ~ B(n, p), then the sample proportion has the approximate distributionThe standard error of the sample proportion is24
22 ExercisesThe capacitances of certain electronic components have a normal distribution with a mean μ = 174 and a standard deviation σ = 2.8. If an engineer randomly selects a sample of n = 30 components and measures their capacitances, what is the probability that the engineer’s point estimate of the mean μ will be within the interval (173, 175)?A scientist reports that the proportion of defective items from a process is 12.6%. If the scientist’s estimate is based on the examination of a random sample of 360 items from the process, what is the standard error of the scientist’s estimate?The pH levels of food items prepared in a certain way are normally distributed with a standard deviation of σ = An experimenter estimates the mean pH level by averaging the pH levels of a random sample of n items. What sample size n is needed to ensure that there is a probability of at least 99% that the experimenter’s estimate in within 0.5 of the true mean value?24
24 Exercise Solutions Recall The estimate is within 0.5 of the true mean value24
25 Maximum Likelihood Estimates We have considered the obvious point estimates for a success probability, a population mean and variance. However, it is often of interest to estimate parameters that require less obvious point estimates. For example, how should the parameters of the Poisson, exponential, beta or gamma distributions be estimated?Maximum likelihood estimation is one of general and more technical methods of obtaining point estimates.24
26 Maximum Likelihood Estimate for One Parameter If a data set consists of observations x1, x2, …, xn from a probability distribution f (x,) depending upon one unknown parameter , the maximum likelihood estimate of the parameter is found by maximizing the likelihood functionIn practice, the maximization of the likelihood function is usually performed by taking the derivative of the natural log of the likelihood function.
27 ExampleSuppose again that x1, x2, …, xn are a set of Bernoulli observation, with each taking the value 1 (success) with probability p and the value 0 (no success) with the probability 1 – p .We can write this asThe likelihood function is thereforeWhere x = x1 + x2 +…+ xn and the maximum likelihood estimate is the value that maximize this
28 ExampleandHave students explain why each of these occurs.Level of confidence can be seen in the sampling distribution.Setting this expression equal to 0 and solving for p produce
29 Maximum Likelihood Estimate for Two Parameter If a data set consists of observations x1, x2, …, xn from a probability distribution f (x,1, 2) depending upon two unknown parameter, the maximum likelihood estimate and are the values of the parameters that jointly maximize the likelihood functionAgain the best way to perform the joint maximization is usually to take derivatives of the log-likelihood with respect to and to set the two resulting expressions equal to 0
30 ExampleThe normal distribution is an example of a distribution with two parameters, with a probability density functionThe likelihood of a set of normal observation is therefore
31 Example So that the log-likelihood is Taking derivatives with respect to the parameters values and gives
32 Example Setting d ln(L)/dμ = 0 gives And setting d ln(L)/dσ2 = 0 then givesDid you see any difference from the variance estimate that we have discussed before?
33 ExercisesSuppose that the quality inspector at the glass manufacturing company inspects 30 randomly selected sheets of the glass and records the number of flaws found in each sheet. These data values are shown as follows0 , 1 , 1 , 1 , 0 , 0 , 0 , 2 , 0 , 1 , 0 , 1 , 0 , 0 , 0 ,0 , 0 , 1 , 0 , 2 , 0 , 0 , 3 , 1 , 2 , 0 , 0 , 1 , 0 , 0If the distribution of the number of flaws per sheet is taken to have a Poisson distribution, how should the parameter λ of the Poisson distribution be estimated? And find its value.
34 Exercise SolutionsWe should first estimate the parameter of λ. Then, the probability mass function of the data isSo that the likelihood isThe log-likelihood is thereforeTaking its derivative w.r.t. λ and setting it to zero, we get
35 Exercise Solutions Therefore Since the variance of each data is λ , thenThe standard error of the estimate of a Poisson parameter is