Presentation is loading. Please wait. # Inference about a Mean Part II

## Presentation on theme: "Inference about a Mean Part II"— Presentation transcript:

Inference about a Mean Part II
Lecture 7 Lecture 7

Today’s Plan Confidence Intervals Hypothesis testing Small samples
Large samples Types of errors Quick review of what we’ve learned so far Lecture 7

What we’ve seen so far We’ve worked with univariate populations
Recall that we have the standardized normal Z distributed Z ~ N(0,1): Ask question E(Y)? What is the probability that someone selected at random will have earnings of \$300? Lecture 7

What we’ve seen so far Before when we were considering the distribution around my we were considering the distribution of Y Now we are considering as a point estimator for my The difference is that the distribution for has a variance of s2/n where as Y has a variance of s2 Having obtained an estimate of a parameter (my), and considered the properties of the estimator (BLUE), we need to find out how ‘good’ the estimate is. Estimation is the first side of statistical inference. The other side of statistical inference: hypothesis testing Lecture 7

Confidence Intervals Recall our picture showing the distributions of Y and You repeatedly take samples from the population and get different estimates of The sampling distribution is the probability distribution for the values that takes on in the different samples Lecture 7

Confidence Intervals (2)
How do we assign probability bounds on our estimate? We don’t know what µy is, but we know the sample size and the sample estimates of We can estimate µy give or take some amount of error We know that is distributed We use s2 as an estimate of 2 Our distribution of : Lecture 7

Confidence Intervals (3)
Remember: Large samples: use the Z distribution Small samples: use the t distribution We’ll use the Z distribution for this example Our expression for the Z statistic is Lecture 7

Confidence Intervals (4)
We have the standard normal distribution around µy -Z +Z We want to describe how much area is between -Z and +Z We can create a 95% confidence interval around Z Lecture 7

Confidence Intervals (5)
We can write the confidence interval as Where did we get the values and +1.96? Look at the standard normal table We see that 47.5% of the area under the curve can be found between 0 and Or 95% between +/- 1.96 -1.96 +1.96 47.5% Lecture 7

Confidence Intervals (6)
So we can rewrite This is the confidence interval estimate for µy at a 95% level of confidence You can choose other levels As you increase the confidence level you increase the number of possible values µy can take Lecture 7

Using the t distribution
If we have a small sample, we should use the t distribution Our t statistic looks like What will our confidence interval look like? We substitute t for Z We don’t know the underlying population distribution But we can use the central limit theorem to assume that the sample distribution is approximately normal We can use the t distribution to approximate the sample distribution Lecture 7

Using the t distribution (2)
We have to choose the confidence interval (1- ) that requires a choice of a The area between the two t values is the confidence interval -t +t Confidence interval The usual accepted confidence level is 95% (a = 0.05) Lecture 7

Using the t distribution (3)
If (1- ) is the area between the two t values, then () is the sum of the area under the two tails if =0.95, (1- )=0.05 0.05/2 = .025 So for a 95% confidence level, of the area of the curve is found in each tail of the distribution Lecture 7

The t table In the first row, there is an upper number and a lower number The upper gives you the area in one tail given a two tail test The lower number gives the area in one tail or in two tails combined At an infinite number of observations, 2.5% of the area under the curve is found in each of the tails when our t statistic is it approximates the normal If our sample size is 10, 95% of the area under the t distribution is between and Note: the t has fatter tails than the standard normal Lecture 7

The t Table For a small sample size, the t values corresponding to a 95% confidence interval are larger in absolute value than the Z values for the same interval Depending on 3 things we get a very different approximation of the confidence interval Sample size Whether or not we use the population estimate for s These determine the type of distribution we use Lecture 7

Hypothesis Testing We want to ask:
What is the probability that µy is equal to some value? Using hypothesis testing we can determine whether or not it’s plausible that µy equals a certain value We have two types of samples Large: n > 30 Small: 30  n Lecture 7

Large Samples Large samples
Doesn’t matter if the population distribution is skewed or normal Doesn’t matter if the population variance is known or unknown Use the Z table Lecture 7

Small Samples Small samples
If the population is normally distributed and the population variance is known, use either the Z or t table If the population is normally distributed but the population variance is unknown use the t distribution with n-1 degrees of freedom (calculate the sample variance as an estimate of the population). If the population is non-normally distributed, use neither the t nor the Z (I will never give you a case like this) Lecture 7

Setting Up Hypotheses In hypothesis testing you set up a null hypothesis H0 Under the null hypothesis µy will take a particular value Example: we can create a null such that H° : µy = 300 Once we have a null hypothesis we can set up an alternative hypothesis H1 Lecture 7

One and Two - Tailed Tests
We can represent this in the following graph: One-tail tests We calculate the area in the right-hand tail if H1 : µy > 300 We calculate the area in the left-hand tail if H1 : µy <300 Two tail test: Find the area under both tails if H1 : µy  300 Lecture 7

Intervals and Regions We also need to assign a significance level (or confidence interval) For a two-tailed test we are looking to see if a value of 300 lies within the confidence interval With hypothesis tests we are creating an acceptance region bound by critical values Critical values are taken off the Z and t tables The regions in the tails are the critical regions Lecture 7

Intervals and Regions (2)
Critical region /2 1 -  Acceptance Region Critical value  is the significance level If you fail to reject the null, the Z or t statistic must fall in the acceptance region If you reject the null, the Z or t must fall in one of the critical regions Lecture 7

Types of Errors Type I errors
Rejecting a hypothesis when it is in fact true Example: In the confidence interval example we constructed the confidence interval (254  y  380). If the true pop. mean is 400 we can make H0 : y = In this case we’d falsely reject the null hypothesis! Type II errors Accepting a false hypothesis Example: if the true mean is 400 but we accept H0 : y =300 we would be accepting a false hypothesis Lecture 7

Types of Errors (2) Statisticians worry about Type I errors
They choose a significance level  that minimizes Type I errors To minimize Type I errors choose a small , where  is the total area in both tails Thus the area in each tail is /2 Lecture 7

Types of Errors (3) As  decreases, the likelihood of rejecting a true null hypothesis also decreases Most of the time  = 5% is used, and /2 = 2.5% We can say that we accept or reject the null, but we can’t say that we accept the alternative! Lecture 7

Hypothesis Testing in General
Null (H0): If you are using the t instead, replace the Z’s with t’s Lecture 7

Where are we now? So far we have learned about inference and testing hypotheses using assumptions about distributions Distributions We had samples and populations and used weights to make inferences about the population using sample statistics We assumed distributional forms such as the Z or t distributions Sampling distribution of the mean You should know the difference between E(Y) and Lecture 7

Where are we now? (2) BLUE: we’ll return to this in the next few lectures Estimation and hypothesis testing We now look to return to the regression line and consider the estimators for a and b from: Have to consider the properties of the OLS estimator (BLUE), and how do we construct hypothesis tests on the estimates of the parameters a and b? Lecture 7

Download ppt "Inference about a Mean Part II"

Similar presentations

Ads by Google