Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.

Similar presentations


Presentation on theme: "1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing."— Presentation transcript:

1 1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing

2 2 Normal distribution mean value As a start we can think about the normal distribution. Along the horizontal axis we measure the variable we think has a normal distribution. The variable might be age, income or whatever. Note the mean value is in the center of the distribution.

3 3 Normal distribution mean value The curve above the axis helps us understand what the probability of a range of values would have. As an example, the probability of having a value above the mean is 50%. 50% is the area under the curve to the right of the mean. The z table would help us find the probability of other ranges of values.

4 4 Example We could imagine that the people in a typical classroom represent a population. The population would be the people who meet in the class on a regular basis. As we think of this population, we might want to know about characteristics of the population such as age, income, or educational attainment. If we looked at the population we would call the population mean and standard deviation of a variable(of say, age) parameters of the population.

5 5 example When we look at the people in the class we could find out the population mean by asking everyone to give their age and then we could calculate the mean. But in many statistical studies we do not collect information from everyone. We only take a sample. The sample will have a mean and standard deviation as well. Since a sample does not include everyone in the population, the sample mean (and sample standard deviation) will have a value that depends on which people made it into the sample.

6 6 example Let’s take a sample of 5 people in the class and determine the average age. We have............................................... for an average of...................... If we took a different sample of 5 we would have............................................... for an average of...................... So in principle we could look at every possible sample of size 5 and calculate the mean for each sample. The mean for each sample of size five could then be looked at as a distribution.

7 7 sampling distribution When we think about repeated sampling, statistics like the mean from the sample could be thought of as a making up a sampling distribution. Due to the central limit theorem, we know a great deal about the sampling distribution of the sample mean. The nice thing about the central limit theorem is that it holds whether we know about the population or not.

8 8 central limit theorem The basic idea of the central limit theorem is that if you consider samples from a population, the sampling distribution of sample means 1) has a normal distribution - the sampling distribution is normal, 2) has mean value equal to the mean of the population, and, 3) has standard deviation or, in this context, a standard error equal to the standard deviation of the population divided by the square root of the sample size. The standard error is just the standard deviation of the sampling distribution and, as such, is just given this special name.

9 9 central limit theorem So we see the variable in the population can have a normal distribution and the sample mean can have a normal distribution. Example: If in the population age ~N(30, 3) – the ~ means distributed – the N here means normally with mean 30 and standard deviation 3-, then samples of size, say 9, have x ~N(30, 1). (The x with a line over it is called x bar and refers to the sample mean.) How did I get this? Do you get it?

10 10 68-95-99.7 rule For a normal distribution it is know that 1) approximately 68% of the values are within 1 standard deviation of the mean, 2) approximately 95% of the values are within 2 standard deviations of the mean, and 3) approximately 99.7% of the values are within 3 standard deviations of the mean. So from our example of age before, in the population 68% of the people are between 27 and 33, but 68% of the sample means would fall between 29 and 31.

11 11 rule in a graph population age mean age 27 30 33

12 12 statistical inference Up to this point we have operated as if we knew the population mean. (What we have done will act as a model for what we are about to do.) But most of the time we don’t - that is why we have statistics. We will take a sample and try to infer what the population mean is from the sample we draw. The two methods of inference are 1) confidence intervals and 2) hypothesis tests.

13 13 confidence interval When we take a sample and calculate the mean of the sample we could use this sample mean as our estimate of the population mean. But remember that the mean of the sample would vary depending on the sample. Instead of just a point estimate of the mean of the population we use an interval or range of values for our estimate of where the population mean might be. To account for sampling variability, we use an interval.

14 14 confidence interval sample means The true mean we just don’t know it. The lines I put here tell us where 95% of the means should fall. The distance from the center is 1.96(σ)/(square root of sample size) σ below is the population standard deviation, which we will assume is known.

15 15 confidence interval sample means Now when we get the sample mean we use the same distance, 1.96 (σ)/(square root of sample size), around the sample mean. We are then 95% confident that our interval will contain the true unknown mean. x

16 16 1.96 Where did I get the 1.96 on the previous page? Before we said approximately 95% of the sample means are within 2 standard deviations of the mean. To be more precise we say 95% of the sample means are within 1.96 standard deviations. If you look at the standard normal table in the book you see associated with a Z = 1.96 the value.975. So.025 is in the upper tail, and due to symmetry,.025 in the lower tail of a normal distribution. So to be precise we use 1.96 in the formulas when we refer to the middle 95%.

17 17 Story about hypothesis tests. Not really stats, but an idea to consider. Say I have two decks of cards. One deck is a regular deck – spades, hearts, diamonds and clubs. The other deck is special – 4 sets of hearts. Now, I take out one of the decks, but you do not know which one. In the language of statistics the null hypothesis will be that I took out the regular deck. You will accept the null hypothesis unless an event occurs that has a really low probability. If a really low probability event occurs you will reject the null hypothesis and go with the alternative hypothesis. So, I take out a deck and deal you five cards – a royal flush hearts! You would reject the null hypothesis of a regular deck and go with the alternative that the deck I pulled out is the special one because a royal flush hearts has a low probability in a regular deck.

18 18 hypothesis test In a hypothesis test we don’t know the unknown population mean, but we have a value in mind(the hypothesized value), say from other research or the like. What we then do is use the hypothesized value as if it were the true value and see how likely our sample mean value would be, coming from the population with the center at the hypothesized value. Low probabilities of occurrence(usually defined as 5% or.05) would have us reject our hypothesized value as the true mean.

19 19 hypothesis test sample means With the hypothesized value as the center, we would look at the probability of getting the sample mean value or a more extreme value. If the shaded value is.05 or less(for a one tail test) we reject the hypothesized value as the true value. x p-value

20 20 hypothesis test sample means When this shaded area is.05 or less we are saying that, based on the hypothesized value as the center, the probability of getting a sample mean with the value we obtained is so small that we will reject our hypothesized value and conclude the center value must be something else. x


Download ppt "1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing."

Similar presentations


Ads by Google