Null Hypothesis Significance Testing (NHST)

Null Hypothesis Significance Testing (NHST)
A way by which psychologists and other scientists attempt to draw inferences about population parameters. The way NHST is taught and practiced today represents an amalgam of ideas developed by Fisher, Neyman, and Pearson in the first half of the 20th Century. This history partly explains why NHST is such a convoluted way of thinking about research and why so many psychologists cannot accurately define the p-value from NHST. Suppose you hypothesize that OSU students will, on average, have higher ACT scores than the national average. You go about testing your hypothesis by randomly sampling 15 students who are currently enrolled at OSU. Their ACT scores are as follows: 24, 29, 19, 19, 26, 28, 22, 18, 19, 32, 30, 25, 26, 21, 23 You obtain normative data for the ACT from the U.S. government website: There you find that the average ACT score for 2009 (the last year reported) was equal to 21.1 with a standard deviation of You treat these values as population parameters to which you will compare your estimated population mean for OSU students. Do “OSU Students” have a higher mean than the norm?

Alternative or Research Hypothesis
The point of NHST is to make an inference about population parameters. Pop1 : General Population (the norm) µNorm = σNorm = 5.1 Pop2 : Population of OSU Students µOSU = ? σOSU assumed equal to σNorm 3-Valued Logic Form Three Properties Population parameters Mutually Exclusive All possible outcomes covered Null Hypothesis Ho : µOSU = µNorm HA : µOSU < µNorm or µOSU > µNorm *predicted Alternative or Research Hypothesis

Ho : µOSU = µNorm HA : µOSU < µNorm or µOSU > µNorm
OSU Sample : 24, 29, 19, 19, 26, 28, 22, 18, 19, 32, 30, 25, 26, 21, 23 We enter these numbers into our calculators and compute the sample mean. For these 15 students it equals (n = 15, x 𝑂𝑆𝑈=24.07) Is this value higher than 21.1? Yes! But, we do not yet conclude that µOSU > µNorm. Why not? Because we only have the sample mean, not the population mean. It could very well be the case that the population mean for OSU students is 21.1, thus equal to the norm. What do we do then, since we wish to choose one of the hypotheses. Again, notice how the hypotheses are written in terms of population parameters, and the µOSU value has not been observed. This is why we refer to this process as “inferential statistics.” We are making an inference to an unobserved value.

OSU Sample : 24, 29, 19, 19, 26, 28, 22, 18, 19, 32, 30, 25, 26, 21, 23 We enter these numbers into our calculators and compute the sample mean. For these 15 students it equals (n = 15, x 𝑂𝑆𝑈=24.07) So, we are going to put on our NHST hats and think in terms of probabilities. Specifically, if the population OSU mean were in fact 21.1, what is the probability of obtaining a sample mean of at least from 15 randomly selected students? If a result of obtaining at least is really strange (improbable), then perhaps we can at least rule out 21.1; that is, we can reject the null hypothesis which states that µOSU = µNorm = After rejecting the null hypothesis we can then choose between the other two hypotheses. A bit convoluted? Yes! But this is the wacky world of NHST.

With NHST, we will make several assumptions and set up some ground rules. Assume the null hypothesis is true. µOSU = µNorm Assume random sampling and a few other things (we’ll discuss later in class) Set up an agreed upon “critical probability” (i.e., critical p-value) before examining the data. In this case use our knowledge of how means behave in a sampling scenario to evaluate the data and draw a conclusion about the population parameters.

Using the Central Limit Theorem, which applies to means computed from randomly drawn samples, let’s assume a perfectly normal distribution of mean values. If we assume the null hypothesis to be true (µOSU = µNorm), then the mean of this distribution of means will equal 21.1. µx-bar = 21.1

Next, let’s convert the means in the distribution to z-scores
Next, let’s convert the means in the distribution to z-scores. We now have the Standard Normal Curve which has a mean equal to 0.

Next, we set up what is known as the “Rejection Region” in the distribution. Following the conventional practice of psychologists for the past 70+ years, we set up this region to correspond to 5% of the distribution. This is the infamous value underlying the statement “p ≤ .05” in journal articles (sometimes you might see p ≤ .01). Again, it is a value that is completely arbitrary. We use it solely on the basis of convention.

You can see below that we’ve cut the 5% evenly so that 2.5% is in the lower tail of the distribution and 2.5% is in the upper tail of the distribution. This is what we call a two-tailed test. Notice how the alternative hypothesis has two parts: one in which the OSU mean is higher than the norm, and one in which the OSU mean is lower. These two directions correspond to the two tails of the distribution. Ho : µOSU = µNorm HA : µOSU < µNorm or µOSU > µNorm

Ho : µOSU = µNorm HA : µOSU < µNorm or µOSU > µNorm If the observed mean, converted to a z-score, is NOT in the rejection region, then we’ll “fail to reject the null hypothesis” and declare the result as “nonsignificant, p > .05.”

Ho : µOSU = µNorm HA : µOSU < µNorm or µOSU > µNorm If the observed mean, converted to a z-score, is in the rejection region, then we’ll “reject the null hypothesis” and declare the finding as “statistically significant, p ≤ .05.”

Next, let’s refer to the. 05 (5%) as our “alpha level” (α =. 05)
Next, let’s refer to the .05 (5%) as our “alpha level” (α = .05). We’ll also refer to this as our p-critical, or pcrit, value. Using alpha (pcrit) we determine the critical z-values values for our statistical test. As we’ll see shortly, the test we will run on these ACT data is a z-test for means. The critical values are those z-scores that cut off the rejection region. Looking up these values in the z-table, we find zcrit = +/ Again, as seen below, these are the z-values that demarcate 2.5% of each tail of the distribution.

𝑧𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑=𝑧𝑜𝑏𝑠= x − 𝜇 𝜎 𝑛 = 24.07−21.1 5.1 15 = 2.97 1.32 =2.25
Finally, we are ready to see if the OSU student’s data are “statistically significant.” OSU Sample : 24, 29, 19, 19, 26, 28, 22, 18, 19, 32, 30, 25, 26, 21, 23 (n = 15, x 𝑂𝑆𝑈=24.07) We next convert the mean to a z-score using what is known as the “z-test for means” formula: 𝑧𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑=𝑧𝑜𝑏𝑠= x − 𝜇 𝜎 𝑛 = − = =2.25

𝑧𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑=𝑧𝑜𝑏𝑠= x − 𝜇 𝜎 𝑛 = 24.07−21.1 5.1 15 = 2.97 1.32 =2.25
Ho : µOSU = µNorm HA : µOSU < µNorm or µOSU > µNorm 2.25 We can see that 2.25 falls in the rejection region. We therefore “Reject the Null Hypothesis” which is equivalent to saying that the result is “statistically significant, p ≤ .05.”

That is the specific inference we have drawn here.
Ho : µOSU = µNorm HA : µOSU < µNorm or µOSU > µNorm 2.25 Now that we’ve rejected the null hypothesis, which alternative hypothesis do we accept? We see that the sample mean for the OSU students is 24.07, which is higher than the norm (21.1). If we regard as an estimate of the OSU population mean (µOSU), then we conclude that the OSU population mean is higher than the norm, µOSU > µNorm. That is the specific inference we have drawn here.

How improbable is the sample mean? x 𝑂𝑆𝑈=24.07
This mean was converted to a z-score of Since we are running a two-tailed test, we always look in both tails of the distribution. We thus look up the probability for 2.25 (or higher) in our z-table and double the probability to get the total region demarcated by the +/ values in the distribution (see below). -2.25 2.25 From the z-table, we find Doubling this value gives us This is our pobserved (or pobs) value which is less than pcrit (.05). This is another way of reaching the same conclusion: Reject the null hypothesis, the result is “statistically significant, p = ”

So what is this pobs value psychologists consider so important?
Simply stated, it is a “weirdness statistic.” Assuming the null hypothesis is true, along with other assumptions (we’ll cover later in class), and given a random sample, obtaining a sample mean of at least is an unusual (i.e. weird) result. One could see values this extreme as a normal part of sampling variability, but they would be unusual (p = .0244). All in all, then, we aren’t saying a whole lot when we declare our results to be “statistically significant!” -2.25 2.25

Null Hypothesis Significance Testing (NHST)

Similar presentations

Presentation on theme: "Null Hypothesis Significance Testing (NHST)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Null Hypothesis Significance Testing (NHST)

Similar presentations

Presentation on theme: "Null Hypothesis Significance Testing (NHST)"— Presentation transcript:

Similar presentations

About project

Feedback