Building on the logic of hypothesis testing: T-tests

Building on the logic of hypothesis testing: T-tests

One-Sample T-test: Outline
Introduce the t-test and explain when it should be used Define Directional Hypotheses (one-tailed t-tests) and contrast them with ‘Non-Directional Hypotheses’ (two-tailed t-tests) that were described earlier. Learn how to find tcrit for directional hypotheses Highlight a second method for making decisions regarding the null hypothesis: the p-value method

One-Sample T-test: Outline
Learn to calculate a p-value and review important points about using this method Demonstrate one measure for estimating whether an experimental effect is large or small (Cohen’s D) Advanced Topic: Calculating  and the Power associated with a hypothesis test Outline the steps for conducting Hypothesis Tests using SPSS

What is a one-sample t-test?
When we have a single sample mean and we are trying to decide if it is drawn from a specific population. …this is what we’ve been doing so far in hypothesis testing.

The Animal Cracker Packer
Abby is the manager of an animal cracker factory. She is concerned that her aging cracker packer might need to be replaced. Each bag of crackers is supposed to weigh 454 grams. Abby would like to conduct a hypothesis test but she faces one big obstacle: she does not know the population variability (σ), which means she cannot use the formula for zobs. Q: Have we ever faced a similar problem? A: Yes; when we wanted to calculate confidence intervals and we did not know σ.

The Animal Cracker Packer
Q: What did we do then? A: Substituted s for σ and substituted t for z Q: Is that what Abby should do now? A: You betcha! Why do you think this section is called t-tests?

Limitations of hypothesis testing using z-scores
Our sample must be large (> 30) in order for the Central Limit Theorem to ‘kick in’. σ must be known. This is a more serious limitation because it is almost impossible to know σ in the real world.

What do we do if σ is unknown?
Just like we did with confidence intervals: Use s as an estimate of σ Use t as our test statistic instead of z The basic steps for conducting t-tests Determine the value for tcrit Calculate tobs Compare tobs with tcrit

Visualizing the t-distribution
Generally not normal-flattened and stretched out too much Shape determined by df Approximates a normal distribution at larger sample sizes Approximates z when sample size is large

Step 1: Determining tcrit
tcrit depends on the degrees of freedom; df = n-1 If α = .05, and n = 10: t(=.05; 9) = 2.262 *If specific df is not listed; look up t values for df in between and then use the larger t value

Step 2: Calculating tobs
The only difference is that s replaces σ NOTE: is an estimate of SE Step 3: compare tobs with tcrit Exactly the same as before.

Steps for completing a one sample t-test
Specify the NULL hypothesis (HO) Specify the ALTERNATIVE hypothesis (HA) Designate the rejection region by selecting . Determine the critical value of your test statistic Use appropriate degrees of freedom (n-1) Use sample statistics to calculate test statistic. tobs = Compare observed value with critical value: If test statistic falls in RR, we reject the null. Otherwise, we fail to reject the null. Interpret your decision regarding the null What do your data imply regarding the question that motivated your experiment?

Abby’s Animal Cracker Packer
Abby samples the next 25 cracker bags packed by the machine to determine whether it is putting 454g of crackers in each bag. The sample statistics are as follows: M = g; s = 16 g. Is the machine properly filling cracker packages? Step 1: Ho:  = 454 g Step 2: Ha:   454 g Step 3:  = .05

Then we can make a judgment about how likely our sample mean is.
We must figure our what the sampling distribution would look like if the null is true. Then we can make a judgment about how likely our sample mean is. µ0 SE

Abby’s Animal Cracker Packer
N=25, M = g; s = 16 g. Step 1: Ho:  = 454 g Step 2: Ha:   454 g Step 3:  = .05 Step 4: tcrit( = .05, df = 24) = ± Step 5: tobs Step 6: Because the tobs falls in the rejection region, we would reject the null.

Comparing tobs and tcrit: The Cracker Packer
Step 7: Interpret results Is Abby’s machine properly filling bags?

Proper Statistical Notation
“Old school style” t (df) = t-observed, p < alpha OR p > alpha t (24) = 2.625, p < .05 Code for: “With degrees of freedom of 24, our t-observed value was and the probability of getting this value or one more extreme if the null were true is less than a 5% chance.” “New APA style” t (24) = 2.625, p = .025 (exact p can only be calculated using a computer)

The Crocodile Hunter Before his untimely death, the Crocodile hunter, Steve Irwin was studying the length of a new group of crocs in Perth. He wanted to know how the length of an average adult in this new group compared to other adult crocodiles. Owing to his many years of experience with crocs, Steve knows that  = 21 feet. Because catching and measuring crocs is dangerous – even for the Crocodile Hunter – Steve is only able to capture and measure five crocs (M = 24, s = 1.58). Are the Perth Crocs really different from typical crocs? Assume  = .05 Complete steps 1-7 for a hypothesis test

Are Perth Crocs as long as regular Crocs?
Step 1: Ho: Step 2: Ha: Step 3: α = Step 4: tcrit( = .05, df = 4) = Step 5: tobs = Step 6: Step 7:

Two-tailed vs. One-tailed tests
Two-tailed: observing a sample mean in either the upper or lower tail of the sampling distribution would be theoretically meaningful Ho:  = some value Ha:   some value Rejection region split between two tails: α/2 in upper tail, α/2 in lower tail. One-tailed: observing a sample mean in only one of the two tails of the sampling distribution would be theoretically meaningful Ho:   some value OR  ≤ some value Ha:  < some value OR  > some value Rejection region located entirely in one tail; EITHER α in upper tail OR α in lower tail.

Did not change the number of errors
Directional (one-tailed) Hypothesis Tests: What happens at the drive-thru? Let’s pretend that Abby leaves the cracker packer factory for the fast-paced, high-pay, take-no-prisoners life of a fast food restaurant manager. Abby is considering whether or not to buy a new intercom system for the drive-thru, so she rents one to test it out. What if Abby wanted the new intercom to reduce errors? If the new intercom… …Abby would … Increased errors keep the old intercom Did not change the number of errors Decreased errors buy the new intercom

What if Abby rented the new intercom to reduce errors?
Decreasing errors: the only meaningful result would be if the sample mean fell at the extreme low end of the sampling distribution. In this case: Ho:   some value Ha:  < some value And the rejection region would look like this: If her sample mean was out here (big increase in errors), we wouldn’t care, we’d still fail to reject Ho

Directional (one-tailed) Hypothesis Tests: What happens at the drive-thru?
Let’s pretend that Abby leaves the cracker packer factory for the fast-paced, high-pay, take-no-prisoners life of a fast food restaurant manager. Abby is considering whether or not to buy a new intercom system for the drive-thru, so she rents one to test it out. What if Abby rented the new intercom to increase sales? If the new intercom… …Abby would … Increased sales buy the new intercom Did not change sales keep the old intercom Decreased sales

What if Abby rented the new intercom to increase sales?
Increasing sales: the only meaningful result would be if the sample mean falls at the extreme high end of the sampling distribution. In this case: Ho:  ≤ some value Ha:  > some value And the rejection region would look like this: If her sample mean was out here (big decrease in sales), we wouldn’t care, we’d still fail to reject Ho

Finding the critical value for one-tailed test
One tailed: Entire amount of alpha in one tail;  = .05 Two tailed: Divide alpha in half;  = .05, α/2 = .025 Are you more likely to reject the null for a one-tailed test or a two-tailed test? How do you decide which one to do?

My honest opinion on one-tailed tests….
ONE-TAILED TESTS ARE INAPPROPRIATE UNLESS YOU HAVE AN EXTREMELY GOOD REASON FOR USING THEM BEFORE YOU RUN YOUR EXPERIMENT. I HAVE NEVER SEEN A STRONG ARGUMENT FOR WHY ONE WAS APPROPRIATE. EVER!!!!

Steps for conducting t-tests: including directional tests
Decide whether you are conducting a one- or a two-tailed test. Specify the NULL hypothesis (HO) 2-tailed: µ = some value; 1-tailed: µ ≤ or ≥ some value Specify the ALTERNATIVE hypothesis (HA) 2-tailed: µ ≠ some value 1-tailed: u > or < some value Designate the rejection region by selecting . 2-tailed: /2 in the tail 1-tailed: α in the tail Determine the critical value of your test statistic (remember to use appropriate df) Use sample statistics to calculate test statistic. tobs = Compare observed value with critical value If test statistic falls in RR, we reject the null. Otherwise, we fail to reject the null. Interpret your decision regarding the null What do your data imply regarding the question that motivated your experiment?

Comparing the results of one- and two-tailed t-tests
You lost a lot of money at the track and were forced to become the personal statistician of notorious underworld crime boss “Big Lou”. Big Lou wants to know if his son “Moderately-Sized Lou” is stealing from his gambling operation. Before Lou Jr. took over the operation, it used to gross $3500 per night (µ). Big Lou tells you, “I don’t care if he is grossing more than $3500, I only care if he’s grossing less. Got it?!”

At this point, you could give Big Lou a lecture regarding the theoretical considerations that guide the choice between one- and two-tailed tests, but I would not be so bold... You sample the gross earnings of the casino over the next 25 nights. The average of the sample is $3338; s = Is “Moderately-Sized Lou” in trouble?  = .05?

$3,500 Lou Jr. gets whacked Lou Jr. is ok!

Gross $3500 per night (µ) Sample 25 nights, M =$3338; s = 450. Step 1: Big Lou has asked us to conduct a one-tailed test with the entire rejection region in the lower tail. Thus, our null and alternative hypotheses will be as follows: Step 2: Ho:   3500 Step 3: Ha:  < 3500 Step 4:  = .05 Step 5: tcrit (α=.05, df = 24; 1-tailed) = Step 6: tobs Negative value!

Gross $3500 per night (µ) Sample 25 nights, M =$3338; s = 450. Step 7: Our observed t falls in the rejection region. Therefore, we would reject the null: t (24) = -1.8, p <.05 Step 8: Interpret the results

Big Bad Lou as a Two-Tailed Test
Although it would be unwise for you to challenge Big Lou’s decision to run a one-tailed test, the same is not true for Mrs. Lou. She loves her baby boy and wisely asks you to conduct a two-tailed test, just to see what would happen. After all, wouldn’t Lou Jr. deserve a big raise if receipts from the gambling operation increased rather than decreased? Bear in mind that, just like with selecting a value for α, the time to make a decision regarding whether to run a one- or two-sampled test is BEFORE you have seen the data.

Following mama Lou’s request
Step 1: Because we have decided to conduct a two-tailed test, our statistical hypotheses would be as follows: Step 2: Ho:  = 3500 Step 3: Ha:  ≠ 3500 Step 4:  = .05 Step 5: tcrit(α=.05, df = 24, 2-tailed) = ± Step 6: The observed value of our test statistic does not change: tobs = -1.8

Following mama Lou’s request
Step 7: Our observed t DOES NOT fall in the rejection region. Therefore, we would fail to reject the null: t (24) = -1.8, p > .05 Ethics: Be judicious when making a choice. Choose before you see your data! Step 8: Interpret results

Vertical-Horizontal Illusion
Although the two lines are exactly the same length, the vertical line appears longer. To examine the strength of this illusion a researcher prepares a version of the illusion where each line is exactly 10 inches. They tell 25 participants that the horizontal line is 10 inches and ask them to estimate the length of the vertical line. The average length estimated is 12.2 inches, s = Conduct a one-tailed hypothesis test (α = .01)

Vertical-Horizontal Illusion
Mean = 12.2; s = 1.00; n = 25 Step 1: One-tailed test Step 2: Ho:  ≤ 10 Step 3: Ha:  > 10 Step 4:  = .01 Step 5: Step 6: Step 7: Step 8:

The critical value method (what we’ve been using so far)
1.) We set alpha 2.) We find the correspond t or z critical value 3.) We see if our t or z observed is in the rejection region (beyond the critical value)

Introducing the p-value method (APA’s new recommendation)
1.) We set alpha 2.) We find the exact probability of obtaining our sample mean or one more extreme if the null is true: called the p-value 3.) We see if the p-value is less than alpha

p-value = the probability (assuming Ho is true) of observing a sample mean that is at least as extreme as the observed sample mean.

Using the p-value method
If you know the population SD and can use z Finding the p-value is easy! It’s just the proportion in the tail that corresponds to zobs For a one-tailed: p = area in the tail For a two-tailed: p= 2 x (area in the tail) Either way If p < α, we REJECT THE NULL. If p > α, we FAIL TO REJECT THE NULL.

EXAMPLE: Lou’s question using a two-tailed (for illustration only: t-test is more appropriate) zobs = -1.80 Proportion in the tail: .0359 Two-tailed: x 2 = .0718 Alpha = .05, .0718 > .05, so we fail to reject the null .0359 .0359 -1.80 1.80

If you don’t know the population SD and are doing a t-test We can’t find an exact p-value in the table BUT we can find a p-value if we use RC (RC is “smart” and knows all the exact values for t at any possible df and the corresponding p value) RC is also smart and already calculated p-value X 2 So you don’t have to do that step!

Using RC to get the p-value
National average pieces of fruit eaten per day is 2.5 pieces How do AC students compare to the national average? Ho:  = 2.5 Ha:  ≠ 2.5 α = .05 p-value method: We want to see if the probability of getting our mean or one more extreme in either tail is less than alpha

One-Sample T-test: RC Output
One Sample t-test data: Fruit t = , df = 29, p-value = alternative hypothesis: true mean is not equal to percent confidence interval: sample estimates: mean of x 3.2 mean sd n

Reporting the results of a t-test
Amherst College students consumed an average of 3.2 pieces of fruit per day (s = 2.57). These data did not provide enough evidence to conclude that Amherst College students differed from the national average in terms of fruit consumption: t (29) = 1.49, p = .15. Mean SD Df tobs p-value

Using RC to get the p-value
National average pieces of fruit eaten per day is 2.0 pieces How do AC students compare to the national average? Ho:  = 2.0 Ha:  ≠ 2.0 α = .05 p-value method: We want to see if the probability of getting our mean or one more extreme in either tail is less than alpha

One-Sample T-test: RC Output
One Sample t-test data: Fruit t = , df = 29, p-value = alternative hypothesis: true mean is not equal to percent confidence interval: sample estimates: mean of x 3.2 mean sd n

Reporting the results of a t-test
Amherst College students consumed an average of 3.2 pieces of fruit per day (s = 2.57). These data indicate that Amherst College students are significantly above the national average in terms of fruit consumption: t (29) = 1.49, p = .15. Mean SD Df tobs p-value

Important points about the p-value method
The p-value method and the critical value method will always produce the same decision regarding the null. The p-value method gives us more information than the critical value method, in that it tells us where our sample mean fell in the sampling distribution What if p = .06? (a marginally significant result) A lower p-value does not imply a ‘bigger effect’ even though – all things being equal – a lower p-value implies a larger difference between M and µ0. What is the easiest way to obtain a smaller p-value in an experiment?

Effect size- Cohen’s D Effect size – a statistical procedure for determining the magnitude of an experimental manipulation. dm = the difference between the sample mean and the expected value of µ Cohen’s d Evaluation 0 < d < .2 Small .2 < d < .8 Medium d > .8 Large Effect size for the ‘Big Lou’ question: mean difference: 3338 vs = 162

Statistical significance isn’t everything
A large school district decides to implement a new drug awareness program for high school students. They select 1,000 students from the district to participate. Prior data suggested that the average high school student in this district has 4.2 alcoholic drinks per month, σ =2. The district administers the new program and finds that afterwards the average is now 4 drinks per month. Should they adopt the new intervention program?

Mean = 4 drink per month; n = 1000
Ho: = 4.2 drinks Ha:  4.2 drinks = .05 Zobs .0008 x 2 = < alpha (.05): therefore reject the null. Decision: the intervention significantly reduced drinking behavior. Problem???

Mean = 4 drink per month; n = 1000
Ho: = 4.2 drinks Ha:  4.2 drinks = .05 Zobs Zobs = x 2 = = p-value .0008 x 2 = < alpha (.05): therefore reject the null. Decision: the intervention significantly reduced drinking behavior. Problem???

.0016 < .05, so we reject the null
Mean = 4 drink per month; n = 1000 Ho: = 4.2 drinks Ha:  4.2 drinks = .05 Zobs Zobs = x 2 = = p-value .0008 x 2 = < alpha (.05): therefore reject the null. Decision: the intervention significantly reduced drinking behavior. Problem??? p-value vs. alpha = vs. 05 .0016 < .05, so we reject the null The intervention significantly reduced drinking behavior. Interpretation: The intervention significantly reduced drinking behavior. Problems?

Putting it all together
You and your pals decide you’ve had enough of AC and head out on an epic road trip! The first few hours are fun, but then boredom starts to set in. You get to thinking about this class and me and you can’t help but wonder: WWMD? So you send me a SNAPCHAT!!!! And I tell you that when I’m on a road trip, I like to break up the monotony by collecting and analyzing data. “Of course!,” you say. “Why didn’t we think of that?”

Putting it all together
Your big question: How does the number of cows in the fields of W. Mass compare to the national average? You count the number of cows in the next 36 fields that you pass. Your sample yields a mean of 15 cows per field. You’re too drunk (except for the driver) to calculate the SD, but a quick search on the Google (the Google can do anything these days) shows that: µ = 10, σ = 11 Set alpha at .05 and run a two-tailed test: Use the p-value method to decide if our cow population differs from the national average. Compute an effect size.

Building on the logic of hypothesis testing: T-tests

Similar presentations

Presentation on theme: "Building on the logic of hypothesis testing: T-tests"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Building on the logic of hypothesis testing: T-tests

Similar presentations

Presentation on theme: "Building on the logic of hypothesis testing: T-tests"— Presentation transcript:

Similar presentations

About project

Feedback