# 11.1 – Significance Tests: The basics

## Presentation on theme: "11.1 – Significance Tests: The basics"— Presentation transcript:

11.1 – Significance Tests: The basics

Inference: to assess the evidence provided by the sample to claim information about the population.

Hypothesis: A claim made about a population parameter, and sample data is gathered to determine whether the hypothesis is true.

Null Hypothesis: The statement being tested. We believe this to be true until we get evidence against it. NOTATION:

Alternate Hypothesis:
Statement we hope or suspect is true instead of the null hypothesis NOTATION:

(Two-Tailed)

Test Statistic: A sample statistic that is computed from the data. It helps us to make a statistical decision. Do we have enough evidence to reject the null hypothesis or not? test statistic = Z =

p-value: This value measures how much evidence you have against the null hypothesis. Small p-values indicate the outcome measured from the sample data is unlikely given the null hypothesis is true. It provides strong evidence against your null hypothesis.

Statistically Significant:
An event unlikely to occur by chance. If your p-value is small, then it is statistically significant. It is called alpha, .

Significance Level: The decisive p-value we fix in advance. This states when the null hypothesis should be rejected. This level is compared to the p-value. Common  levels of rejection are =0.10, =0.05, and =0.01. p < , then reject the null p  , then accept the null

Conditions: SRS Normality Independence

Example #1 – State the notation for the null and alternative hypothesis
Suppose we work in the quality control department of Ruffles Potato Chips. The quality control manager wants us to verify that the filling machine is calibrated properly. We wish to determine if the mean amount of chips in a bag is different from the advertised 12.5 ounces. The company is concerned if there are too many or too few chips in the bag.

Example #1 – State the notation for the null and alternative hypothesis
b. According to the US Department of Agriculture, the mean farm rent in Indiana was \$89 per acre in A researcher for the USDA claims that the mean rent has decreased since then. He randomly selected 50 farms from Indiana and determined the mean farm rent to be \$67.

Example #1 – State the notation for the null and alternative hypothesis
c. Researchers claim to have found a brain protein that blocks the craving for fatty food and therefore, increases the loss of body fat. To test this theory, 100 people are treated with protein and the reduction in body fat is measured.

True mean weight of loaves of bread
Example #2 At the bakery where you work, loaves of bread are supposed to weigh 1 pound. From experience, the weights of loaves produced at the bakery follow a Normal distribution with standard deviation  = 0.13 pounds. You believe that new personnel are producing loaves that are heavier than 1 pound. As supervisor of Quality Control, you want to test your claim at the 5% significance level. You weigh 20 loaves and obtain a mean weight of 1.05 pounds. a. Identify the parameter of interest. State your null and alternative hypotheses.  = True mean weight of loaves of bread

SRS Normality Independence
Example #2 At the bakery where you work, loaves of bread are supposed to weigh 1 pound. From experience, the weights of loaves produced at the bakery follow a Normal distribution with standard deviation  = 0.13 pounds. You believe that new personnel are producing loaves that are heavier than 1 pound. As supervisor of Quality Control, you want to test your claim at the 5% significance level. You weigh 20 loaves and obtain a mean weight of 1.05 pounds. b. Verify the conditions are met. SRS (must assume) Normality (yes, pop. is approx normal, therefore, so is sample dist) Independence (There are more than 200 loaves of bread)

Example #2 At the bakery where you work, loaves of bread are supposed to weigh 1 pound. From experience, the weights of loaves produced at the bakery follow a Normal distribution with standard deviation  = 0.13 pounds. You believe that new personnel are producing loaves that are heavier than 1 pound. As supervisor of Quality Control, you want to test your claim at the 5% significance level. You weigh 20 loaves and obtain a mean weight of 1.05 pounds. c. Calculate the test statistic and the P-value. Illustrate using the graph provided.

P(Z > 1.72) = 1 – P(Z < 1.72) =

P(Z > 1.72) = 1 – P(Z < 1.72) = 1 – = 0.0427

Example #2 At the bakery where you work, loaves of bread are supposed to weigh 1 pound. From experience, the weights of loaves produced at the bakery follow a Normal distribution with standard deviation  = 0.13 pounds. You believe that new personnel are producing loaves that are heavier than 1 pound. As supervisor of Quality Control, you want to test your claim at the 5% significance level. You weigh 20 loaves and obtain a mean weight of 1.05 pounds. d. State your conclusions clearly in complete sentences. I would reject the null hypothesis at the 0.05 level. I believe that the workers are making the loaves heavier.

11.2 - Carrying Out Significance Tests

Steps to Hypothesis Testing: PHANTOMS
Parameter of interest H: Hypothesis A: Assumptions N: Name of Test T: Test Statistic O: Obtain P-Value M: Make a Statistical Decision S: Summary in context of problem.

One-Sample Z-Test: Testing the mean when  is known.

Calculator Tip: Z-Test Stat – Tests - ZTest

Mean oil output per well in the US
Example #1 An energy official claims that the oil output per well in the US has declined from the 1998 level of 11.1 barrels per day. He randomly samples 50 wells throughout the US and determines that the mean output to be 10.7 barrels per day. Assume =1.3 barrels. Test the researchers claim at the =0.05 level. P: Mean oil output per well in the US H:

A: SRS (says so) Normality Independence N: ZTest
(n 30, so by the CLT, approx normal) Independence (Safe to assume more than 500 wells in the US) N: ZTest

T:

O: P(Z < ) =

O: P(Z < ) = 0.0146

M: < 0.0146 0.05 Reject the Null

S: There is enough evidence to reject the claim that the average oil output per well in the US is 11.1 barrels per day.

Mean volume of Dell computer stock
Example #2 The average daily volume of Dell computer stock in 2000 was 31.8 million shares with a standard deviation of 14.8 million shares according to Yahoo! A stock analyst claims that the stock volume in 2001 is different from the 2000 level. Based on a random sample of 35 trading days in 2001, he finds the sample mean to be 37.2 million shares. Test the analyst’s claim at the =0.01 level. P: Mean volume of Dell computer stock H:

A: SRS (says so) Normality Independence N: ZTest
(n 30, so by the CLT, approx normal) Independence (Safe to assume more than 350 trading days) N: ZTest

T:

O: 2[ P(Z < -2.16)] =

O: 2[ P(Z < -2.16)] = 2[ ] = 0.0308

M: > 0.0308 0.01 Accept the Null

S: There is not enough evidence to claim that the average daily volume of Dell stock is different from 31.8 million shares.

Duality of Confidence Intervals and Hypothesis Testing
If the confidence interval does not contain μo, we have evidence that supports the alternative hypothesis, thus we reject the null hypothesis at the  level. Note: The Confidence Interval matches the two-tailed test only!

To not reject 31.8 million shares per day
Example #3 The average daily volume of Dell computer stock in 2000 was 31.8 million shares with a standard deviation of 14.8 million shares according to Yahoo! A stock analyst claims that the stock volume in 2001 is different from the 2000 level. Based on a random sample of 35 trading days in 2001, he finds the sample mean to be 37.2 million shares. Test the analyst’s claim at the =0.01 level. What was your conclusion from this hypothesis test in Example #2? To not reject 31.8 million shares per day

Note: We already did P and A
b. Construct a 99% confidence interval for the true average daily volume of Dell Computer stock in 2001. Note: We already did P and A N: Z-Interval

I:

31.8 is in the interval, so can’t assume it is different
I am 99% confident the true mean daily Dell volume stock is between and million shares. c. Does this interval reaffirm your statistical decision from the hypothesis test? Explain. Yes, 31.8 is in the interval, so can’t assume it is different

True mean anger expression for marijuana users
Example #4 Does marijuana use affect anger expression? Assume for all non-users, the mean score on an anger expression scale is 41.5 with a standard deviation of For a random sample of 47 frequent marijuana users, the mean score was 44. Test the claim that marijuana affects the expression of anger at the =0.05 level. P: True mean anger expression for marijuana users H:

A: SRS (says so) Normality Independence N: ZTest
(n 30, so by the CLT, approx normal) Independence (Safe to assume more than 470 marijuana users) N: ZTest

T:

O: 2[ P(Z < -2.83)] =

O: 2[ P(Z < -2.83)] = 2[ ] = 0.0046

M: > 0.0046 0.05 Reject the Null

S: There is not enough evidence to claim that the average anger expression for marijuana users is 41.5. Does marijuana use affect anger expression? Yes, anger expression is different for marijuana users

b. Calculate a 95% confidence interval for the mean anger expression of frequent marijuana users. Does this interval reaffirm your statistical decision in part a? N: Z-Interval

I:

41.5 is not in the interval, so can’t assume it is the same
I am 95% confident the true mean anger expression for marijuana users is between and b. Does this interval reaffirm your statistical decision in part a? Yes, 41.5 is not in the interval, so can’t assume it is the same

11.3 – Use and Abuse of Tests 11.4 – Using Inference to Make Decisions

What  level to use? How plausible is Ho? If it represents an assumption that the people you must convince have believed for years, strong evidence (small ) will be needed. What are the consequences for rejecting Ho? To do this means you might have to make major changes to accept Ha. Consider the sample and if you need to increase the sample size or look for outliers. Is the sample a true representation of the population? Remember that a certain percent of time you won’t reject the null. (ex. 5%) Multiple testing helps to check this.

Beware of the p-values of 0.049 and the 0.051!
What  level to use? Typically ok to use 0.05 Beware of the p-values of and the 0.051!

  Type I Error Power of the test p =  p = 1 –  Type II Error p = 
Errors in Hypothesis Testing: Because a statistician must make inferences (or conclusions) based on random data that is subject to sampling errors, we can make mistakes in hypothesis testing. In fact, there are two types of errors that can be made. Ho True Ho False Reject Ho Do not Reject Ho Type I Error Power of the test p =  p = 1 –  Type II Error p =  Note: You will never have to calculate 

To reduce type II error and increase the power of the test:
Increase the sample size Increase the significance level alpha (be careful, if we choose an alpha that almost guarantees never to make a type I error, then there is a large type II error, because it would be hard to reject the null under any circumstance.

  I: Innocent and found guilty Guilty and found guilty
Example #1: In a criminal trial, the defendant is held to be innocent until shown to be guilty beyond a reasonable doubt. If we consider hypotheses H0: defendant is innocent Ha: defendant is guilty we can reject H0 only if the evidence strongly favors Ha. 1. Make a diagram that shows the truth about the defendant, and the possible verdicts and that identifies the two types of error. Which type of error is more serious? Ho True Ho False Reject Ho Do not Reject Ho I: Innocent and found guilty Guilty and found guilty Innocent and found innocent II: Guilty and found innocent Type I error is more serious

Example #1: In a criminal trial, the defendant is held to be innocent until shown to be guilty beyond a reasonable doubt. If we consider hypotheses H0: defendant is innocent Ha: defendant is guilty we can reject H0 only if the evidence strongly favors Ha. 2. Is this goal better served by a test with  = 0.20 or a test with  = 0.01? Explain your answer.  = 0.01 because the probability of a Type I error would be smaller then Ho True Ho False Reject Ho Do not Reject Ho I: Innocent and found guilty Guilty and found guilty Innocent and found innocent Guilty and found innocent II:

The ability to find a person guilty that is in fact guilty
Example #1: In a criminal trial, the defendant is held to be innocent until shown to be guilty beyond a reasonable doubt. If we consider hypotheses H0: defendant is innocent Ha: defendant is guilty we can reject H0 only if the evidence strongly favors Ha. 3. Explain what is meant by the power of the test in this setting. The ability to find a person guilty that is in fact guilty Ho True Ho False Reject Ho Do not Reject Ho I: Innocent and found guilty Guilty and found guilty power Innocent and found innocent Guilty and found innocent II:

Example #2: For each of the following samples, state the null and alternative hypotheses, Identify when a Type I and a Type II Error would occur. A company specializing in parachute assembly claims that its competitor’s main parachute failure rate is more than 1%. You perform a hypothesis test to determine whether the company’s claim is true. Which error is more serious? Ho: The main parachute failure rate is 1% Ha: The main parachute failure rate is more than 1%

  I: II: Type II error is more serious
A company specializing in parachute assembly claims that its competitor’s main parachute failure rate is more than 1%. You perform a hypothesis test to determine whether the company’s claim is true. Which error is more serious? Ho: The main parachute failure rate is 1% Ha: The main parachute failure rate is more than 1% Ho True Ho False Reject Ho Do not Reject Ho I: Failure rate is 1% and think its more than 1% Failure rate is not 1% and think its more than 1% Failure rate is 1% and think it is 1% II: Failure rate is not 1% and think it is 1% Type II error is more serious

Example #2: For each of the following samples, state the null and alternative hypotheses, Identify when a Type I and a Type II Error would occur. b. A company that produces snack foods uses a machine to package 454 gram bags of pretzels. If it is working properly, the bags will be exactly 454 grams. You perform a hypothesis test to determine whether the company is packaging the right amount of grams per bag. Ho: There is 454 grams of pretzels are in the bag Ha: There is not 454 grams of pretzels are in the bag

b. A company that produces snack foods uses a machine to package 454 gram bags of pretzels. If it is working properly, the bags will be exactly 454 grams. You perform a hypothesis test to determine whether the company is packaging the right amount of grams per bag. Ho: There is 454 grams of pretzels are in the bag Ha: There is not 454 grams of pretzels are in the bag Ho True Ho False Reject Ho Do not Reject Ho I: Not 454g in bag, and don’t think 454g 454 grams in bag and don’t think 454g 454 grams in bag and think 454g. II: Not 454g in bag and think 454g.