Hypothesis test flow chart

Slides:



Advertisements
Similar presentations
Chapter 12: Testing hypotheses about single means (z and t) Example: Suppose you have the hypothesis that UW undergrads have higher than the average IQ.
Advertisements

Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
Statistics for the Social Sciences
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Statistics for the Social Sciences Psychology 340 Fall 2006 Review For Exam 1.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 10 Notes Class notes for ISE 201 San Jose State University.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 11: Power.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Hypothesis Testing Using The One-Sample t-Test
Hypothesis Testing: Two Sample Test for Means and Proportions
Chapter 11: Random Sampling and Sampling Distributions
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Inferential Statistics
Chapter Ten Introduction to Hypothesis Testing. Copyright © Houghton Mifflin Company. All rights reserved.Chapter New Statistical Notation The.
AM Recitation 2/10/11.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Hypothesis Testing:.
Overview of Statistical Hypothesis Testing: The z-Test
Chapter 13 – 1 Chapter 12: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Errors Testing the difference between two.
Overview Definition Hypothesis
Descriptive statistics Inferential statistics
Introduction to Hypothesis Testing for μ Research Problem: Infant Touch Intervention Designed to increase child growth/weight Weight at age 2: Known population:
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Hypothesis Testing II The Two-Sample Case.
Mid-semester feedback In-class exercise. Chapter 8 Introduction to Hypothesis Testing.
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Tuesday, September 10, 2013 Introduction to hypothesis testing.
Sampling Distributions and Hypothesis Testing. 2 Major Points An example An example Sampling distribution Sampling distribution Hypothesis testing Hypothesis.
Chapter 8 Introduction to Hypothesis Testing
Chapter 8 Hypothesis Testing. Section 8-1: Steps in Hypothesis Testing – Traditional Method Learning targets – IWBAT understand the definitions used in.
Confidence Intervals and Hypothesis Testing
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
The Hypothesis of Difference Chapter 10. Sampling Distribution of Differences Use a Sampling Distribution of Differences when we want to examine a hypothesis.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Overview Basics of Hypothesis Testing
Week 8 Chapter 8 - Hypothesis Testing I: The One-Sample Case.
Chapter 8 Hypothesis Testing I. Chapter Outline  An Overview of Hypothesis Testing  The Five-Step Model for Hypothesis Testing  One-Tailed and Two-Tailed.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Chapter 9: Testing Hypotheses
Chapter 8 Introduction to Hypothesis Testing
1 Lecture note 4 Hypothesis Testing Significant Difference ©
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
Large sample CI for μ Small sample CI for μ Large sample CI for p
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
DIRECTIONAL HYPOTHESIS The 1-tailed test: –Instead of dividing alpha by 2, you are looking for unlikely outcomes on only 1 side of the distribution –No.
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Psych 230 Psychological Measurement and Statistics
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Chance Models, Hypothesis Testing, Power Q560: Experimental Methods in Cognitive Science Lecture 6.
Sampling Distribution (a.k.a. “Distribution of Sample Outcomes”) – Based on the laws of probability – “OUTCOMES” = proportions, means, test statistics.
Psych 230 Psychological Measurement and Statistics Pedro Wolf October 21, 2009.
Chapter 3 Normal Curve, Probability, and Population Versus Sample Part 2 Aug. 28, 2014.
CHAPTER 7: TESTING HYPOTHESES Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Chapter 9 Introduction to the t Statistic
Statistics for the Social Sciences
Hypothesis Testing.
Presentation transcript:

Hypothesis test flow chart START HERE number of correlations number of variables χ2 test for independence (19.9) Table I Test H0: r=0 (17.2) Table G 1 correlation (r) frequency data 2 Measurement scale 2 Means 1 basic χ2 test (19.5) Table I Test H0: r1= r2 (17.4) Tables H and A Do you know s? number of means number of factors z -test (13.1) Table A Yes 1 More than 2 2 2-way ANOVA Ch 21 Table E No 2 1 independent samples? 1-way ANOVA Ch 20 Table E t -test (13.14) Table D Yes No Test H0: m1= m2 (15.6) Table D Test H0: D=0 (16.4) Table D

Chapter 13: Interpreting the Results of Hypothesis Testing ‘statistically significant’ does not mean ‘important’ IQ’s of UW undergraduates Suppose we measured the IQ’s of 10,000 UW undergraduates and found a mean IQ of 100.3. If we were to conduct a one-tailed z-test to determine if this mean is greater than the US population that has a mean of 100 and a standard deviation of 15. -3 -2 -1 1 2 3 z We’d find that we could reject H0 with a=.05. But is a difference of 0.3 IQ points important? area = a = .05 z=2

If you want to read a lot about statistically significant results that may or may not be important…

Some journals require the authors to report the ‘effect size’, along with the outcomes of statistical tests to let the reader interpret whether the effect is ‘big’ enough to be important. Remember, to calculate t, we divide by the standard error of the mean: But the standard error of the mean shrinks with increasing n. We need a measure of the size of the difference between our observation and the null hypothesis that doesn’t depend on experimental parameters like n.

Effect size: the difference between our observation and the null hypothesis in terms of standard deviations. Formally: effect size is “an estimate of the degree to which the treatment effect is present in the population, expressed as a number free of the original measurement unit”. One example of effect size is Cohen’s d: Where mhyp is the mean for the null hypothesis. This is just like converting the sample mean to a z score. A more common example is Hedge’s g, which is used when we don’t know the standard deviation of the population. It’s our best estimate of Cohen’s d: This is just like calculating a value for the t-distribution except we divide by sX instead of the standard error of the mean 𝑠 𝑥

Back to our made-up IQ example where we had a mean of 100 Back to our made-up IQ example where we had a mean of 100.3 and a standard deviation of 15 The effect size is: The study found that UW IQ’s are only 0.02 standard deviations above 100. This is a small effect size, even though it is statistically significant.

0.8 0.8 is large, 0.5 is medium 0.2 is small. 0.5 Reporting effect size has the advantage that since it doesn’t depend on n, the value is more easily compared across studies. 0.8 0.5 0.2 A conventional interpretation of effect size is that (in absolute value): 0.8 is large, 0.5 is medium 0.2 is small.

Example: What is the effect size for the ‘Freshman 15’ example (for women)? Male Freshman Female Freshman Mean 3.1 pounds 3.5 sd 10.1 10.3 n 2536 2151 This would be considered large effect size.

Example: What about comparing this result to the null hypothesis that there was no weight gain the first year (H0: mx =0) ? Male Freshman Female Freshman Mean 3.1 pounds 3.5 sd 10.1 10.3 n 2536 2151 This would be considered small to medium effect size – even though it is highly statistically significant. (t = 16.76, you can do the t-test yourself as an exercise).

There are two unavoidable types of errors in hypothesis testing: type I and type II errors. True state of the world HO is true HO is false Correctly fail to reject HO (1-a) Type II Error (b) Fail to reject HO Decision based on your sample Type I Error (a) Correctly reject H0 (1-b = power) Reject HO A Type I error is when we reject H0 when it is actually true. Pr(Type I error) = a A Type II error is when we fail to reject H0 even though it false. Pr(Type II error) = b More commonly, we talk about the probability of correctly rejecting H0, The probability of this happening is called power: Power = Pr(correct rejection of HO) = 1-b.

True state of the world HO is true HO is false Correctly fail to reject HO (1-a) Type II Error (b) Fail to reject HO Decision based on your sample Type I Error (a) Correctly reject H0 (1-b = power) Reject HO

Alpha (a) is therefore the probability that a Type I error will occur. Type I errors (a) A Type I error occurs when our statistic (z or t) falls within the region or rejection even though the null hypothesis is true. For example, for a one-tailed z-test using a = .05, the distribution of z scores and the rejection regions look like this: -4 -3 -2 -1 1 2 3 4 z score Pr(Type I error) = a Alpha (a) is therefore the probability that a Type I error will occur.

Type II errors At Type II error happens when the null hypothesis is false but you fail to reject it anyway. To calculate the probability of a type II error, we need to know the true distribution of the population. This is weird because the true distribution of the population is the thing we’re trying to figure out in the first place.

Type II errors: beta (b) and power (1-b) Type II errors happen only if the null hypothesis is false. For example, suppose we’re conducting a one-tailed z-test with a = .05, and the true population mean has a mean z score of 1 (mtrue = 1). We still use the same critical value that we did under the null hypothesis. But now the distribution of z-values is centered around z=1. mtrue = 1 Zcrit = 1.645 mhyp = 0 1-b = power (blue shaded area) a = Pr(type I error) (red shaded area) -3 -2 -1 1 2 3 4 5 z-score The blue shaded region is the probability of correctly rejecting the null hypothesis. Type II errors happen when z falls outside the rejection region, so the probability of making a Type II error is 1- blue shaded area.

Type II errors: beta (b) mtrue = 1 Zcrit = 1.645 mhyp = 0 1-b = power (blue shaded area) a (red shaded area) -3 -2 -1 1 2 3 4 5 z-score Calculating power, the probability of correctly rejecting HO 1) Find the rejection region under the null hypothesis: With a = .05, zcrit = 1.645 (Table A, column C), so the rejection region is z>1.645 2) The new rejection region will by shifted down by utrue – uhyp = 1 1.645-1 = 0.645, so the new rejection region is z>0.645 3) Find the area in the new rejection region The power is the area for z above 0.645 is .2611 (Table A, Column C)

1-b = power (blue shaded area) mtrue = 1 Zcrit = 1.645 mhyp = 0 1-b = power (blue shaded area) a (red shaded area) -3 -2 -1 1 2 3 4 5 z-score Power is the probability of correctly rejecting the null hypothesis, which is the area in the rejection region. Power in this example is: Pr(z>0.645) = 1-b = .2611 More power is good. Power is the probability of correctly finding an effect in your experiment. A ‘desirable’ level of power is .8

Example: Between 1930 and 1980 the year-to-year average temperature in the Northern Hemisphere varied with a standard deviation of 0.2482 degrees. Consider this to be the population standard deviation for temperatures. Suppose we wanted to calculate the mean of the temperatures over a random sample of 12 years. What is the standard error of this mean? -0.2 -0.1 0.1 0.2 Temperature (deg)

What temperature above the mean exceeds 99% of all temperatures? Example: Between 1930 and 1980 the year-to-year average temperature in the Northern Hemisphere varied with a standard deviation of 0.2482 degrees. Consider this to be the population standard deviation for temperatures. (These values were taken from the NOAA website) What temperature above the mean exceeds 99% of all temperatures? -0.2 -0.1 0.1 0.2 Temperature (deg) z = .1668 From Table A, Pr(z>2.33) = .01 x = (2.33)(.0716) = .1668 degrees

x = (2.33)(.0716) = .1668 degrees If we were to calculate the temperature averaged over the 12 years from 2000 to 2011, there is a 1% chance that we’d find an increase of .1668 degrees or more from 1930 to 1980. In other words, if null hypothesis is true, that the temperature has not increased since the period 1930 to 1980, we’d make a Type I error if we found an average temperature increase of .1668 degrees or more. The probability of this type I error is 1% -0.2 -0.1 0.1 0.2 Temperature (deg) z = .1668

Pr(x<.1668) = Pr(z< (.1668-.3)/.0716)) = Pr(z<-1.86) = .0314 x = (2.33)(.0716) = .1668 degrees Now suppose that you actually measured that the average temperature over the years 2000 to 2011 has increased by 0.3 degrees above the average from 1930 to 1980. If we assume an alpha value of .01, what is the probability that this is a type II error? In other words, given a new normal distribution with standard deviation of .1668 and a mean of 0.3, what is the probability that a sample will be less than the old critical value of .1668? 0.1 0.2 0.3 0.4 0.5 Temperature (deg) z = .1668 Pr(x<.1668) = Pr(z< (.1668-.3)/.0716)) = Pr(z<-1.86) = .0314

Pr(correct rejection of Ho) is 1-Pr(Type II error) = 1-.0314 = .9686 That is, if the real temperature increase is 0.3 degrees, we would fail to detect it 3.14% of the time. If the probability of a type II error is.0314, then what is the probability that we will correctly detect a temperature change of 0.3 degrees or higher? Pr(correct rejection of Ho) is 1-Pr(Type II error) = 1-.0314 = .9686 The probability of a correct rejection of Ho is the power of the test. The power here is .9686. We will correctly reject Ho 96.86% of the time. z = .1668 0.1 0.2 0.3 0.4 0.5 Temperature (deg)

Not quite impossible, but effectively so. We just showed that we have a 96.86% chance of correctly detecting a 0.3 average temperature increase since 2000. The actual measured temperature increase has been 1.67 degrees since 1930-1970. The probability of this happening under the null hypothesis is Pr(z < -21). Not quite impossible, but effectively so. 1930 1940 1950 1960 1970 1980 1990 2000 2010 -1 -0.5 0.5 1 1.5 2 2.5 Years Relative temperature (deg)

Example: IQs are normally distributed with a mean of 100 and a standard deviation of 15. Suppose you sampled 100 students and calculated a sample mean and are about to test for a significant increase in IQ using a one-tailed z-test using a=.05. What is the power of this test under the assumption that the true population mean for the group that we’re sampling is 103? Answer: First, we’ll convert everything to z-scores. This makes mhyp = 0 (always), and

1) Find the critical value of t under null hypothesis: To calculate power: 1) Find the critical value of t under null hypothesis: With a = .05, zcrit = 1.64 (Table A, column C), so the rejection region is z > 1.64 2) The new rejection region will by shifted over by utrue – uhyp = 2-0 = 2 z > 1.64-2, which is z `> -.36 3) Find 1-b, the area in the new rejection region Pr(z > -.36) = .1406+.5 = .6406 A power of .6406 means that there is a 64.06% chance of correctly rejecting the null hypothesis (or not making a type II error). power = 1-b = .6406 -4 -3 -2 -1 1 2 3 4 z-score

Things that affect power: Variability of the measure mtrue = 1.0 Power increases as the standard error of the mean decreases. Ways to decrease the standard error of the mean: Increase the sample size Make more accurate measurements

Things that affect power: level of significance (a) a=? mtrue = 1.0 Power decreases as alpha (a) decreases. -3 -2 -1 1 2 3 4 5 z score power =0.2595 -3 -2 -1 1 2 3 4 5 z score power =0.1685 a=.05 a=.025 -3 -2 -1 1 2 3 4 5 z score power =0.0924 -3 -2 -1 1 2 3 4 5 z score power =0.0183 a=.01 a=.001 This is a classic tradeoff: The less willing we are to make a Type I error, the more likely we are going to make a Type II error.

mtrue is the one thing we don’t know (but want to estimate). Things that affect power: difference between utrue and uhyp a=.05 mtrue = ? Power increases with effect size: as the difference between means for the true population and the null hypothesis increases. -3 -2 -1 1 2 3 4 5 z score power =0.0815 -3 -2 -1 1 2 3 4 5 z score power =0.1261 mtrue = 0.25 mtrue = 0.5 -3 -2 -1 1 2 3 4 5 z score power =0.2595 -3 -2 -1 1 2 3 4 5 6 7 8 z score power =0.9907 mtrue = 4.0 mtrue = 1.0 We don’t have control over this: mtrue is the one thing we don’t know (but want to estimate).

Power curve: shows how power increases with effect size Two-tail a=.05 1 Sample size = 50 0.9 0.8 0.7 0.6 Power 0.5 0.4 0.3 0.2 0.1 0.2 0.4 0.6 0.8 1 Effect size (d)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 n=8 10 12 15 20 25 30 40 50 75 100 150 250 500 1000 Effect size (d) Power a = 0.01, 1-tail, 1 mean

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 n=8 10 12 15 20 25 30 40 50 75 100 150 250 500 1000 Effect size (d) Power a = 0.05, 1-tail, 1 mean

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 n=8 10 12 15 20 25 30 40 50 75 100 150 250 500 1000 Effect size (d) Power a = 0.01, 2-tails, 1 mean

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 n=8 10 12 15 20 25 30 40 50 75 100 150 250 500 1000 Effect size (d) Power a = 0.05, 2-tails, 1 mean

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 n=8 10 12 15 20 25 30 40 50 75 100 150 250 500 1000 Effect size (d) Power a = 0.01, 1-tail, 2 means

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 n=8 10 12 15 20 25 30 40 50 75 100 150 250 500 1000 Effect size (d) Power a = 0.05, 1-tail, 2 means

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 n=8 10 12 15 20 25 30 40 50 75 100 150 250 500 1000 Effect size (d) Power a = 0.01, 2-tails, 2 means

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 n=8 10 12 15 20 25 30 40 50 75 100 150 250 500 1000 Effect size (d) Power a = 0.05, 2-tails, 2 means

Example: Suppose we’re conducting a two-tailed t-test with one mean with a = .05 with a sample size of n=50. How much of an effect size do we need to obtain a power value of 0.8? Answer: Looking at the appropriate family of power curves, the curve with n=50 passes through a power value of 0.8 when the effect size is 0.4. Example: Suppose we’re conducting a one-tailed t-test with one mean with a = .01 and we have an effect size of 0.6. How large of a sample size do we need to get a power value of 0.8? Answer: Looking at the appropriate family of power curves, looking at a power value of 0.4, the curve with n=30 passes through a power value of 0.8.

Example: You decide to sample the test scores of 63 dazzling cats from a population and obtain a mean test scores of 25.6 and a standard deviation of 2.77. Using an alpha value of α = 0.01, is this observed mean significantly different than an expected test scores of 25? What is the effect size? What is the power?

Example: You decide to sample the test scores of 63 dazzling cats from a population and obtain a mean test scores of 25.6 and a standard deviation of 2.77. Using an alpha value of α = 0.01, is this observed mean significantly different than an expected test scores of 25? What is the effect size? What is the power? Answer: (Two tailed t-test for one mean) We fail to reject H0 (t(62) = 1.72, tcrit = ±2.6575). The test scores of dazzling cats is not significantly different than 25. Effect size: 0.2166 Power = 0.1759

Power curves for independent samples t-test (2 means): Remember, the power of a test is the probability of correctly rejecting H0 when it is false. Power depends on a, nX, nY, and the effect size.

Example (again): The heights of the 45 students in Psych 315 with fathers above 70 inches has a mean of 66.8 inches and a standard deviation of 4.14 inches. The heights of the remaining 51 students has a mean of 64.9 and a standard deviation of 3.62 inches. What is the effect size? (use a = .05, two tailed) This is a medium effect size. Remember it was a ‘significant’ t-test. Assuming that our observed mean is the true population mean, what is the power of this test? How large of a sample would we need to obtain an effect size of 0.8?

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 n=8 10 12 15 20 25 30 40 50 75 100 150 250 500 1000 Effect size (d) Power a = 0.05, 2-tails, 2 means For an effect size of 0.49 and sample sizes of about 50, the power is about .69. We’d need between 50 and 75 subjects per group to get an effect size of 0.8.

Example: Suppose we’re conducting a two-tailed t-test with a = Example: Suppose we’re conducting a two-tailed t-test with a = .05 with a sample size of n=50. How much of an effect size do we need to obtain a power value of 0.8? Answer: Looking at the appropriate family of power curves, the curve with n=50 passes through a power value of 0.8 when the effect size is 0.4. Example: Suppose we’re conducting a one-tailed t-test with a = .01 and we have an effect size of 0.6. How large of a sample size do we need to get a power value of 0.8? Answer: Looking at the appropriate family of power curves, looking at a power value of 0.4, the curve with n=30 passes through a power value of 0.8.

Example: You decide to sample the test scores of 63 dazzling cats from a population and obtain a mean test scores of 25.6 and a standard deviation of 2.77. Using an alpha value of α = 0.01, is this observed mean significantly different than an expected test scores of 25? What is the effect size? What is the power?

Example) You decide to sample the test scores of 63 dazzling cats from a population and obtain a mean test scores of 25.6 and a standard deviation of 2.77. Using an alpha value of α = 0.01, is this observed mean significantly different than an expected test scores of 25? What is the effect size? What is the power? Answer) (Two tailed t-test for one mean) We fail to reject H0 (t(62) = 1.72, tcrit = ±2.6575). The test scores of dazzling cats is not significantly different than 25. Effect size: 0.2166 Power = 0.1759