Chapter 12: Testing hypotheses about single means (z and t) Example: Suppose you have the hypothesis that UW undergrads have higher than the average IQ.

Slides:



Advertisements
Similar presentations
Comparing One Sample to its Population
Advertisements

Significance Testing Chapter 13 Victor Katch Kinesiology.
Review: What influences confidence intervals?
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
t scores and confidence intervals using the t distribution
Confidence intervals using the t distribution. Chapter 6 t scores as estimates of z scores; t curves as approximations of z curves Estimated standard.
T-Tests Lecture: Nov. 6, 2002.
8-2 Basics of Hypothesis Testing
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Hypothesis Tests Regarding a Parameter 10.
Chapter 9 Hypothesis Testing.
PSY 307 – Statistics for the Behavioral Sciences
T scores and confidence intervals using the t distribution.
1 Confidence Interval for Population Mean The case when the population standard deviation is unknown (the more common case).
Hypothesis Testing Using The One-Sample t-Test
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Probability Population:
Chapter 11: Random Sampling and Sampling Distributions
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
The t-test Inferences about Population Means when population SD is unknown.
One Sample Z-test Convert raw scores to z-scores to test hypotheses about sample Using z-scores allows us to match z with a probability Calculate:
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Overview of Statistical Hypothesis Testing: The z-Test
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Overview Definition Hypothesis
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Descriptive statistics Inferential statistics
Tuesday, September 10, 2013 Introduction to hypothesis testing.
Fundamentals of Hypothesis Testing: One-Sample Tests
Chapter 8 Hypothesis Testing. Section 8-1: Steps in Hypothesis Testing – Traditional Method Learning targets – IWBAT understand the definitions used in.
Section #4 October 30 th Old: Review the Midterm & old concepts 1.New: Case II t-Tests (Chapter 11)
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Lecture 3: Review Review of Point and Interval Estimators
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Single Sample Inferences
Week 8 Chapter 8 - Hypothesis Testing I: The One-Sample Case.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Chapter 9: Testing Hypotheses
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
1 Lecture note 4 Hypothesis Testing Significant Difference ©
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
DIRECTIONAL HYPOTHESIS The 1-tailed test: –Instead of dividing alpha by 2, you are looking for unlikely outcomes on only 1 side of the distribution –No.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Section 8-2 Basics of Hypothesis Testing.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Psych 230 Psychological Measurement and Statistics
Chapter 8 Parameter Estimates and Hypothesis Testing.
Example You give 100 random students a questionnaire designed to measure attitudes toward living in dormitories Scores range from 1 to 7 –(1 = unfavorable;
Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.
Testing Differences between Means, continued Statistics for Political Science Levin and Fox Chapter Seven.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Sampling Distribution (a.k.a. “Distribution of Sample Outcomes”) – Based on the laws of probability – “OUTCOMES” = proportions, means, test statistics.
Hypothesis test flow chart
Welcome to MM570 Psychological Statistics Unit 5 Introduction to Hypothesis Testing Dr. Ami M. Gates.
 What is Hypothesis Testing?  Testing for the population mean  One-tailed testing  Two-tailed testing  Tests Concerning Proportions  Types of Errors.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Slide Slide 1 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
Review: What influences confidence intervals?
Presentation transcript:

Chapter 12: Testing hypotheses about single means (z and t) Example: Suppose you have the hypothesis that UW undergrads have higher than the average IQ than the US population. You know that IQ’s of the whole population of are normally distributed with a mean of 100 and a standard deviation of 15. How would you test your hypothesis? The solution is to obtain a random sample of IQs from the UW population, calculate the mean, and compare it to 100. Let’s say we measure the IQs of 25 students and obtain a mean of 106 points.

Is a mean of 106 points really different from 100? We need to compare this mean from our sample and compare it to the US population mean of 100 and ask the question: What if the population of UW students really has a mean IQ of 100 like the US population. How unlikely would it be for us to make this observation by chance? More specifically, how likely would it be for us to draw a mean that differs from 100 by more than 6 points by chance? If it’s sufficiently unlikely, we’d consider this evidence in favor of our hypothesis that UW students have higher IQs.

More formally, we call the thing we’re trying to prove wrong the null hypothesis (H 0 ), and the thing we’re trying to show to be true the alternative hypothesis (H A ). In our example, the null hypothesis is that there is not a difference between the mean IQ scores of UW students and the US population. The alternative hypothesis is that UW students have a higher mean IQ than the US population. We compute a statistic from a sample and determine how probable our observed statistic should occur if the null hypothesis is true. If this probability is sufficiently low, we reject the null hypothesis. Our criterion for the probability for rejection is called the ‘alpha (  ) value’. Choosing a value of alpha is both complicated and somewhat arbitrary (more on this later). But typically values are  = 0.05 or a =0.01. An alpha value of 0.05 means that there less than a probability of.05 (1 in 20) that we’d observe our sample statistic if the null hypothesis were true.

Step 1: Define the target population. This is the population that we want to make an inference about. In this case, we want to make an inference about UW undergrad IQs Step 2: Specify the null hypothesis (H 0 ). This is the hypothesis we hope to reject. In our example, our null hypothesis is that UW students have a mean IQ of 100. We write this as: H 0 :  x = 100 Step 3: Specify the alternative hypothesis (H A ). We must choose between a directional (‘one-tailed’) or non-directional (‘two-tailed’) test here. In our example we are expecting (hoping) for an IQ that is greater than the population, so this is a directional, or one-tailed test. H A :  x > 100 Here is a step-by-step recipe for hypothesis testing using our UW IQ example.

Step 4: Specify the ‘level of significance’ (  ) to be use as a criterion for decision. This is the probability criterion for which we will reject the null hypothesis by chance if it is actually true. We’ll chose  =.05 for this example. Step 5: Decide on a sample size (n) and draw a random sample from our target population. In our example, our sample had 25 students. Step 6: Calculate your statistic on your sample (the mean in our example). In our example, we obtained a mean IQ of 106 points Step 7: Convert your statistic into standard units with respect to your null hypothesis. In our example, we’ll calculate the z-score with a standard error of the mean:

Step 8: Reject H 0 if our observed mean is located in the ‘region of rejection’. For the standard normal (z) distribution, the region of rejection is the upper tail containing a proportion of area equal to  =.05. Looking this up in Table A (Column C), this corresponds to a value of z = Our observed mean corresponds to z=2, which is within the region of rejection. This means that our observation would be unlikely if the null hypothesis were true. We therefore reject the null hypothesis. We say that “our study shows that UW students have statistically significantly higher IQs than the US population using criterion value of  =.05.” z=2 area =  = z score

z=2 area =  =.01 What if we had chosen a criterion of  =.01 instead of.05? This corresponds to a rejection region for values of z greater than Our observation of z=2 does not fall into this region, so in this case we would fail to reject H 0. If our choice of criterion (  ) seems arbitrary, that’s because it is. To give the reader more information, we can report the probability of our observation under the null hypothesis. In our example, this is the area under the curve above z=2, or Pr(z>2) = This value is often called the p-value, and we write p = Note that this p-value falls between our two  values of 0.05 and 0.01.

Another example: Suppose we have a drug that we think can influence IQ values. How would we test if this drug has an effect? Step 1: Define the target population. We’ll be randomly sampling from the US population this time. Step 2: Specify the null hypothesis (H 0 ). Like before, our null hypothesis is H 0 :  x = 100 Step 3: Specify the alternative hypothesis (H A ). By ‘influence’ we’re not specifically predicting an increase (or decrease) in IQ. So we’ll use a two-tailed test and write: H A :  x ≠ 100. Step 4: Specify the ‘level of significance’ (  ) to be use as a criterion for decision. We’ll chose  =.05 again.

Step 6: Calculate your statistic on your sample (the mean in our example). Suppose obtained a mean IQ of 96 from our 100 subjects. Step 7: Convert your statistic into standard units with respect to your null hypothesis. In our example, we’ll calculate the z-score with a standard error of the mean: Step 5: Decide on a sample size (n) Let’s run our experiment on 100 subjects.

Step 8: Reject H 0 if our observed mean is located in the ‘region of rejection’. We want to find the values of z that have an area of  /2=.025 in each tail. This corresponds to a values of z= ±1.96 area =  = z score z = Our observed mean corresponds to z=-2.67, which is within the region of rejection. We therefore reject the null hypothesis. We conclude that our drug has a significant influence on IQ values at a criterion level of  =.05.

p-values: Calculating a p-value for a two-tailed test corresponds to calculating the area under the standard normal in both the positive and negative directions away from the absolute value of our observed z. z=-2.67z=+2.67 The area below z=-2.67 is.0038, which is the same as the area above z= So our p-value is p=.0038 x 2 = area = z score The p-value is the probability of rejecting H 0 when it is actually true.

area =  =.05 z=-2.67 Note that if we had decided ahead of time to use a one-tailed test, with an alternative hypothesis of H A :  x > 100 our region of rejection for  =.05 would include values of z greater than In this case, we would have failed to reject H 0 and would conclude that our drug did not significantly increase IQs at a criterion level of  =.05.

Example: Suppose the speed of safe exams has a population that is normally distributed with a standard deviation of 6. Without anything better to do, you sample 19 safe exams from this population and obtain a mean speed of 67.2 and a standard deviation of Using an alpha value of α = 0.01, is this observed mean significantly different than an expected speed of 69?

What if we don’t know the standard deviation of the population from which we obtained our sample? This is a much more common situation. How do we estimate this value? Common sense says that we’d use the standard deviation of our sample as an estimate of the population’s standard deviation (and therefore use the standard error of the mean of our sample as an estimate of the population’s standard error of the mean). This is generally correct, but we have to make two changes: 1)We need to change our formula for the standard deviation to use n-1 instead of n. 2) To get our estimate of the population’s standard error of the mean, we still divide by the square root of our sample size: The t-distribution: when we don’t know 

2) Our standardized measure no longer comes from a normal distribution. Instead, it’s called a ‘t-distribution’ What happened to our normal distribution? Note that now the mean and the standard error of the mean both vary for different samples. This increases the probability of very high and low values which fattens the distribution compared to normal. normal distribution (z) (n=∞) n=12 n=4 n= t

Unlike our standard normal distribution, our t-distributions are a ‘family’ of curves, one for each sample size (n). We label each family member not by sample size but by ‘degrees of freedom (df)’, which is equal to n-1 for the examples we’re doing here (comparing a single mean to an expected population mean). normal distribution (z) (n=∞) df =11 df=3 df= t

Example: The mean height of the 72 women in our class is 64.5 inches with a standard deviation of 3.28 inches. Is this significantly taller than 64 inches, which is the average height of a woman in the US?

Step 1: Define the target population. We are interested in the heights of the women in our class. Step 2: Specify the null hypothesis (H 0 ). Our null hypothesis is H 0 :  x = 64 Step 3: Specify the alternative hypothesis (H A ). We’ll use a one-tailed test, since we’re asking if our mean is taller H A :  x > 64. Step 4: Specify the ‘level of significance’ (  ) to be use as a criterion for decision. We’ll chose  =.05 again.

Step 6: Calculate your statistics on your sample (the mean in our example). Our sample mean is 64.5 inches and our sample standard deviation is 3.28 inches Step 7: Convert your statistic into standard units with respect to your null hypothesis. Since we don’t know the population standard deviation, we’ll use our sample standard deviation and the t-distribution with 72-1 =71 degrees of freedom: Step 5: Decide on a sample size (n) We have a sample of 72 women

Step 8: Reject H 0 if our observed mean is located in the ‘region of rejection’. We will use table D which contains rejection regions for the t-distribution. This is the area in one tail for  =.05 and df = 71. The nearest df is 70. Our region of rejection is for values of t greater than area =  =.05 t= 1.29 Our observed mean is not within the region of rejection. We therefore fail to reject the null hypothesis. We conclude that the average height of women in our class is not significantly different from that of the US population at a criterion of  = t

t (df = 71) area: Calculating p-values using table D is pretty crude since we only have a limited set of alpha values to choose from. It turns out that our value of t = 1.29 with df = 71 is very close to the critical value of t for  =.10. For this example, our p-value is close to 0.1. But we can always use our t-statistic calculator in the Excel spreadsheet. This means that if we drew a random sample of 72 heights from women the US population, there is about a 10% chance that we’d observe a mean as high or higher than the mean of our class. Our mean is therefore above average, but not exceptionally so. t= 1.29

Example: The 21 men in our class have a mean height of 70.3 inches with a standard deviation of 2.61 inches. Is this significantly different from 69.5 inches, the average height of a man in the US? Step 1: Define the target population. We are interested in the heights of the men in our class. Step 2: Specify the null hypothesis (H 0 ). Our null hypothesis is H 0 :  x = 69.5 Step 3: Specify the alternative hypothesis (H A ). We’ll use a two-tailed test, since we’re asking if our mean is different H A :  x ≠ Step 4: Specify the ‘level of significance’ (  ) to be use as a criterion for decision. We’ll chose  =.05 again.

Step 6: Calculate your statistics on your sample (the mean in our example). Our sample mean is 70.3 inches and our sample standard deviation is 2.61 inches Step 7: Convert your statistic into standard units with respect to your null hypothesis. Since we don’t know the population standard deviation, we’ll use our sample standard deviation and the t-distribution with 21-1=20 degrees of freedom: Step 5: Decide on a sample size (n) We have a sample of 21 men

Step 8: Reject H 0 if our observed mean is located in the ‘region of rejection’. We will use table D which contains rejection regions for the t-distribution. This is the area covering two tails for  =.05 and df = 28. The critical t-value for two tails with  =.05 is the same as the critical t-value for one tail with  =.025/2 = This is because for two tails, our total area of.05 needs to be split into two halves. Our observed mean is not within the region of rejection. We therefore fail to reject the null hypothesis. We conclude that the average height of men in our class is not significantly different from that of the US population at a criterion of  = area = area = t (df=20) Our region of rejection is for values of t greater than 2.09 or less than

Looking at table D, our observed value of t= for df = 20 falls outside the rejection region for an alpha value of 0.5 (two-tailed). This means that our p-value is less than 0.5. The true p-value from our t-test calculator is =

Male FreshmanFemale Freshman Mean3.1 pounds3.5 sd n Example: in the news. "Freshman 15" weight gain is a myth, new study finds Reuters - The idea that college freshmen gain an average of 15 pounds in their first year of school is a myth -- the average is really between 2.4 pounds for women and 3.4 pounds for men, the co-author of a new study said Tuesday. "Not only is there not a 'Freshman 15,' there doesn't appear to be even a 'college 15' for most students," said Jay Zagorsky, research scientist at Ohio State University's Center for Human Resource Research and co-author of a study on college weight gain. Here’s a table of weight gain (in pounds) from the actual publication: Zagorsky & Smith, Social Science Quarterly, 2011

Male FreshmanFemale Freshman Mean3.1 pounds3.5 sd n Let’s look at the women. We don’t know the population standard deviation, so we’ll use a t-test. H 0 :  x = 15 H A :  x ≠ 15 If we use  =.01, then with n-1 = 2150 degrees of freedom, our critical value of t for a nondirectional (two-tailed) test is +/ Our observed value of t falls way into the rejection region, so we conclude that college freshmen do not gain 15 lbs. “Our results indicate that the “Freshman 15” is a media myth. While freshmen do gain weight, the observed average increase of 2.5 to 3.5 pounds falls far short of the ominous 15 pounds.”