THE DISTRIBUTION OF SAMPLE MEANS How samples can tell us about populations.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Hypothesis testing Week 10 Lecture 2.
Review: What influences confidence intervals?
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
Statistics for the Social Sciences
Fundamentals of Hypothesis Testing. Identify the Population Assume the population mean TV sets is 3. (Null Hypothesis) REJECT Compute the Sample Mean.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Chapter Sampling Distributions and Hypothesis Testing.
Major Points An example Sampling distribution Hypothesis testing
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Statistics for the Social Sciences Psychology 340 Spring 2005 Hypothesis testing.
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
8-2 Basics of Hypothesis Testing
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
BCOR 1020 Business Statistics
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
Statistics for the Social Sciences
Overview of Statistical Hypothesis Testing: The z-Test
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
Chapter 10 Hypothesis Testing
Overview Definition Hypothesis
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Presented by Mohammad Adil Khan
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Introduction to Biostatistics and Bioinformatics
Tuesday, September 10, 2013 Introduction to hypothesis testing.
Sampling Distributions and Hypothesis Testing. 2 Major Points An example An example Sampling distribution Sampling distribution Hypothesis testing Hypothesis.
Chapter 8 Introduction to Hypothesis Testing
Confidence Intervals and Hypothesis Testing
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Chapter 10 Hypothesis Testing
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Inferential Statistics 2 Maarten Buis January 11, 2006.
Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.
Hypotheses tests for means
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
Education 793 Class Notes Decisions, Error and Power Presentation 8.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
Chapter 21: More About Tests
Math 4030 – 9a Introduction to Hypothesis Testing
Testing the Differences between Means Statistics for Political Science Levin and Fox Chapter Seven 1.
Welcome to MM570 Psychological Statistics
© Copyright McGraw-Hill 2004
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Statistical Techniques
Welcome to MM570 Psychological Statistics Unit 5 Introduction to Hypothesis Testing Dr. Ami M. Gates.
Inferential Statistics Psych 231: Research Methods in Psychology.
9.3 Hypothesis Tests for Population Proportions
Section Testing a Proportion
Statistics for the Social Sciences
Central Limit Theorem, z-tests, & t-tests
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9
Psych 231: Research Methods in Psychology
Testing Hypotheses I Lesson 9.
Presentation transcript:

THE DISTRIBUTION OF SAMPLE MEANS How samples can tell us about populations

Review All your experience so far has focused on single scores within a distribution We learned how to convert raw scores to z-scores We computed the probability of obtaining a particular range of scores We computed the range of scores associated with particular probabilities Now we want to apply the same type of logic in thinking about samples within a population How are things different now that we are dealing with a set of scores rather than just a single score?

Random sampling Recall: difference between populations and samples Sample Population Population: the entire set of individuals of interest Sample: a smaller subset of set of individuals of interest All US voterslawyersBSC students all AL voters all who pass bar in 2012 all in this classroom

Random sampling If samples are chosen at random from populations, then they will be representative of those populations All U.S. Voters Sample of Voters Random sample of all eligible voters nationwide Random sample of all eligible voters in 5 randomly selected states Random sample of Fox News viewers Random sample of Fox News viewers and MSNBC viewers

Sampling distributions If samples are chosen at random from populations, then they will be representative of those populations Should have similar means, standard deviations Because samples do not contain all members of the population, they may be slightly different than the population due purely to chance This difference is sampling error

Sampling distributions Imagine we are interested in US presidents Average age: 54.6 years Average height: in Died in office: 18% Population: all 44 US presidents Average age: 56.0 years Average height: 73.1 in Died in office: 0% Average age: 51.6 years Average height: 70.3 in Died in office: 40% Average age: 58 years Average height: 70.2 in Died in office: 0% Last 5First 5 20 th century No sample will perfectly match the population

Sampling distributions If samples are chosen at random from populations, then they will be representative of those populations Should have similar means, standard deviations Because samples do not contain all members of the population, they may be slightly different than the population due purely to chance This difference is sampling error To combat sampling error, sample repeatedly from same population and record the results of each sample Produces a sampling distribution

Sampling distributions Imagine we are interested in US presidents Repeatedly sample sets of 5 presidents and record average age 54.6 years52 years49.4 years 57.2 years 59.8 years Average age: 56.0 years Average height: 73.1 in Died in office: 0% Last 5 presidents Average age: 51.6 years Average height: 70.3 in Died in office: 40% 20 th century presidents Average age: 58 years Average height: 70.2 in Died in office: 0% First 5 presidents

Sampling distributions Imagine we are interested in US presidents Repeatedly sample sets of 5 presidents and record average age 54.6 years52 years49.4 years 57.2 years 59.8 years After 20 samples

Sampling distributions Imagine we are interested in US presidents Repeatedly sample sets of 5 presidents and record average age 54.6 years52 years49.4 years 57.2 years 59.8 years After 60 samples

Sampling distributions Imagine we are interested in US presidents Repeatedly sample sets of 5 presidents and record average age 54.6 years52 years49.4 years 57.2 years 59.8 years After hundreds of samples

Sampling distributions Imagine we are interested in US presidents Repeatedly sample sets of 5 presidents and record average age 54.6 years52 years49.4 years 57.2 years 59.8 years After all possible samples Every combination of 5 presidents 1,086,008 Sampling distribution

ERROR AND POWER The tradeoffs of different types of mistakes

α levels in hypothesis testing When testing hypotheses, we arbitrarily set α level at.05 Customary value for psychological studies What does this mean?

α levels in hypothesis testing When testing hypotheses, we arbitrarily set α level at.05 Customary value for psychological studies Requires that sample mean have less than 5% chance of coming from default population But α levels can be selected to be any value from 0 – 1.0 What happens to critical regions/decision rules as α is adjusted? α =.05 α =.01α =.10 smaller critical region less likely to reject H0 more conservative larger critical region more likely to reject H0 more liberal

α levels in hypothesis testing When testing hypotheses, we arbitrarily set α level at.05 Customary value for psychological studies Requires that sample mean have less than 5% chance of coming from default population But α levels can be selected to be any value from 0 – 1.0 What happens to critical regions/decision rules as α is adjusted? So why not set α to be very high and find lots of interesting things?

α levels in hypothesis testing When testing hypotheses, we arbitrarily set α level at.05 Customary value for psychological studies Requires that sample mean have less than 5% chance of coming from default population But α levels can be selected to be any value from 0 – 1.0 What happens to critical regions/decision rules as α is adjusted? So why not set α to be very high and find lots of interesting things? Because you’ll make a lot of mistakes!

Errors Mistakes in hypothesis testing more formally called errors Two different types of errors that can be made Reject H 0 when you should not Fail to reject H 0 when you should not accept ^ conclude that the sample mean likely comes from a different population, when it comes from the default population you detect differences that don’t really exist conclude that the sample mean likely comes from the default population, when it comes from a different population you fail to detect differences that really do exist Type I errors Type II errors false alarms false positives misses false negatives

Errors Mistakes in hypothesis testing more formally called errors Two different types of errors that can be made But decisions can also be correct Two different ways to be correct Reject H 0 when you should Fail to reject H 0 when you should conclude that the sample mean likely comes from a different population, when it does come from a different population you detect differences that actually exist conclude that the sample mean likely comes from the default population, when it does come from the default population you don’t detect differences that don’t exist PowerConfidence ability to detect differencescertainty there are no differences

Errors Mistakes in hypothesis testing more formally called errors Two different types of errors that can be made But decisions can also be correct Two different ways to be correct Correct and incorrect decisions are necessarily related Cannot be simultaneously correct and incorrect All decisions must be either correct or incorrect

Errors Imagine you want to know whether women in sororities get different grades than the average collegiate woman. You collect a sample of women in sororities and record their grades, then compare that to the grades of the broader population of women. What are the two possible states of the world? What are the two possible realities? What are the two possible decisions you can make? What conclusions can you draw from your hypothesis testing? H 0 sorority sisters = all other women H 1 sorority sisters ≠ all other women Fail to reject H 0 sorority sisters = all other women Reject H 0 sorority sisters ≠ all other women

Errors Fail to reject H 0 sorority sisters = all other women Reject H 0 sorority sisters ≠ all other women correct confidence incorrect type I error incorrect type II error correct power reality decision H 0 sorority sisters = all other women H 1 sorority sisters ≠ all other women

Errors H 0 sorority sisters = all other women H 1 sorority sisters ≠ all other women Fail to reject H 0 sorority sisters = all other women Reject H 0 sorority sisters ≠ all other women correct confidence incorrect type I error incorrect type II error correct power reality decision Note: you make one decision or the other, and are either correct or incorrect Probabilities of being correct and incorrect must therefore add up to 1.0 for each decision you could make confidence + type I error = 1.0type II error + power = 1.0

Errors H 0 sorority sisters = all other women H 1 sorority sisters ≠ all other women Fail to reject H 0 sorority sisters = all other women Reject H 0 sorority sisters ≠ all other women correct confidence incorrect type I error incorrect type II error correct power reality decision α β 1 - α 1 - β

Errors When we choose α for a hypotheses test, we are setting the acceptable level of type I error So α =.05 means that we expect to < 5% false positives – claiming there is a difference between the sample and population when there is not one By extension, we are also setting the confidence in our evaluation of the null hypothesis, as it is 1 – α So an α =.05 means we have 95% confidence in concluding that the sample mean came from the default population As a thought experiment, how do you think α levels should affect type II error and power?

Errors If α levels are very low (decision rules are very conservative) Few type I errors Increased confidence in absence of difference between sample and population Lots of type II errors Reduced power to detect real differences between sample and population If α levels are very high (decision rules are very liberal) Lots of type I errors Reduced confidence in stating no difference between sample and population Few type II errors Increased power to detect real differences between sample and population Inverse relationship between α and β

Errors Choice of α should reflect costs of type I vs. type II error When type I errors are worse than type II errors set α low, β will be high Evaluating new medications to treat advanced forms of cancer Benefits of such medications likely to be low, side-effects severe Do not want to approve drugs that produce negative side-effects if they are not significantly improving patient’s lives Might be better to for FDA to reject some medications that work rather than allow some that do not work to be approved

Errors Choice of α should reflect costs of type I vs. type II error When type I errors are worse than type II errors set α low, β will be high When type II errors are worse than type I errors set α high, β will be low Deciding whether or not snake might be dangerous If fail to avoid dangerous snake, serious injury/death can result Not a huge cost to avoid even non-dangerous species of snake Might be better to for your brain to be wired to assume that most snakes are dangerous, even if many of those decisions will be wrong

Errors Choice of α should reflect costs of type I vs. type II error When type I errors are worse than type II errors set α low, β will be high When type II errors are worse than type I errors set α high, β will be low Other examples of when type I errors are worse? Type II errors?

Errors and power Does not really address exactly how α and β are related Power and type II error (β) determined by a three primary factors α level/location (one- vs two-tailed) Sample size Effect size In order to understand how these factors influence power, need to visualize the underlying distributions How are the sampling distributions of the default population and the alternative population related

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters Sample will come from here if H 1 is true μμ Sample will come from here if H 0 is true Now imagine choosing an α of.05

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ If H 0 true 5% chance type I error 95% confidence

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ If H 1 true 30% chance type I error 70% power

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ confidencepower α β

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ What happens if we increase α? β confidencepower α Confidence Power β

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ What happens if we decrease α?

Errors and power How does sample size influence β/power? Remember, these are distributions of sample means As sample size increases, distributions get closer to population means n = 10 n = 25 n = 100

Errors and power How does sample size influence β/power? Remember, these are distributions of sample means Larger samples will produce less overlap between distributions, holding all else equal

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ n = 30 Notice large region of overlap

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ n = 100 Overlap much smaller Notice the means do not change

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ n = 100 α =.05 β =.025

Errors and power How does sample size influence β/power? Remember, these are distributions of sample means Larger samples will produce less overlap between distributions, holding all else equal Produces smaller β Increases power of hypothesis test No change in confidence For a given α level