# THE DISTRIBUTION OF SAMPLE MEANS How samples can tell us about populations.

## Presentation on theme: "THE DISTRIBUTION OF SAMPLE MEANS How samples can tell us about populations."— Presentation transcript:

THE DISTRIBUTION OF SAMPLE MEANS How samples can tell us about populations

Review All your experience so far has focused on single scores within a distribution We learned how to convert raw scores to z-scores We computed the probability of obtaining a particular range of scores We computed the range of scores associated with particular probabilities Now we want to apply the same type of logic in thinking about samples within a population How are things different now that we are dealing with a set of scores rather than just a single score?

Random sampling Recall: difference between populations and samples Sample Population Population: the entire set of individuals of interest Sample: a smaller subset of set of individuals of interest All US voterslawyersBSC students all AL voters all who pass bar in 2012 all in this classroom

Random sampling If samples are chosen at random from populations, then they will be representative of those populations All U.S. Voters Sample of Voters Random sample of all eligible voters nationwide Random sample of all eligible voters in 5 randomly selected states Random sample of Fox News viewers Random sample of Fox News viewers and MSNBC viewers

Sampling distributions If samples are chosen at random from populations, then they will be representative of those populations Should have similar means, standard deviations Because samples do not contain all members of the population, they may be slightly different than the population due purely to chance This difference is sampling error

Sampling distributions Imagine we are interested in US presidents Average age: 54.6 years Average height: 70.86 in Died in office: 18% Population: all 44 US presidents Average age: 56.0 years Average height: 73.1 in Died in office: 0% Average age: 51.6 years Average height: 70.3 in Died in office: 40% Average age: 58 years Average height: 70.2 in Died in office: 0% Last 5First 5 20 th century No sample will perfectly match the population

Sampling distributions If samples are chosen at random from populations, then they will be representative of those populations Should have similar means, standard deviations Because samples do not contain all members of the population, they may be slightly different than the population due purely to chance This difference is sampling error To combat sampling error, sample repeatedly from same population and record the results of each sample Produces a sampling distribution

Sampling distributions Imagine we are interested in US presidents Repeatedly sample sets of 5 presidents and record average age 54.6 years52 years49.4 years 57.2 years 59.8 years Average age: 56.0 years Average height: 73.1 in Died in office: 0% Last 5 presidents Average age: 51.6 years Average height: 70.3 in Died in office: 40% 20 th century presidents Average age: 58 years Average height: 70.2 in Died in office: 0% First 5 presidents

Sampling distributions Imagine we are interested in US presidents Repeatedly sample sets of 5 presidents and record average age 54.6 years52 years49.4 years 57.2 years 59.8 years After 20 samples

Sampling distributions Imagine we are interested in US presidents Repeatedly sample sets of 5 presidents and record average age 54.6 years52 years49.4 years 57.2 years 59.8 years After 60 samples

Sampling distributions Imagine we are interested in US presidents Repeatedly sample sets of 5 presidents and record average age 54.6 years52 years49.4 years 57.2 years 59.8 years After hundreds of samples

Sampling distributions Imagine we are interested in US presidents Repeatedly sample sets of 5 presidents and record average age 54.6 years52 years49.4 years 57.2 years 59.8 years After all possible samples Every combination of 5 presidents 1,086,008 Sampling distribution

ERROR AND POWER The tradeoffs of different types of mistakes

α levels in hypothesis testing When testing hypotheses, we arbitrarily set α level at.05 Customary value for psychological studies What does this mean?

α levels in hypothesis testing When testing hypotheses, we arbitrarily set α level at.05 Customary value for psychological studies Requires that sample mean have less than 5% chance of coming from default population But α levels can be selected to be any value from 0 – 1.0 What happens to critical regions/decision rules as α is adjusted? α =.05 α =.01α =.10 smaller critical region less likely to reject H0 more conservative larger critical region more likely to reject H0 more liberal

α levels in hypothesis testing When testing hypotheses, we arbitrarily set α level at.05 Customary value for psychological studies Requires that sample mean have less than 5% chance of coming from default population But α levels can be selected to be any value from 0 – 1.0 What happens to critical regions/decision rules as α is adjusted? So why not set α to be very high and find lots of interesting things?

α levels in hypothesis testing When testing hypotheses, we arbitrarily set α level at.05 Customary value for psychological studies Requires that sample mean have less than 5% chance of coming from default population But α levels can be selected to be any value from 0 – 1.0 What happens to critical regions/decision rules as α is adjusted? So why not set α to be very high and find lots of interesting things? Because you’ll make a lot of mistakes!

Errors Mistakes in hypothesis testing more formally called errors Two different types of errors that can be made Reject H 0 when you should not Fail to reject H 0 when you should not accept ^ conclude that the sample mean likely comes from a different population, when it comes from the default population you detect differences that don’t really exist conclude that the sample mean likely comes from the default population, when it comes from a different population you fail to detect differences that really do exist Type I errors Type II errors false alarms false positives misses false negatives

Errors Mistakes in hypothesis testing more formally called errors Two different types of errors that can be made But decisions can also be correct Two different ways to be correct Reject H 0 when you should Fail to reject H 0 when you should conclude that the sample mean likely comes from a different population, when it does come from a different population you detect differences that actually exist conclude that the sample mean likely comes from the default population, when it does come from the default population you don’t detect differences that don’t exist PowerConfidence ability to detect differencescertainty there are no differences

Errors Mistakes in hypothesis testing more formally called errors Two different types of errors that can be made But decisions can also be correct Two different ways to be correct Correct and incorrect decisions are necessarily related Cannot be simultaneously correct and incorrect All decisions must be either correct or incorrect

Errors Imagine you want to know whether women in sororities get different grades than the average collegiate woman. You collect a sample of women in sororities and record their grades, then compare that to the grades of the broader population of women. What are the two possible states of the world? What are the two possible realities? What are the two possible decisions you can make? What conclusions can you draw from your hypothesis testing? H 0 sorority sisters = all other women H 1 sorority sisters ≠ all other women Fail to reject H 0 sorority sisters = all other women Reject H 0 sorority sisters ≠ all other women

Errors Fail to reject H 0 sorority sisters = all other women Reject H 0 sorority sisters ≠ all other women correct confidence incorrect type I error incorrect type II error correct power reality decision H 0 sorority sisters = all other women H 1 sorority sisters ≠ all other women

Errors H 0 sorority sisters = all other women H 1 sorority sisters ≠ all other women Fail to reject H 0 sorority sisters = all other women Reject H 0 sorority sisters ≠ all other women correct confidence incorrect type I error incorrect type II error correct power reality decision Note: you make one decision or the other, and are either correct or incorrect Probabilities of being correct and incorrect must therefore add up to 1.0 for each decision you could make confidence + type I error = 1.0type II error + power = 1.0

Errors H 0 sorority sisters = all other women H 1 sorority sisters ≠ all other women Fail to reject H 0 sorority sisters = all other women Reject H 0 sorority sisters ≠ all other women correct confidence incorrect type I error incorrect type II error correct power reality decision α β 1 - α 1 - β

Errors When we choose α for a hypotheses test, we are setting the acceptable level of type I error So α =.05 means that we expect to < 5% false positives – claiming there is a difference between the sample and population when there is not one By extension, we are also setting the confidence in our evaluation of the null hypothesis, as it is 1 – α So an α =.05 means we have 95% confidence in concluding that the sample mean came from the default population As a thought experiment, how do you think α levels should affect type II error and power?

Errors If α levels are very low (decision rules are very conservative) Few type I errors Increased confidence in absence of difference between sample and population Lots of type II errors Reduced power to detect real differences between sample and population If α levels are very high (decision rules are very liberal) Lots of type I errors Reduced confidence in stating no difference between sample and population Few type II errors Increased power to detect real differences between sample and population Inverse relationship between α and β

Errors Choice of α should reflect costs of type I vs. type II error When type I errors are worse than type II errors set α low, β will be high Evaluating new medications to treat advanced forms of cancer Benefits of such medications likely to be low, side-effects severe Do not want to approve drugs that produce negative side-effects if they are not significantly improving patient’s lives Might be better to for FDA to reject some medications that work rather than allow some that do not work to be approved

Errors Choice of α should reflect costs of type I vs. type II error When type I errors are worse than type II errors set α low, β will be high When type II errors are worse than type I errors set α high, β will be low Deciding whether or not snake might be dangerous If fail to avoid dangerous snake, serious injury/death can result Not a huge cost to avoid even non-dangerous species of snake Might be better to for your brain to be wired to assume that most snakes are dangerous, even if many of those decisions will be wrong

Errors Choice of α should reflect costs of type I vs. type II error When type I errors are worse than type II errors set α low, β will be high When type II errors are worse than type I errors set α high, β will be low Other examples of when type I errors are worse? Type II errors?

Errors and power Does not really address exactly how α and β are related Power and type II error (β) determined by a three primary factors α level/location (one- vs two-tailed) Sample size Effect size In order to understand how these factors influence power, need to visualize the underlying distributions How are the sampling distributions of the default population and the alternative population related

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters Sample will come from here if H 1 is true μμ Sample will come from here if H 0 is true Now imagine choosing an α of.05

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ If H 0 true 5% chance type I error 95% confidence

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ If H 1 true 30% chance type I error 70% power

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ confidencepower α β

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ What happens if we increase α? β confidencepower α Confidence Power β

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ What happens if we decrease α?

Errors and power How does sample size influence β/power? Remember, these are distributions of sample means As sample size increases, distributions get closer to population means n = 10 n = 25 n = 100

Errors and power How does sample size influence β/power? Remember, these are distributions of sample means Larger samples will produce less overlap between distributions, holding all else equal

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ n = 30 Notice large region of overlap

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ n = 100 Overlap much smaller Notice the means do not change

Errors and power Do women in sororities get higher grades? Distribution of grades for general population Distribution of grades for sorority sisters μμ n = 100 α =.05 β =.025

Errors and power How does sample size influence β/power? Remember, these are distributions of sample means Larger samples will produce less overlap between distributions, holding all else equal Produces smaller β Increases power of hypothesis test No change in confidence For a given α level