Making Inferences. Sample Size, Sampling Error, and 95% Confidence Intervals Samples: usually necessary (some exceptions) and don’t need to be huge to.

Slides:



Advertisements
Similar presentations
Mean, Proportion, CLT Bootstrap
Advertisements

A Sampling Distribution
Estimation Let's return to our example of the random sample of 200 USC undergraduates. Remember that this is both a large and a random sample, and therefore.
1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.
Introduction to Confidence Intervals using Population Parameters Chapter 10.1 & 10.3.
Statistics and Quantitative Analysis U4320
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 16 Mathematics of Normal Distributions 16.1Approximately Normal.
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Confidence Intervals with proportions a. k. a
Central Limit Theorem.
Objectives Look at Central Limit Theorem Sampling distribution of the mean.
Chapter 19 Confidence Intervals for Proportions.
Confidence Intervals for Proportions
INFERENTIAL STATISTICS  Samples are only estimates of the population  Sample statistics will be slightly off from the true values of its population’s.
Confidence Intervals for
The standard error of the sample mean and confidence intervals
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Chapter 7 Probability and Samples: The Distribution of Sample Means
Determining the Size of
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Review of normal distribution. Exercise Solution.
4.2 Statistics Notes What are Good Ways and Bad Ways to Sample?
POSC 202A: Lecture 9 Lecture: statistical significance.
June 18, 2008Stat Lecture 11 - Confidence Intervals 1 Introduction to Inference Sampling Distributions, Confidence Intervals and Hypothesis Testing.
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Chapter 11: Estimation Estimation Defined Confidence Levels
A Sampling Distribution
Estimation of Statistical Parameters
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
PARAMETRIC STATISTICAL INFERENCE
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
90288 – Select a Sample and Make Inferences from Data The Mayor’s Claim.
Day 3: Sampling Distributions. CCSS.Math.Content.HSS-IC.A.1 Understand statistics as a process for making inferences about population parameters based.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
February 2012 Sampling Distribution Models. Drawing Normal Models For cars on I-10 between Kerrville and Junction, it is estimated that 80% are speeding.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Introduction to Confidence Intervals using Population Parameters Chapter 10.1 & 10.3.
Confidence Interval Estimation For statistical inference in decision making:
What is a Confidence Interval?. Sampling Distribution of the Sample Mean The statistic estimates the population mean We want the sampling distribution.
Inference: Probabilities and Distributions Feb , 2012.
4.4.2 Normal Approximations to Binomial Distributions
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 5 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
Margin of Error S-IC.4 Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Copyright © 2010 Pearson Education, Inc. Slide
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Statistics 19 Confidence Intervals for Proportions.
Political Science 30: Political Inquiry. The Magic of the Normal Curve Normal Curves (Essentials, pp ) The family of normal curves The rule of.
 Confidence Intervals  Around a proportion  Significance Tests  Not Every Difference Counts  Difference in Proportions  Difference in Means.
 Normal Curves  The family of normal curves  The rule of  The Central Limit Theorem  Confidence Intervals  Around a Mean  Around a Proportion.
CIVE Engineering Mathematics 2.2 (20 credits) Statistics and Probability Lecture 6 Confidence intervals Confidence intervals for the sample mean.
Sampling Sampling Distributions. Sample is subset of population used to infer something about the population. Probability – know the likelihood of selection.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Confidence Intervals with proportions a. k. a
Sampling Population – any well-defined set of units of analysis; the group to which our theories apply Sample – any subset of units collected in some manner.
Lecture: statistical significance.
Sampling Distributions
Pull 2 samples of 10 pennies and record both averages (2 dots).
Presentation transcript:

Making Inferences

Sample Size, Sampling Error, and 95% Confidence Intervals Samples: usually necessary (some exceptions) and don’t need to be huge to be accurately representative of the entire population you want to study e.g., 1936 election between Alf Landon and FDR; Literary Digest predicted sweeping victory for Landon (based on sample of 2 million people)

Sample Size, Sampling Error, and 95% Confidence Intervals Sampling Error (also known as Standard Error): is simply the difference between the estimates obtained from the sample and the true population value (e.g., president’s approval rating of 52% (±4%); determined by the sample’s size and standard deviation Confidence Level (also known as Confidence Interval): 95 percent confidence level or interval would mean that 95 out of 100 samples that might be selected would generate an estimate of presidential approval within the range of 48-56%.

The Mayor and Your Job as Lead Pollster

The Normal Distribution & Sampling Example: For his upcoming reelection campaign, Michael Bloomberg wants to know how many Independents there are in N.Y. city, which has grown rapidly in population the last several years. Although the N.Y. Bureau of Elections reports that 25% of registered voters claim “Independent” status, he wants to test the validity of this figure. Consequently, Bloomberg asks you to conduct a poll to estimate the proportion/percent of citizens, 18 years or older, who are Independents rather than Democrats or Republicans in NYC. You interview 10 randomly chosen individuals and find that 2 of them are registered to vote as an Independent. Based on this finding you might think that the proportion is closer to 20%, which is a little bit below the last reported proportion. This difference, 5%, is called the sampling error. What you need is some way to measure the uncertainty in your estimate, so that you can tell Mr. Bloomberg what the margin of error is.

The Normal Distribution & Sampling, cont’d Let’s say you repeat the interview procedure 4 more times and get estimates of 20%, 30%, 40% and 20% ( Independents all out of 10 respondents) divided by 4 = 27.5%, which is not too far from the originally reported value of 25%. What would happen if you repeated the process over and over and over, say, 1,000 independent samples of 10 interviewees and calculated the proportion of Independents in each one? After a while you would have a substantial list of sample proportions (see the first figure on handout #1.) In a simulated example, you end up with a mean proportion of.248 (24.8%) and a standard deviation of.141 (14.1%). Now think of standard error as an indicator of how much uncertainty there is in your estimate. We see in the first figure, for example, that about 2/3rds of the estimates are in the range of.248 +/-.141, or between.107 and.389 (10.7% and 38.9%). Consider this the “68% confidence interval.” The other third of the samples are below.107 (10.7%) or above.389 (38.9%).

The Normal Distribution & Sampling, cont’d In other words, after 1,000 samples of 10 randomly chosen NYC voters, you could tell Michael Bloomberg that the proportion of Independents in New York City is probably between 10% and 39%. When he asks, “What do you mean by ‘probably’?” a technical answer on your part would be, “I’m 68% sure.” Not surprisingly, he threatens to fire you by the end of the month if you can’t do better than that. He wants you to narrow the range of uncertainty. What do you do? You ask for another $1 million, so that you can take larger sample sizes (e.g., 50 randomly chosen people instead of 10). Repeating the same process of 1,000 samples, but this time with 50 NYC voters, in each sample, you get the results in the 2 nd figure on handout #1: mean proportion of.251 (25.1%) and a standard deviation of.064 (6.4%). Now you can tell Mr. Bloomberg that the proportion of Independents in NYC is probably, with 68% confidence, between 19% and 31%. “You’re getting better,” he says, “but I still want some more certainty.” “O.k.,” you say, “show me some more $$ and I’ll get your some more certainty.”

The Normal Distribution & Sampling, cont’d By the time you’ve conducted 1,000 samples of interviews with 500 randomly selected NYC voters (see figure 4 on handout #2), your mean is still.250 (25%), but with a standard deviation now of only.019 (essentially 2%). Hence, now you can tell Mr. Bloomberg that the proportion of Independents in NYC is probably, with 95% confidence (2 standard deviations either way; 1 standard deviation either way would give you 68% confidence), between 21% and 29%. But let’s say that earlier in the process, Mayor Bloomberg randomly surveyed 10 individuals (e.g. friends, butler, misc. staff) about their voter status and found a rate of 35% registered Independents. He asks for you to demonstrate why the upcoming campaign shouldn’t work with his figure instead. I mean, he got it himself personally. You have to show him, based on your original study of 1,000 samples of 10 random New York City voters, what the likelihood is of a finding of 35% registered Independents. This is first done by computing the amount of random sampling error or standard error: (Pollack, p. 106) Sampling or Standard error = standard deviation ÷ square root of the sample size (n=10) Sampling or Standard error = 14.1% divided by 3.16 Sampling or Standard error = 4.5%

Inference Using the Normal Distribution & Z Scores The central limit theorem (Pollack, p. 108) tells us that there is a 68% chance that the true population mean of NYC voters registered as “Independent” lies within plus or minus 1 standard error of the sample mean (25% in your study), and there is a 95% chance that it lies within plus or minus 1.96 standard errors of the sample mean (again, 25%). Conversely, there is only a 5% probability that the true population of NYC voters registered as “Independent” is more or less than 1.96 standard errors away from the sample mean: “Low” end of 95% confidence interval = sample mean – 1.96 standard errors = 25% – 1.96 (4.5%) = 16.2% “High” end of 95% confidence interval = sample mean standard error = 25% (4.5%) = 33.8% Conclusion: Invariably, then, 95% of all possible random samples of 10 NYC voters will produce sample means of between 16.2% and 33.8% “Independent” registered voters.

Inference Using the Normal Distribution & Z Scores Given these results, how “random” is Mayor Bloomberg’s finding of 35% registered Independents among NYC’s voting population? First, standardize his 35% finding into a Z score: Z = Bloomberg mean – larger sample mean of 1,000 samples of 10 voters ÷ standard error Z = 35% - 25% divided by 4.5% Z = 10% divided by 4.5% Z = 2.27 Based on the table of Z scores (Pollack, p. 110), how likely is it that a truly random sample of registered NYC voters would find 35% to be Independents?.0116 = 1.16 or 1.2% (Basically, you could say, “Mr. Mayor, the odds of finding that 35% of registered voters in NYC are “Independent” are 1 out of a 100 and, congratulations Sir, you got that one. Now stop wasting my time and let me do the polling in this campaign.”