Sampling and Estimating Population Percentages and Averages Math 1680.

Slides:



Advertisements
Similar presentations
Mean, Proportion, CLT Bootstrap
Advertisements

Sampling Distributions Welcome to inference!!!! Chapter 9.
1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.
Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
The Diversity of Samples from the Same Population Thought Questions 1.40% of large population disagree with new law. In parts a and b, think about role.
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Confidence Intervals for Proportions
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 18, Slide 1 Chapter 18 Confidence Intervals for Proportions.
Confidence Intervals for
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
Introduction to Formal Statistical Inference
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
The eternal tension in statistics.... Between what you really really want (the population) but can never get to...
Sampling Distributions
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Understanding sample survey data
Inferential Statistics
Sampling Theory and Surveys GV917. Introduction to Sampling In statistics the population refers to the total universe of objects being studied. Examples.
June 18, 2008Stat Lecture 11 - Confidence Intervals 1 Introduction to Inference Sampling Distributions, Confidence Intervals and Hypothesis Testing.
Lecture 3: Review Review of Point and Interval Estimators
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Chapter 8: Confidence Intervals
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Estimates and Sample Sizes Lecture – 7.4
PARAMETRIC STATISTICAL INFERENCE
Significance Tests: THE BASICS Could it happen by chance alone?
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
SAMPLING Obtaining a Sample From a Population. A CENSUS is a sample of an entire Population…. They are seldom done because….. 1. Impossible 2. Unprofitable.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Sampling Distribution WELCOME to INFERENTIAL STATISTICS.
Unit 6 Confidence Intervals If you arrive late (or leave early) please do not announce it to everyone as we get side tracked, instead send me an .
Section 10.1 Confidence Intervals
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Inferential Statistics Part 1 Chapter 8 P
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
STA Lecture 171 STA 291 Lecture 17 Chap. 10 Estimation – Estimating the Population Proportion p –We are not predicting the next outcome (which is.
The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.2 Estimating a Population Proportion.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Confidence Intervals (Dr. Monticino). Assignment Sheet  Read Chapter 21  Assignment # 14 (Due Monday May 2 nd )  Chapter 21 Exercise Set A: 1,2,3,7.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Part III – Gathering Data
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Chapter 19 Confidence intervals for proportions
SAMPLING Obtaining a Sample From a Population. A population is all the people or objects of interest in a study.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Introduction to Inference Sampling Distributions.
Math 3680 Lecture #15 Confidence Intervals. Review: Suppose that E(X) =  and SD(X) = . Recall the following two facts about the average of n observations.
Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Copyright © 2010 Pearson Education, Inc. Slide
Review Statistical inference and test of significance.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
+ Chapter 8: Estimating with Confidence Section 8.2 Estimating a Population Proportion.
Statistics 19 Confidence Intervals for Proportions.
Review Law of averages, expected value and standard error, normal approximation, surveys and sampling.
Confidence Intervals for Proportions
Understanding Sampling Distributions: Statistics as Random Variables
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Part III – Gathering Data
Confidence Intervals for Proportions
BUSINESS MATHEMATICS & STATISTICS.
Confidence Intervals for Proportions
Confidence Intervals for Proportions
Presentation transcript:

Sampling and Estimating Population Percentages and Averages Math 1680

Overview Introduction Picking a Sample Chance Errors in Sampling Estimating the Population Percentage Estimating the Population Average Summary

Introduction In real life, we often want to know about a population of individuals  Voters for an election  Families who watch TV  Students at a university Most of the time the population is far too big to be able to question directly

Introduction Instead, we take a sample of the population, determine the parameters of the sample, and then use this to infer the parameters of the population at large  Characteristics take the form of statistics Average height Median exam score Proportion of voters in favor of a candidate  Can adjust these statistics to apply to the population Which would better represent the population (all other things being equal), a large sample or a small sample?

Picking a Sample Survey tools include  Interview  Mail  Phone  Internet Sample types include  Convenience  Quota  Simple Random

Picking a Sample Generally, the best bet is to make the sample random  Allows the surveyor no choice in determining who will be interviewed  Provides the best defense against bias Simple random sampling is one way to do this  Draw randomly as if from a box

Picking a Sample Sources of bias  People who respond to voluntary surveys tend to have different parameters than people who don’t respond (volunteer bias)  People tend to exaggerate/round off things such as their weight or age when asked, depending on the context of the question  The wording of the questions or the nature of the interviewer can suggest certain types of answers from people (response bias)  Phone surveys in particular are susceptible to volunteer bias and also tend to over-represent the middle class In general, poor people and rich people are less likely to have listed numbers

Picking a Sample Give at least two types of bias (with examples) that one encounters when trying to conduct a phone survey by looking up numbers randomly in a phone book Many people have their numbers unlisted. Many people (especially the ones with Caller ID) will hang up the phone or choose not to answer, creating a non- response bias.

Picking a Sample In order to obtain a representative sample of 10,000 American citizens, a polling organization takes one morning to go door-to- door in the capital city of each state until they have enough people to get that state’s percentage out of the 10,000  Find at least two problems with this sampling procedure and justify why they are problems The capital city will not represent the entire state, especially rural states like Wyoming. Going in the morning will eliminate anyone who is working that morning.

Picking a Sample (Hypothetical) The teacher of a 400-person class wants to determine whether or not to curve his first exam. His exams are graded by four different teacher’s assistants. Since there are so many students, the teacher does not want to look through every exam himself. Instead, he pulls out the first 5 tests from each TA’s section and takes the average grade. He then makes a curve based on this average.  Give at least two possible confounding factors in this process. The TA’s may not have graded to the same level, and some TA’s may have sorted their test scores, either alphabetically or by grade. Especially in the later, the average from the first five tests will not be representative of all the students.

Chance Errors in Sampling Warning: The formulae and methods used in the rest of the lecture only apply to simple random samples  If the sample in question is not obtained by this method, using these formulae will give meaningless results!

Chance Errors in Sampling Suppose we are dealing with a population of 10,000 students, 5,300 of which are women and 4,700 of which are men  If we were to sample 100 people at random, how many would we expect to be women?  What percentage would we expect to be women? 53 53%

Chance Errors in Sampling Suppose we are dealing with a population of 10,000 students, 5,300 of which are women and 4,700 of which are men  If we were to sample 100 people, how far off of our expected value are we likely to be?  By what percentage are we likely to be off? ≈ 5 ≈ 5%

Chance Errors in Sampling The expected value for a percentage (%EV) in a sample size of n is the probability of drawing a “1” from the corresponding box

Chance Errors in Sampling The standard error for a percentage (%SE n ) in a sample size of n, if we draw with replacement, is the standard deviation of the box, divided by the square root of the number of draws Note that %SE n depends on n, while %EV does not!

Chance Errors in Sampling As sample size increases, %SE n goes to 0% Also, dividing the SD by the square root of the sample size is exact only when the draws are being made with replacement  Since in sampling we usually do not re-interview the people we sample, this formula is only an approximation  If the number of people we sample is small relative to the population size, then this approximation is good enough  If not, we need a correction factor

Chance Errors in Sampling Approximate %SE n and exact %SE n for the number of men in a sample of size n, for different sample sizes  Population (N) is 10,000 5,000 are men n%SE n (exact)%SE n (approximate) %5% %1.7% 1,0001.5%1.6% 6, %0.63%

Chance Errors in Sampling Suppose we are dealing with a population of 10,000 students, 5,300 of which are women and 4,700 of which are men  If we were to sample 100 people at random, what is the probability that we get between 45% and 50% men? ≈ 38%

Chance Errors in Sampling To verify that simple random sampling is accurate, we need to see that there is a high probability of being “close” to the %EV We can use the normal curve to answer this kind of question  Find the expected value and standard error for the percentage  Standardize the range in question according to the %EV and %SE n  Find the corresponding area under the curve by using the normal table

Chance Errors in Sampling According to the 2000 US Census, of Americans 25 years or older, 80.4% are high school graduates and 24.4% have at least a Bachelor’s degree. There were about 281 million Americans in  If one was to take a simple random sample of 900 Americans age 25 or more, what is the probability that the sample would contain less than 79% high school graduates? ≈ 14.5%

Chance Errors in Sampling According to the 2000 US Census, of Americans 25 years or older, 80.4% are high school graduates and 24.4% have at least a Bachelor’s degree. There were about 281 million Americans in  What is the probability that the sample would contain between 23% and 26% college graduates? ≈ 70%

Chance Errors in Sampling (Hypothetical) A polling organization wants to find out if a simple random sample really achieves the correct demographic proportions. They decide to conduct surveys in Dallas, where they know before-hand from the US Census that 50.4% of Dallas residents are male, and that 38.8% of families in Dallas have a married couple heading them. The population of Dallas at that time was 1,188,580.  If they take a simple random sample of 5,000 people, estimate the probability that their sample would contain less than 49% males ≈ 2.4%

Chance Errors in Sampling (Hypothetical) A polling organization wants to find out if a simple random sample really achieves the correct demographic proportions. They decide to conduct surveys in Dallas, where they know before-hand from the US Census that 50.4% of Dallas residents are male, and that 38.8% of families in Dallas have a married couple heading them. The population of Dallas at that time was 1,188,580.  If they take a simple random sample of 500,000 people estimate the probability that their sample would contain less than 49% males 0%

Chance Errors in Sampling (Hypothetical) A polling organization wants to find out if a simple random sample really achieves the correct demographic proportions. They decide to conduct surveys in Dallas, where they know before-hand from the US Census that 50.4% of Dallas residents are male, and that 38.8% of families in Dallas have a married couple heading them. The population of Dallas at that time was 1,188,580.  If they take a simple random sample of 10,000 people, estimate the probability that their sample would contain between 38.5% and 39.5% families headed by married couples ≈ 66%

Estimating the Population Percentage Often, we are interested in estimating a population’s percentage about some parameter  Parameters can be seen as answers to a yes/no question Do you favor this candidate? Do you smoke? To estimate the percentage, we start by taking a simple random sample and calculating its average and SD

Estimating the Population Percentage A simple random sample of 1,000 people is taken, of which 543 are Democrats  What percentage of the population do we expect to be Democrats?  How far off do we expect to be? The expected percentage in the population is just the percentage in the sample 54.3%

Estimating the Population Percentage Ideally, we would set up our box model for the entire population and calculate the SD, then divide by the square root of the sample size However, we don’t know the population SD  Instead, we just use the sample SD in its place, making the assumption that it should reflect the population SD When estimating %SE n for the expected percentage in the population, use the sample’s SD in your calculations  So the %SE in this case is… 1.58%

Estimating the Population Percentage In the previous example, we saw that we expected the population to be 54.3% Democrats, and we expected to be off by about 1.6% %EV is really a random variable  If the sample was large enough, we can assume %EV is approximately normal and say that the interval 54.3 ± 1.6% is the 68% confidence interval for the percentage of Democrats in the population  What would the 95% confidence interval be?  What would the 99% confidence interval be? 54.3 ± 3.2% 54.3 ± 4.8%

Estimating the Population Percentage Warning: If the sample percentage is very close to 0% or 100%, a very large sample is needed to use the normal approximation!  One way to check this is to calculate the sample SD If the SD is close to 50%, a small sample will allow for a normal approximation If the SD is well below 50%, very large samples are needed to use the normal approximation

Estimating the Population Percentage At this point, we are tempted to say that there is a 95% chance that the true percentage of Democrats falls between 51.1% and 57.5%  This is not so Remember that the true percentage of Democrats in the population is determined by the entire population Sample percentages are random numbers determined by the people we sample  It is correct to say that if we were to take 100 samples and calculate 100 different 95% confidence intervals, then about 95 of them should encompass the true percentage of Democrats

Estimating the Population Percentage Homer Simpson decides to run against Mayor Quimby for the leadership of Springfield  After a grueling campaign, Election Day finally arrives, and the early exit polls show that Homer has the votes of 58 out of the 100 people polled Find the 95% confidence interval for the percentage of votes for Homer  Can Homer break out the Duff champagne? 58% ± 9.88%, no

Estimating the Population Percentage Homer Simpson decides to run against Mayor Quimby for the leadership of Springfield  A few hours later, the exit polls show a turn for the worse for Homer, with only 485 out of 1,000 sampled voting for him Calculate the 95% confidence interval for the percentage of votes for Homer  Is he out of the running? 48.5% ± 3.16%, no

Estimating the Population Percentage Homer Simpson decides to run against Mayor Quimby for the leadership of Springfield  By midnight, 10,000 votes have been counted, and Homer has 5,204 of them One last time, calculate the 95% confidence interval for the percentage of votes for Homer  Has Quimby’s regime finally been toppled? 52.04% ± 1.0%, it appears so

Estimating the Population Average Often, we are interested in estimating a population’s average on some parameter  Height  IQ  Income As before, to estimate the average, we start by taking a simple random sample and calculating its average and SD

Estimating the Population Average The expected value for the average (mEV) in a sample size of n is the average of the corresponding box

Estimating the Population Average The standard error for the average (mSE n ) in a sample size of n, if we draw with replacement, is the standard deviation of the box, divided by the square root of the number of draws Note that mSE n depends on n, while mEV does not!

Estimating the Population Average Just as with percentages, we estimate the standard error for the population’s average by applying the sample’s SD in place of the population SD If our sample is large enough, we can assume the distribution on the sample average is approximately normal  This allows us to obtain confidence intervals for the population average

Estimating the Population Average (Hypothetical) A large company takes a simple random sample of 500 employees and asks them how long they have worked there  The employees averaged 4.2 years with an SD of 1.3 Give a 95% confidence interval for the average length of employment at the company (4.08 years, 4.32 years)

Estimating the Population Average (Hypothetical) Out of a county containing a college, a simple random sample of 1000 people is taken  From the sample, the average level of education (years of school completed, not counting Kindergarten) is 14 years, with an SD of 2 years Give an expected value and standard error for the average educational level of people in the county Have 68% of the county’s residents completed 14 ± years of schooling? 14 ± years No, the SD for the population is 2, not years. On top of this, we can’t determine if the population’s education level is normally distributed.

Estimating the Population Average (Hypothetical) As a way of measuring the quality of English education in one high school, the senior class is required to take the ACT The English department wants to see an English ACT average of 25  Of the class, 125 tests are checked  The average English ACT score was a 25.4, with an SD of 2.2 Can you say with 99.7% confidence that the English department will get the average it desires? No, because the 99.7% confidence interval is (24.81, 25.99), which includes the goal of 25.

Summary When trying to determine characteristics of a large population, researchers use smaller samples to infer the population characteristics  Ideal samples are randomly selected  Simple random samples can be modeled with a box and ticket model A simple random sample will give an accurate representation of the population, provided that it is large enough

Summary We use a sample percentage/average to estimate the population percentage/average  We find the standard error by assuming that the sample SD in place of the population SD If the sample is large enough, we can assume the sample percentage/average as a random variable is approximately normal  We can calculate confidence intervals around our sample percentage/average to narrow down the location of the population percentage/average