The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
Advertisements

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Estimation in Sampling
6-1 Stats Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23 The Law of Averages (Ch 16) Box Models (Ch 16) Sampling.
Sampling Distributions (§ )
Chapter 10: Hypothesis Testing
Confidence Intervals for Proportions
The standard error of the sample mean and confidence intervals
Zen and the Art of Significance Testing At the center of it all: the sampling distribution The task: learn something about an unobserved population on.
Introduction to Inference Estimating with Confidence Chapter 6.1.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Probability (cont.). Assigning Probabilities A probability is a value between 0 and 1 and is written either as a fraction or as a proportion. For the.
Sampling and Estimating Population Percentages and Averages Math 1680.
CHAPTER 11: Sampling Distributions
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Standard Error of the Mean
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control.
Estimation of Statistical Parameters
7-1 Estim Unit 7 Statistical Inference - 1 Estimation FPP Chapters 21,23, Point Estimation Margin of Error Interval Estimation - Confidence Intervals.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
16-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 16 The.
PARAMETRIC STATISTICAL INFERENCE
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Stat 13, Tue 5/8/ Collect HW Central limit theorem. 3. CLT for 0-1 events. 4. Examples. 5.  versus  /√n. 6. Assumptions. Read ch. 5 and 6.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Section 10.1 Confidence Intervals
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Confidence Intervals (Dr. Monticino). Assignment Sheet  Read Chapter 21  Assignment # 14 (Due Monday May 2 nd )  Chapter 21 Exercise Set A: 1,2,3,7.
Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.
SAMPLE SIZE.
Introduction to Inference Sampling Distributions.
Math 3680 Lecture #15 Confidence Intervals. Review: Suppose that E(X) =  and SD(X) = . Recall the following two facts about the average of n observations.
1 Chapter 8 Interval Estimation. 2 Chapter Outline  Population Mean: Known  Population Mean: Unknown  Population Proportion.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
The Normal Approximation for Data. History The normal curve was discovered by Abraham de Moivre around Around 1870, the Belgian mathematician Adolph.
The normal approximation for probability histograms.
Review Statistical inference and test of significance.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
10.1 Estimating with Confidence Chapter 10 Introduction to Inference.
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Review Law of averages, expected value and standard error, normal approximation, surveys and sampling.
Sampling Distribution Models
Inference: Conclusion with Confidence
Chapter 8: Inference for Proportions
Confidence intervals for the difference between two means: Independent samples Section 10.1.
Sampling Distributions (§ )
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Accuracy of Averages.
Presentation transcript:

The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make inference about the average from the sample to the population.

Introduction We want to estimate the accuracy of an average computed from a simple random sample. Again, we deal with the situation that the parameter (average/expected value) of the population is unknown. First of all, we need to figure the likely size of the chance error for average. This is measured by the SE for average.

Example Let’s look at the box with tickets: 1, 2, 3, 4, 5, 6, 7. Using the computer simulation, 25 draws at random with replacement could be: The sum is 105, and the average is 105/25 = 4.2. Another simulation came out differently: The sum becomes 95, and the average is 95/25 = 3.8. The sum is subject to chance variability, therefore so is the average.

Example

As an application of the change of scale, if we convert the histograms into standard units, the two histograms (sum and average) are exactly the same. So the histogram for the average of the sum can be approximated by the normal curve, if the number of draws is large enough.

Example When drawing at random from a box, the probability histogram for the average of the draws follows the normal curve, even if the contents of the box do not. The histogram must be put into standard units, and the number of draws must be reasonably large.

Increase the draws

Comments Similar to the SE for number and the SE for percentage, the SE for the sum and the SE for the average behave quite differently when increasing the number of draws. As the number of draws goes up, the SE for the sum gets bigger, but the SE for the average gets smaller. Since the SE for average corresponds to the SE for percentage, when drawing without replacement, the exact SE for the average can be found using the correction factor: SE without = correction factor × SE with. In general, the number of draws is a small part of the total tickets, the correction factor will be close to 1, which could be ignored.

Sample average Now we come to the main issue in this chapter: make inference about the average from the sample to the population. Two main questions have to be paid attention: What is the difference between the SD of the sample and the SE for the sample average? Why is it OK to use the normal curve in figuring confidence levels? With these questions in mind, we look at two examples.

Example 1 A city manager wants to know the average income of the 25,000 families living in his town. He hires a survey organization to take a simple random sample of 1,000 families. The total income of the sample families turns out to be $62,396,714. So the average is about $62,400. Then the average income for all 25,000 families is estimated as $62,400. This estimate is off by a chance error. So we have to figure out the SE for the average.

Example 1 We first set up a box model. The problem is not about counting, and it is about average. So we no longer use the 0-1 box. Since the population size is 25,000, there are 25,000 tickets in the box. But the incomes vary from family to family, we need to summarize the data of the population. Remember that, the average and the SD are two good summary statistics for data. The average of the box is already estimated by $62,400. All we need to do is to figure out the SD of the box. But the data for the whole population is unknown, we have to use the bootstrap method to estimate the SD of the box. That is, substitute the SD of the sample to the SD of the box.

Example 1

We could also use the confidence intervals to state the accuracy: For example, a 95%-confidence interval for the average of the incomes is obtained by going 2 SEs either way from the sample average: “$62,400 ± 2 × $1,700 = $59,000 to $65,800”. Once again, “$59,000 to $65,800” is just one of the confidence intervals “sample average ± 2 SEs”. The probability 95% states that about 95% of the confidence intervals cover the true value (average income of the population, the parameter).

Remarks Since we don’t know the average of the box, we don’t know the expected value for the sum of draws. The income $62,396,714 of the sample families is just an observed value, and so is the average $62,400. The SE for average (or for the sum) measures the likely size of the chance error from the equation: Observed value = expected value + chance error. So the SE says how far sample averages are from the population average----for typical samples. (Comparing the averages.) Whereas, the SD says how far family incomes are from the average---- for typical families. (Comparing the incomes.)

Example 2 As part of an opinion survey, a simple random sample of 400 persons age 25 and over is taken in a certain town in Appalachia. The total years of schooling completed by the sample persons is 4,635. So their average educational level is 4,635/400≈ 11.6 years. The SD of the sample is 4.1 years. Find a 95%-confidence interval for the average educational level of all persons age 25 and over in this town.

Solution We have quite a few information about the population. This is the general case. For the box model, there should be one ticket for each person, showing the number of years of schooling completed by that person. It does not matter we don’t know the number of tickets of the whole box. Let’s assume there are too many relative to the sample size. According to the sample, there are 400 draws made at random. The data from the box can be estimated by the draws. This completes the box model.

Solution

Remark Why is it OK to use the normal curve to calculate the confidence level of 95%? After all, the histogram for educational levels looks nothing like the normal curve:

Remark Here is a computer simulation. In reality, we don’t know the contents of the box, but mathematical theory still applies. In the sample, although there are a few too many people with 8-9 years of education, the histogram is very similar to the population one. This indicates that the sample SD is a good estimate for the population SD. (Share about the same amount of spread.)

Remark The reason that we can use the normal curve is similar to the case for sample percentage. Remember, in that case, we have the new model: 0-1 box. The sample percentage is just a change of scale of the sum of draws----we are counting the 1’s. By CLT, the sum of draws follows the normal curve, even if the composition of the box is abnormal (e.g. 10% 1’s, 90% 0’s). Here, the sample average is again a change of scale of the sum of draws----numbers on the tickets are from the data(educational years). So even if the data do not follow the normal curve, the sum of draws still follow the curve, provided the number of draws is large enough. With a small sample, the normal curve should not be used.

Remark Here is the probability histogram for the average of draws. It does not represent data. Instead, it represents chances for the sample average (a change of scale of the sum of draws). Clearly, the normal curve is a good approximation to the histogram.

A summary for SE

Summary The SE for the average equals the SE for the sum divided by the number of draws. When drawing at random or with a simple random sample, the average of the draws can be used to estimate the average of the box. The SD of the sample can be used to estimate the SD of the box. Multiplying the number of draws by some factor divides the SE for their average by the square root of that factor. The probability histogram for the average will follow the normal curve, even if the contents of the box do not. The histogram must be put into standard units, and the number of draws must be large.

Summary A confidence interval for the average can be found by going the right number of SEs either way from the average of the draws. The confidence level is read off the normal curve. This method should only be used with large samples. Once again, the formulas for simple random samples should not be applied to other kinds of samples. If a sample is not chosen by a probability method, it is called a sample of convenience. In such a case, the SE makes no sense.