Segment 4 Sampling Distributions - or - Statistics is really just a guessing game George Howard.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Estimation in Sampling
Chapter 10: Estimating with Confidence
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
Statistics and Quantitative Analysis U4320
Segment 3 Introduction to Random Variables - or - You really do not know exactly what is going to happen George Howard.
Central Limit Theorem.
Sociology 601: Class 5, September 15, 2009
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Introduction to Statistics: Chapter 8 Estimation.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
1 Business 90: Business Statistics Professor David Mease Sec 03, T R 7:30-8:45AM BBC 204 Lecture 22 = More of Chapter “Confidence Interval Estimation”
Chapter 8 Estimation: Single Population
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
Fall 2006 – Fundamentals of Business Statistics 1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 7 Estimating Population Values.
Chapter 7 Estimation: Single Population
CHAPTER 8 Estimating with Confidence
Chapter 7 Estimating Population Values
Inferential Statistics
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics, A First Course.
Standard error of estimate & Confidence interval.
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Confidence Interval Estimation Statistics for Managers.
Review of normal distribution. Exercise Solution.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 7 Estimation: Single Population
Confidence Interval Estimation
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
Chapter 11: Estimation Estimation Defined Confidence Levels
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Confidence Intervals (Chapter 8) Confidence Intervals for numerical data: –Standard deviation known –Standard deviation unknown Confidence Intervals for.
CHAPTER 8 Estimating with Confidence
Populations, Samples, Standard errors, confidence intervals Dr. Omar Al Jadaan.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
© 2003 Prentice-Hall, Inc.Chap 6-1 Business Statistics: A First Course (3 rd Edition) Chapter 6 Sampling Distributions and Confidence Interval Estimation.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Lecture 14 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Chap 6-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 6 Introduction to Sampling.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Statistical estimation, confidence intervals
Determination of Sample Size: A Review of Statistical Theory
Chap 7-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 7 Estimating Population Values.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
Estimation of a Population Mean
+ DO NOW. + Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population.
Introduction to Statistical Inference Jianan Hui 10/22/2014.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Chap 7-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 7 Estimating Population Values.
The Single-Sample t Test Chapter 9. t distributions >Sometimes, we do not have the population standard deviation. (that’s actually really common). >So.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Math 3680 Lecture #15 Confidence Intervals. Review: Suppose that E(X) =  and SD(X) = . Recall the following two facts about the average of n observations.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 5 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
Lesoon Statistics for Management Confidence Interval Estimation.
1 Probability and Statistics Confidence Intervals.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics: A First Course 5 th Edition.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
10.1 Estimating with Confidence Chapter 10 Introduction to Inference.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Chapter 8 Confidence Interval Estimation Statistics For Managers 5 th Edition.
Statistics for Business and Economics 7 th Edition Chapter 7 Estimation: Single Population Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Confidence Intervals.
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Presentation transcript:

Segment 4 Sampling Distributions - or - Statistics is really just a guessing game George Howard

Statistics as organized guessing One of the two major tasks in statistics is “estimation” (the technical term for guessing) Suppose that there is some huge group of people (or whatever we are studying) The huge group is called the universe This population arises from some distribution –We have talked about arising from either binomial or normal –Then this large population can be described by parameters p for the binomial μ and σ for the normal –Our task is to estimate (guess) the parameters

How do we estimate the parameters? Approach 1: measure everyone –Advantages You will get the correct answer –Disadvantages Expensive Impractical Approach 2: estimation –Take a sample of the big group and try to guess –That is: we guess at the parameters in the universe by using estimates from a sample

Characteristics of Estimates Expectation –We take an sample and produce estimates –We take another sample and produce estimates again –We will get different answers Consider the most simple example, estimating the mean of a normal distribution (μ)

Suppose that we draw a sample of 20 individuals from a N(80,5) In this sample we use the formulas from previous lectures to get: Estimated mean = 77.5 Estimated SD = 4.7 Hence, we are “pretty close” to guessing the correct mean and standard deviation But what happens if we draw another sample?

Estimated mean and SD of 10 samples, each with 20 observations from a N(80,5) (mean, standard deviation) of the sample (77.5, 4.7) (82.4, 5.7)(81.3, 4.8)(80.1, 6.1) (78.6, 5.3) (79.3, 3.8)(80.6, 4.5)(80.2, 5.4)(79.5, 6.3)(79.1, 5.4)

Summary of 10 samples of 20 individuals from N(80,5) For each sample –Mean was “close” to 80 –Standard deviation was “close” to 5 But remember that we are interested in estimating the mean of the “universe” What about the distribution of the sample means? –The means we observed were: 77.5, 82.4, 81.3, 80.1, 78.6, 79.3, 80.6, 80.2, 79.5, and 79.1 –What does the distribution of these look like?

Mean and Standard Deviation of the Means Estimated from the 10 Samples The mean of the means = 79.9, The standard deviation of the means = 1.4

Considering the means of the 10 samples of 20 patient drawn from N(80,5) So across the means of the 10 samples –Have a mean very close to 80 –Have a standard deviation much smaller than 5 This follows common sense, if data are coming from a normal distribution –The mean of repeated samples will be the mean of the universe –There will be less variation between the means than there is in the data What determines the SD of the means?

But what happens if the sample size or standard deviation changes? 200 Replicate Samples of size n taken from N(80,SD) n=10 n=100 n=1000 SD=5 SD=10 Mean=79.9 Mean=80.0 Mean=80.0 SD=1.6 SD=0.5 SD=0.1 Mean=80.2 Mean=80.0 Mean=80.0 SD=3.3 SD=0.9 SD=0.3

The Estimation of Parameters from a N(80,5) The mean of the estimated means across samples will be the same as the mean of the universe –If a estimate of a parameter is correct on average, then we call it an unbiased estimator The standard deviation of the estimated means is smaller than the standard deviation of the population –But increases with the standard deviation of the universe –Decreases with the sample size

The Standard Deviation of the Estimated Mean A “good” estimate of the mean should be unbiased and stable (that is, correct on average and would not change much if the experiment is repeated) ANY estimate has variation between repeated experiments, and “good” estimates will have small standard deviations across repeated experiments Estimates with low variability are called reliable (and the estimates with the smallest variation are sometimes called minimum variance estimators) In general we do not repeat experiments, so how can I know what the standard deviation of the estimate would be if I did repeat the experiment?

The Standard Deviation of the Estimated Mean The estimated standard deviation of the mean (if the experiment were repeated) is called the Standard Error (of the Mean) Every estimate has a standard error The formula for the standard error of the mean is:

The Standard Error From the very first sample we drew,  = 77.5 and s =4.7 Then the estimated standard error from this individual sample is SE = 4.7 / sqrt(20) = 1.1 The standard deviation of estimated mean from the 10 samples was 1.4 These are estimating the same parameter, and are pretty close together But using the formula allows estimating the standard error without repeating the experiment

Confidence Limits on the Mean Remember from the previous lecture that 95% of observation are from within approximately 2 SD of the mean I lied, but you can use the Normal Table (handout) to see 95% is between and 1.96 So if we know μ and σ we can calculate a range that will include 95% of the estimated means

Confidence Limits on the Mean In the case of our British soldiers N(80,5), then if we are taking samples of 20 soldiers and calculating the mean, 95% of the estimated means should be between Or between = 77.8 and = 82.2 So if we repeat the experiment a large number of times, 95% of the means will be between 77.8 and 82.2

Well, that is interesting, but it is even hard to think of a case were we have μ and σ What happens if we substitute and s for μ and σ First, we have to pay a small penalty for the “extra” uncertainty introduced by using estimates instead of parameters (the t- distribution) Table at the right is the t with in each tail (just the same as we used from the normal table) and is a Table in the book We need to think about the interpretation Confidence Limits on the Mean df (n-1)t n ∞ 1.96

Confidence Limits on the Mean From the first sample –Estimated mean = 77.5 –Estimated standard deviation = 4.7 –Sample size 20 95% confidence limits on the estimated mean

Interpretation of the Confidence Limits on the Estimated Mean The 95% confidence limits are now no longer centered on the mean from the universe, but the estimated mean from the sample –We should not expect 95% of the means to fall in this range (but rather the range centered on the true mean) –Common (and slightly incorrect) interpretation: “I am 95% sure that the true mean is in this range” –The technically correct interpretation of 95% confidence limits is “If I were to repeat the experiment a large number of times, and calculate confidence limits like this from each sample, 95% of the time they would include the true mean”

Printout Examples Simple description (PROC MEANS) of systolic blood pressure and c-reactive protein in the REGARDS Study

Printout Examples Detailed description (PROC UNIVARIATE) of systolic blood pressure and c- reactive protein in the REGARDS Study Page 1 of 6

Printout Examples Detailed description (PROC UNIVARIATE) of systolic blood pressure and c- reactive protein in the REGARDS Study Page 2 of 6

Printout Examples Detailed description (PROC UNIVARIATE) of systolic blood pressure and c- reactive protein in the REGARDS Study Page 3 of 6

Printout Examples Detailed description (PROC UNIVARIATE) of systolic blood pressure and c- reactive protein in the REGARDS Study Page 4 of 6

Printout Examples Detailed description (PROC UNIVARIATE) of systolic blood pressure and c- reactive protein in the REGARDS Study Page 5 of 6

Printout Examples Detailed description (PROC UNIVARIATE) of systolic blood pressure and c- reactive protein in the REGARDS Study Page 6 of 6

General Confidence Limit Thoughts The estimate for any parameter from any distribution has a standard error 95% confidence limits can be calculated on estimates from any parameter General form: estimate - (dist area)(SE) < x < estimate + (dist area)(SE) This is really, really important … you will see this many, many times in this course

Can We Use this Approach in the Binomial Distribution? For example, suppose we have data coming from the binomial distribution with n = 200 We take a sample and observe 40 “events” We want to estimate the parameter p Not surprising that the estimate of p is Then the estimated p = 40/200 = 0.20

Can We Use this Approach in the Binomial Distribution? But as noted above, every estimate must have a standard error If the sample size (n) is “big,” then in the case of the estimated proportion from a binomial, the standard error is:

So What Does the Standard Error of a Binomial Look Like?

Can we calculate 95% confidence limits on the estimated proportion? Use exactly the same approach estimate-(dist area)(SE) < x < estimate+(dist area)(SE) But what probability should be use? –If n is large, then there is no real difference between z α/2 and t α/2, n so just use z 0.05/2 =1.96

Can we calculate 95% confidence limits on the estimated proportion? So most folks would say that we are 95% sure that the true proportion is between and This is (slightly) wrong Really, if we repeated the experiment a large number of times, and calculated confidence limits on the estimated proportion this way each time, then these confidence limits would include the true proportion 95% of the time

Important Points in Closing Half of what statistics is useful for is estimation –Given a distribution (the universe) with parameters –We take a sample and make estimates (of the parameters) –Some estimates are good, some are bad Unbiased (correct on average) Reliable (measured by standard error of estimates) –95% confidence limits on estimated parameters can be made using the general approach estimate - (dist area)(SE) < x < estimate + (dist area)(SE) –We did this for the estimated mean from a normal and the estimated proportion from a binomial

Where Have we Been Working in the “Big Picture” 1 Estimate proportion (and confidence limits) 8 Estimate mean (and confidence limits)