Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
Statistics and Quantitative Analysis U4320
Chapter 7 Introduction to Sampling Distributions
Suppose we are interested in the digits in people’s phone numbers. There is some population mean (μ) and standard deviation (σ) Now suppose we take a sample.
Point estimation, interval estimation
Chapter 7 Sampling and Sampling Distributions
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Chapter 8 Estimation: Single Population
Chapter Sampling Distributions and Hypothesis Testing.
Sampling Distributions
Chapter 7 Estimation: Single Population
BCOR 1020 Business Statistics
Chapter 11: Random Sampling and Sampling Distributions
Inferential Statistics
Statistical Analysis – Chapter 4 Normal Distribution
Probability and the Sampling Distribution Quantitative Methods in HPELS 440:210.
Chapter 7 Estimation: Single Population
Conditions Required for a Valid Large- Sample Confidence Interval for µ 1.A random sample is selected from the target population. 2.The sample size n.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 9 Section 1 – Slide 1 of 39 Chapter 9 Section 1 The Logic in Constructing Confidence Intervals.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Sampling Distributions
AP Statistics Chapter 9 Notes.
Topic 5 Statistical inference: point and interval estimate
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
1 SAMPLE MEAN and its distribution. 2 CENTRAL LIMIT THEOREM: If sufficiently large sample is taken from population with any distribution with mean  and.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
LECTURE 16 TUESDAY, 31 March STA 291 Spring
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Measures of Dispersion CUMULATIVE FREQUENCIES INTER-QUARTILE RANGE RANGE MEAN DEVIATION VARIANCE and STANDARD DEVIATION STATISTICS: DESCRIBING VARIABILITY.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5.
Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
Chapter 7 Estimation Procedures. Basic Logic  In estimation procedures, statistics calculated from random samples are used to estimate the value of population.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
6 - 1 © 1998 Prentice-Hall, Inc. Chapter 6 Sampling Distributions.
Determination of Sample Size: A Review of Statistical Theory
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Inferential Statistics Part 1 Chapter 8 P
Chapter 10: Introduction to Statistical Inference.
Chapter 7 Sampling Distributions. Sampling Distribution of the Mean Inferential statistics –conclusions about population Distributions –if you examined.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.
Introduction to Inference Sampling Distributions.
Sampling Distributions & Sample Means Movie Clip.
6 - 1 © 2000 Prentice-Hall, Inc. Statistics for Business and Economics Sampling Distributions Chapter 6.
Sampling Distributions Chapter 18. Sampling Distributions If we could take every possible sample of the same size (n) from a population, we would create.
Sampling Distribution Models and the Central Limit Theorem Transition from Data Analysis and Probability to Statistics.
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Chapter 7, part D. VII. Sampling Distribution of The sampling distribution of is the probability distribution of all possible values of the sample proportion.
Sampling Distributions Chapter 18
Chapter 6: Sampling Distributions
Introduction to Inference
Sampling Distributions
Normal Distribution and Parameter Estimation
Inference: Conclusion with Confidence
Behavioral Statistics
Chapter 7 Sampling Distributions.
MATH 2311 Section 4.4.
Chapter 7 Sampling Distributions.
Chapter 7 Sampling Distributions.
Warmup Which of the distributions is an unbiased estimator?
Chapter 7 Sampling Distributions.
Chapter 5: Sampling Distributions
How Confident Are You?.
Presentation transcript:

Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran

Sampling A. Basics 1. Ways to Describe Data Histograms Frequency Tables, etc. 2. Ways to Characterize Data Central Tendency Mode Median Mean Dispersion Variance Standard Deviation

Sampling (cont.) 3. Probability of Events If Discrete Rely on Relative Frequency If Continuous Rely on the distribution of events Example: Standard Normal Distribution 4. Samples We can take a sample of the population and make inferences about the population. 5. Central Question How well does the sample represent the underlying population?

Sampling (cont.) B. Random Sampling 1. Problems with Sample Bias The way we collect our data may bias our results. That is, the average response in our sample may not represent the average response in the whole population. Examples: Literary Digest Phone Book Poll Primaries Relation between economic growth and education looking only at OECD countries 2. Solution Random Sampling

Sampling (cont.) C. Moments of the Sample 1. Characteristics of Sample Mean

Sampling (cont.) Example Draw a single observation

Sampling (cont.) Draw two observations

Sampling (cont.) Draw 4 Observations

Sampling (cont.) 2. Generalization Every sample has an expected mean of. But as our sample size increases, we are more confident of our results. That is, the standard deviation (or standard error as we will call it) of our results is decreasing. So as N increases,

Sampling (cont.) 3. Hat Experiment Mean = 10.5 Standard deviation  = 5.77 Now let's take a sample of size 1. (With replacement.) Now one of size 2. Now one of size 6.

Sampling (cont.) 4. Equations For a sample of size n from a population of mean and standard deviation, the sample mean has: SE( ): it's called the standard error of the sampling process.

Inference We make inferences about a population from a given sample. A. Population and Sampling Parameters We have a population with parameters and. We then take a sample with parameters and s. We want to know how well the sample mean approximates the population mean.

Inference (cont.) On average the sample mean equals the population mean.

Inference (cont.) B.Referring Back to the Hat Experiment 1. Sample Error decreases as n increases For instance, before we drew samples of sizes 1, 2, and 6 from the hat. The first sample of size 1 had standard error 5.77/ 1 = The second sample of size 2 had standard error 5.77/ 2 = The third sample of size 6 had standard error 5.77/ 6 = 2.36.

Inference (cont.) C. Shape of the Sampling Distribution If you take a sample and find its mean, then take another sample and find its mean and repeat this process a large number of times then is a random variable with its own mean and standard error.

Inference (cont.) 1. Central Limit Theorem Take a large number of samples, then, the sample mean is normally distributed with mean and standard error.

Inference (cont.) 2. Example: 3 different distributions Example 1; A population of men on a small, Eastern campus has a mean height =69" and a standard deviation =3.22". If a random sample of n=10 men is drawn, what is the chance that the sample mean will be within 2" of the population mean?

Inference (cont.) Answer: From the Central Limit Theorem, we know that is normally distributed, with mean 69 and standard error:

Inference (cont.) Answer (cont.) Find z-score P(Z>1.96) = Since there are two tails, the area in the middle is: So there's a 95% probability that the sample mean falls between 67 and 71.

Inference (cont.) Example 2: Suppose a large class in statistics has marks normally distributed around  = 72 with  = 9. Find the probability that a) An individual student drawn at random will have a mark over 80.

Inference (cont.) Answer: The Z-score is (80-72)/9 =.89 Looking this up in the table gives P(Z>.89) =.187, or about 19%. b) Now, what's the probability that a sample of size 10 has an average of over 80?

Inference (cont.) Answer: The standard error is = 9/ 10 = So the Z-Score becomes (80-72)/2.85 = P(Z> 2.81) =.002.

Inference (cont.) Example 3: I f the number of miles per gallon achieved by all cars of a particular model has  = 25 and  = 2, what is the probability that for a random sample of 20 such cars, average miles per gallon will be less than 24? (assume that the population is normally distributed.) Step 1: Standardize X

Inference (cont.) Step 2: Then Find the Z scores (From the standard Normal tables) So there is about a 1.3 percent chance that from a sample of 20 the average will be less than 24.

Inference (cont.) D. Proportions 1. Proportions as Means A proportion (P) is just the mean of a dichotomous variable. Example Ask 50 people what they think of Clinton; 0 if think he's doing a poor job; and 1 if think he is doing a good job. Suppose 30 of the 50 respondents say he's doing a good job Then, the sample mean P is 30/50 =.60. This is just another way of saying that 60% of those surveyed approved of his job performance.

Inference (cont.) 2. Formula for Standard Error For a large enough sample of size n, P (the proportion) will be normally distributed with mean and standard deviation . Population Mean  = Population Proportion  Sample Mean = Sample Proportion P Population SD  =

Inference (cont.) 3. Example: Polling Suppose that the true approval rating for Clinton is.50. That is, 50 percent of the population believe he is doing a good job.  =.5 If we sample 50 people, what is the probability that we will observe an approval rating as high as 60 percent or above?

Inference (cont.) We know that the true population mean is =.5, The Standard Error = = Then the Z-score is (.6-.5) / = Looking this up in the Z-table, P(Z>1.414) =.079, or about 8 %.

Inference (cont.) 4. Example Of your first 15 grandchildren, what is the chance that there will be more than 10 boys?

Inference (cont.) Answer: What the probability is that the proportion of boys is at least 10/15=2/3. We know that the population mean is =1/2, The standard error = Then the Z-score is ( ) / = Looking this up in the table, P(Z>1.29) =.099, or about 10%.

Point Estimation: Properties A. Unbiased Estimators When an estimator has the property that it converges to the correct value, we say that it is unbiased.

Point Est. Properties (cont.) B. Efficient Estimators Def of Efficient: One estimator is more efficient than another if its standard error is lower.

Point Est. Properties (cont.) C. N-1 Problem 1. Known When we take a sample of size n, if we had the real from the population, we could calculate Then there wouldn't be a problem; would be a consistent estimator of, if we knew.

Point Est. Properties (cont.) 2. Unknown But we usually don't have, so we have to use the sample mean instead. What's the difference? Why don't we just say that It turns out that we can show that minimizes the expression.

Point Est. Properties (cont.) 2. Unknown (cont.) So if we used instead, then, the expression would be bigger. The right way to correct for this is to multiply by, so The bottom line is that we use n-1 to make a consistent, unbiased estimate of the population variance.

IV. Review Homework