F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Chapter 6 Sampling and Sampling Distributions
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Estimation in Sampling
Sampling: Final and Initial Sample Size Determination
Sampling Distributions
Business Statistics for Managerial Decision
Chapter 10: Sampling and Sampling Distributions
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
THE MEANING OF STATISTICAL SIGNIFICANCE: STANDARD ERRORS AND CONFIDENCE INTERVALS.
Topics: Inferential Statistics
Chapter 7 Sampling and Sampling Distributions
Quantitative Methods – Week 6: Inductive Statistics I: Standard Errors and Confidence Intervals Roman Studer Nuffield College
Topic 2: Statistical Concepts and Market Returns
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
Class notes for ISE 201 San Jose State University
Chapter 11: Inference for Distributions
Inferences About Process Quality
5-3 Inference on the Means of Two Populations, Variances Unknown
Inferential Statistics
9. Statistical Inference: Confidence Intervals and T-Tests
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Section #4 October 30 th Old: Review the Midterm & old concepts 1.New: Case II t-Tests (Chapter 11)
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Chapter 11: Estimation Estimation Defined Confidence Levels
Topic 5 Statistical inference: point and interval estimate
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 10. Hypothesis Testing II: Single-Sample Hypothesis Tests: Establishing the Representativeness.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Chapter 11 – 1 Chapter 7: Sampling and Sampling Distributions Aims of Sampling Basic Principles of Probability Types of Random Samples Sampling Distributions.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Chapter 7: Sampling and Sampling Distributions
Sampling Distribution and the Central Limit Theorem.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Section 10.1 Confidence Intervals
5.1 Chapter 5 Inference in the Simple Regression Model In this chapter we study how to construct confidence intervals and how to conduct hypothesis tests.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Chapter 7 Statistical Inference: Estimating a Population Mean.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 5 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Sampling and Sampling Distributions. Sampling Distribution Basics Sample statistics (the mean and standard deviation are examples) vary from sample to.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Sampling and Sampling Distributions
Statistical Inference
ESTIMATION.
Chapter 4. Inference about Process Quality
Chapter 8: Inference for Proportions
Statistics in Applied Science and Technology
CONCEPTS OF ESTIMATION
Sampling Distributions
Presentation transcript:

F OUNDATIONS OF S TATISTICAL I NFERENCE

D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire population using data from a subset, or sample, of that population. Simple random sampling is a sampling method which ensures that every combination of n members of the population has an equal chance of being selected.

Statistical Inference The process of making guesses about the truth about a population parameter from a sample statistic. Sample (observation) Make guesses about the whole population Truth (not observable) Population parameters Sample statistics *hat notation ^ is often used to indicate “estimate”

A sampling distribution is the distribution of sample statistics computed on the set of all possible random samples of size n that could be drawn from a population. Most experiments are one-shot deals. So, how do we know if an observed effect from a single experiment is real or is just an artifact of sampling variability (chance variation)? Probability distributions important here. Because they form the basis of describing the distribution of a sample statistic. Sampling Distributions

Statistical Inference is based on Sampling Variability Sample Statistic – we summarize a sample into one number; e.g., could be a mean, a difference in means or proportions, an odds ratio, or a correlation or regression coefficient – E.g.: Average support for gun control among women and men. – E.g.: Proportion of women and men who supported the war in Iraq. Sampling Variability – If we could repeat an experiment many, many times on different samples with the same number of subjects, the resultant sample statistic would not always be the same (because of chance!). Standard Error – a measure of the sampling variability. It is the standard deviation of the sampling distribution.

For large enough sample sizes, the shape of the sampling distribution will be approximately normal. The sampling distribution is centered on , the mean of the population. The standard deviation of the sampling distribution can be computed as the population standard deviation divided by the square root of the sample size.

Examples of Sample Statistics: Single population mean μ (known population standard deviation  ) Single population mean μ (unknown population standard deviation  ) Single population proportion p Difference in means μ 1,μ 2 (t-test) Difference in proportions p 1,p 2 (Z-test) Odds ratio/risk ratio Correlation coefficient Regression coefficient …

The Central Limit Theorem: If all possible random samples, each of size n, are taken from any population with a mean  and a standard deviation , the sampling distribution of the sample means (averages) will: 1. Have mean: 2. Have standard deviation (also called standard error for sampling distribution): 3. Be approximately normally distributed regardless of the shape of the parent population (normality improves with larger n).

Symbol Check The mean of the sample means. The standard deviation of the sample means. Also called “the standard error of the mean.”

I NTUITIVE T REATMENT OF S AMPLING D ISTRIBUTION Suppose we have a population of size 100. We then draw a sample of 100 people from the population of 100. We then compute the mean. How confident could we be about the computed sample statistic? How much sampling error would there be? Suppose we have a population of size 100. We then draw every sample of size 99 from this population. We compute means for all of these samples. How many different samples could we draw? C =100? How much sampling error would there be in the computed means? Suppose we have a population of size 100. We then draw a sample of 50 people from the population of 100. We then compute the means on each sample. How many different samples could we draw? C =1.089X How much sampling error would there be in the computed means? The principle is that the larger the sample size, relative to the population we are drawing from, the lower the sampling error. The smaller the sample size, relative to the population we are drawing from, the larger the sampling error.

Null Region Alternative Region

H YPOTHESIS T ESTING USING THE N ORMAL (Z) DISTRIBUTION Calculate the estimated statistic from the sample. Record the sample standard deviation  and N. Then calculate the standard error of the sampling distribution from the preceding. Then calculate Z Compare the calculated value for Z to the table of Z statistics.

E XAMPLE :. Suppose we draw a sample with mean, variance, and N as follows: How confident could we be that the mean was not actually 10 (the a null hypthesis). We might then ask how many standard deviations (Z units) away 12.5 is from 10. We can then calculate a p value from the Z-statistic. Using the preceding table, there is only a.0016 chance that with a sample of size 50 and variance 36 we could have drawn a sample with mean 12.5 when the actual population mean was 10.

E XAMPLE : With the NES92, we draw a sample of 1500 respondents. On the variable, liking for Clinton we find a mean of 4.1 with a variance of 1.6. What is the probability that the real liking for Clinton in the population is only 3, rather than the calculated 4.1? Using the earlier table, the probability is less than that the real liking for Clinton is 3.0. What factors determine this probability? 1) The magnitude of the hypothesized difference(the numerator) 2) The variance of the sample (1.6) 3) The N of the sample (1500) Note that we can also think of these three quantities as distances in standard deviation units on the sampling distribution. See slide 13 again.

T HE C ONFIDENCE I NTERVAL A PPROACH Let UCL and LCL refer respectively to upper and lower confidence limits. Let μ be the estimated parameter. Let Z be the Z-statistic associated with the desired p-value. Let σ e be the standard error. Then, calculate the confidence limits as follows.

E XAMPLE : Construct a 99 percent confidence interval around the point estimate 12.5 from the preceding example with the given information. The interval does not contain zero. Therefore, we can be at least 99 percent confident the estimated mean is not zero. It also does not contain 10, so we can be at least 99 percent confident that the true estimate is not 10.

U SING THE T - DISTRIBUTION In actuality, we seldom know the population variance or standard deviation. Under these circumstances we use the t distribution, rather than the Z (normal distribution) for our tests of significance. Unlike the Z distribution of which there is only one, there are many t distributions. One for each possible degree of freedom for the test. (Degrees of freedom refer to N minus the number of parameters estimated.) Note, however that as N becomes large, say 100, the t distribution equals the z distribution. The t-distribution is used in precisely the same way as the Z in conducting the preceding tests. Simply substitute in the numbers for the t-distribution where you have the numbers for the Z distribution. The t-distribution takes into account that we do not have full information about the population variability. With small N, the t- distribution is somewhat more conservative than the Z. It gives the same answer if N is larger than about 1,000. It is also quite close when N is larger than about 100. See the next table.

T HE P- VALUE The p-value is the probability that we would have observed our sample statistic (or something more unexpected) just by chance if the null hypothesis (null value) is true. For example, we might estimate as above 12.5, but posit a null value of 10. Small p-values mean the null value is unlikely given our data.