Intro to Inference & The Central Limit Theorem. Learning Objectives By the end of this lecture, you should be able to: – Describe what is meant by the.

Slides:



Advertisements
Similar presentations
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Advertisements

Chapter 8: Estimating with Confidence
Psych 5500/6500 The Sampling Distribution of the Mean Fall, 2008.
Sampling Distributions
Using the Rule Normal Quantile Plots
Theoretical Probability Distributions We have talked about the idea of frequency distributions as a way to see what is happening with our data. We have.
Standard Normal Table Area Under the Curve
Inference: Confidence Intervals
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
Sampling distributions:. In Psychology we generally make inferences about populations on the basis of samples. We therefore need to know what relationship.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.
Sampling Distributions
The Sampling Distribution Introduction to Hypothesis Testing and Interval Estimation.
Chapter 6: Sampling Distributions
Objectives (BPS chapter 11) Sampling distributions  Parameter versus statistic  The law of large numbers  What is a sampling distribution?  The sampling.
A P STATISTICS LESSON 9 – 1 ( DAY 1 ) SAMPLING DISTRIBUTIONS.
Chapter 7 Sampling Distributions
1 Sampling Distributions Presentation 2 Sampling Distribution of sample proportions Sampling Distribution of sample means.
Chapter 5 Sampling Distributions
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Density Curves Normal Distribution Area under the curve.
Objectives (BPS chapter 11) Sampling distributions  Parameter versus statistic  The law of large numbers  What is a sampling distribution?  The sampling.
Sampling distributions BPS chapter 11 © 2006 W. H. Freeman and Company.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Sampling distributions for sample means IPS chapter 5.2 © 2006 W.H. Freeman and Company.
Probability, contd. Learning Objectives By the end of this lecture, you should be able to: – Describe the difference between discrete random variables.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Standard Deviation Z Scores. Learning Objectives By the end of this lecture, you should be able to: – Describe the importance that variation plays in.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
Please turn off cell phones, pagers, etc. The lecture will begin shortly. There will be a quiz at the end of today’s lecture. Friday’s lecture has been.
Sampling distributions BPS chapter 11 © 2006 W. H. Freeman and Company.
Sampling distributions for sample means
Stat 1510: Sampling Distributions
Chapter 7 Sampling Distributions Target Goal: DISTINGUISH between a parameter and a statistic. DEFINE sampling distribution. DETERMINE whether a statistic.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
1 Chapter 9: Sampling Distributions. 2 Activity 9A, pp
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 7 Sampling Distributions 7.1 What Is A Sampling.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 7: Sampling Distributions Section 7.1 What is a Sampling Distribution?
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
©2011 Brooks/Cole, Cengage Learning Elementary Statistics: Looking at the Big Picture 1 Lecture 7: Chapter 4, Section 3 Quantitative Variables (Summaries,
7.1 What is a Sampling Distribution? Objectives SWBAT: DISTINGUISH between a parameter and a statistic. USE the sampling distribution of a statistic to.
Chapter 7: Sampling Distributions Section 7.2 Sample Proportions.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Review Law of averages, expected value and standard error, normal approximation, surveys and sampling.
Sampling Distributions
Chapter 6: Sampling Distributions
Distribution of the Sample Means
Chapter 5 Sampling Distributions
Chapter 5 Sampling Distributions
Chapter 5 Sampling Distributions
Chapter 5 Sampling Distributions
Sampling distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
CHAPTER 11: Sampling Distributions
Chapter 5 Sampling Distributions
Chapter 5: Sampling Distributions
Standard Normal Table Area Under the Curve
Inference: Confidence Intervals
Density Curves Normal Distribution Area under the curve
Using the Rule Normal Quantile Plots
Intro to Inference & The Central Limit Theorem
Density Curves Normal Distribution Area under the curve
Using the Rule Normal Quantile Plots
Standard Normal Table Area Under the Curve
Presentation transcript:

Intro to Inference & The Central Limit Theorem

Learning Objectives By the end of this lecture, you should be able to: – Describe what is meant by the term ‘inference’. – Name two potential pitfalls that can cause us come up with false values for our population when doing inference calculations. – Explain what is meant by the term ‘distribution of sample means’. – Describe what distribution is found when a large series of sample means is obtained. – Define the Central Limit Theorem. – Calculate the mean and SD for a distribution of sample means (as opposed to of an individual sample).

Inference Inference is the process of taking information from a sample and using it to draw conclusions about the population. Example: In a sample of 100 undergraduate DePaul students, 38% of them identify themselves as Republicans. Our goal, of course is to use this 38% to infer the percentage of ALL DePaul undergraduates who identify themselves as Republican. The process of taking this value obtained from the sample and turning it into a prediction about the population is called inference. You will see that the answer is NOT simply to report 38%. We will begin our discussion of how to do inference and the proper way to report the results over the next couple of lectures.

Two important issues to bear in mind about inference: 1. Your sample is only ONE estimate. That is, if you randomly sampled again, you would get a different result.  Calculate the average height of 20 people. Then do it again – you will, of course, almost certainly get a somewhat different result.  The importance of this fact will become more clear over time. 2. Your estimate of the population is only as good as your sampling design. I.E. Do all you can to eliminate biases.  Trying to determine relationship between # beers and BAC by sampling a group of NFL athletes will not properly generalize to the population of all Americans.

Sampling variability Recall that your one sample is only one estimate of the true value. IMPORTANT: By ‘true’ value, we mean the value of the actual population. Recall that the population value is the piece of information that we are interested in. Every time we take a random sample, we are going to get a different set of individuals and, therefore, will obtain a different value. This concept is called sampling variability. Recognizing that no ONE PARTICULAR sample reliably gives you the “true” (i.e. population) value, since all samples will likely be different, is a key concept to keep in mind when doing inference calculations.

Distribution of MANY (i.e. repeated) samples: There is an interesting and very important property to sampling variability:  If you’ve already forgotten what sampling variablity is, that’s okay.  However, go back and review it!  Important Property to Keep in Mind: Suppose we were to take MANY random samples (of the same size) from a given population and record the mean each time. If we plotted all of those means on a histogram and drew a density curve, we would encounter a very familiar distribution! Can you guess which distribution??? Hint: It rhymes with ‘Schnormal’. All of statistical inference is based on this fact: The ‘distribution of sample means’ follows a Normal distribution.

If we take repeated samples and calculated the mean of each, the distribution of all of those means would be approximately Normal. This distribution seen here is an example of a sampling distribution of sample means. Note that the mean of this distribution turns out to be the true (i.e. population) mean.

The what of the who??? Be sure to understand what is meant by the ‘Distribution of Sample Means’… Restated: The sampling distribution of sample means refers to the distribution we would find if were to take many, many samples and calculate the mean of each sample.

Central Limit Theorem As we have just said (repeatedly): If you were to plot the distribution of sample means and draw a density curve, you would quickly find that the distribution of all of those means is Normal. This leads us to one of the most important concepts in an introductory statistics course: The distribution of sample means is always Normal, Even if the original dataset was NOT Normal! This (very) important property is known as the:

Example: 1. A sample of the incomes of 100 people on the street, you would have one result for the mean. Of course, this is only one sample. If we repeat this sample again, we’d almost certainly obtain a different result for the mean. 2. If we repeat this sample again, we will have 2 different means. If we repeat this sample 100 times, we will have 100 results. As we have discussed, we call this a sampling distribution of sample means. 3. If we plot these 100 means on a histogram, we would find that the distribution of all of these values is approximately Normal. 4. IMPORTANT: Note that income is NOT Normally distributed (it is skewed). This is one of the “powerful” aspects of the Central Limit Theorem: The distribution of means is Normal even if the original dataset was not!  I am well aware that this isn’t exactly blowing your socks off, however, statistically, it turns out to have some very important ramifications.

Who cares?  Are you impressed yet? Okay, I agree that the fact that the sampling distribution of sample means is always Normal may not seem like an earth-shattering revelation. However, there are some aspects to it that end up allowing us to use very powerful statistical tools down the road.  The key one to bear in mind for now, is the idea that the central limit theorem applies even when the original dataset is NOT Normal.  For example, suppose we looked at the distribution of 100 incomes. This distribution would be right-skewed.  Now suppose we took the mean of those 100 incomes. Then we took another sample of 100 incomes and calculated that mean. Then we repeated this process a few hundred times. If we plotted all of those means on a histogram, the distribution would be Normal.  In other words, the distribution of sample means is Normal even when the original dataset itself was not Normal.

Example of the Central Limit Theorem Distribution of EVERY CallDistribution of 500 samples (80 calls in each sample) 1.The lengths of phone calls at a call center is right skwewd. 2.The graph on the left is a record of thousands and thousands of phone calls. 3.We take a sample of 80 phone calls and calculate the mean length. Then we repeat with another sample of 80 phone calls and calculate the mean length. We repeat for 500 samples. 4.Note how when we look at the distribution of these 500 phone calls, the distribution is Normal.

Sampling distribution of x bar  √n√n For any population with mean  and standard deviation  :  The mean, or center of the sampling distribution is equal to the population mean  x .  The standard deviation of the sampling distribution is  x  =  / √n. Calculation of Mean and SD of sampling distribution :

 Mean of a sampling distribution: There is no tendency for a sample mean to fall systematically above or below  even if the distribution of the raw data is skewed. Thus, the mean of the sampling distribution is an unbiased estimate of the population mean  — it will be “correct on average” in many samples. Key point: Mean of a Sample Distribution = Mean of the population  Standard deviation of a sampling distribution: The standard deviation of the sampling distribution measures how much the sample statistic varies from sample to sample. This sd is smaller than the standard deviation of the population by a factor of √n. Key Point: SD of a Sample Distribution = SD of the population / square root of n But isn’t this backward??? Don’t we typically start with a sample and from there try to infer about the population? In a word: Yes! At the moment, this is backwards. For the time being, if I were to ask you to tell me the mean or SD of a sampling distribuiton on quizzes/exams, I would have to give you the population mean/SD. I realiize that this may seem ridiculous since the whole point of statistical sampling is to DISCOVER the population values since we don’t know them yet!! At the moment, however, we are doing things this way to help us understand the theory of how things work. If/when you progress with stats, you will learn how to get around this seemingly backwards way of doing things.

Restated: Population Sampling distribution If the population is N(  ) then the sample means distribution is N(  /√n).

Example: In a large population of adults, the mean IQ is 112 with standard deviation 20. Suppose 200 adults are randomly selected for a market research campaign.  What is the distribution of the sample means? A) Normal, mean 112, standard deviation 20 B) Normal, mean 112, standard deviation 20 C) Normal, mean 112, standard deviation D) Unable to Determine C) Approximately normal, mean 112, standard deviation Population distribution : N(  = 112;  = 20) Sampling distribution for n = 200 is N(  = 112;  /√n = 1.414) KEY POINT: Note that the question asks for the distribution of the sample means.

Example Using our example from earlier: In a sample of 100 DePaul students, 38 of them identify themselves as Republicans.  What is the distribution of the sample means? A) Normal, mean 38, standard deviation 3 B) Normal, mean 38, standard deviation 3 C) Normal, mean 38, standard deviation 0.3 D) Unable to Determine D) Unable to determine Sampling distribution for N(  = population mean;  /√n = pop SD/sqrt(100) Population distribution : N(  = ???;  = ????) We are missing two key pieces of information (the population mean and SD), so we can’t answer! Again, I recognize that being provided with the population mean and SD seems backward. In subsequent stats study, you will learn how to get around the need for these two pieces of information.