Statistics for Social and Behavioral Sciences

Name: Statistics for Social and Behavioral Sciences
Uploaded: 2017-10-20T18:34:55+00:00
Duration: PTM11S37
Channel: Geoffrey Nichols
Description: Statistics for Social and Behavioral Sciences

Statistics for Social and Behavioral Sciences
Session #13: Central Limit Theorem, Estimation (Agresti and Finlay, Chapter 4) Prof. Amine Ouazad

Statistics Course Outline
Part I. Introduction and Research Design Part II. Describing data Part III. Drawing conclusions from data: Inferential Statistics Part IV. : Correlation and Causation: Regression Analysis Week 1 Weeks 2-4 Weeks 5-9 Key concepts: data Observations Statistics consists of a body of methods to obtain and analyze data. Design Description Inference Firenze or Lebanese Express is coming up next session Weeks 10-14 This is where we talk about Zmapp and Ebola!

Last Session A statistic is a random variable.
The distribution of a statistic is called its sampling distribution. In particular the mean of a variable in a sample is a statistic. The expected value of the sample mean is equal to the true mean. The standard deviation of the sample mean is called the standard error. Central Limit theorem: with a large sample size, the sampling distribution of the mean of X is normal, and the empirical rule applies. The standard error is sX / √N.

Standard Error: Exercise Margin of Error with a proportion Margin of Error with a continuous variable Sample Size needed to achieve a given MoE Estimation of a Parameter Sampling Distribution Point Estimate Confidence Interval Outline Next time: Probability Distributions (continued) Chapter 4 of A&F

Exercise: Compute the Margin of Error (Proportion)
The Rasmussen Poll interviewed 966 individuals. Assuming that the true fraction of individuals who will vote for Gardner is 50%, what is the margin of error? The Margin of Error is here two standard errors of the distribution. Is it close to the result reported by the website?

Central Limit Theorem Probability(Mean(VoteGardner)=m) With probability 95%, the estimated fraction of voters for Gardner will be between the true fraction standard deviations of the distribution. With some (low) probability the polling company will give a number ‘far’ over the true fraction of voters for Gardner With some (low) probability the polling company will give a number ‘far’ over the true fraction of voters for Gardner That is the probability that the reported fraction is equal to 30% m The reported fraction could be here, e.g. 30% True fraction of voters for Gardner Central Limit Theorem: With a large sample size, the sampling distribution of the mean(VoteGardner) is normal, and the empirical rule applies.

Important Point: N The previous graph is the sampling distribution for a given sample size N. Each point on the horizontal axis is the mean for a sample of given size N. In the Colorado Rasmussen example: N is around 1000. What is infinite is, in our thought experiment, the number of polls conducted. But we only observe one typically with a given methodology.

Central Limit Theorem The last remaining element is the standard deviation of the sampling distribution, also called the standard error. Noting sX the standard deviation of X, the sampling distribution of the mean of X has standard deviation: The standard deviation of the sampling distribution is called the standard error. It is a measure of sampling error. Finally what is sX? For a proportion, sX = √( p (1-p) ) , where p is the true value ⇒ see slide at the end.

Good news The most likely outcome is the true mean.
There is some probability that the reported mean will be far above or far below the true mean. But: With a large sample size, the probability that the sample mean is further than 2 standard errors from the true mean is 5%. The most likely outcome is the true mean. The mode of the sampling distribution is the true mean. The expected value of the reported mean is the true mean. The larger the sample size, the smaller the standard error.

Bad news We measure the sample mean, we know the sample size….
But we don’t know the true mean p. Without the true mean we cannot know what the sampling distribution is… we miss both the mean (p) and the standard deviation ( sX / √N ) (aka standard error) of that statistic. If we knew p, the true mean, there would be no need for a poll.

Margin of Error with a continuous X
In the sample of N=8,464 individuals, the sample mean of height was cm, and the sample standard deviation of height was . Assuming that the true standard deviation of height sX is the sample standard deviation of height sX, what is the standard error of the sampling distribution?

Distribution of X (height)
The distribution of X is not the sampling distribution. We observe 8,464 individuals, but only one mean of the sample. The sampling distribution is typically not visible on Stata (there is only one mean) Even if the distribution of X is not bell shaped, the sampling distribution will be bell shaped.

Central Limit Theorem Central Limit Theorem: with large sample size, the distribution of the sample mean is normal, with mean the true mean and with standard deviation (=standard error) equal to: Discrete X: Approximate sX = √( p (1-p) ) , where p is the true value, using the sample proportion for p. Continuous X: Approximate the true standard deviation sX using the sample standard deviation sX.

Sample Size Needed to Achieve a Given Margin of Error
The Margin of Error is a multiple of the standard error. The standard error is sX / √N. z . sX / √N In the poll example and in the height example we used z = 2 as it gives us that the probability that the sample mean is within mean +- z . sX / √N is 95%. To achieve a given MoE, find the sample size N such that: MoE = z . sX / √N which is N = (z . sX / MoE)2

Exercise: Achieving a MoE of 1 percentage point
Assuming that the true proportion of voters for Cory Gardner is 0.5, what sample size is needed to achieve a Margin of Error of 1 percentage point (i.e. MoE=0.01)? What do we learn from this?

For next Time: Back to Zomato
What statistical issue would preclude us from using the Central Limit Theorem? Assuming we can use the CLT, what is the Margin of Error on Cafe Firenze and Lebanese Express’s ratings? Think !!

Standard Error: Exercise Margin of Error with a proportion Margin of Error with a continuous variable Sample Size needed to achieve a given MoE Estimation of a Parameter Estimators Biased and Unbiased Estimator Efficient Estimator Outline Next time: Probability Distributions (continued) Chapter 4 of A&F

Thinking like a statistician
Ask an empirical question: What is average height in the US population? In other words, what is the true mean of height in the US population? Design the Study: Choose N and choose between simple random sampling, cluster sampling, stratified sampling. The Central Limit Theorem applies to Simple Random Sampling (SRS). Describe the data: What is the sample mean? What is the sample standard deviation? Make inferences: What is an estimate of the true mean height (i.e. the population mean height)? What is a confidence interval for the true mean height?

Parameters and their estimates
Parameters (« True » values) Estimator Population mean m Sample mean m Population median Sample median Population standard deviation sX Sample standard deviation sX. Population variance sX2 Sample variance sX2 Population p-th percentile Sample p-th percentile A given parameter may have multiple estimators. So far we have seen one estimator for each parameter.

Biased vs. Unbiased Estimator
An estimator is unbiased when the sampling distribution has mean equal to the true value, the parameter. The sample mean is unbiased as E(Mean(VoteGardner)) = p On average, the mean of an infinite number of polls taken the same day with the same methodology (but different samples) is the true fraction p of voters for Gardner.

Biased vs Unbiased Estimator
We have seen that to get the standard error of the sample mean, we need to have an estimate of sX. So far we have used: And the textbook has given: These are two different estimators of the same quantity sX. The textbook’s estimator of sX is unbiased.

Efficient vs Inefficient Estimator
Among all possible estimators, an estimator is efficient if it has the smallest standard error. The standard error of Is smaller than the standard error of The slides’ version is efficient, while the textbook’s version is unbiased. There is a conundrum.

Wrap up Central Limit theorem: with a large sample size, the sampling distribution of the sample mean of X is normal, and the empirical rule applies. The standard error is the standard deviation of the sampling distribution sX / √N. For a proportion: sX = √( p (1-p) ). As we typically do not observe the true proportion p, but the sample proportion p. For other variables: As we do not observe the true standard deviation sX but rather the sample standard deviation sX, we approximate the standard error by sX / √N. We are interested in estimating parameters, but we only observe statistics. Can we use statistics as estimators? Estimators can be unbiased, and efficient.

Coming up: Readings: Chapter 5 entirely – estimation, confidence intervals. Online quiz tonight. Recitation: exercises. Solution following. Deadlines are sharp and attendance is followed. For help: Amine Ouazad Office 1135, Social Science building Office hour: Tuesday from 5 to 6.30pm. GAF: Irene Paneda Sunday recitations. At the Academic Resource Center, Monday from 2 to 4pm.

Statistics for Social and Behavioral Sciences

Similar presentations

Presentation on theme: "Statistics for Social and Behavioral Sciences"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistics for Social and Behavioral Sciences

Similar presentations

Presentation on theme: "Statistics for Social and Behavioral Sciences"— Presentation transcript:

Similar presentations

About project

Feedback