Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics for Social and Behavioral Sciences Session #14: Estimation, Confidence Interval (Agresti and Finlay, Chapter 5) Prof. Amine Ouazad.

Similar presentations


Presentation on theme: "Statistics for Social and Behavioral Sciences Session #14: Estimation, Confidence Interval (Agresti and Finlay, Chapter 5) Prof. Amine Ouazad."— Presentation transcript:

1 Statistics for Social and Behavioral Sciences Session #14: Estimation, Confidence Interval (Agresti and Finlay, Chapter 5) Prof. Amine Ouazad

2 Statistics Course Outline P ART I. I NTRODUCTION AND R ESEARCH D ESIGN P ART II. D ESCRIBING DATA P ART III. D RAWING CONCLUSIONS FROM DATA : I NFERENTIAL S TATISTICS P ART IV. : C ORRELATION AND C AUSATION : R EGRESSION A NALYSIS Week 1 Weeks 2-4 Weeks 5-9 Weeks 10-14 This is where we talk about Zmapp and Ebola! Firenze or Lebanese Express now

3 Last 2 Sessions A statistic is a random variable. The distribution of a statistic is called its sampling distribution. In particular the mean of a variable in a sample is a statistic. The expected value of the sample mean is equal to the true mean. The standard deviation of the sample mean is called the standard error. Central Limit theorem: with a large sample size, the sampling distribution of the mean of X is normal, and the empirical rule applies. The standard error is  X / √N.

4 Last 2 Sessions For a proportion (X is 0,1):  X = √(  (1-  ) ). As we typically do not observe the true proportion , but the sample proportion p. For other variables (X is not 0,1): As we do not observe the true standard deviation  X but rather the sample standard deviation s X, we approximate  X by s X and thus approximate the standard error by s X / √N. We are interested in estimating parameters, but we only observe statistics. Can we use statistics as estimators?

5 Outline 1.Back to Zomato Just applying the formulas we know 2.Estimators: Point Estimator Biased vs Unbiased Estimators Efficient vs Inefficient Estimators Interval Estimator Next time:Estimation, Confidence Intervals (continued) Chapter 5 of A&F

6 Back to Zomato 1.What statistical issue would preclude us from using the Central Limit Theorem? 2.Assuming we can use the CLT, what is the Margin of Error on Cafe Firenze and Lebanese Express’s ratings? Think !!

7 Questions: 1.When rating a restaurant, what are the possible choices for the user? 2.What is 3.4 on this rating? 3.What are we trying to estimate? 4.What is the formula for the standard error of ratings? Is a rating X a 0,1 variable? 5.What is the standard deviation sX of ratings? 6.Finally what is the standard error of the rating 3.4? 7.And what is the margin of error for the rating 3.4? (MoE = twice the standard error)

8 Recap: Central Limit Theorem Central Limit Theorem: with large sample size, the distribution of the sample mean is normal, with mean the true mean and with standard deviation (=standard error) equal to: X is not 0,1: Approximate the true standard deviation  X using the sample standard deviation s X. X is 0,1: Approximate  X = √(  (1-  ) ), where  is the true proportion, using the sample proportion for p. Café Firenze’s case

9 Back to Zomato If we had all the ratings of individual users: – John3 “Hated it, service is poor” – Abdullah4“Great venue” – Anthony5“Perfect, loved the al dente pasta” – Claire3“Ok for a downtown lunch” – Al Bloom3“The italian restaurant of the world” – John Sexton3“Can achieve more” – Ayesha3“There are alternatives” The average is 3.4, and we would find s X =…………….

10 Zomato Problemo The website only reports the sample mean of ratings… We thus have to figure out a conservative of s X (the largest possible). What is the highest possible s x ?

11 Outline 1.Back to Zomato Just applying the formulas we know 2.Estimators: Point Estimate Biased vs Unbiased Estimators Efficient vs Inefficient Estimators Interval Estimate Next time:Estimation, Confidence Intervals (continued) Chapter 5 of A&F

12 Parameters and their point estimates Parameters (« True » values)Point Estimate Population mean  Example: Population mean rating of Cafe Firenze Sample mean m Sample mean rating of Cafe Firenze Population medianSample median Population standard deviation  X Example: Population standard deviation of ratings of Cafe Firenze Sample standard deviation s X. Sample standard deviation of ratings of Cafe Firenze Population variance  X 2 Sample variance s X 2 Population p-th percentileSample p-th percentile This is called a “point estimate” because we give a single number (a “point” on the axis).

13 Biased vs Unbiased Estimator We have seen that to get the standard error of the sample mean, we need to have an estimate of  X. So far we have used: And the textbook has given: These are two different estimators of the same quantity  X. The textbook’s estimator of  X is unbiased. These two formulas are “point estimates”.

14 Efficient vs Inefficient Estimator Among all possible estimators, an estimator is efficient if it has the smallest standard error. The standard error of Is smaller than the standard error of The slides’ version is efficient, while the textbook’s version is unbiased. There is a conundrum. These two formulas are “point estimates”.

15 What do you actually need to remember? “Good” estimators are unbiased and efficient. – The sample mean is an unbiased and efficient estimator of the population mean. “Less good” estimators may be either unbiased or efficient. – The sample standard deviation with denominator N-1 is unbiased but inefficient. – The sample standard deviation with denominator N is biased but efficient. – We keep using the formula we learnt…

16 Parameters and Interval Estimate An interval estimate is an interval of numbers around the point estimate, which includes the parameter with probability either 90%, 95%, or 99%. Example: “the interval estimate [156.2 cm – 0.49cm ; 156.2 cm + 0.49cm] includes the population average height with probability 95%.”

17 Parameters and Interval Estimate An interval estimate that includes the parameter with probability 95% is called a 95% confidence interval. The expression “95% confidence interval” is widely used. Example: “[156.2 cm – 0.49cm ; 156.2 cm + 0.49cm] is a 95% confidence interval for the population average height.”

18 How do we build a 95% confidence interval? Goal: estimate the population average . From previous session: [  – MoE ;  + MoE] includes the sample mean with probability 95%. We conclude: the interval [m – MoE; m+MoE] includes the population mean with probability 95%. [m – MoE; m+MoE] is a 95% confidence interval for . MoE = 1.96 x Standard Error Standard Error = sX/√N

19

20 Wrap up Central Limit theorem: with a large sample size, the sampling distribution of the sample mean of X is normal, and the empirical rule applies. The standard error is the standard deviation of the sampling distribution  X / √N. For a proportion:  X = √(  (1-  ) ). As we typically do not observe the true proportion , but the sample proportion p. For other variables: As we do not observe the true standard deviation  X but rather the sample standard deviation s X, we approximate the standard error by s X / √N. We are interested in estimating parameters, but we only observe statistics. Can we use statistics as estimators? Estimators can be unbiased, and efficient.

21 Coming up: Readings: This week and next week: – Chapter 5 entirely – estimation, confidence intervals. – Understand the confidence interval, the point estimate. Online quiz on Thursday. Deadlines are sharp and attendance is followed. Tonight is the midterm election!! Watch : http://www.msnbc.com/jose-diaz-balart/watch/is-2014-the-margin-of-error-midterms-- 349919811638 For help: Amine Ouazad Office 1135, Social Science building amine.ouazad@nyu.edu Office hour: Tuesday from 5 to 6.30pm. GAF: Irene Paneda Irene.paneda@nyu.edu Sunday recitations. At the Academic Resource Center, Monday from 2 to 4pm.


Download ppt "Statistics for Social and Behavioral Sciences Session #14: Estimation, Confidence Interval (Agresti and Finlay, Chapter 5) Prof. Amine Ouazad."

Similar presentations


Ads by Google