Statistics for Social and Behavioral Sciences Session #18: Literary Analysis using Tests (Agresti and Finlay, from Chapter 5 to Chapter 6) Prof. Amine.

Statistics for Social and Behavioral Sciences Session #18: Literary Analysis using Tests (Agresti and Finlay, from Chapter 5 to Chapter 6) Prof. Amine Ouazad

Outline 1.True or False? 2.The p-value 3.One sided t tests Next time:one-sided t test of mean and proportion Chapter 6 of A&F

The Federalist Papers End of 18 th century: end of British colonial rule in the 13 colonies. How should the 13 colonies be ruled? Anti-Federalist papers written by unknown authors Brutus and Cato. Federalist papers published between October 1787 and August 1788 under the names “Philo-Publius”. Promoting the ratification of the United States Constitution, and describing the virtues of the checks and balances, an executive, and the judicial review of laws by federal courts. United States Constitution was ratified in 1788. Source: Inference in an authorship problem: A comparative Study of Discrimination Methods Applied to the Authorship of the Disputed Federalist Papers, Journal of the American Statistical Association, Volume 58, issue 302, 1963.

Exercise 6.20: Literary Analysis The authorship of an old document is in doubt. A historian hypothesizes that the author was a journalist named Jacalyn Levine. Upon a thorough investigation of Levine’s known works, it is observed that one unusual feature of her writing was that she consistently began 6% of her sentences with the word whereas. To test the historian’s hypothesis, it is decided to count the number of sentences in the disputed document that begin with whereas. Out of the 300 sentences, none do. Let π denote the probability that any one sentence written by the unknown author of the document begins with whereas. Test H 0 : “π= 0.06” against Ha: “π is not equal 0.06.” What assumptions are needed for your conclusion to be valid?

To know: Hypothesis testing is the foundation of (social) sciences. Three typical types of hypothesis: – A parameter (e.g.  ) is equal to v. – A parameter (e.g.  ) is greater than v. – A parameter (e.g.  ) is lower than v. Null hypothesis (to be rejected), and alternative hypothesis. We provide evidence to reject a null hypothesis. – We might not have evidence to reject the null hypothesis. For a test on the population mean  : H 0 :  = v. H a :  ≠ v. Confidence interval method with CI = [m-t*SE, m+t*SE]. – Reject the H 0 with significance level 5% if the 95% confidence interval for the sample mean m does not include v. – Reject the H 0 with significance level 10% if the 90% confidence interval for the sample mean m does not include v. t-test method with t = (m-v)/SE: – reject the null at 1% if is outside [-t 0.005,t 0.005 ]. – reject the null at 5% if is outside [-t 0.025,t 0.025 ]. – reject the null at 10% if is outside [-t 0.05,t 0.05 ]. – With degrees of freedom df = N-1.

t Table

From the confidence interval method …to the t-test Null hypothesis:  = v. We do not reject the null hypothesis H 0 with confidence level 95% if the 95% confidence interval for the sample mean m includes v. Do not reject H 0 at 95% if: m – t 0.025 * SE < v < m + t 0.025 * SE Notice that this is equivalent to: Do not reject H0 if: -t 0.025 < (m-v)/SE < t 0.025 t 0.025 is the 95% critical value for the t statistic. (m-v)/SE is the t statistic.

Graphically… Under the null hypothesis (  =v): (m-v)/SE follows a standard normal distribution if the sample size is large. (m-v)/SE follows a t distribution if (i) the sample size is small and (ii) X is normally distributed. Sampling distribution of the t statistic df = N-1 On this graph, indicate for which values of t we should reject the null hypothesis… With 95% confidence. With 90% confidence. And also with 99% confidence ?

The p-value is the probability (in other potential samples, unobserved) that the absolute value of the t statistic is greater than the observed t statistic in our sample. A low p value means….. – that larger absolute values of the t test are unlikely. In practice: Reject the null hypothesis H0 at 95% if the p- value is lower than 0.05. Similarly for 90% and 99%! (0.10 and 0.01 resp.)

Source: Mosteller, Wallace, Inference in an authorship problem: A comparative Study of Discrimination Methods Applied to the Authorship of the Disputed Federalist Papers, Journal of the American Statistical Association, Volume 58, issue 302, 1963. Frederick Mosteller, Harvard University

t statistics What can we infer? At 95% ? At 90%? At 99%? The authors build a measure y of the style of a paper. Upon, whilst, there, on, etc. When Hamilton writes, such measure is on average  H. Null hypothesis for paper k: H0 : style =  H. And similarly for Madison.

Publication bias Papers get published if the t statistic is above 1.96 (df = infinite, N is large). – This is a convention in science and social sciences. This is a bad thing !!! Non rejected H0 should be reported !!! Distribution of t statistics across papers: Brodeur, Le, Sangnier, Zylberberg, “Star Wars: The Empirics Strike Back”, March 2013. Number of papers t statistic 1.96

Outline 1.Literary analysis using hypothesis testing 2.One sided hypothesis and t tests. Next time:one-sided t test of mean and proportion Chapter 6 of A&F

Hypothesis testing Hypothesis: an empirical statement about a population parameter. Usually of the shape: – “The parameter is equal to a given value” – “The parameter is greater than a given value” – “The parameter is lower than a given value” Almost all scientific/sociological/economic statements can be reduced to one of these three types. – “The population proportion of voters for Cory Gardner is greater than 50%.” (second type of hypothesis) – “The impact of ZMapp on Ebola patients’ condition is zero.” (first type of hypothesis) Last session This session We haven’t seen that yet!

One sided test of H 0 :  =v When testing  <v, use the t statistic method. – avoid the confidence interval method. Assume that  =v and build t = (m-v)/SE. Reject the null hypothesis at 95% if the t statistic is greater than t 0.05. Similarly for 90% and 99% (t 0.10 and t 0.01 ).

Back to Cory Gardner With 1802 respondents, the polling company PPP found that 48% of respondents would vote for Gardner. Pollsters think that Gardner wins if the vote share is > 46%. Can you test the null hypothesis that the true vote share is 46% ?

Wrap up Confidence interval method for the test of H 0 :  = v. H a :  ≠ v. – Reject the H 0 with significance level 1% if the 99% confidence interval for the sample mean m does not include v. – Reject the H 0 with significance level 5% if the 95% confidence interval for the sample mean m does not include v. – Reject the H 0 with significance level 10% if the 90% confidence interval for the sample mean m does not include v. t test method for the test of H0 :  = v. H a :  ≠ v. – Build the t statistic (m-v)/SE – Reject the H0 with significance level 1% if the t statistic is outside the range [-t 0.005, t 0.005 ] – Reject the H0 with significance level 5% if the t statistic is outside the range [-t 0.025, t 0.025 ] – Reject the H0 with significance level 10% if the t statistic is outside the range [-t 0.05, t 0.05 ] I will not ask for the one sided t-test, but good to know for your future life.

Coming up: Readings: Mid term on Tuesday, November 25. – Coverage: up to Chapter 6 inclusive. No online quiz this week. Make sure you come to sessions and recitations. For help: Amine Ouazad Office 1135, Social Science building amine.ouazad@nyu.edu Office hour: Tuesday from 5 to 6.30pm. GAF: Irene Paneda Irene.paneda@nyu.edu Sunday recitations. At the Academic Resource Center, Monday from 2 to 4pm.

Statistics for Social and Behavioral Sciences Session #18: Literary Analysis using Tests (Agresti and Finlay, from Chapter 5 to Chapter 6) Prof. Amine.

Similar presentations

Presentation on theme: "Statistics for Social and Behavioral Sciences Session #18: Literary Analysis using Tests (Agresti and Finlay, from Chapter 5 to Chapter 6) Prof. Amine."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistics for Social and Behavioral Sciences Session #18: Literary Analysis using Tests (Agresti and Finlay, from Chapter 5 to Chapter 6) Prof. Amine.

Similar presentations

Presentation on theme: "Statistics for Social and Behavioral Sciences Session #18: Literary Analysis using Tests (Agresti and Finlay, from Chapter 5 to Chapter 6) Prof. Amine."— Presentation transcript:

Similar presentations

About project

Feedback