Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distinguishing authorship

Similar presentations


Presentation on theme: "Distinguishing authorship"— Presentation transcript:

1 Distinguishing authorship
Shu Min, Yan Ling, Yi Mou

2 Reading Mosteller, F. (2010). The pleasures of statistics: The autobiography of Frederick Mosteller. Edited by Stephen E. Fienberg, David C. Hoaglin and Judith M. Tanur. Springer. (Chapter 4: Who wrote the disputed Federalist Papers, Hamilton or Madison?)

3 Contents Author of the book: Frederick Mosteller Federalist Papers
Mosteller’s attempts to distinguish authorship Conclusion

4 Author of the book: Frederick Mosteller
One of the most eminent statisticians of the 20th century Founding chairman of Harvard’s statistics department Major contribution to statistics Studied the historical problem of who wrote each of the disputed Federalilst papers, Madison or Hamilton

5 Federalist Papers Published anonymously in by Hamilton, Madison and Jay Persuade the citizens of New York to ratify the Constitution Till today, it is an important work in political philosophy

6 Disputed federalist papers
General agreement on the authorship of 70 papers—5 by Jay, 14 by Madison, and 51 by Hamilton. Of the remaining 15, 12 are in dispute between Hamilton and Madison, and 3 are joint works to a disputed extent. 85 essays and articles

7 Problems with distinguishing authorship
Writings of Hamilton and Madison are difficult to tell apart because both authors were masters of the popular Spectator style of writing—complicated and oratorical. “Had no important step been taken by the leaders of the Revolution for which a precedent could not be discovered, no government established of which an exact model did not present itself, the people of the United States might, at this moment, have been numbered among the melancholy victims of misguided councils, must at best have been laboring under the weight of some of those forms which have crushed the liberties of the rest of mankind.”

8 Mosteller attempt 1: Worked with Fred Williams
Bought duplicate copies of Federalist papers Counted words in each sentence for known papers Average length 34.55, s.d. 19 for Hamilton Average length 34.59, s.d. 20 for Madison DISASTER!

9 Mosteller attempt 2: Read some stylistic work by psychologists
Suggested to look at noun-adjective ratio Used dictionaries, grammers and special rules Modest differences between the two authors, but not enough to be compelling

10 Mosteller attempt 3: Rate of use of variables that were easily detected and counted One- and two- letter words, the number of the’s Applied Fisher’s discriminant function to the unknown papers Distinguish between two categories Discriminant obtained was too weak to settle the authorship for each of the 15 papers with reasonable confidence. Separated from Fred Williams due to WWII

11 Mosteller attempt 4: Worked with David Wallace
Used paired marker words while (Hamilton) and whilst (Madison) Problems: 1. Present in only less than half of the paper 2. Words are imperfect indicators (Authors may use another form of word sometimes)

12 Mosteller attempt 5: Non-contextual words to discriminate author (writing style/preference) Analyse their rate of usage For the word: by Lower rates = Hamilton Higher rates = Madison Discriminating power By > To > From

13 Modelling To apply the theory of statistical inference to evidence
Probability model represent the variability in word rate from paper to paper To represent Madison’s usage of the word by: 12 per 1000 Imagine an urn filled with many thousands of red and black balls Red occurring in the proportion 12 per 1000. Black corresponding to the number of other words (988 per 1000) To extend the model to simultaneous study of two or more words – need balls of three or more colours Simplest model/ most common in classic probability Fine structure within a sentence is determined in large measure by non-random elements of grammar, meaning, and style If a large block of text is analysed, detailed structure of phrases and sentences ought not to be very important

14 Testing of model Tested the model by comparing its predictions with actual counts of word frequencies in the papers. The random variation of the urn scheme represented most of the variation in counts from one essay to another, but in some essays authors change their basic rates a bit. Another model used: negative binomial distribution. Negative binomial gave odds of 100 to 1 for Madison, the simple urn model gave 10,000 to 1! Choosing a model that does not fit the data may therefore give a highly misleading result Random variation of the urn scheme represented most of the variation in counts from one essay to another, but authors change their basic rates a bit in some essays

15 Conclusion Overwhelming evidence for Madison’s authorship of the disputed papers. Except for some papers, odds of 80 to 1 for Madison—strong, but not overwhelming Many attempts of coming up ways to distinguish authorship


Download ppt "Distinguishing authorship"

Similar presentations


Ads by Google