Presentation is loading. Please wait.

Presentation is loading. Please wait.

Authorship Verification as a One-Class Classification Problem Moshe Koppel Jonathan Schler.

Similar presentations


Presentation on theme: "Authorship Verification as a One-Class Classification Problem Moshe Koppel Jonathan Schler."— Presentation transcript:

1 Authorship Verification as a One-Class Classification Problem Moshe Koppel Jonathan Schler

2 Introduction  Goal –Given examples of the writing of a single author, ask to determine if given texts is written by this author  Authorship attribution –Given examples of several of authors, ask to determine which author wrote the given anonymous texts

3 Challenge  Negative samples are neither exhaustive nor representative  Single author may consciously vary his/her style from text to text

4  Naïve Approach –Given examples of the writing of author A –Concoct a mishmash of works by other authors –Learn a model for A vs. not-A –Learn A vs. X (an mystery work) –Easy to distinguish between A and X  Different author  Same author (otherwise) Authorship Verification

5  Unmasking basic idea –A small number of features do most of the works in distinguish books –Iteratively remove those most useful features –Gauge the speed with which cross-validation accuracy degrades Authorship Verification

6 Unmasking House of Seven Gables against Hawthorne (actual author), Melville and Cooper

7 Experiment

8  Use One-class SVM as baseline –6 of 20 same-author pairs are correctly classified –143 of 189 different-author pairs are correctly classified

9 Experiment  Using Unmasking Approach –Choose feature set with 250 words with highest average frequency in A x and X –Build Degradation Curve Use 10-fold validation for A again X, for each fold Do 10 iterations { Build a model for A against X Evaluate accuracy results Add accuracy number to degradation curve Remove 6 top contributing feature from data }

10 Experiment Unmasking An Ideal Husband against each of the ten authors

11 Experiment  Distinguish same-author curves and different-author curve –Represent degradation curve as feature vector –Feature vector: numerical vector in terms of its essential feature  Accuracy after 6 elimination rounds < 89%  The 2 nd highest accuracy drop in two iteration > 16% –Test degradation curve

12 Experiment Result  19 of 20 same-author pairs are correctly classified  181 of 189 different-author pairs are correctly classified  Accuracy 95.7%

13 Extension  Use negative examples to eliminate some false positive from the unmasking phase  In our case, use elimination method improved accuracy –189 of 189 different-author pairs are correctly classified –Introduced a single new misclassified

14 Extension  Elimination If alternative author {A 1,…,A n } exists then { build model M for classifying A vs. all other alternative authors test each chunk of X with built model M for each alternative author A i build model M i for classifying A i vs. {A or all other alternative authors} test each chunk of X with built model M i } If number of chunks assigned to A i > # of chunks assigned to A then return different-author }

15 Actual Literary Mystery  Two 19 th century collection of Hebrew- Aramaic –RP includes 509 documents (by Ben Ish Chai) –TL includes 524 documents (Ben Ish Chai claims to have found in an archive)

16 Actual Literary Mystery Unmasking TL against Ben Ish Chai and four impostors

17 Conclusion  Unmasking – complete ignore examples –High accuracy  Unmasking + Elimination (little negative data) –Accuracy better  More experiment need to confirm this methods is also good for other languages


Download ppt "Authorship Verification as a One-Class Classification Problem Moshe Koppel Jonathan Schler."

Similar presentations


Ads by Google