Presentation is loading. Please wait.

Presentation is loading. Please wait.

WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye.

Similar presentations


Presentation on theme: "WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye."— Presentation transcript:

1 WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

2 Wikipedia: 3,000,000+ Article, 1,000,000,000+ Revisions Our Goal: Crowd-sourcing community consensus

3 Vandalism Prevents Wikipedia being taken fully seriously Harder to use Wikipedia in schools Harder to make static selections

4 Zero-delay: Use only those features which are available at the time the revision is created. (no lookahead) Historical: Use the full set of WikiTrust features, including how the revision is treated by subsequent authors. (lookahead) Vandalism Detection Given a new revision, classify as Vandalism or Regular

5 Wikipedia 1.0 Project: Aims to extract a static snapshot of Wikipedia. Use in Schools, Developing Countries, OLPC Project. Revision Selection Given an article, select the “best” revision to show to a user.

6 Core Concepts Wikipedia Article Many Revisions 1 Author per Revision Author has Reputation, Revision has Trust. Binary Classifier: Either A or B.

7 Zero Day Features Author is Anonymous (Turns out we don’t care) Time interval after the previous edit (Useful, but only as a predicate time > 12 seconds) Time of day of edit (Not used)

8 Zero Day Features Difference from previous revisions (Not really) Comment Length (Nope)

9 Zero Day Features (we care about these) Previous Text Trust Histogram Current Text Trust Histogram Histogram Difference

10 Text Trust New text starts with a trust value proportional to the author's reputation. Text can gain trust when revised. Cut-and-paste, deletions result in local trust loss. We remember deleted text and its trust.

11 A Sequence of Differences For revisions v 1, v 2, v 3... of a wiki, word trust is computed from the difference between v i, v i-1 How did we arrive at the current version of an article?

12 Text Trust: The Algorithm Illustrated 1) Trust of new text 1

13 Text Trust: The Algorithm Illustrated 1) Trust of new text 2) New block borders have the same trust as new text 22 2

14 Text Trust: The Algorithm Illustrated 1) Trust of new text 2) New block borders have the same trust as new text 3) The revision effect increases the trust of existing text 3 3

15 Text Trust: The Algorithm Illustrated 1) Trust of new text 2) New block borders have the same trust as new text 3) The revision effect increases the trust of existing text 4) Note: this is not a new border 4 4

16 Zero Day Features (we care about these) Previous Text Trust Histogram Current Text Trust Histogram Histogram Difference

17 Historical Features Next revision comment length (length > 110 chars) Next revision comment has the word revert in it (too noisy)

18 Historical Features Author Reputation (How do other users judge this user’s edits?)

19 Historical Features Minimum Revision Quality Average Revision Quality Maximum Dissent

20 Historical Features Total Weight of Judges (not at all)

21 ROC AUC Scoring >0.90 = Excellent 0.8 - 0.9 = Good < 0.8 = Poor 0.5 = Expected result from flipping a coin Probability that a binary classifier is correct

22 Results (PAN 2010) ROC of 0.937

23 Results (PAN 2010) ROC of 0.937 X ROC of 0.914 ?

24 Results (PAN 2010) ROC of 0.937 X ROC of 0.904 ?

25 Other Directions Wikipedia 1.0 Vandalism API Newsgroup Reputation IP Address Reputation

26 The fraction of change that is in the same direction of the future. Qual = 1: v j is a totally good edit Qual = -1: v j is reverted -1 ≤ Qual ≤ 1 vivi vkvk vjvj “work done” d(v i, v j ) d( v i, v j )-d( v j, v k ) “progress” the past the future Revision Quality

27

28

29

30

31


Download ppt "WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye."

Similar presentations


Ads by Google