Presentation is loading. Please wait.

Presentation is loading. Please wait.

Characterizing and supporting the Wikipedia Knowledge Community Adam Wierzbicki PJIIT.

Similar presentations


Presentation on theme: "Characterizing and supporting the Wikipedia Knowledge Community Adam Wierzbicki PJIIT."— Presentation transcript:

1 Characterizing and supporting the Wikipedia Knowledge Community Adam Wierzbicki PJIIT

2 Reconcile : Credibilty Evaluation of Web content Goal: creation of new mechanisms for supporting users in the evaluation of Web content credibility – Improving how users evaluate credibility – Improving the credibility of content through better evaluation – Making new mechanisms available in Web browsers – Applications: almost any decision making (in politics, economics, climate change, nuclear energy) Three year project. Started in November, 2011 Two partners: PJIIT and EPFL – PJIIT team: 6 full-time researchers, 1 developer, several collaborators, collaboration with a social opinion research company, collaboration with Wikimedia foundation – EPFL team: 3 full-time researchers, collaboration with CERN

3 Web Credibility Corpus (WCC) Constructed using experiments with online evaluation of Web page credibility Pages specially selected using thematic Web searches – Supplemented with additional query terms that control some variables, like recency of a page – Pages selected manually, then archived in the experiment backend server Special plugin for running the experiment has been developed – Capable of evaluating Web pages Capable of evaluating Web pages Corpus divided into modules. – Each module can have a different subject, different kinds of content, and be used for various experiments

4 First credibility evaluation experiment First WCC module – Subject: health. 9 topics. – Language: Polish – 81 pages selected First experiment – Participants: about 100 students from PJIIT – About 800 evaluations (at least 9 per page) – Three treatments: Topic search: answering a question Topic browse: pages in only one topic Free browse: pages on variety of topics

5 Credibility game Game-theoretic model of using credibility for selection of Web content (or information) Requirements for a good model of credibility: – Use asymmetric roles for users who can produce or consume information – Explicitly model the quality of information produced and consumed by users – Model the preferences of information producers and consumers – Allow to take into account the use of credibility evaluation methods and to study their impact on the model – Model economic motivations of information producers – Model diverse strategies of information producers and consumers – Be simple enough to analyze and simulate ;]

6 Questions?

7 Research on the Wikipedia community The Wikipedia is not just a free encyclopedia… – It’s a knowledge community (Pierre Levy), a collaborative innovation network (Peter Gloor), a community of prosumers (Dan Tapscott)… – A model for the knowledge economy (Wikinomics) What do we know about the Wikipedia community? Our knowledge is incomplete. We know: – The distribution of editor activities (power law, strongly focused in a group of active editors, with a long tail of weakly active ones) – The hyperlink structure of the Wikipedia (mostly like the Web) – Conflict phenomena (edit wars) – Mechanisms of coordination and collaboration – Management mechanisms

8 How to study the Wikipedia community? Edit history: source of behavioral data – Contains every version of every page, including special pages – Very large datasets (Polish-language Wikipedia: over 250 GB, English-language Wikipedia: over 18TB) Constructing social networks from edit history – We have used multi-dimensional social networks constructed from the entire Polish-language community: acquaintance, trust, criticism, interests/knowledge Validating operationalizations – We have used surveys to validate our operationalizations – We are now using a data-mining approach to find the best operationalization

9 What ails the Wikipedia? On the English-language Wikipedia, we see: – Decline of number of edits – Decline of number of editors Why do editors leave the Wikipedia? – „Wikipedia is too confusing” – „I felt that I was often working alone, without support” – „I found the atmosphere unpleasant” – „Some editors made Wikipedia a difficult place to work” Management problems: – Closing admin society – Increasing amount of indirect work (like maintenance) – Increasingly complex rules What can be done to help?

10 Social Wiki The Wiki technology is over 10 years old It has been designed well to make editing and collaborative work easier… …But it is not a social centric platform Examples: – Simplistic social roles (admin vs non-admin) – No representation of social environment (hence the feeling of „being alone”, not knowing whom to trust) – No representation of social norms (have to be reinforced manually by admins) – Poorly designed management procedures (like admin election) – No explicit motivation mechanisms Time for a Social Wiki-NG!

11 The n-d Social Network A multidimensional social network of Wikipedia editors – Anonymous authors and bots are not counted – Edges between authors (nodes) may be part of different network dimensions – Every edge has a strength (defined for its dimension) The network is created from the edit history – Can be created for any moment in time within the history – The edge strength depends on the whole history up to this moment Currently we use 4 dimensions – Dimension 1 (Coedits) – Dimension 2 (Reverts) – Dimension 3 (Discussion on talk pages) – Dimension 4 (Edits in categories)

12 Network dimensions Dimension 1 – Edge strength is a decreasing function of the distance between words of author B and words of author A – Can be interpreted as trust Dimension 2 – Edge strength is the number of reverts of revisions made by author B that have been done by author A – Can be interpreted as distrust or criticism Dimension 3 – Edge strength is the number of words of author A written near words of author B on the talk pages – Can be interpreted as acquaintance Dimension 4 – A bipartite graph connecting authors with Wikipedia subject categories – Edge strength is the number of edits of an author in a category – Can be interpreted as interest or knowledge

13 WikiTeams A WikiTeam is a subset of the n-D social network From the edit history we construct teams as groups of editored that have co-authored an article – If their edits are still in the newest version Assumption: team quality can be evaluated based on the quality of the team’s product (the article) Use article quality evaluation on Wikipedia – Evaluation done by authors – Articles can be „featured” or „good” – The rest of the articles (not counting „stubs”) are „normal”

14 WikiTeam evaluation criteria Based on n-D social network Sums or averages of edge strenghts between team members – Averages can count missing links as zero strength For dimension 4 (edits in categories): – Average strength between team members and categories that describe the article – Minmax of strength between team members and categories that describe the article

15 WikiTeam dataset About 300 „featured” or „good” teams About 200 000 normal teams Data about a WikiTeam: – Size – Membership – Criteria values Data about the article: – Number of edits – Time of first and last edit Data about authors: – In how many „featured” or „good” teams did the author participate? – Author position in each dimension of the social network

16 Comparing „good” and „normal” WikiTeams

17 Social capital of editors Let’s look at the number of featured articles co-authored by an editor. Is this number related to the editor’s social capital? Yes. The degree in dimension 3 (acquaintance) and 1 (trust) is positively and significantly related to the number of co- authored featured articles. It is good to have many neighbors in dimensions 3 and 1, but these neighbors should not be too connected to each other. Variable R square Linear model coeff. Degree in dimension 3 0,2720,16 Degree in dimension 1 0,431,34 CC in dimension 3 0,053-2,93 CC in dimension 1 0,014-1,18

18 Study of Polish WikiAdmins Polish Wikipedia has over 170 administrators Since 2005, there have been about 300 Requests for Adminship (RfA) – The RfA procedure was introduced in 2005 Dataset contains all votes

19 Wiki admins: a closing society? Percentage of accepted Requests for Adminship Mean time in days from account registration of accepted candidates

20 Impact of n-dSN on RfA votes Confirmed hypotheses: – Mean „trust” is higher for accepting votes than for rejecting votes – Mean „criticism” is higher for rejecting votes than for accepting votes – Mean „acquaintance” is higher in accepting votes than in rejecting votes

21 RfA votes and n-dSN summary MeasureVotes „for”Votes „against” Trust median57.341.1 Trust mean442.2287.7 Criticism mean2.34.2 Criticism 3d quartile2.03.0 Acquaintance median 492405 Acquaintance mean1,035751

22 Interpreting the n-DSN How do we know that our interpretations of the n-DSN are correct? – We have indirect evidence: if „trust” increases for good WikiTeams then „it works” ;] – If we want more, we have to ask the editors! – Survey on Polish Wikipedia Editors responded to „personalized” questions that were based on the edit history Editors were asked to recall the nicks of other editors and then to recognize them from a list Then they were asked about trust, antagonism, interests and knowledge Reponses are anonymized

23 Survey questions 1. Please give the nicks of all Wikipedia editors that you remember. 2. We will show you a list of nicks of editors that you have worked with. Please select the nicks that you recognize. 3. Please select the nicks of editors that you have had contact with (using talk pages, communicators, e-mail, personally or in any other way). 4. Please select the nicks of editors that have, in your opinion, edits of a good quality. 5. Please select the nicks of editors with whom you have at any time disagreed with or argued with. 6. How did you contact other editors? (multiple choice) 7. Please look at the following list of Wikipedia categories. Please select the categories in which you have expert knowledge. 8. Now select the categories that you are interested in. 9. How much time (monthly) do you work on Wikipedia? 10. Age, social status, education

24 Preliminary survey results – 1 Percentage of recognized nicks as a function of strength in d-3 (discussion)

25 Questions?


Download ppt "Characterizing and supporting the Wikipedia Knowledge Community Adam Wierzbicki PJIIT."

Similar presentations


Ads by Google