WINLAB Narayan B. Mandayam Collaborators: S. Anand*, O. Arazy †, and O. Nov* *NYU Poly †University of Haifa & Alberta School of Business Acknowledgment:

1 WINLAB Narayan B. Mandayam Collaborators: S. Anand*, O. Arazy †, and O. Nov* *NYU Poly †University of Haifa & Alberta School of Business Acknowledgment: NAKFI Game Theoretic Modeling of Online Knowledge Creation in Wikipedia

2 WINLAB DNA of Silicon Brains?  Peer Production Large number of individuals co-create knowledge Examples- Citizen science projects, Citizen journalism, Open source software (Linux Kernel Development), Wikipedia  From Individual Informed Brains to a Society Scale Informed Brain Uncovering the DNA of Social Knowledge Creation  Inspired by similar efforts in the life sciences, such as the Human Genome Project, we seek to explore basic patterns or “building blocks” of the process through which individual human brains co-create society-scale silicon brains the relationship between sequential patterns of these building blocks and attributes of the resulting silicon brains 2

3 WINLAB Wikipedia as a Silicon Brain  Among the most popular information repositories on the web 18B page views (500M unique visitors/month) and counting 6 th most popular after Google, Microsoft, Facebook and Yahoo  The largest “collaborative effort” in human history Content: ~34M articles (~5M in English) Editors: ~23M editors/contributors (130K active in last month, 30K with at least 5 edits) Administrators: ~1400  Wikipedia is a silicon-based “brain” - a large scale knowledge repository created collaboratively  people contribute their knowledge, expertise and energy  a common pool (information good) accessible to everybody 3

4 WINLAB DNA of Wikipedia  Wikipedia is built on Wiki technology A web-based collaborative authoring tool  A contributor can add content, add to or delete existing content Similar to a Google doc or multi-user editing with ‘Track Changes’ in MS-Word  Each “edit” made by a user, creates a new version of the wiki page All versions are tracked in the “History” page (adopting version control principles from software development) 4 WikiDNA

5 WINLAB DNA of Wikipedia 5

6 WINLAB Characteristics of Collaboration in Wikipedia 6  Contributors Motivations Reputation enhancement, Ego, Express one’s opinions Contributors’ goal is to increase the content that they “own” “Competitive” in nature, “Refactoring” of others’ contributions  Comprehensive Array of Governance Mechanisms Automated tools Social norms and procedures Quality management, Conflict Management, Status management Act to ensure information credibility, unintentional biases, intentional biases (“Wiki lobbying”), vandalism  Product Quality Information Quality, Accuracy, Completeness, Objectivity, Representation “5 pillars”

7 WINLAB Modeling Contributor Activity and Governance Mechanisms in Wikipedia  Noncooperative Game for Modeling Contributor Interactions  Stackelberg Game for Modeling Governance Mechanism Interactions with Contributor Actions 7  Derive insights from model  Validate model with data from Wikipedia

8 WINLAB Non-cooperative game amongst contributors 8

9 WINLAB Nash equilibrium of the noncooperative game 9

10 WINLAB Solution and Implications of Nash equilibrium 10

11 WINLAB How to measure quantities in model from data? 11

12 WINLAB How to model governance? 12

13 WINLAB Sample Data and Quality Assessment  Representative sample of 89 Wikipedia articles used in (Arazy et al., 2011, 2013) Stratified sampling by topic (e.g., culture, geography) For each article, details of every edit made (and the contributor making it) from article’s inception to January 2007; the average article: # of edits = 91 # of unique contributors = 49  Set includes measures of information credibility (7-point Likert scale) 5-6 students independently analyze each page and produce detailed reports by comparing to external resources Senior university librarians independently analyze articles (employ students’ report + other sources) and rate: information quality, accuracy, completeness, objectivity, and representation Senior librarians sit together, argue differences, and arrive on consensus 13

14 WINLAB  1956 Trans-Canada Air Lines accident, Abreu Camp, Alcohol 120% Alpha Iota Omicron Ancient DNA Anime Vegas Antonio Inoki vs Renzo Gracie Arms sales to Iraq Art Finley Ashton, Illinois Australian contribution to the 1991 Gulf War Battle of Magdhaba Belle Glade, Florida Bess of Hardwick Biphenyl Blue and white (porcelain) BMW 3 Series Briccriu Cameron Bright Canadian federal election, 1930 Chandrashekarendra Saraswati Chikkamagaluru district Commonwealth Scientific and Industrial Research Organisation Construction Core mantle boundary Dhol Dianthus Wiki pages for Quality Assessment (1/2) 14  Dragoon Sniper Rifle Dzhezkazgan Edouard Pingret Electronic lock Eleonore Duplay Empire Theatres Fawsley Ferdinand III, Grand Duke of Tuscany Fiat 1300/1500 Flying car Frunzik Mkrtchyan Gneisenau class battlecruiser Graz Great Dismal Swamp Greenup, Illinois Hero of the Soviet Union High pressure area High-end audio cables Ier arrondissement In a Fix Irving Kanarek Jacobi identity Jay Fiedler JE Khopesh

15 WINLAB Wiki pages for Quality Assessment (2/2) 15  K-pop Ludwigsfelde Magnetohydrodynamic drive Medieval churches of York Meow Wars Merrimac Ferry Mr. Potato Head Multitrack recording Myles Brand Newtonian telescope Nueces massacre Operation Osprey Orange Revolution Otis, Colorado Perfect game (bowling) Peter Jackson Philippe, comte de Paris Pine Lawn, Missouri Pledge of Allegiance Poisson algebra Politics of Alberta  R. Lee Ermey RBC Center Sandkings (novelette) Self-service password reset Shoulder strap Smelly Cat Spoiler effect Stephen Goldsmith Student unionism in Australia System Requirements Specification The Bad Plus The Real World: Austin Thrombosis Timesplitters Tippmann A-5 Treaty of Mutual Cooperation and Security between the United States and Japan Uncle Vanya Unidentified submerged object Urbanization in Africa WDC 65C02 Well-Tempered Clavier William Holborne Winged bean

16 WINLAB Data: Governance, Quality & Unbiasedness  Average quality is correlated to reducing difference between max and min fractional ownerships (bias) 16 Quality vs Difference between maximum and minimum fractional ownership Average Quality “5 Pillars” (ensuring no one group takes over ownership of content)

17 WINLAB A Stackelberg Model for Wikipedia 17 Governance Model Contributor Model

18 WINLAB  1000 pages from January 2012 dump of English Wikipedia  219,811 distinct contributors  Lifespan: 129 – 4078 days, Avg. duration: 2681 days (~7.35 years)  25 Topical Categories: Agriculture, Arts, Business, Chronology, Concepts, Culture, Education, Environment, Geography, Health, History, Humanities, Humans, Language, Law, Life, Mathematics, Medicine, Nature, People, Politics, Science, Society, Sports, Technology  4 Maturity Strata (Number of Revisions): 1-10, , ,  40 articles in each topical category, 250 articles in each maturity stratum Wikipedia Articles for Empirical Data Validation 18

19 WINLAB Validating Analysis against Data-1 19 Abreu Camp Paris

20 WINLAB Validating Analysis against Data -2 20

21 WINLAB Validating Model: Estimation Error & Significance 21 Error 11-15%, Significance Test p =0.03

22 WINLAB  Mathematical Models for Collaborative Knowledge Creation  Wikipedia as an example of a silicon brain  Developed a Game Theoretic Model for Knowledge Creation in Wikipedia  Non cooperative game for contributor interactions  Users noncooperatively maximize their content ownership  Effort measured as function of Levenshtein distance of edits  Stackelberg model for influence of governance mechanisms on editors  Governance factor is the implicit outcome of emphasis on “5 pillars” and “product quality”  Reducing difference between maximum and minimum fractional ownership subject to objectivity/quality constraints Conclusions and Future Directions-1 22

23 WINLAB  Nash Equilibrium Implications  Only users who expend less than “average effort” have non- zero content ownership  Seems counterintuitive but a consequence of governance  Unintended consequences of governance: Wiki-bureaucracy, Difficult to navigate rules, Driving away contributors  Model can offer guidelines on how to moderate governance  Dynamic models that track sequential actions and interactions  Dynamic games  Evolutionary game theory  Model improvements and validation with larger data sets  Improved modeling/analysis of governance; Other user metrics Conclusions and Future Directions-2 23

