Amplifying Community Content Creation with Mixed-Initiative Information Extraction Raphael Hoffmann, Saleema Amershi, Kayur Patel, Fei Wu, James Fogarty,

Slides:

Advertisements

Similar presentations

Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers I am Raphael Hoffmann and this is joint work with James Fogarty.

Advertisements

Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.

Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and.

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)

Distant Supervision for Relation Extraction without Labeled Data CSE 5539.

Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

Snejina Lazarova Senior QA Engineer, Team Lead CRMTeam Dimo Mitev Senior QA Engineer, Team Lead SystemIntegrationTeam Telerik QA Academy Telerik QA Academy.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Zoetrope: Interacting with the Ephemeral Web Eytan Adar, Mira Dontcheva James Fogarty, Dan Weld University of Washington & Adobe Systems.

Machine Reading From Wikipedia to the Web Daniel S. Weld Department of Computer Science & Engineering University of Washington Seattle, WA, USA.

Machine Reading From Wikipedia to the Web Daniel S. Weld Department of Computer Science & Engineering University of Washington Seattle, WA, USA.

6/14/2015 8:20 PM1 CSE 574 Extracting, Managing & Personalizing Web Information Staffing –Dan Weld –Raphael Hoffmann Content –Intersection of AI, ML, DB.

Web Usability by Scott Grissom1 Web Usability Scott Grissom Computer Science & Information Systems.

Using the Semantic Web for Web Searches Norman Piedade de Noronha, Mário J. Silva XLDB / LaSIGE, Faculdade de Ciências, Universidade de Lisboa.

Ryen White, Susan Dumais, Jaime Teevan Microsoft Research {ryenw, sdumais,

Senior Project Database: Design and Usability Evaluation Stephanie Cheng Rachelle Hom Ronald Mg Hoang Bao CSC 484 – Winter 2005.

© Anselm Spoerri Lecture 13 Housekeeping –Term Projects Evaluations –Morse, E., Lewis, M., and Olsen, K. (2002) Testing Visual Information Retrieval Methodologies.

ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.

Usability 2004 J T Burns1 Usability & Usability Engineering.

Jeffrey P. Bigham Anna C. Cavender, Jeremy T. Brudvik, Jacob O. Wobbrock * and Richard E. Ladner Computer Science & Engineering The Information School*

Privacy – what do they know about you? This work is licensed under a Creative Commons Attribution-Noncommercial- Share Alike 3.0 License. Skills: none.

Lecturing with Digital Ink Richard Anderson University of Washington.

IntelWiki: Recommending Resources to Help Users Contribute to Wikipedia Mohammad Noor Nawaz and Andrea Bunt University of Manitoba 1.

Knowledge is Power Marketing Information System (MIS) determines what information managers need and then gathers, sorts, analyzes, stores, and distributes.

The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.

A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.

Olli Kulkki Markus Lappalainen Ville Lehtinen Reijo Lindroos Ilari Pulkkinen Helsinki University of Technology S Acceptability and Quality.

Jeffrey P. Bigham Richard Ladner, Ryan Kaminsky, Gordon Hempton, Oscar Danielsson University of Washington Computer Science & Engineering.

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

Selecting and Developing Courses for Police Product Road Map.

Preposition Usage Errors by English as a Second Language (ESL) learners: “ They ate by* their hands.”  The writer used by instead of with. This work is.

CS 352, W12 Eric Happe, Daniel Sills, Daniel Thornton, Marcos Zavala, Ben Zoon ANDROID/IOS RPG GAME UI.

Collage: A presentation tool for the K-12 Classroom Presented by Kanav GoyalAbhinav Uppal.

Evaluation of Adaptive Web Sites 3954 Doctoral Seminar 1 Evaluation of Adaptive Web Sites Elizabeth LaRue by.

Online, Remote Usability Testing  Use web to carry out usability evaluations  Two main approaches agent-based evaluation (e.g., WebCritera)  model automatically.

Part 1-Intro; Part 2- Req; Part 3- Design  Chapter 20 Why evaluate the usability of user interface designs?  Chapter 21 Deciding on what you need to.

Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru

-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.

KNOVADA – CUSTOMERSOFT SOLUTIONS Maximizing The Business Value Of Your Employees.

ICT in Primary Language Learning Presentation English Didactics Course Janne Lumme 13th Oct 2004.

CS 352, W12 Eric Happe, Daniel Sills, Daniel Thornton, Marcos Zavala, Ben Zoon ANDROID/IOS RPG GAME UI.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

Information Extraction from Wikipedia: Moving Down the Long Tail Fei Wu, Raphael Hoffmann, Daniel S. Weld Department of Computer Science & Engineering.

WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber.

CS 352, W12 Eric Happe, Daniel Sills, Daniel Thornton, Marcos Zavala, Ben Zoon ANDROID/IOS RPG GAME UI.

Week 8 MSE614 – SP 08 Ileana Costea. HW Questions on KA Due today, Week 8 Assigned last session, Week 7 A few verbal questions (see Transparency)

Prof. James A. Landay University of Washington Spring 2008 Web Interface Design, Prototyping, and Implementation The Future of the Web June 3, 2008.

CSC USI Class Meeting 10 November 9, 2010.

Usability Evaluation, part 2. REVIEW: A Test Plan Checklist, 1 Goal of the test? Specific questions you want to answer? Who will be the experimenter?

ApproxHadoop Bringing Approximations to MapReduce Frameworks

BACKGROUND The Web is a global information resource Web users that seek information vary, culturally and ethnically Users of different cultural backgrounds.

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Extraction from Wikipedia: Moving Down the Long.

William H. Bowers – Requirements Torres 9.

The Intelligence in Wikipedia Project Daniel S. Weld Department of Computer Science & Engineering University of Washington Seattle, WA, USA Joint Work.

Coached Active Learning for Interactive Video Search Xiao-Yong Wei, Zhen-Qun Yang Machine Intelligence Laboratory College of Computer Science Sichuan University,

GOOGLE TAG MANAGER. INTRODUCTION Google Tag Manager (GTM) is a free solution, introduced in October Google Tag Manager (GTM) is a free solution,

WP4 Models and Contents Quality Assessment

SIE 515 Design Evaluation Lecture 7.

Soliciting Reader Contributions to Software Tutorials

Usability Evaluation, part 2

Information Extraction from Wikipedia: Moving Down the Long Tail

Unit 14 Website Design HND in Computing and Systems Development

Soliciting Reader Contributions to Software Tutorials

Analysis of Software Usability Evaluation Methods

Approaching an ML Problem

A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.

Presentation transcript:

Amplifying Community Content Creation with Mixed-Initiative Information Extraction Raphael Hoffmann, Saleema Amershi, Kayur Patel, Fei Wu, James Fogarty, Daniel S. Weld

“What Russian-born writers publish in the U.S.?”

Advanced Interfaces Leverage Structure of Content Huynh et al., UIST’06 Hoffmann et al., UIST’07 Toomim et al., CHI’09 Dontcheva et al., UIST’06, UIST’07

How can we obtain the necessary structure on Web scale? Community Content Creation Information Extraction

Community Content Creation

Requires Critical mass Incentives

Information Extraction

Training data expensive Error-prone

Our Goal: Synergistic Pairing

More user contributions

More precise extractors

What this work is about Synergistic method for amplifying Community Content Creation and Information Extraction Use of search advertising for evaluation

Outline Motivation Case Study: Intelligence in Wikipedia Designing for the Wikipedia Community Search Advertising Deployment Study Conclusion

Case Study: Intelligence in Wikipedia What Russian-born writers publish in the U.S.?Search

Some Structured Content in Wikipedia

Lack of Structured Content in Wikipedia

Previous Work: Learning from Existing Infoboxes [Wu et.al. CIKM’07] Ben is living in Paris. Extractor (~60-90% precision)

Community-based Validation of Extractions “We think Ayn Rand’s birthplace is Saint Petersburg. Is this correct?”

Outline Motivation Case Study: Intelligence in Wikipedia Designing for the Wikipedia Community Search Advertising Deployment Study Conclusion

Method Design Interviews with Wikipedians Design of 3 interfaces Talk-aloud studies with 9 participants Evaluation Search advertising study with 2473 visitors

Incentivizing Contribution Audience Target experienced Wikipedians (power law) Target newcomers Motivation Co-ercion (unacceptable to Wikipedia) Using information extraction to make the ability to contribute visible and easy

Contribution as a Non-Primary Task We want to solicit contributions from people pursuing some other task (the information need that brought them to this article) Using information extraction to ease contribution, we explore a tradeoff between intrusiveness and contribution rate (Popup, Highlight, and Icon designs)

Designed Three Interfaces Popup (immediate interruption strategy) Highlight (negotiated interruption strategy) Icon (negotiated interruption strategy)

Popup Interface

Highlight Interface hover

Highlight Interface

hover

Highlight Interface

Icon Interface hover

Icon Interface

hover

Icon Interface

Outline Motivation Case Study: Intelligence in Wikipedia Designing for the Wikipedia Community Search Advertising Deployment Study Conclusion

How do you evaluate this? Contribution as a non-primary task Can lab study show if interfaces increase spontaneous contributions?

Search Advertising Study Deployed interfaces on Wikipedia proxy 2000 articles One ad per article “ray bradbury”

Search Advertising Study Select interface round-robin Track session ID, time, all interactions Questionnaire pops up 60 sec after page loads Logs baseline popup highlight icon proxy

Baseline Interface

Search Advertising Study Used Yahoo and Google 2473 visitors Deployment for ~ 7 days ~ 1M impressions Estimated cost: $1500 (generous support from Yahoo)

An Early Observation “We think Ray Bradbury’s nationality is American. Is this correct?” “Please check with the Britannica!” “If I knew would I really need to look” “We think the summary should say Ray Bradbury’s nationality is American. Is this what the article says?”

BaselineIconHighlightPopup Visitors Distinct Contributors Contribution Likelihood 0%3.0%7.5%7.8% Number of Contributions Contributions per Visit Survey Responses Saw I Could Help Improve 11/33 (33%) 30/73 (41%) 23/58 (40%) 24/52 (46%) Intrusiveness (1:not – 5:very)

BaselineIconHighlightPopup Visitors Distinct Contributors Contribution Likelihood 0%3.0%7.5%7.8% Number of Contributions Contributions per Visit Survey Responses Saw I Could Help Improve 11/33 (33%) 30/73 (41%) 23/58 (40%) 24/52 (46%) Intrusiveness (1:not – 5:very)

More user contributions

More precise extractors

Users are conservative Of extractions that visitors marked as correct, 90.4% were indeed valid Of extractions that visitors marked as incorrect, 57.9% were indeed incorrect

Area under Precision/Recall curve with only existing infoboxes Area under P/R curve birth_date birth_place death_date nationality occupation Using 5 existing infoboxes per attribute 0.12

Area under Precision/Recall curve after adding user contributions 0.12 Area under P/R curve birth_date birth_place death_date nationality occupation Using 5 existing infoboxes per attribute

Improvements and Number of Existing Infoboxes Improvements larger if few existing infoboxes –significant improvements for 5, 10, 25, 50, 100 existing infoboxes Most infobox classes have few instances –72% of classes have 100 or fewer instances –40% of classes have 10 or fewer instances

Synergy

Going Beyond Wikipedia Research on contribution to communities shows parallels between Wikipedia and others Wikipedians may not be typical, but our contributions were solicited from people using search to complete their everyday tasks Goal: Hooks to platforms like MediaWiki

Conclusions Synergistic method for amplifying Community Content Creation and Information Extraction –Significantly increased likelihood of contribution –Significantly improved quality of extraction Demonstrated use of search advertising in evaluating interfaces as a non-primary task

Raphael Hoffmann Saleema Amershi Kayur Patel Fei Wu James Fogarty Daniel S. Weld University of Washington This work was supported by Office of Naval Research grant N , CALO grant , NSF grant IIS , the WRF / TJ Cable Professorship, a UW CSE Microsoft Endowed Fellowship, a NDSEG Fellowship, a Web- advertising donation by Yahoo, and an equipment donation from Intel’s Higher Education Program. Thank You!

Related Work Snow, O’Connor, Jurafsky, Ng. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks, EMNLP’08 DeRose, Chai, Gao, Shen, Doan, Bohannon, Zhu. Building Community Wikipedias: A Human-Machine Approach, ICDE’08 Ahn, Dabbish. Labeling Images with a Computer Game, CHI’04 Mankoff, Hudson, Abowd. Interaction Techniques for Ambiguity Resolution in Recognition-Based Interface, UIST’00 Culotta, Kristjansson, McCallum, Viola. Corrective Feedback and Persistent Learning for Information Extraction. Artificial Intelligence 170(14) Cosley, Frankowski, Terveen, Riedl. SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia, IUI’07