Building a Persistent Workforce on Mechanical Turk for Multilingual Data Collection David L. Chen The University of Texas at Austin William B. Dolan Microsoft.

Slides:



Advertisements
Similar presentations
U.S. Government Language Requirements U.S. Government Language Requirements 7 September 2000 Everette Jordan Department of Defense
Advertisements

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Persuasive Writing.
Adaptxt® Enhanced Keyboards for Smartphones and Tablets: CUSTOM-MADE FOR OEM SUCCESS KeyPoint Technologies February 25, 2013.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Unsupervised Modeling of Twitter Conversations
Collecting Highly Parallel Data for Paraphrase Evaluation David L. Chen The University of Texas at Austin William B. Dolan Microsoft Research The 49th.
 They speak German  8.47 million of people live there.
CO1010 IT Skills in Science Lecture 3: Good Practice in Report Writing.
Vamshi Ambati | Stephan Vogel | Jaime Carbonell Language Technologies Institute Carnegie Mellon University A ctive Learning and C rowd-Sourcing for Machine.
CerOrganic European Conference – Athens, 6/12/2011 Giannis Stoitsis, Alexios Dimitropoulos Agro-Know Technologies.
Clients for XProtect VMS What’s new presentation
No time and no strength I spend time on: on the road in a country house machine maintenance service home Pets Children etc., etc..... By the end of the.
Measuring Monolinguality Chris Biemann NLP Department, University of Leipzig LREC-06 Workshop on Quality Assurance and Quality Measurement for Language.
1 Linguistic Resources needed by Nuance Jan Odijk Cocosda/Write Workshop.
CrowdFlow Integrating Machine Learning with Mechanical Turk for Speed-Cost-Quality Flexibility Alex Quinn, Ben Bederson, Tom Yeh, Jimmy Lin.
Linkkservicesworld LTD. SERVICES Translation English / Spanish / English Interpretation/ Full Professional Medical Support / Editing / Proofreading.
Discussion examples Andrea Zhok.
What's on the Web? The Web as a Linguistic Corpus Adam Kilgarriff Lexical Computing Ltd University of Leeds.
Multilinguality to the Rescue Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.
Towards Boosting Video Popularity via Tag Selection Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu University of British Columbia -
Welcome to the iTEC People & Events Directory … key points!
December 2010iTEC - Designing the future classroom1 Virginija Birenienė iTEC community moderator iTEC Edukata tools in Education.
Extensive Reading Research in Action
Why study languages?. What language is Marc speaking? Any ideas? (click image to start movie clip)
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
By Edward Lim 8.7.  What?  Today we started the Cornerstone Piece and we were given a few tasks to complete. The tasks were to watch the Kurt Fearnly.
Phones and fieldTask. Session Objective Be familiar with: – selecting smart phones for a survey, – configuring them – and using them – fieldTask (c) Smap.
IATE EU tool for translation-oriented terminology work
Defence School of Languages, UK BILC NATO Conference Prague 2012.
JRC-Ispra, , Slide 1 Next Steps / Technical Details Bruno Pouliquen & Ralf Steinberger Addressing the Language Barrier Problem in the Enlarged.
1 Translate and Translator Toolkit Universally accessible information through translation Jeff Chin Product Manager Michael Galvez Product Manager.
Essay and Report Writing. Learning Outcomes After completing this course, students will be able to: Analyse essay questions effectively. Identify how.
1 TURKOISE: a Mechanical Turk-based Tailor-made Metric for Spoken Language Translation Systems in the Medical Domain Workshop on Automatic and Manual Metrics.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.
Comparable Corpora BootCaT (CCBC) (or: In Praise of BootCaT) Adam Kilgarriff, Jan Pomikalek, Avinesh PVS Lexical Computing Ltd. Work Supported by EU FP7.
Collecting primary data: use of questionnaires Lecture 20 th.
Fundamental Programming: Fundamental Programming K.Chinnasarn, Ph.D.
An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.
COMMUNICATION ENGLISH III November 22/23, Today Talk about Task 3 Pt.2 (“The Pitch”).
ELanguages creative collaboration for teachers globally.
Microsoft Visual Basic 2010: Reloaded Fourth Edition Overview An Introduction to Programming.
1 How to Give a Good Presentation? Cliff C. Zou CAP /2010.
Introduction to Python Lesson 1 First Program. Learning Outcomes In this lesson the student will: 1.Learn some important facts about PC’s 2.Learn how.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
Finding the balance between games, learning and communicativity. Presented by Martin McCloud
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.
Towards Semi-Automated Annotation for Prepositional Phrase Attachment Sara Rosenthal William J. Lipovsky Kathleen McKeown Kapil Thadani Jacob Andreas Columbia.
The New Interface for the Library Catalogue Proposal 10.
1 Hitting The Right Paraphrases In Good Time Stanley Kok Dept. of Comp. Sci. & Eng. Univ. of Washington Seattle, USA Chris Brockett NLP Group Microsoft.
Avaya.com Usability Test Findings and Recommendations March 22, 2002 Steve Ellis - Avaya.
ELanguages workshop. Agenda Part 1: IntroductionIntroduction Part 2: Exploration of eLanguagesExploration of eLanguages Part 3: Your personal pageYour.
Writing in English Academic Writing.
F ACTORS TO G OOGLE A D S ENSE A PPROVAL By: Aarif Habeeb.
CREATING A SURVEY. What is a survey questionnaire? Survey questionnaires present a set of questions to a subject who with his/her responses will provide.
Introduction Chomsky (1984) theorized that language is an innate ability ingrained in all humans as expressed by universal grammar. Later, Mitchell and.
Advanced Directives: What to Assess with Seniors
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Measuring Monolinguality
Sales Presenter Available now
Sales Presenter Available now
--Mengxue Zhang, Qingyang Li
Mitubishi Chemical Holdings Group
Definition of Health WHO approved translation
Part of Speech Tagging with Neural Architecture Search
Sales Presenter Available now Standard v Slim
Getting Started with YouTube
Presentation transcript:

Building a Persistent Workforce on Mechanical Turk for Multilingual Data Collection David L. Chen The University of Texas at Austin William B. Dolan Microsoft Research The 3 rd Human Computation Workshop (HCOMP) August 8, 2011

Introduction Collect translation and paraphrase data Statistical machine translation systems require large amounts of parallel corpora Professional translators are expensive – E.g. $0.36/word to create a Tamil-English corpus [Germann, ACL 2001 Workshop] No similar resource for paraphrases

Variety of Natural Language Processing tasks done on Mechanical Turk Collecting paraphrase data – Buzek et al., NAACL 2010 AMT workshop – Denkowski et al., NAACL 2010 AMT workshop Collecting translation data – Ambati and Vogel, NAACL 2010 AMT workshop – Omar Zaidan and Chris Callison-Burch, ACL 2011 Evaluating machine translation quality – Callison-Burch, EMNLP 2009 – Denkowski and Lavie, NAACL 2010 AMT workshop

Issues of using Mechanical Turk Catching people using machine translations – Collect and check against online translations – Use an image instead of text Improving poor translations – Ask other workers to edit the translations – Ask other workers to rank the translations

Issues of using Mechanical Turk Attracting workers – Higher pay doesn’t always mean higher quality – More likely to cheat if pay is high Making tasks simple and quick enough – Difficult to collect data where input is required rather than selecting buttons – More difficult to translate whole sentences than short phrases

Use video to collect language data Collect descriptions of videos (Chen and Dolan, ACL 2011) Parallel descriptions in different languages Translation data Parallel descriptions in the same language Paraphrase data Also useful as video annotation data

Video description task Show a short YouTube video to the user Ask them to write a single-sentence description in any language – Removes incentive to cheat Only require monolingual speakers – Similar to work by Hu et al. [CHI 2011]

Annotation Task Describe video in a single sentence

Example Descriptions Someone is coating a pork chop in a glass bowl of flour. A person breads a pork chop. Someone is breading a piece of meat with a white powdery substance. A chef seasons a slice of meat. Someone is putting flour on a piece of meat. A woman is adding flour to meat. A woman is coating a piece of pork with breadcrumbs. A man dredges meat in bread crumbs. A person breads a piece of meat. A woman is breading some meat. A woman coats a meat cutlet in a dish.

Example Descriptions Ženska panira zrezek. (Slovene) правење шницла (Macedonian) Jemand wendet ein Stück Fleisch in Paniermehl (German) Une femme trempe une cotelette de porc dans de la chapelure. (French) cineva da o felie de carne prin pesmet (Romanian) गोश में कुछ ओत्ता मिला रहा हे (Hindi) ქალი ხორცს ავლებს რაღაცაში (Georgian) Dame haalt lap vlees door bloem. (Dutch) Šnicla mesa se valja u smesu začina (Serbian) Unas manos embadurnan un trozo de carne en un recipiente (Spanish)

Quality Control Tier 1 $0.01 per description Tier 2 $0.05 per description Initially everyone only has access to Tier-1 tasks

Quality Control Tier 1 $0.01 per description Tier 2 $0.05 per description Good workers are promoted to Tier-2 based on # descriptions, English fluency, quality of descriptions

Quality Control Tier 1 $0.01 per description Tier 2 $0.05 per description The two tiers have identical tasks but have different pay rates

Video collection task Ask workers to submit video clips from YouTube Single, unambiguous action/event Short (4-10 seconds) Generally accessible No dialogues No words (subtitles, overlaid text, titles) Also uses multi-tiered payment system

Daily number of descriptions collected

Distribution of languages collected Other: Tagalog, Portuguese, Norwegian, Filipino, Estonian, Turkish, Arabic, Urdu, Hungarian, Indonesian, Malay, Bulgarian, Danish, Bosnian, Marathi, Swedith, Albanian English85550 (33855)Spanish1883 Hindi6245Gujarati1437 Romanian3998Russian1243 Slovene3584French1226 Serbian3420Italian953 Tamil2789Georgian907 Dutch2735Polish544 German2326Chinese494 Macedonian1915Malayalam394

Statistics of data collected Total money spent: $5000 Total number of workers: 835

Number of descriptions submitted Sorted histogram of the top 255 workers

Persistent workforce Return workers who continually work on our task Several workers annotated all available videos, even both Tier-1 and Tier-2 tasks Easier to model and evaluate workers when they submit many annotations Identified good workers require little or no supervision

Sample worker responses “The speed of approval gave me confidence that you would pay me for future work.” “Posters on Turker Nation recommended it as both high-paying and interesting. Both ended up being true.” “I consider this task pretty enjoyable. some videos are funny, others interesting.” “Fast, easy, really fun to do it.”

Lessons learned Design the tasks well – Short, simple, easy, accessible, and fun Learn from worker responses Compensate the workers fairly Communicate – Quick task approval – Explain rejections – Respond to comments and s

Conclusion Introduced a novel way to collect translation and paraphrase data Conducted a successful pilot data collection on Mechanical Turk – Tiered payment system – High-quality persistent workforce All data (including most of the videos) available for download at