Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microsoft Translator William Lewis Kites Symposium October 31, 2013 - Helsinki, Finland.

Similar presentations


Presentation on theme: "Microsoft Translator William Lewis Kites Symposium October 31, 2013 - Helsinki, Finland."— Presentation transcript:

1 Microsoft Translator William Lewis wilewis@microsoft.com Kites Symposium October 31, 2013 - Helsinki, Finland

2 Overview  Introduction to Microsoft Translator, Tools, Products, etc.  Extent of Localization - Methods of Applying MT  Collaborative MT  Assessing Quality  Application in Knowledge Base  Building your own MT  Collaboration with Language Communities 2

3 Why MT? The purpose The Crude  Extent of localization  Data Mining & Business Intelligence  Globalized NLP  Triage for human translation Research  Machine Learning  Statistical Linguistics  Same-language translation The Good  Breaking down language barriers  Text, Speech, Images & Video  Language Preservation NOT:  Spend less money  Take the job of human translators  Perform miracles

4 Microsoft Translator – Quick Facts  Linguistically informed statistical MT system  41 languages – from any language to any other language  Runs in Microsoft Datacenter  Simple web service API: SOAP, REST, AJAX, OData, web site widget  2 million characters/month free  Available in the Enterprise Agreement, as a monthly subscription  For extreme confidentiality situations, available on-premise  Highly customizable: – Collaborative Translations – Involve community, coworkers and customers – Hub: Custom engine training via an easy-to use UI  Web Scale – Powers translations in Bing, Microsoft Office, Microsoft SharePoint, Internet Explorer, Yammer – Powers translations in Facebook, Twitter, eBay, and many other government and enterprise sites 4

5 Microsoft Translator at a Glance World-class Statistical Machine Translation Built on over a decade of work at Microsoft Research Big Data Powered Trained with billions of “parallel” sentences (Bing index & licensed) General Purpose System Powers Bing Translator, supports 40+ languages, any-to-any Powerful Cloud API Rich, secure API enabling integrations, 99.9% availability

6 Fully integrated across the stack, Translator extends the value of Microsoft platform and your solutions built on the Microsoft platform for our customers including consumer facing applications such as Bing Translator, Bing Toolbar, Bing Dictionary, and Windows Phone App. +80,000 more. A few of our customers and partners…. Enabling Translation in Many Products

7 Powerful Tools and Customization Our machine learning & big-data based translation technology brings the power of instant translations to break down language barriers for users, developers, webmasters, translators and businesses. Robust, industry leading tools such as the HUB and CTF allow for unprecedented customization of the translation experience. Instant translation and language services in web, desktop and mobile applications. Highly scalable and robust cloud- based, machine-translation service from Microsoft. Supports SOAP, REST, AJAX, OData, and the Translator web site translation widget. Extensibility for development on SharePoint, Office, Windows Phone, and more….. Instant translations of web pages without the need to write any code. Use the AJAX API to roll-your-own widget. Use the integrated “Collaborative Translations” (CTF) functionality to tap into your community. Custom translation portal to build, train, and deploy customized automatic language translation systems. Combine your data with Bing big data to tune the translation output to best fit your content. Free with any level of Translator subscription (including the free tier). Override, modify or vote for the translated output to best fit the content. Provide the end-user alternative translations. Import the edits back into Hub for further training. HubCTF Widget Powerful API

8 Integrates with your TM tool 8 Top translation tools support Microsoft Translator

9 Give these a try! (Demo)

10 Price Competitively priced  Monthly subscription  Free for up to 2 million characters per month  Base price: $10 per million characters  Discounted for higher volumes  Paid by credit card or via Microsoft Enterprise agreement 10

11 Extent of localization Methods of applying MT 11

12 Extent of localization Methods of applying MT 12

13 The Triangle You can have only two. Not anymore! 13 Price Speed Quality P3 P3: Post-Publishing Post-Edit

14 The cost/quality curve Optimize for the knee 14 Highly visible marketing content Low pageview supporting content No cost No translation Low cost MT+TM+ Community High cost Fully qualified HT Very high cost Expert reviewed translation/ transcreation User satisfaction Good enough for the intended purpose $

15

16

17

18  Always there  Always current  Always retaining human translations  Always ready to take feedback and corrections ---------- Midori Tatsumi, Takako Aikawa, Kentaro Yamamoto, and Hitoshi Isahara Proceedings of Association for Machine Translation in the Americas (AMTA) November 2012

19 Collaboration: MT + Your community What makes this possible – fully integrated 100% matching TM Collaborative TM entries:  Rating 1 to 4: unapproved  Rating 5 to10: Approved  Rating -10 to -1: Rejected 1 to many is possible

20 Making it easier for the approver – Pending edits highlight

21 Making it easier for the approver – Managing authorized users

22 Making it easier for the approver – Bulk approvals

23 What is Important? In this order  Quality  Access  Coverage

24 Measuring Quality: Human Evaluations Knowledge powered by people  Absolute  3 to 5 independent human evaluators are asked to rank translation quality for 200 sentences on a scale of 1 to 4 – Comparing to human translated sentence – No source language knowledge required 24 Also: Relative evals, against a competitor, or a previous version of ourselves

25 Measuring Quality: BLEU* Cheap and effective – but be aware of the limits  A fully automated MT evaluation metric – Modified N-gram precision, comparing a test sentence to reference sentences  Standard in the MT community – Immediate, simple to administer – Correlates with human judgments  Automatic and cheap: runs daily and for every change  Not suitable for cross-engine or cross- language evaluations 25 * BLEU: BiLingual Evaluation Understudy Result are always relative to the test set.

26 Measuring Quality In Context Real-world data  Instrumentation to observe user’s behavior  A/B testing  Polling 26 In-Context gives you the most useful results

27 27 Knowledge Base (since 2003)

28

29 29

30 30 Knowledge base feedback

31 Source: Martine Smets, Microsoft Customer Support 31 Knowledge Base Resolve Rate Human Translation Machine Translation Microsoft is using a customized version of Microsoft Translator

32 Statistical MT - The Simple View 32

33 Collaboration: MT + Your community Remember the collaborative TM? There is more.

34 Collaboration: You, your community, and Microsoft You, your community and Microsoft working together to create the optimal MT system for your terminology and style

35 35

36

37

38

39

40  Multiple community models – Necessity: driven by crisis – Love of language: driven by strong language/cultural identification – Preservation: desire to preserve language  Haitian Creole  White Hmong Community-driven MT

41  One of two official languages in Haiti  A creole that evolved from French, Spanish, and several African languages (large % French-like)  Spoken natively by most of Haiti’s 8M people  Recent as a written language (first literature dates to late 18 th century), growing literature base  Semi-literate population, with preference to French (until recently)  Somewhat inconsistent orthography  Limited (but growing) Web presence Haitian Creole

42  The earthquake of January 12 th, 2010 a significant humanitarian crisis.  Aid agencies, foreign governments, a variety of NGOs, all responded en masse Tranbleman tè nan Pòtoprens, kapital Ayiti! Moun ap fouye pami debri yon bilding ki kraze nan tranblemann' tè 12 Janvye a. Pòtoprens te catastrophically afekte 12 janvye 2010 tranbleman tè a. Need for translated materials critical, especially those related to medicine and the relief effort. Mission 4636 text messages from the field (up to 5K/day at peak) require rapid translation

43  At 10:30 a.m. on Tuesday, January 19 th 2010, our team received an e-mail from a Microsoft employee in the field: – Do we have a translator for Haitian Creole? – If not, could we make one?  A little soul searching: – No one on our team knew anything about Creole No native speakers No linguistic background on the language No idea about grammatical structure – No idea about encoding or orthography – No knowledge about registers or the degree of literacy – No parallel or monolingual training data of any kind (nor readily available documents we could start with) – In effect, we were starting at Zero  So what else could we do but say “YES!” The E-mail

44  Emergency SMS infrastructure  Setup immediately in wake of Jan. 12, 2010 quake Mission 4636 Mission 4636: Received SMSs Translated Categorized Triaged Routed to aid agencies

45  Fanmi mwen nan Kafou, 24 Cote Plage, 41A bezwen manje ak dlo  Moun kwense nan Sakre Kè nan Pòtoprens  Ti ekipman Lopital General genyen yo paka minm fè 24 è  Fanm gen tranche pou fè yon pitit nan Delmas 31 Mission 4636 Messages My family in Carrefour, 24 Cote Plage, 41A needs food and water People trapped in Sacred Heart Church, PauP General Hospital has less than 24 hrs. supplies Undergoing children delivery Delmas 31 Over 80,000 messages received, up to 5,000+/day

46 Crisis Infrastructure: Message Pipeline SMS Tweets Media Message Portal Crowd (Translate) MT Triage Geolocate Lewis et al, 2011

47  White Hmong: not a crisis scenario like Creole  But, a language in crisis  Some background: – The Hmong Languages – The Hmong Diaspora – Decline of White Hmong and its usage in younger Hmong White Hmong

48  Involves two critical groups: – Community of native speakers – Community leader(s)  Wide spectrum of users across the Hmong community: – College students – High school students – School teachers – School administrators, deans, professors – Business professionals – Elders Community Engagement

49  Locating and vetting data – Locate data – Review documents that contain Hmong data – Review parallelism of Hmong-English documents  Actively correcting errors from the engine  Contributing translation “repairs” on web sites that translate to Hmong Building MT: Community Contributions

50  Home page (Web page viewer, cut-and-paste translator)  Haitian Creole and Hmong are among the languages available through our API (Advanced Programming Interface) – Multiple interfaces: AJAX, SOAP, HTTP – Can integrate translation directly into a variety of apps  Widget – Integrate translation into Web pages – Traffic kept client side Tools Available for Haitian Creole and Hmong

51  Widget/Collaborative Translation Framework (CTF) – Community can contribute translations – These can be published to Web pages – Mixes MT with “trusted” human translations Tools Available for Haitian Creole and Hmong

52 52 Just visit http://hub.microsofttranslator.com to do it yourself

53 Contacts Web site www.microsoft.com/translator Licensing & Pricing Questions mtlic@microsoft.com General & Customer Questions translator@microsoft.com

54 54


Download ppt "Microsoft Translator William Lewis Kites Symposium October 31, 2013 - Helsinki, Finland."

Similar presentations


Ads by Google