Download presentation
Presentation is loading. Please wait.
Published byRandolf Floyd Modified over 10 years ago
1
Microsoft Translator William Lewis wilewis@microsoft.com Kites Symposium October 31, 2013 - Helsinki, Finland
2
Overview Introduction to Microsoft Translator, Tools, Products, etc. Extent of Localization - Methods of Applying MT Collaborative MT Assessing Quality Application in Knowledge Base Building your own MT Collaboration with Language Communities 2
3
Why MT? The purpose The Crude Extent of localization Data Mining & Business Intelligence Globalized NLP Triage for human translation Research Machine Learning Statistical Linguistics Same-language translation The Good Breaking down language barriers Text, Speech, Images & Video Language Preservation NOT: Spend less money Take the job of human translators Perform miracles
4
Microsoft Translator – Quick Facts Linguistically informed statistical MT system 41 languages – from any language to any other language Runs in Microsoft Datacenter Simple web service API: SOAP, REST, AJAX, OData, web site widget 2 million characters/month free Available in the Enterprise Agreement, as a monthly subscription For extreme confidentiality situations, available on-premise Highly customizable: – Collaborative Translations – Involve community, coworkers and customers – Hub: Custom engine training via an easy-to use UI Web Scale – Powers translations in Bing, Microsoft Office, Microsoft SharePoint, Internet Explorer, Yammer – Powers translations in Facebook, Twitter, eBay, and many other government and enterprise sites 4
5
Microsoft Translator at a Glance World-class Statistical Machine Translation Built on over a decade of work at Microsoft Research Big Data Powered Trained with billions of “parallel” sentences (Bing index & licensed) General Purpose System Powers Bing Translator, supports 40+ languages, any-to-any Powerful Cloud API Rich, secure API enabling integrations, 99.9% availability
6
Fully integrated across the stack, Translator extends the value of Microsoft platform and your solutions built on the Microsoft platform for our customers including consumer facing applications such as Bing Translator, Bing Toolbar, Bing Dictionary, and Windows Phone App. +80,000 more. A few of our customers and partners…. Enabling Translation in Many Products
7
Powerful Tools and Customization Our machine learning & big-data based translation technology brings the power of instant translations to break down language barriers for users, developers, webmasters, translators and businesses. Robust, industry leading tools such as the HUB and CTF allow for unprecedented customization of the translation experience. Instant translation and language services in web, desktop and mobile applications. Highly scalable and robust cloud- based, machine-translation service from Microsoft. Supports SOAP, REST, AJAX, OData, and the Translator web site translation widget. Extensibility for development on SharePoint, Office, Windows Phone, and more….. Instant translations of web pages without the need to write any code. Use the AJAX API to roll-your-own widget. Use the integrated “Collaborative Translations” (CTF) functionality to tap into your community. Custom translation portal to build, train, and deploy customized automatic language translation systems. Combine your data with Bing big data to tune the translation output to best fit your content. Free with any level of Translator subscription (including the free tier). Override, modify or vote for the translated output to best fit the content. Provide the end-user alternative translations. Import the edits back into Hub for further training. HubCTF Widget Powerful API
8
Integrates with your TM tool 8 Top translation tools support Microsoft Translator
9
Give these a try! (Demo)
10
Price Competitively priced Monthly subscription Free for up to 2 million characters per month Base price: $10 per million characters Discounted for higher volumes Paid by credit card or via Microsoft Enterprise agreement 10
11
Extent of localization Methods of applying MT 11
12
Extent of localization Methods of applying MT 12
13
The Triangle You can have only two. Not anymore! 13 Price Speed Quality P3 P3: Post-Publishing Post-Edit
14
The cost/quality curve Optimize for the knee 14 Highly visible marketing content Low pageview supporting content No cost No translation Low cost MT+TM+ Community High cost Fully qualified HT Very high cost Expert reviewed translation/ transcreation User satisfaction Good enough for the intended purpose $
18
Always there Always current Always retaining human translations Always ready to take feedback and corrections ---------- Midori Tatsumi, Takako Aikawa, Kentaro Yamamoto, and Hitoshi Isahara Proceedings of Association for Machine Translation in the Americas (AMTA) November 2012
19
Collaboration: MT + Your community What makes this possible – fully integrated 100% matching TM Collaborative TM entries: Rating 1 to 4: unapproved Rating 5 to10: Approved Rating -10 to -1: Rejected 1 to many is possible
20
Making it easier for the approver – Pending edits highlight
21
Making it easier for the approver – Managing authorized users
22
Making it easier for the approver – Bulk approvals
23
What is Important? In this order Quality Access Coverage
24
Measuring Quality: Human Evaluations Knowledge powered by people Absolute 3 to 5 independent human evaluators are asked to rank translation quality for 200 sentences on a scale of 1 to 4 – Comparing to human translated sentence – No source language knowledge required 24 Also: Relative evals, against a competitor, or a previous version of ourselves
25
Measuring Quality: BLEU* Cheap and effective – but be aware of the limits A fully automated MT evaluation metric – Modified N-gram precision, comparing a test sentence to reference sentences Standard in the MT community – Immediate, simple to administer – Correlates with human judgments Automatic and cheap: runs daily and for every change Not suitable for cross-engine or cross- language evaluations 25 * BLEU: BiLingual Evaluation Understudy Result are always relative to the test set.
26
Measuring Quality In Context Real-world data Instrumentation to observe user’s behavior A/B testing Polling 26 In-Context gives you the most useful results
27
27 Knowledge Base (since 2003)
29
29
30
30 Knowledge base feedback
31
Source: Martine Smets, Microsoft Customer Support 31 Knowledge Base Resolve Rate Human Translation Machine Translation Microsoft is using a customized version of Microsoft Translator
32
Statistical MT - The Simple View 32
33
Collaboration: MT + Your community Remember the collaborative TM? There is more.
34
Collaboration: You, your community, and Microsoft You, your community and Microsoft working together to create the optimal MT system for your terminology and style
35
35
40
Multiple community models – Necessity: driven by crisis – Love of language: driven by strong language/cultural identification – Preservation: desire to preserve language Haitian Creole White Hmong Community-driven MT
41
One of two official languages in Haiti A creole that evolved from French, Spanish, and several African languages (large % French-like) Spoken natively by most of Haiti’s 8M people Recent as a written language (first literature dates to late 18 th century), growing literature base Semi-literate population, with preference to French (until recently) Somewhat inconsistent orthography Limited (but growing) Web presence Haitian Creole
42
The earthquake of January 12 th, 2010 a significant humanitarian crisis. Aid agencies, foreign governments, a variety of NGOs, all responded en masse Tranbleman tè nan Pòtoprens, kapital Ayiti! Moun ap fouye pami debri yon bilding ki kraze nan tranblemann' tè 12 Janvye a. Pòtoprens te catastrophically afekte 12 janvye 2010 tranbleman tè a. Need for translated materials critical, especially those related to medicine and the relief effort. Mission 4636 text messages from the field (up to 5K/day at peak) require rapid translation
43
At 10:30 a.m. on Tuesday, January 19 th 2010, our team received an e-mail from a Microsoft employee in the field: – Do we have a translator for Haitian Creole? – If not, could we make one? A little soul searching: – No one on our team knew anything about Creole No native speakers No linguistic background on the language No idea about grammatical structure – No idea about encoding or orthography – No knowledge about registers or the degree of literacy – No parallel or monolingual training data of any kind (nor readily available documents we could start with) – In effect, we were starting at Zero So what else could we do but say “YES!” The E-mail
44
Emergency SMS infrastructure Setup immediately in wake of Jan. 12, 2010 quake Mission 4636 Mission 4636: Received SMSs Translated Categorized Triaged Routed to aid agencies
45
Fanmi mwen nan Kafou, 24 Cote Plage, 41A bezwen manje ak dlo Moun kwense nan Sakre Kè nan Pòtoprens Ti ekipman Lopital General genyen yo paka minm fè 24 è Fanm gen tranche pou fè yon pitit nan Delmas 31 Mission 4636 Messages My family in Carrefour, 24 Cote Plage, 41A needs food and water People trapped in Sacred Heart Church, PauP General Hospital has less than 24 hrs. supplies Undergoing children delivery Delmas 31 Over 80,000 messages received, up to 5,000+/day
46
Crisis Infrastructure: Message Pipeline SMS Tweets Media Message Portal Crowd (Translate) MT Triage Geolocate Lewis et al, 2011
47
White Hmong: not a crisis scenario like Creole But, a language in crisis Some background: – The Hmong Languages – The Hmong Diaspora – Decline of White Hmong and its usage in younger Hmong White Hmong
48
Involves two critical groups: – Community of native speakers – Community leader(s) Wide spectrum of users across the Hmong community: – College students – High school students – School teachers – School administrators, deans, professors – Business professionals – Elders Community Engagement
49
Locating and vetting data – Locate data – Review documents that contain Hmong data – Review parallelism of Hmong-English documents Actively correcting errors from the engine Contributing translation “repairs” on web sites that translate to Hmong Building MT: Community Contributions
50
Home page (Web page viewer, cut-and-paste translator) Haitian Creole and Hmong are among the languages available through our API (Advanced Programming Interface) – Multiple interfaces: AJAX, SOAP, HTTP – Can integrate translation directly into a variety of apps Widget – Integrate translation into Web pages – Traffic kept client side Tools Available for Haitian Creole and Hmong
51
Widget/Collaborative Translation Framework (CTF) – Community can contribute translations – These can be published to Web pages – Mixes MT with “trusted” human translations Tools Available for Haitian Creole and Hmong
52
52 Just visit http://hub.microsofttranslator.com to do it yourself
53
Contacts Web site www.microsoft.com/translator Licensing & Pricing Questions mtlic@microsoft.com General & Customer Questions translator@microsoft.com
54
54
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.