Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227.

Similar presentations


Presentation on theme: "Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227."— Presentation transcript:

1 Statistical Machine Translation with Moses Hieu Hoang Localization World

2 Agenda What is Statistical Machine Translation? What is Moses? – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 2

3 Agenda What is Statistical Machine Translation? What is Moses? – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 3

4 What is Statistical Machine Translation? It is very tempting to say that a book written in Chinese is simply a book written in English which was coded into the Chinese code. If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation? Warren Weaver 1949 Moses by Hieu Hoang, University of Edinburgh 4

5 NLP Application – search engines, text mining etc. Big-data – bi-text from the Internet eg. multilingual websites, documents – large monolingual data Learn to translate – from previous translations – models of language What is Statistical Machine Translation? Moses by Hieu Hoang, University of Edinburgh 5

6 What is Statistical Machine Translation? Training Training Data Linguistic Tools bi-text monolingual data dictionary SMT System translation model language model lots of numbers… Using Source Text SMT System translation model language model lots of numbers… § § Source Text Moses by Hieu Hoang, University of Edinburgh 6

7 What is a model? Moses by Hieu Hoang, University of Edinburgh 7 thanks to Precision Translation Tools Translation Model Language Model – (of the target language)

8 What is a model? Translation model – source translation – probability Moses by Hieu Hoang, University of Edinburgh 8 sourcetargetprobability den Vorschlagthe proposal s proposal a proposal the idea this proposal proposal ….

9 What is a model? Language model – Likelihood of sentence – in target language Moses by Hieu Hoang, University of Edinburgh 9 textprobability I would like0.489 would like to0.905 like to commend0.002 to commend the0.472 commend the rapporteur ….

10 Agenda What is Statistical Machine Translation? What is Moses? – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 10

11 What is Moses? Replacement for Pharoah – Academic software – Closed-source Open source Re-written, clean code – More features Large developer community – Initiated by Hieu Hoang – Developed at NLP Workshop Moses by Hieu Hoang, University of Edinburgh 11

12 Agenda What is Statistical Machine Translation? What is Moses? – Timeline – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 12

13 What is Moses? Only for Linux Difficult to use Unreliable Only phrase-based Developed by one person Slow Common Misconceptions Moses by Hieu Hoang, University of Edinburgh 13

14 Only works on Linux Tested on – Windows 7 (32-bit) with Cygwin 6.1 – Mac OSX 10.7 with MacPorts – Ubuntu 12.10, 32 and 64-bit – Debian 6.0, 32 and 64-bit – Fedora 17, 32 and 64-bit – openSUSE 12.2, 32 and 64-bit Project files for – Visual Studio – Eclipse on Linux and Mac OSX Moses by Hieu Hoang, University of Edinburgh 14

15 Difficult to use Easier compile and install – Boost bjam – No installation required Binaries available for – Linux – Mac – Windows/Cygwin – Moses + Friends IRSTLM GIZA++ and MGIZA Ready-made models trained on Europarl Moses by Hieu Hoang, University of Edinburgh 15

16 Unreliable Monitor check-ins Unit tests More regression tests Nightly tests – Run end-to-end training – Tested on all major OSes Train Europarl models – Phrase-based, hierarchical, factored – 8 language-pairs – Moses by Hieu Hoang, University of Edinburgh 16

17 Only phrase-based model – replacement for Pharoah – extension of Pharaoh From the beginning – Factored models – Lattice and confusion network input – Multiple LMs, multiple phrase-tables since 2009 – Hierarchical model – Syntactic models Moses by Hieu Hoang, University of Edinburgh 17

18 Developed by one person ANYONE can contribute – 50 contributors git blame of Moses repository Moses by Hieu Hoang, University of Edinburgh 18

19 Slow thanks to Ken!! Decoding Moses by Hieu Hoang, University of Edinburgh 19

20 Slow Multithreaded Reduced disk IO – compress intermediate files Reduce disk space requirement Time (mins)1-core2-cores4-cores8-coresSize (MB) Phrase- based 6047 (79%) 37 (63%) 33 (56%) 893 Hierarchical (65%) 473 (45%) 375 (36%) 8300 Training Moses by Hieu Hoang, University of Edinburgh 20

21 What is Moses? Common Misconceptions Only for Linux Difficult to use Unreliable Only phrase-based Developed by one person Slow Moses by Hieu Hoang, University of Edinburgh 21

22 What is Moses? Only for Linux Windows, Linux, Mac Difficult to use Easier compile and install Unreliable Multi-stage testing Only phrase-based Hierarchical, syntax model Developed by one person everyone Slow Fastest decoder, multithreaded training, less IO Common Misconceptions Moses by Hieu Hoang, University of Edinburgh 22

23 Agenda What is Statistical Machine Translation? What is Moses? – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 23

24 Coming up… Moses by Hieu Hoang, University of Edinburgh 24 Code cleanup Incremental Training Better translation – smaller model – bigger data – faster training and decoding Applications – CAT tools – Speech translation

25 Applications EU Project – CASMACAT – MATECAT Moses by Hieu Hoang, University of Edinburgh 25 Computer-Aided Translation

26 Agenda What is Statistical Machine Translation? What is Moses? – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 26

27 What can we do for you? – simpler Moses – graphical interface – Windows compatibility – terminology and glossary – incremental training What can you do for us? – code – data – funding Moses by Hieu Hoang, University of Edinburgh 27

28 What can we do for you? – simpler Moses – graphical interface – Windows compatibility – terminology and glossary – incremental training What can you do for us? – code – data – funding Moses by Hieu Hoang, University of Edinburgh 28


Download ppt "Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227."

Similar presentations


Ads by Google