Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Machine Translation MT – Research Landscape Stephan Vogel Spring Semester 2011.

Similar presentations


Presentation on theme: "1 Machine Translation MT – Research Landscape Stephan Vogel Spring Semester 2011."— Presentation transcript:

1 1 Machine Translation MT – Research Landscape Stephan Vogel Spring Semester 2011

2 11-711 Machine Translation2 Overview lSome influential projects lOpen source toolkits lConferences lMT evaluations lLiterature and general resources lDisclaimer: this all is incomplete, subjective, biased!

3 11-711 Machine Translation3 MT Projects lVerbmobil lLarge speech translation project in Germany lDifferent translation paradigms lSuccess story for SMT lTIDES lDARPA funded US MT project lSMT widely used, small and large data track evaluations lChinese-English and Arabic-English lGALE lDARPA funded lFollow-up to TIDES lTransTac lDARPA funded lSpeech-to-Speech Translation lTargeted towards force protection

4 11-711 Machine Translation4 MT Projects lTC-Star lEuropean Project with partners from different universities lTechnology and Corpora for Speech-to-Speech Translation lhttp://tcstar.org/ lEuroMatrix l2006-2009, EuroMatixPlus 2009-2012 lTranslate all European languages lOff-springs: WMT evaluations, MT marathon leuromatrix.net lQuero lFrench-German project lKind of TC-Star follow-up lhttp://www.quaero.org/modules/movie/scenes/home/index.php?FUSEB OX_LANG=2

5 11-711 Machine Translation5 Open Source Toolkits: Word Alignment lGame Changer lLower barrier to enter the field lTransparency lWord Alignment lGIZA++ lStarted out at JHU workshop, subsequently extended by Franz Josef Och (at RWTH and ISI) lMost widely used alignment toolkit lmGIZA++ lMulti-threaded/multi-core extension of GIZA++ lBy Qin Gao: http://geek.kyloo.net/software/doku.php/mgiza:overviewhttp://geek.kyloo.net/software/doku.php/mgiza:overview lBerkeley Aligner lWord alignment via quadratic assignment lhttp://code.google.com/p/berkeleyaligner/http://code.google.com/p/berkeleyaligner/ lPostCAT (Posterior Constrained Alignment Toolkit) lhttp://www.seas.upenn.edu/~strctlrn/CAT/CAT.html

6 11-711 Machine Translation6 Open Source Toolkits: WA cont. lWord Alignment tools lAlignment Set lSet of tools to manipulate and display alignments lFrom TALP research group lhttp://www.talp.upc.edu/talp/index.php/en/resources/tools/alingment-set

7 11-711 Machine Translation7 Open Source Toolkits: Decoders lDecoders lMoses (Edinburgh): phrase-based and recently also hierarchical lJoshua (JHU): hiero reimplementation lsourceforge.net/projects/joshua lJane (RWTH Aachen): hierarchical lhttp://www-i6.informatik.rwth-aachen.de/web/Software/index.html lcdec (UMD -> CMU): hierarchical and phrase-based lMarie (TALP): ngram-based (kinda phrase-based) lwww.talp.upc.edu/talp/index.php/en/resources/tools/marie lApertium (University of Alicante): rule-based lPhrasasl (Stanford): phrase-based lhttp://www-nlp.stanford.edu/wiki/Software/Phrasal

8 11-711 Machine Translation8 Open Source Toolkits: LMs lSRILM lMost widely known and used LM toolkit lSALM lWritten by Joy Ying Zhang (while at LTI) lhttp://projectile.sv.cmu.edu/research/public/tools/salm/salm.htmhttp://projectile.sv.cmu.edu/research/public/tools/salm/salm.htm lIRST-LM lhttp://sourceforge.net/projects/irstlm/http://sourceforge.net/projects/irstlm/ lKen-LM lSmaller footprint then SRILM lWritten by Kenneth Heafield (LIT PhD student) lhttp://kheafield.com/code/kenlm/

9 11-711 Machine Translation9 Conferences lGeneral CL conferences lACL lHLT lEMNLP lColing lIJCNLP lInt. Joint Conf on NLP lLREC lLanguage Resources and Evaluation lRANLP lRecent Advances in NLP lSALTMIL lSpeech and Langauge Technology for Minority Languages lSpecific MT conferences lMT Summit (every 2 years) lAMTA (US) lEAMT (Europe) lTMI lTranslating and the Computer (organised by Aslib) lIWSLT (organized by C-Star consortium) l… lMT Workshops lWMT lWorkshop on Machine Translation lSSST lSyntax, Semantics, and Structure in SMT l…

10 11-711 Machine Translation10 Evaluations lIt all started with TIDES lComparative evaluations lDefined training and test data lAutomatic evaluation metrics (NIST mteval, Bleu) lOrganized by NIST lNIST Open MT Evaluations lContinuation and expansion of TIDES MT evaluations lChinese-English, Arabic-English, Urdu-English lRestricted and unrestricted track lOriginally every year, now going to 2 year cycle lhttp://www.itl.nist.gov/iad/mig/tests/mt/2009/

11 11-711 Machine Translation11 Evaluations (cont.) lWMT Evaluations lOrganized in connection with EuroMatrix lBased on Europarl corpora lMany languages lAutomatic and manual evaluation lhttp://www.statmt.org/wmt11/translation-task.html lIWSLT Evaluations lSpoken language lLanguages vary: Chinese, Japanese, Arabic, Italian, … lSpeech 1-best and lattices provided lBased on (small) BTEC corpus (basic traveler expression corpus) lLast time also lecture translations lhttp://iwslt2010.fbk.eu/node/15

12 11-711 Machine Translation12 Evaluations (cont.) lSpecific projects have evaluations lGALE lArabic-English and Chinese-English lBroadcast news and broadcast conversations, newswire and blogs lHuman evaluation (HTER) lGo/No-Go lQuero lEuropean languages, also Arabic-French lThis year WMT evaluation was used as Quero evaluation

13 11-711 Machine Translation13 Journals lMachine Translation lSpringer Science, formerly Kluwer Academic Publishers, vol.4-,1989- lArticles available online (abstracts free, full texts on payment of fee) from Springer lChief editor: Andy Way lhttp://www.springer.com/computer/ai/journal/10590 lComputation Linguistics lMIT Press lNow open access lhttp://www.mitpressjournals.org/loi/coli lACM TSLP lOnline publication lStarted in 2005 lhttp://tslp.acm.org/

14 11-711 Machine Translation14 Journals (cont.) lIEEE Transactions on Audio, Speech, and Langauge Processing lhttp://www.signalprocessingsociety.org/publications/periodicals/taslp/http://www.signalprocessingsociety.org/publications/periodicals/taslp/ lThe Prague Bulletin of Mathematical Linguistics lHas papers from recent MT Marathons, i.e. esp. descriptions of open source packages. lhttp://ufal.mff.cuni.cz/pbml.htmlhttp://ufal.mff.cuni.cz/pbml.html

15 11-711 Machine Translation15 Literature lMT-Archive: http://www.mt-archive.info/http://www.mt-archive.info/ lCompiled by John Hutchins for the EAMT lOne stop shop! lAlso links to books, journals, conferences lPapers listed by author, language, organization lACL Anthology: http://www.aclweb.org/anthology/


Download ppt "1 Machine Translation MT – Research Landscape Stephan Vogel Spring Semester 2011."

Similar presentations


Ads by Google