Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.

Slides:



Advertisements
Similar presentations
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Advertisements

ThemeInformation Extraction for World Wide Web PaperUnsupervised Learning of Soft Patterns for Generating Definitions from Online News Author Cui, H.,
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
References Kempen, Gerard & Harbusch, Karin (2002). Performance Grammar: A declarative definition. In: Nijholt, Anton, Theune, Mariët & Hondorp, Hendri.
An interactive environment for creating and validating syntactic rules Panagiotis Bouros*, Aggeliki Fotopoulou, Nicholas Glaros Institute for Language.
Bilingual Lexical Acquisition From Comparable Corpora Andrea Mulloni.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
METIS-II: a hybrid MT system Peter Dirix Vincent Vandeghinste Ineke Schuurman Centre for Computational Linguistics Katholieke Universiteit Leuven TMI 2007,
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
An innovative platform to allow translation and indexing of internet sites Localization World
Leuven, Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
April 2005CSA2050:NLTK1 CSA2050: Introduction to Computational Linguistics NLTK.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
A Language Independent Method for Question Classification COLING 2004.
02/19/13English-Indian Language MT (Phase-II)1 English – Indian Language Machine Translation Anuvadaksh Phase – II - The SMT Team, CDAC Mumbai.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
© NCSR, Frascati, July 18-19, 2002 WP1: Plan for the remainder (1) Ontology Ontology  Use of PROTÉGÉ to generate ontology and lexicons for the 1 st domain.
Mutual bilingual terminology extraction Le An Ha*, Gabriela Fernandez**, Ruslan Mitkov*, Gloria Corpas*** * University of Wolverhampton ** Universidad.
Toulouse, September 2003 Page 1 JOURNEE ALTARICA Airbus ESACS  ISAAC.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
METIS Project full title: STATISTICAL MACHINE TRANSLATION USING MONOLINGUAL CORPORA: from concept to implementation Project acronym: METIS-II Instrument:
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
A method to restrict the blow-up of hypotheses... A method to restrict the blow-up of hypotheses of a non-disambiguated shallow machine translation system.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
--Mengxue Zhang, Qingyang Li
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an IST Programme, with a 3-year duration (01/10/2004 – 30/09/2007). The METIS II consortium comprises the following partners:  Institute for Language & Speech Processing [ILSP] (co-ordinator)  Katholieke Universiteit Leuven [KUL]  Gesellschaft zur Förderung der Angewandten Informationsforschung [GFAI]  Universitat Pompeu Fabra [UPF] hybrid readily availableresources METIS II is a hybrid system, combining various approaches to machine translation (rule-based, statistical, pattern-matching techniques). It makes use of readily available resources, such as bilingual dictionaries or basic NLP tools, and it can be easily customised to handle different source (SL) and target language (TL) tags. innovative exclusively monolingual TL corpora Most importantly, however, METIS II is innovative because it does not need bilingual corpora for the translation process, but exclusively relies on monolingual TL corpora. recursive METIS II handles sequences both at sentence and sub-sentential level, achieving thus to exploit the recursive property of natural language. weights automatically METIS II employs a series of weights, i.e. system parameters, in various phases of the translation process. Weights are associated with system resources and employed by the pattern-matching algorithm; they can be automatically adjusted to customise system performance. GreekDutchGermanSpanish English Four (4) language pairs have been developed as yet, namely Greek, Dutch, German & Spanish  English. METIS II: Statistical Machine Translation using Monolingual Corpora METIS II: Statistical Machine Translation using Monolingual Corpora (FP6-IST ) Database Server Lexicon BNC Clauses BNC Chunks Token Generation Rules Final Translation NLP NLP tools handle the SL input yielding an SL sequence annotated with grammatical & syntactic information. LexiconLookup The SL sequence is enhanced by translation equivalents & PoS info, thus resembling a TL pattern. Core Engine The core engine of METIS II system is fed with a sequence of TL-like patterns, handled by the pattern-matching algorithm. It proceeds in 2 stages involving wider and narrower contexts, thus generating a TL sequence. Web Interface The end user selects the preferred SL and enters the text to be translated. Token Generation The token generation module receives as input a sequence of translated lemmas & their respective tags; it is responsible for the production of tokens out of lemmas. Weights Evaluation Setup For the system evaluation an experimental corpus extracted from real texts, mainly from newspapers, was used. It consisted of 200 sentences, 50 per language pair. The test sentences were of relative complexity, containing one to two clauses each and covered various syntactic phenomena such as word-order variation, NP structure, negation, modification etc. 3 BLEUNIST The reference translations have been restricted to 3 and were produced by humans, while BLEU & NIST metrics have been used for the evaluation. Evaluation Results Greek Results Dutch Results Fig. 1: Comparative analysis of the score ranges obtained for METIS II and SYSTRAN using the BLEU metric Fig. 2: Comparative analysis of the score ranges obtained for METIS II and SYSTRAN using the NIST metric Fig. 3: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the BLEU metric Fig. 4: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the NIST metric German Results Fig. 8: Comparative analysis of the scores obtained for different settings of METIS II and SYSTRAN using the NIST metric Spanish Results Fig. 7: Comparative analysis of the scores obtained for different settings of METIS II and SYSTRAN using the BLEU metric Future work involves further investigation of METIS II system architecture. More specifically, work towards the system optimisation includes the following:  Further system testing with a big number of test suites that will have more elaborate structures and deal with a wider range of phenomena  Algorithm optimisation in terms of accuracy  Automatic fine tuning of weights  Implementation of a post-editor module Fig. 5: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the BLEU metric Fig. 6: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the NIST metric