LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Machine Translation Anna Sågvall Hein Mösg F
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
The current status of Chinese- English EBMT -where are we now Joy (Ying Zhang) Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
EBMT1 Example Based Machine Translation as used in the Pangloss system at Carnegie Mellon University Dave Inman.
NICE: Native language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown Carnegie Mellon University.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Seminar on Endangered Languages Alan W Black, Robert Frederking, Lori Levin, Laura Tomokiyo Language Technologies.
APPROACHES and METHODS IN LANGUAGE TEACHING
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000.
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Statistical Alignment and Machine Translation
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
SIG IL 2000 Evaluation of a Practical Interlingua for Task-Oriented Dialogue Lori Levin, Donna Gates, Alon Lavie, Fabio Pianesi, Dorcas Wallace, Taro Watanabe,
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
A Language Independent Method for Question Classification COLING 2004.
Speech-to-Speech MT Design and Engineering Alon Lavie and Lori Levin MT Class April
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Designing a Machine Translation Project Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University CATANAL Planning Meeting Barrow,
Supertagging CMSC Natural Language Processing January 31, 2006.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Recent Advances in Speech Translation Systems ESSLLI-2002 Tutorial Course August 12-16, 2002 Course Organizers: Alon Lavie – Carnegie Mellon University.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
A method to restrict the blow-up of hypotheses... A method to restrict the blow-up of hypotheses of a non-disambiguated shallow machine translation system.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Neural Machine Translation
Approaches to Machine Translation
Basic Parsing with Context Free Grammars Chapter 13
Approaches to Machine Translation
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University

LingWear Scenario Military personnel in the field: –Civil Affairs units deployed in a foreign language environment - need to communicate with the local population –Allied command post - communication and exchange of documents among multi-lingual allied forces Non-military humanitarian mission forces

LingWear - Main Goals Language Technology for the Information Warrior Assimilation of foreign language information in a variety of forms: –Written documents –transcribed speech –signs Bi-directional translation in support of conversational communication Emphasis on portability and rapid deployment - consistent with the DARPA Quick MT vision

LingWear - Main Tasks For Assimilation: uni-directional translation from the source language to English Summarization in the source language prior to translation: –focus translation effort only on interesting material –minimize translation error For Conversational Interaction: bi-directional translation, but in limited task-oriented domains

Scientific Approach Pursue a Multi-Engine Approach: –several different translation engines with different strengths and weaknesses –combination can leverage from the strengths of all available engines –overall robustness and flexibility: reduces the dependence on availability of specific resources General theme of using Machine Learning for fast development of various engines and increased portability

Uni-directional Translation of Text Main engine: Generalized Example-based MT –acquired bilingual corpus of example translations –semantic and syntactic generalizations allow translation of similar sentences and phrases to those in the corpus Backup and support engines: bilingual glossaries and dictionaries Builds on previous rapid deployment MT work on DIPLOMAT : Serbo-Croatian, Haitian-Creole Required NLP tools in support of translation engines: POS tagger, morphological analysis, shallow chunk parser

Generalized EBMT Input sentence matched to foreign side of corpus Sub-sentential alignment done using bilingual dictionary Quality scores currently assigned heuristically Unchosen edges remain in lattice, available through GUI Dictionary can be statistically derived from corpus “Generalized” EBMT allows complex equivalence classes Foreign Side of Corpus English Side of Corpus Input Sentence S Sentence Pairs containing matches to subsets of S Lattice of Quality-Scored Translation Hypotheses English Trigram Language Model Top Best Path of Hypotheses

GEBMT vs. Statistical MT GEBMT uses examples at run time, rather than training a parameterized model. Thus: –GEBMT can work with a smaller parallel corpus than Stat MT –Large target language corpus still useful for generating target language model –Much faster to “train” (index examples) than Stat MT; until recently was much faster at run time as well –Generalizes in a different way than Stat MT (whether this is better or worse depends on match between Statistical model and reality): Stat MT can fail on a training sentence, while GEBMT never will GEBMT generalizations based on linguistic knowledge, rather than statistical model design

Bi-directional Translation of Conversational Language Main engine: trainable interlingua-based translation engine –phrase-level analysis done using rule-based robust parser (SOUP) –higher-level analysis into interlingua representation done using a trained classifier –Generation done using simple rule-based template text generator

Trainable Interlingua Analysis Soup Parser Input Sentence Phrase Analyses Trained Classifier Interlingua DA

Bi-directional Translation of Conversational Language Interlingua concepts pre-designed to cover task oriented language in a limited set of domains English analysis and generation developed in advance Generation grammar for new language can be developed in about two weeks Phrase-level analysis grammar for a new language can be developed in about 1-2 months

Summarization Performed in the foreign language, prior to translation Summarization performed using the MMR approach Necessary NLP tools for summarization: POS tagger, morphological analyzer, shallow phrase-level parser