Presentation is loading. Please wait.

Presentation is loading. Please wait.

LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.

Similar presentations


Presentation on theme: "LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University."— Presentation transcript:

1 LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University

2 LingWear Scenario Military personnel in the field: –Civil Affairs units deployed in a foreign language environment - need to communicate with the local population –Allied command post - communication and exchange of documents among multi-lingual allied forces Non-military humanitarian mission forces

3 LingWear - Main Goals Language Technology for the Information Warrior Assimilation of foreign language information in a variety of forms: –Written documents –transcribed speech –signs Bi-directional translation in support of conversational communication Emphasis on portability and rapid deployment - consistent with the DARPA Quick MT vision

4 LingWear - Main Tasks For Assimilation: uni-directional translation from the source language to English Summarization in the source language prior to translation: –focus translation effort only on interesting material –minimize translation error For Conversational Interaction: bi-directional translation, but in limited task-oriented domains

5 Scientific Approach Pursue a Multi-Engine Approach: –several different translation engines with different strengths and weaknesses –combination can leverage from the strengths of all available engines –overall robustness and flexibility: reduces the dependence on availability of specific resources General theme of using Machine Learning for fast development of various engines and increased portability

6 Uni-directional Translation of Text Main engine: Generalized Example-based MT –acquired bilingual corpus of example translations –semantic and syntactic generalizations allow translation of similar sentences and phrases to those in the corpus Backup and support engines: bilingual glossaries and dictionaries Builds on previous rapid deployment MT work on DIPLOMAT : Serbo-Croatian, Haitian-Creole Required NLP tools in support of translation engines: POS tagger, morphological analysis, shallow chunk parser

7 Generalized EBMT Input sentence matched to foreign side of corpus Sub-sentential alignment done using bilingual dictionary Quality scores currently assigned heuristically Unchosen edges remain in lattice, available through GUI Dictionary can be statistically derived from corpus “Generalized” EBMT allows complex equivalence classes Foreign Side of Corpus English Side of Corpus Input Sentence S Sentence Pairs containing matches to subsets of S Lattice of Quality-Scored Translation Hypotheses English Trigram Language Model Top Best Path of Hypotheses

8 GEBMT vs. Statistical MT GEBMT uses examples at run time, rather than training a parameterized model. Thus: –GEBMT can work with a smaller parallel corpus than Stat MT –Large target language corpus still useful for generating target language model –Much faster to “train” (index examples) than Stat MT; until recently was much faster at run time as well –Generalizes in a different way than Stat MT (whether this is better or worse depends on match between Statistical model and reality): Stat MT can fail on a training sentence, while GEBMT never will GEBMT generalizations based on linguistic knowledge, rather than statistical model design

9 Bi-directional Translation of Conversational Language Main engine: trainable interlingua-based translation engine –phrase-level analysis done using rule-based robust parser (SOUP) –higher-level analysis into interlingua representation done using a trained classifier –Generation done using simple rule-based template text generator

10 Trainable Interlingua Analysis Soup Parser Input Sentence Phrase Analyses Trained Classifier Interlingua DA

11 Bi-directional Translation of Conversational Language Interlingua concepts pre-designed to cover task oriented language in a limited set of domains English analysis and generation developed in advance Generation grammar for new language can be developed in about two weeks Phrase-level analysis grammar for a new language can be developed in about 1-2 months

12 Summarization Performed in the foreign language, prior to translation Summarization performed using the MMR approach Necessary NLP tools for summarization: POS tagger, morphological analyzer, shallow phrase-level parser


Download ppt "LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University."

Similar presentations


Ads by Google