Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven

Similar presentations


Presentation on theme: "A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven"— Presentation transcript:

1 A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven vincent@ccl.kuleuven.be

2 Translation In a globalizing world there is a growing demand for translation Multilingual websites  frequent updates  user content Shortage of human translators Speed up the translation process by means of TECHNOLOGY

3 Tools of the traditional translator Dictionaries  general dictionaries  domain specific Grammar, spelling Traditional filing system Reference material  similar documents  previous versions of the same document

4 Tools of the contemporary translator Similar to traditional translator, but electronically  Online dictionaries, dictionaries on CD-ROM  Word processors with spell and grammar checkers  Files with client specific terminology (Excel)  Reference material (PDF)

5 Tools of the contemporary translator CAT-Tools (Computer Aided Translation)  Terminology tools keep track of, list and recognize terms  Concordance Tools look up text in a corpus of translated documents  Translation Memories

6 Translation Memories A dictionary of sentences Segmentation problem: what is a sentence?  punctuation, capitalization, lay-out Retrieval of sentences  Same sentence = 100% match Sometimes even better: whole paragraphs What counts as a difference?  spaces, capitals, punctuation, lay-out  Similar sentence = fuzzy match How much difference? “Edit distance” Edit distance is measured in characters/words

7 Examples of fuzzy matches A cat sat on the mat= original A dog sat on the groundED = 66% (w) 60% (c) => 65% The cat sits on a matED = 50% (w) 53% (c) => 50% On the mat sat a catED = 50% (w) 53% (c) => 50% Cats are sitting on the matsED = 33% (w) 33% (c) => 33% Issues:  no sentence gets the threshold of 70%  every word counts the same  word variants are not recognized ED = 90% word diff + 10 % char diff

8 Machine Translation as a Tool When is MT a useful tool for the translator?  respect client specific terminology  translations are comparable with matches from translation memory  consistency  speed

9 Types of MT Rule-based – using syntax / linguistic knowledge – transformations on tree structures / interlingua – hand-made rules vs. induced rules Statistical – using no syntax, no linguistics – transformations on strings (flat) – induced translation models and language models

10 Parse and Corpus-based MT Syntactic machine translation Rule-based Data-driven: rules are induced from parallel corpus – general domain, publicly available (including Dutch) Proceedings European Parliament Translation Memory of DG-Translation (EU) OPUS corpus (subtitles, Open source manuals...) – private translation memories

11 PaCo-MT match source analysis with database of translation rules – when match is found, translation is found – when no match is found, recursively try smaller subtree, and use back-off models to connect translations of found subtrees back-off models have a lower accuracy

12 PaCo-TM match source analysis with database of translation rules – when match is found, translation is found – when no match is found, recursively try smaller subtree, and use human judgement to connect translations of found subtrees human judgement = translator judgement

13 PaCo-TM uses existing translations as source material (client specific)  specific terminlogy  the more translations RESEMBLE translations in the source material, the better the translation syntactic analysis of source and target sentences  categorization of words: verbs, adjectives, nouns... word clusters: noun phrase, infinitival phrase  lemmatization

14 Examples of syntactic fuzziness RESEMBLE = syntactical resemblance Fuzzy: not in characters or words A cat sat on the mat= original A dog sat on the ground= same syntax / insert translation of 'dog' The cat sits on a mat= same syntax / present tense / indef. mat On the mat sat a cat= same syntax / different order Cats are sitting on the mats = different structure / different phrases / different number for subject and object

15 Translation memories vs. PaCo-MT Translation memory offers translator information  Translator makes choices  Inflexible fuzzy matching PaCo-MT uses information  PaCo-MT makes choices  Syntactic flexibility in fuzzy matching  Confidence metrics can be used as threshold

16 Conclusions Flexibility of translation memories can be improved  syntactic  replace words (partial MT) Translator can set thresholds  depending on amount of domain data  depending on language pair Continuum full manual translation memory full machine translation

17 Thank you


Download ppt "A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven"

Similar presentations


Ads by Google