Presentation is loading. Please wait.

Presentation is loading. Please wait.

Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre.

Similar presentations


Presentation on theme: "Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre."— Presentation transcript:

1 Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre for Studies in Lexicology

2 Objectives  Introducing available resources, tools, and MT in translation training  Testing customisable MT as a time- saving tool for « industrial » translation  Using simple tools and immediately available resources to improve MT translation results

3 Translation training  Post-graduate students in language industry (LI) and specialised translation (ST):  Translation, linguistics, localisation, technical writing  Dreamweaver, Catalyst, HTML, XML, SQL, UNIX, translation memory, etc.  Semi-professional: every other week with a private company in translation or language industry  Corpus linguistics and applications to terminology and translation => project in ST (HOWTO) + LI (analysis + feedback to Systran)

4 Experiment Translating some yet untranslated Linux HOWTOs, using a MT system  subdomain of computing  Highly specialised texts  written by computer experts – and not technical writers – for computer experts  Translated by French-speaking computer experts + Translating computing dictionary entries

5 Systranet  Systran’s on-line customisable service  Domain-specific dictionaries  User dictionaries:  Mono- or multitarget  « advanced » linguistic information  On-line source and target text alignement  Words not in any (Systran’s or user’s) dictionary  Words in the user’s dictionary

6 Resources + Tools  Headwords + equivalents + linguistic information  On-line technical bilingual glossaries  On-line term bases  Comparable and translation technical corpora  The Web as a corpus  Term extraction (Terminology Extractor)

7 Methodology  Step one dictionary: extracting term candidates from text  Creating and coding step-one dictionary  First translation using the dictionary  Step two dictionary: changing and/or adding linguistic information using Systranet’s alignment and color features + linguistic analysis (feedback)  Step two: until the dictionary is saturated

8 Web-based HOWTO glossary  Several French equivalents  boot,root disk= disquettes (d') amorce ou de démarrage, racine  browser= butineur, navigateur, arpenteur  buffer=tampon  to build= bâtir  currently= actuellement  feedback=comment contacter l'auteur, retour d'information  A.D.S.L. (noun)=raccordement numérique asymétrique

9 Step 1:Terminology Extractor  French and English dictionaries  Morphological analysis  Stop words  Collocations: sequence of 2 to 10 words repeated at least once  Non-words  Concordances

10 TE non-words DebianNetscapeaccellerate PermediaDennisXFCE RedHatDialogsCorel RgbPathFAQsanoying ServerFlagsHowtoMicrodoft ServerLayourREADMELinux XkbLayoutXkbModelRealAudio SolarisISAdegredation UI KDEGUI USB LeftOfIRQs WindowMakerModulePathNFS

11 TE collocations Internet Gateway 3 { Looking look } at the Network 3 IP aliasing 3 name server 4 ISA { card cards } 3 Network { Device devices } 4 latest version 3Linux computer 3 DHCP Server 15IP { addresses address } 16 Linux gateway 3Linux box16 modules file3card on the Linux box 4 scripts / ifcfg 3DNS { Server servers } 17 server will start 3 interface configuration file 3 { Network networking } { Card Cards } 12

12 « Le grand dictionnaire terminologique »  Looking for French equivalents ENGLISHFRENCH buffer mémoire tampon n. f.Syn. buffer storage tampon n. m. buffer memory mémoire intermédiaire n. f intermediate memory zone tampon n. f.

13 HOWTO translation corpus  English source – French translation  WALL: Web-based environment  Concordances with perl-like regexp  Paragraph alignment  French equivalents  lexicogrammatical information  semantic classes  « statistical » information in the domain

14 HOWTOs: equivalents The daemon [ … ] listens to all messages on each network device Le d é mon [ … ] é coute tous les messages sur chacun des p é riph é riques r é seau All the Digital cards will autoprobe for their media Toutes les cartes Digital effectueront la détection automatique du média The latest source distribution can be FTPed from the directory ftp … or Mosaiced from http … On peut charger la derni è re version sur ftp … et sous Mosaic depuis http … Called by the kernel when the card posts an interrupt. Appel é par le noyau quand la carte d é clenche une interruption

15 HOWTOs « semantic classes » can I run 32-bit video games under dosemu used to run Linux on a 386/16 MHz ( unless you want your modem to answer the phone The static SLIP server will answer your modem call

16 WebCorp  The web as a corpus  Concordances : buffer, run* * * on  Updated information  More elements

17 buffer  me des débordements de buffer (tampon en français). Pour  com/advisories/bufero.html. Writing buffer overflow exploits – a tutorial for  de NOP. débordement de buffer dans le tas (heap buffer overflow)  (buffer overflow). débordement de buffer sous windows (et oui ;-)) --[

18 Customized dictionary « Advanced » linguistic information, such as:  Part-of-speech information  noun, proper noun (product name, country, etc.), verb, adjective, sentence  Morphological information  URL (noun) (plural:URLs) / cache (noun)(masculine)  Lexicogrammatical information  access (verb)(noprep)=accéder (verb)(prep:à)  Basic semantic information  to run (verb)(context:OS)  Unix (noun) (SEMCAT:OS)  Idioms  Your mileage may vary (sentence)

19 Dictionary Sample "AT&T" (company name) auto-dial (noun)=numérotation automatique (noun) automatic number identification (noun)=identification de l'appelant (noun) based (adjective)(noprep)=architecturé (adjective)(prep:autour) basic language constructs (noun) (plural)=base de construction du langage (noun) (singular) to log in (verb)=se loger (verb) to introduce (verb) (context:extensions)=introduire to carry (verb)(context:digital data)=transmettre (verb)

20 With Step-one dictionary This page contains a simple cookbook for setting up Red Hat 6.X as an internet gateway for a home network or small office network. Cette page contient un cookbook simple pour le chapeau rouge 6.X d'établissement en tant que Gateway d'Internet pour un réseau à la maison ou le petit réseau de bureau.

21 With Step-two dictionary This page contains a simple cookbook for setting up Red Hat 6.X as an internet gateway for a home network or small office network. Cette page contient des recettes simples pour installer Red Hat 6.X en tant que passerelle Internet pour un réseau domestique ou un petit réseau de bureau.

22 Error typology  Morphosyntax: subject-verb or noun-adjective agreement  Syntax:  POS ambiguïty  NP: determiners, NP coordination  transformations/ellipsis/cleft sentences/PP attachment  Metacharacters  « Bugs »

23 Error examples (1)  I am not going *je n'vais pas => je ne vais pas  the phase of the light through it *la phase du dépassement léger par lui => la phase de la lumière qui les traverse.  decoded by specific individuals. *décodée par les individus spécifiques. décodée par des individus spécifiques.  A cable or ADSL connection *un câble ou une connexion d’AADSL Une connexion par câble ou ADSL

24 Error examples (2) When a user picks or is assigned a password, it is encoded with a randomly generated value called the salt. => *Quand un utilisateur sélectionne ou est généré un mot de passe, il est codé avec une valeur aléatoirement produite appelée le sel.

25 Conclusion  Translation results can be significantly improved by creating customised dictionaries  The tools mentionned here are user-friendly  But, it implies much work in the beginning + translators must have a training in linguistics and basic NLP.  Change of attitude towards MT + various tools, especially in the language industry oriented option

26 More things to be done..  Merging all dictionaries together into a « Systranet term base »  Translating more HOWTOs  Project with Systran: improve user coding  …


Download ppt "Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre."

Similar presentations


Ads by Google