Presentation is loading. Please wait.

Presentation is loading. Please wait.

European Patent Office Wolfgang Täger December 2006 European Patent Office European Machine Translation Programme.

Similar presentations


Presentation on theme: "European Patent Office Wolfgang Täger December 2006 European Patent Office European Machine Translation Programme."— Presentation transcript:

1 European Patent Office Wolfgang Täger December 2006 European Patent Office European Machine Translation Programme

2 The European Patent Office European Patent Office Overview Programme Partners and Goals MT engine Dictionary format Available corpora Alignment & Extraction Validation & Concordancing DEMO

3 The European Patent Office European Patent Office Programme Partners and Goals Trigger: Success of JP-EN patent translation Agreement EPO - Member States 1.MT of patents/ abstracts/ communications to/from English 2.Three language pairs per year 3.First three languages: FR - DE - ES Candidates for next year: Swedish, Dutch, Italian, Romanian, Greek

4 The European Patent Office European Patent Office MT engine Trial with SMT system (Language Weaver) Call for tender: Winner Worldlingo (Systran) Going public (esp@cenet): December 2006 Needed: Improve translation by specific dictionaries

5 The European Patent Office European Patent Office Dictionary format Desiderata open standard XML-Unicode support features of MT engines support conditional translations (e.g. based on IPC) Is not intended for terminology (no definitions, lexical focus and no semantic focus). OLIF format was chosen How to get dictionaries ? By bilingual term extraction !

6 The European Patent Office European Patent Office Available corpora 560.000 EP-B publications => claims in EN,DE,FR 300.000 DE-T2 publications 37.000 ES-B3/T3 publications => Align corpora for term extraction, concordancing, translation memory (and SMT) CL EN CL FR CL DE DESC EN OR FR OR DE EP-B1 DE-T2 CL ES DESC ES ES B3/T3 (LaTex) (CL DE) DESC DE

7 The European Patent Office European Patent Office Available corpora 560.000 EP-B publications => claims in EN,DE,FR 300.000 DE-T2 publications 37.000 ES-B3/T3 publications => Align corpora for term extraction, concordancing, translation memory (and SMT) CL EN CL FR CL DE DESC EN OR FR OR DE EP-B1 DE-T2 CL ES DESC ES ES B3/T3 (LaTex) (CL DE) DESC DE

8 The European Patent Office European Patent Office Alignment & Extraction Alignment: Trial at EPO with internally developed SW Result was not improved by external companies during call for tender.

9 The European Patent Office European Patent Office Alignment & Extraction Call for tender for bilingual term extraction Winner: DFKI 1.Alignment of corpora, POS tagging, Identification of terms 2.Pairing of terms using clues like co- occurrence score, string similarity, grammatical clues, position, available dictionaries,... 3.Providing further information like gender, inflection, transitivity, countable,...

10 The European Patent Office European Patent Office Validation & Concordancing Development of OLIF editor at EPO Remove noise Correct entries Use concordancer (provides statistics based on parallel corpora) => DEMO

11 The European Patent Office European Patent Office OLIF format Support of more languages Clarification of inflection scheme Clarification of term vs lex approach Tools

12 The European Patent Office European Patent Office Relational database ?? Concept Term SurfForm Lemma InflForm LexType RegEx Infl SemRel Transl Naming

13 The European Patent Office European Patent Office Relational database ?? hot drink... grüner Tee grüner grün Nom. Sg. str. f. pos. DE, Adj -er iLike klein SemRel Transl Naming

14 The European Patent Office European Patent Office End Thank you!


Download ppt "European Patent Office Wolfgang Täger December 2006 European Patent Office European Machine Translation Programme."

Similar presentations


Ads by Google