Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Translation Jan Odijk Utrecht. March 7, 2011 1.

Similar presentations


Presentation on theme: "Machine Translation Jan Odijk Utrecht. March 7, 2011 1."— Presentation transcript:

1 Machine Translation Jan Odijk Utrecht. March 7,

2 Overview Lexicons Statistical MT MT: What is (perhaps) possible Conclusions 2

3 Lexicons “Wat helemaal niet moeilijk is –Grote woordenboeken met veel moeilijke woorden en vaktermen” –(Steven Krauwer, vorige college) I disagree 3

4 Lexicons True if you know the words and terms in advance But new words and terms (usually with different translations) are created all the time in science, technology and industry So you must have techniques to find (identify, extract) such new words/terms and their translations as automatically as possible –To tune the lexicons to specific domains –to continuously extend them 4

5 Lexicons Many terms are multiword expressions –With some internal variation –Not always contiguous –This requires special treatment in the lexicon and in the grammar House* of representatives (Chambre* des représentants) Patatas* fritas* (French fries*) Chômeur* (Unemployed person*) 5

6 Lexicons Modern formal grammars depend highly on lexical properties They have very general rule schemata, which are filled in by properties of lexical items –e.g. a word of category X and its complements form a XPhrase –E.g. mass nouns can occur without article in singular; –count nouns can occur with een in singular 6

7 Lexicons Properties of lexical items –E.g. which complements a verb takes E.g. a direct object noun phrase, also an indirect object, predicate, prepositional complement, etc E.g. an infinitival complement, with or with te, with or without om, with or without a subject, etc. –With which preposition it can be combined Kijken naar, zorgen voor, houden van –Nouns: mass or count? 7

8 Lexicons Traditional dictionaries do not contain such information (or very rarely) And what is available is not represented in a formal manner So computers cannot use this information directly 8

9 Lexicons It is very difficult to assign such properties correctly in a systematic manner –It requires very good knowledge of syntax –Often the phenomena are not understood well enough –Words often have multiple options with different meanings and translations –Try it yourself for lopen; innemenlopeninnemen –Count/Mass: vis; wijn; bestek; meubilair 9

10 Lexicons It is very difficult to assign such properties correctly in a systematic manner (Cont.) –Lexicographers are not trained to assign such properties –It must be done for many words –Consistency within one person is hard to achieve –Consistency among multiple people is evebn harder 10

11 Lexicon: Semantics Selection restrictions with type system to approach modeling of world knowledge –Requires sophisticated syntactic analysis Boek: info (legible) Uur: time unit  duration Vergadering: event  duration Lezen: subject=human; object=info (legible) Durational adjunct must be a duration phrase 11

12 Lexicon: Semantics Selection restrictions –Pak (1) (suit): cloths –Pak (2) (package): entity –Dragen (1) (wear): subj=animate; object=cloths –Dragen (2) (carry): subj=animate; object= entity –Schoen: cloths –Entity > cloths –Identity preferred over subsumption –Homogeneous object preferred over heterogeneous one 12

13 Lexicon: Semantics Selection restrictions –Hij draagt een bruin pak He wears a brown suit (1: cloths=cloths) He carries a brown package (1: entity=entity) He carries a brown suit (2: entity > cloth) *He wears a brown package (cloth ¬> entity) –Hij draagt een bruin pak en zwarte schoenen He wears a brown suit and black shoes (1: homogeneous and cloths=cloths) He carries a brown suit and black shoes (2: homogeneous but entity > cloths) He carries a brown package and black shoes(2: inhomogeneous but entity=entity) *He wears a brown package and black shoes (cloths ¬> entity) 13

14 Statistical MT Derives MT-system automatically –From statistics taken from Aligned parallel corpora (  translation model) Monolingual target language corpora (  language model) Being worked since early 90’s 14

15 Statistical MT Plus: –No or very limited grammar development –Includes language and world knowledge automatically (but implicitly) –Based on actually occurring data –Currently many experimental and commercial systems Minus: –Requires large aligned parallel corpora –Unclear how much linguistics will be needed anyway –Probably restricted to very limited domains only 15

16 Statistical MT Google Translate (statistical MT)Google Translate Hij draagt een pak.  √He wears a suit. Hij draagt schoenen.  √ He wears shoes. Hij draagt bruine schoenen en een pak.  √ He wears a suit and brown shoes. (!!) Hij draagt het pakket  √ He carries the package Hij heeft een pak aan.  *He has a suit. Voert uw bedrijf sloten uit? –  *Does your company locks out? 16

17 Hybrid MT Euromatrix esp. “the Euromatrix”, andEuromatrixthe Euromatrix –Successor project EuromatrixPlusEuromatrixPlus –… –Efficient inclusion of linguistic knowledge into statistical machine translation –The development and testing of hybrid architectures for the integration of rule-based and statistical approaches 17

18 Hybrid MT META-NET (EU-funding)META-NET –Building a community with shared vision and strategic research agenda –Building META-SHARE, an open resource exchange facility –Building bridges to neighbouring technology fields Bringing more Semantics into Translation Optimising the Division of Labour in Hybrid MT Exploiting the Context for Translation Empirical Base for Machine Translation 18

19 Hybrid MT PACO-MT PACO-MT Investigates hybrid approach to MT –Rule-based and statistical –Uses existing parser for source language analysis –Uses statistical n-gram language models for generation –Uses statistical approach to transfer 19

20 MT: What is (perhaps) possible Cross-Language Information Retrieval Low Quality MT for Gist extraction MT and Speech Technology Controlled Language Limited Domain Interaction with author Combinations of the above Computer-aided translation 20

21 MT: What is (perhaps) possible Cross-Language Information Retrieval (CLIR) –Input query: in own language –Input query translated into target languages –Search in target language documents –Results in target language Translation of individual words only Growing need (growing multilingual Web) No perfect translation required 21

22 MT: What is (perhaps) possible 22

23 MT: What is (perhaps) possible Low quality MT for Gist extraction Low quality but still useful If interesting high quality human translation can be requested (has to be paid for) 23

24 MT: What is (perhaps) possible 24

25 MT: What is (perhaps) possible 25

26 MT: What is (perhaps) possible CLIR –Fills a growing need in the market –Is technically feasible –Creates need for translation of found documents Solved partially by low quality MT Potentially creates need for more human translation Stimulates (funds) research into more sophisticated MT 26

27 MT: What is (perhaps) possible Combine MT (statistical or rule-based) with OCR technology –Make a picture of a text with your phone –Text is OCR-ed –Text is translated –(usually a short and simple text) Linguatec Shoot & TranslateLinguatecShoot & Translate Word Lens 27

28 MT: What is (perhaps) possible Combine MT (statistical or rule-based) with Speech technology –Complicates the problem on the one hand but –Speech technology (ASR) is currently limited to very limited domains (makes MT simpler) –Many useful applications for speech technology currently in the market Directory assistance Tourist Information Tourist communication Call Centers Navigation Hotel reservations –Some will profit from in-built automatic translation 28

29 MT: What is (perhaps) possible Large EC FP6 project TC-STAR (2004-) –(http://www.tc-star.org/)http://www.tc-star.org/ –Research into improved speech technology (ASR and TTS) –Research into statistical MT –Research in combining both (speech-to-speech translation) –In a few selected limited domains 29

30 MT: What is (perhaps) possible Commercial Speech2Speech Translation Jibbigo –http://www.jibbigo.comhttp://www.jibbigo.com Speech-to-speech translation (iPhone, Android) jibbigo-speech-translatorhttp://www.phonedog.com/2009/10/30/iphone-app- jibbigo-speech-translator Talk to Me (Android phones)Talk to Me 30

31 MT: What is (perhaps) possible Controlled Language –Authoring System limits vocabulary and syntax of document authors –Often desirable in companies to get consistent documentation (e.g. aircraft maintenance manuals) AECMA Simplified English GIFAS Rationalized French –Makes MT easier (language well-defined) 31

32 MT: What is (perhaps) possible Limited Domain –Translation of Weather reports (TAUM-Meteo, Canada) Avalanche warnings (Switzerland) –Fast adaptation to domain/company-specific vocabulary and terminology 32

33 MT: What is (perhaps) possible Interaction with author –No fully automatic translation –Document author resolves Ambiguities unresolved by the system In a dialogue between the author and the system in the source language Approach taken in Rosetta project (Philips) Will only work if the –#unresolved ambiguities is low –Questions to resolve ambiguity are clear 33

34 MT: What is (perhaps) possible Hij droeg een bruin pak –Wat bedoelt u met “pak” (1) kostuum (2) pakket Hij droeg een bruin pak –Wat bedoelt u met “dragen (droeg)” (1) aan of op hebben (kleding) (2) bij zich hebben (bijv. in de hand) 34

35 MT: What is (perhaps) possible Combinations of the above 35

36 MT: What is (perhaps) possible Computer-aided translation –For end-users –For professional translators/localization industry Limited functionality –Specific terminology Bootstrap translation automatically –Human revision and correction (Post-edit) Only if –MT Quality is such that it reduces effort –The system is fully integrated in the workflow system 36

37 Conclusions MT is really very difficult! Even making a lexicon for an MT system is very difficult (and a lot of work) Statistical MT yields practical relatively quick to produce systems (but low-quality) –Provided you have huge amounts of data Focus of research is on hybrid systems (mixed statistically based/knowledge based) (PACO-MT, META-NET,…) 37

38 Conclusions Several constrained versions do yield usable technology with state-of-the-art MT In some cases: even potentially creates additional needs for MT and human translation 38

39 –Try it yourself for lopen; innemenlopeninnemen –Count/Mass: vis; wijn; bestek; meubilair 39

40 Do not go beyond this slide 40

41 MT Evaluation Evaluation depends on purpose of MT and how it is used –application, domain, controlled language Many aspects can be evaluated –functionality, efficiency, usability, reliability, maintainability, portability –translation quality –embedding in work flow post-editing options/tools 41

42 MT Evaluation Focus here: –does the system yield good translations according to human judgement –in the context of developing a system Again, many aspects: –fidelity (how close), correctness, adequacy, informativeness, intelligibility, fluency –and many ways to measure these aspects 42

43 MT Evaluation Test suite –Reference = list of (carefully selected) sentences with their translations (ordered by score) –translations judged correct by human (usually developer) –upon every update of the system output of the new system is compared to the reference if different: system has to be adapted, or reference has to be adapted Advantages –focus on specific translation problems possible –excellent for regression testing –Manual judgement needed only once for each new output –other comparisons are automatic Disadvantages –not really independent –particularly suited for pure rule-based systems –human judgement needed if output differs from reference 43

44 MT Evaluation Comparison against –translation corpus –independently created by human translators –possibly multiple equivalently correct translations of a sentence Advantages –truely independent –also suited for data-driven systems Disadvantage –requires human judgement (every time there is a system update) high effort by highly skilled people, high costs, requires a lot of time –human judgement is not easy (unless there is a perfect match) Useful –for a one-time evaluation of a stable system –not for evaluation during development 44

45 MT Evaluation Edit-Distance (Word Accuracy) –metric to determine closeness of translations automatically –the least number of edit operations to turn the translated sentence into the reference sentence –Alshawi et al

46 MT Evaluation WA = 1- ((d+s+i)/max(r,c)) d= number of deletions s = number of substitutions i = number of insertions r = reference sentence length c = candidate sentence length easy to calculate using Levenshtein distance algorithm (dynamic programming) various extensions have been proposed 46

47 MT Evaluation Advantages –fully automatic given a reference set Disadvantages –penalizes candidates if a synonym is used –penalizes swaps of words and block of words too much 47

48 MT Evaluation BLEU (method to automate MT Evaluation) –the closer a machine translation is to a professional human translation, the better it is –BiLingual Evaluation Understudy Required: –corpus of good quality human reference translations –a “closeness” metric 48

49 MT Evaluation Two candidate translations from Chinese source –C1: It is a guide to action which ensures that the military always obeys the commands of the party –C2: It is to insure the troops forever hearing the activity guidebook that party direct Intuitively: C1 is better than C2 49

50 MT Evaluation Three reference translations –R1: It is a guide to action that ensures that the military will forever heed Party commands –R2: It is the guiding principle which guarantees the military forces always being under the command of the Party –R3: It is the practical guide for the army always to heed the directions of the party 50

51 MT Evaluation Basic idea: –a good candidate translation shares many words and phrases with reference translations –  comparing n-gram matches can be used to rank candidate translations n-gram: a sequence of n word occurrences –in BLEU n=1,2,3,4 -1-grams give a measure of adequacy -longer n-grams give a measure of fluency 51

52 MT Evaluation For unigrams: –count the number of matching unigrams in all references –divide by the total number of unigrams (in the candidate sentence) 52

53 MT Evaluation Problem –C1: the the the the the the the (=7/7=1) –R1: the cat is on the mat Solution: –clip matching count (7) by maximum reference count (2)  2 (Count Clip ) –  modified unigram precision = 2/7=

54 MT Evaluation Example (unigrams) –C1: It is a guide to action which ensures that the military always obeys the commands of the party (17/18=0.94) –R1: It is a guide to action that ensures that the military will forever heed Party commands –R2: It is the guiding principle which guarantees the military forces always being under the command of the Party –R3: It is the practical guide for the army always to heed the directions of the party 54

55 MT Evaluation Example (unigrams) –C2: It is to insure the troops forever hearing the activity guidebook that party direct (8/14=0.57) –R1: It is a guide to action that ensures that the military will forever heed Party commands –R2: It is the guiding principle which guarantees the military forces always being under the command of the Party –R3: It is the practical guide for the army always to heed the directions of the party 55

56 MT Evaluation Example (bigrams) –C1: It is a guide to action which ensures that the military always obeys the commands of the party (10/17=0.59) –R1: It is a guide to action that ensures that the military will forever heed Party commands –R2: It is the guiding principle which guarantees the military forces always being under the command of the Party –R3: It is the practical guide for the army always to heed the directions of the party 56

57 MT Evaluation Example (bigrams) –C2: It is to insure the troops forever hearing the activity guidebook that party direct (1/13=0.08) –R1: It is a guide to action that ensures that the military will forever heed Party commands –R2: It is the guiding principle which guarantees the military forces always being under the command of the Party –R3: It is the practical guide for the army always to heed the directions of the party 57

58 MT Evaluation Extend to a full multi-sentence corpus compute n-gram matches sentence by sentence sum the clipped n-gram counts for all candidates divide by the number of n-grams in the text corpus p n = –∑ C ∈ {Candidates} ∑ n-gram ∈ C Count clip (n-gram) –divided by –∑ C’ ∈ {Candidates} ∑ n-gram’ ∈ C’ Count(n-gram’) 58

59 MT Evaluation Combining n-gram precision scores weighted linear average works reasonable –∑ N n=1 w n p n but: n-gram decisions decays exponentially with n (so log to compensate for this) –exp (∑ N n=1 w n log p n ) weights in BLEU: w n = 1/N 59

60 MT Evaluation BLEU is a precision measure –#(C ∩ R) / #C Recall is difficult to define because of multiple reference translations –e.g. #(C ∩ Rs) / # Rs where Rs = U i R i –will not work 60

61 MT Evaluation C1: I always invariably perpetually do C2: I always do R1: I always do R2: I invariably do R3: I perpetually do Recall of C1 over R1-3 is better than C2 but C2 is a better translation 61

62 MT Evaluation But without Recall: –C1: of the –compared with R1-3 as before –modified unigram precision = 2/2 –modified bigram precision = 1/1 –which is the wrong result 62

63 MT Evaluation Length –n-gram precision penalizes translations longer than the reference –but not translations shorter than the reference –  Add Brevity Penalty (BP) 63

64 MT Evaluation b i = best match length = reference sentence length closest to candidate sentence i‘s length (e.g. r:12, 15, 17, c: 12  12) r = test corpus effective reference length = ∑ i b i c = total length of candidate translation corpus 64

65 MT Evaluation BP = –computed over the corpus –not sentence by sentence and averaged –1 if c > r –e (1-r/c) if c <= r BLEU = BP exp (∑ N n=1 w n log p n ) 65

66 MT Evaluation BLEU: –claim: BLEU closely matches human judgement when averaged over a test corpus not necessarily on individual sentences shown extensively in Papineni et al –  multiple reference translations are desirable to cancel out translation styles of individual translators (e.g. East Asian economy v. economy of East Asia) 66

67 MT Evaluation Variants on BLEU –NIST study.pdfhttp://www.nist.gov/speech/tests/mt/doc/ngram- study.pdf different weights different BP –ROUGE (Lin and Hovy 2003) for text summarization Recall-Oriented Understudy for Gisting Evaluation 67

68 MT Evaluation Main Advantage of BLEU –automatic evaluation good for use during development particularly useful for data-based systems Disadvantage –defined for a whole test corpus –not for individual sentences –just measures difference with reference 68


Download ppt "Machine Translation Jan Odijk Utrecht. March 7, 2011 1."

Similar presentations


Ads by Google