Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Similar presentations


Presentation on theme: "A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;"— Presentation transcript:

1 A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan; Fred Hollowood; Johann Roturier

2 Outline Introduction1 Evaluation on Sentence Level3 Analysis on modifications made by SPE2 Conclusion4

3 Introduction Rule-Based Machine Translation (RBMT) –Three Stages: Analysis: analyze a source text into abstract lexical and structural representations Transfer: convert the source language representations into target language representations Generation: generate the target text

4 Introduction Rule-Based Machine Translation (RBMT) –Three Stages: Analysis: analyze a source text into abstract lexical and structural representations Transfer: convert the source language representations into target language representations Generation: generate the target text Statistical Machine Translation (SMT) –Two Stages: Training: automatically learn translation and language knowledge from parallel corpus Decoding: translate new sentences using the above learned knowledge

5 Introduction Rule-Based Machine Translation (RBMT) –Three Stages: Analysis: analyze a source text into abstract lexical and structural representations Transfer: convert the source language representations into target language representations Generation: generate the target text Statistical Machine Translation (SMT) –Two Stages: Training: automatically learn translation and language knowledge from parallel corpus Decoding: translate new sentences using the above learned knowledge Post-Editing (PE) –Human post-editing –Automatic post-editing –Statistical post-editing (SPE)

6 Introduction Statistical Post-editing (SPE) of Rule-Based Machine Translation (RBMT) Output Knight & Chander (1994) Simard et al. (2007a, 2007b) Flowchart of RBMT Human Post-editor Final output Output 2 Flowchart of SPE RBMT Source Final output Output 1 SPE module SMT Reference RBMT output RBMT Source Output 1 Human Post-editor

7 Introduction –Experimental setting SMT RBMT Human Post-editor SPE module Source Final output Output 1 Output 2 Reference RBMT output Moses Translation Memory: 529,822 (ZH) and 143,742 (JA) Systran -UD: 8,832 entries (ZH) and 6,363 entries (JA) Chinese (ZH); Japanese (JA) English

8 Introduction –Evaluate SPE: Compare Output 2 and output 1 SMT RBMT Human Post-editor SPE module Source Final output Output 1 Output 2 Reference RBMT output

9 Analysis of the Modifications Made by SPE Methodology Pilot project –Random selection of 100 sentences for each language Classify and Evaluate the changes –Classification( Vilar et al. 2006 ) Alteration, Deletion, Addition of Content/Function words Form of Tense/Voice/Imperative/Formality (Politeness) Fixed expression Reordering Punctuation –Evaluation ( Dugast et al. 2007 ) Improvement Degradation Equivalent

10 Analysis of the Modifications Made by SPE Quantitative Evaluation Modifications distribution in Japanese and Chinese ImprovementDegradationEquivalent ZHJAZHJAZHJA Alteration Content words1374519402825 Function words3845691730 Deletion Content words090201 Function words5157451216 Addition Content words403220 Function words12182151 Forms Tense or Voice630035 Formality011000 Imperative080002 Fixed Expression800001 Word / Phrase Reordering913301 Punctuation31474904 Total 29621748727785

11 Analysis of the Modifications Made by SPE Qualitative Evaluation Similarities SourceMT outputSPE output the actions that you specify for that rule JA: あなたがその規則のために指定す る処理 そのルールに指定する処理 After you configure your … ZH: 在 您 配 置 您 的 … 配 置…配 置… Deletion of function words Punctuation SourceMT outputSPE output To maintain … JA: 保守するため … 維持するには … Reverts to … ZH: 恢 复 对 … 恢 复 到... SourceMT outputSPE output MPE provides an option … JA: オプションを提供 します 。オプションがあります. while the synchronization is in progress… ZH: , 当 同 步 进 展 中 时 … 同 步 处 理 …. Alteration of function words

12 Analysis of the Modifications Made by SPE Qualitative Evaluation Similarities Alteration of function words SourceMT outputSPE output the actions that you specify for that rule JA: あなたがその規則のために指定す る処理 そのルールに指定する処理 After you configure your … ZH: 在 您 配 置 您 的 … 配 置…配 置… Deletion of function words Punctuation SourceMT outputSPE output To maintain … JA: 保守するため … 維持するには … Reverts to … ZH: 恢 复 对 … 恢 复 到... SourceMT outputSPE output MPE provides an option … JA: オプションを提供 します 。オプションがあります. while the synchronization is in progress… ZH: , 当 同 步 进 展 中 时 … 同 步 处 理 ….

13 Analysis of the Modifications Made by SPE Qualitative Evaluation Similarities Alteration of function words SourceMT outputSPE output the actions that you specify for that rule JA: あなたがその規則のために指定す る処理 そのルールに指定する処理 After you configure your … ZH: 在 您 配 置 您 的配 置配 置 Deletion of function words Punctuation SourceMT outputSPE output To maintain … JA: 保守するため … 維持するには … Reverts to … ZH: 恢 复 对恢 复 到恢 复 到 SourceMT outputSPE output MPE provides an option … JA: オプションを提供 します 。オプションがあります. while the synchronization is in progress… ZH: , 当 同 步 进 展 中 时 … 同 步 处 理 ….

14 Analysis of the Modifications Made by SPE Qualitative Evaluation Differences Alteration of content words Addition of function words SourceMT outputSPE output console commands JA: コンソールは命じます console コマンド number JA: 番号数 subdomainsZH: subdomains 子 域子 域 SourceMT outputSPE output A black dash indicates that it is disabled. ZH: 黑 色 破 折 号 表 明 它 禁 用。黑 色 线 表 明 它 已 禁 用。黑 色 线 表 明 它 已 禁 用。 On the Spim tab… ZH: 在 Spim 选 项 卡 … 在 Spim 选 项 卡 上 …

15 Analysis of the Modifications Made by SPE Qualitative Evaluation Differences Alteration of content words Addition of function words SourceMT outputSPE output console commands JA: コンソールは命じます console コマンド number JA: 番号数 subdomainsZH: subdomains 子 域子 域 SourceMT outputSPE output A black dash indicates that it is disabled. ZH: 黑 色 破 折 号 表 明 它 禁 用。黑 色 线 表 明 它 已 禁 用。黑 色 线 表 明 它 已 禁 用。 On the Spim tab… ZH: 在 Spim 选 项 卡 … 在 Spim 选 项 卡 上 …

16 Analysis of the Modifications Made by SPE Qualitative Evaluation Reordering SourceMT outputSPE output These threats are then… ZH: 这 些 威 胁 然 后 … 然 后, 这 些 威 胁 … SourceMT outputSPE output (Imperative ending) JA: して下さいします SourceMT outputSPE output In general ZH: 一 般 情 况 下,… 通 常 情 况 下,… Fixed expression Imperatives forms Differences

17 Analysis of the Modifications Made by SPE Qualitative Evaluation Reordering SourceMT outputSPE output These threats are then… ZH: 这 些 威 胁 然 后 … 然 后, 这 些 威 胁 … SourceMT outputSPE output (Imperative ending) JA: して下さいします SourceMT outputSPE output In general,… ZH: 一 般 情 况 下,… 通 常 情 况 下,… Fixed expression Imperatives forms Differences

18 Analysis of the Modifications Made by SPE Qualitative Evaluation Reordering SourceMT outputSPE output These threats are then… ZH: 这 些 威 胁 然 后 … 然 后, 这 些 威 胁…然 后, 这 些 威 胁… SourceMT outputSPE output (Imperative ending) JA: して下さいします SourceMT outputSPE output In general,… ZH: 一 般 情 况 下,… 通 常 情 况 下,… Fixed expression Imperatives forms Differences

19 Evaluation on Sentence Level Methodology –Same 100 segments –Effect of SPE on Fluency, Adequacy and PE time –Four evaluators per language –Random distribution of MT output and SPE output CriteriaChineseJapanese Fluency0.2760.598 Adequacy0.2880.582 Less PE time0.2840.624 Kappa scores (Inter-evaluator agreement level) –Japanese: moderate to substantial agreement –Chinese: generally fair agreement Source_ENOutput 1Output 2FluencyAdequacyLess-PE time Turns on or off the special meaning of metacharacters. オン / オフ回転メタ文字の 特別な意味。 有効または無効にメタ文字 の特別な意味します. 1 / 2 / E

20 Evaluation on Sentence Level Results and Analysis Improvement by SPE: –Chinese ─ Fluency and Adequacy: ≈ 40%, PE time: ≈ 50% –Japanese ─ Fluency, Adequacy, PE time: ≈ 60% LanguageChineseJapanese CriteriaFluencyAdequacyLess PE TimeFluencyAdequacyLess PE Time MT12.7515.5015.0014.508.009.75 SPE37.7538.0048.2559.2561.5062.50 Equal49.5046.5036.7526.0530.5027.75 Total100

21 Conclusions SPE generates more improvement than degradation Three fold for Japanese; Six fold for Chinese Linguistic changes vary between ZH and JA SPE changes are generally limited to word level SPE improves fluency, adequacy, and shortens PE time

22 Questions? midori.tatsumi2@mail.dcu.ie yanli.sun2@mail.dcu.ie


Download ppt "A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;"

Similar presentations


Ads by Google