Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Similar presentations


Presentation on theme: "Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi."— Presentation transcript:

1 Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi Sato (Kyoto-U), Hideo Watanabe (IBM Japan)

2 Our method Introduction 1-2% Co-occurrence information Parallel Corpus Syntactic Information Translation dictionary Statistical approach 50% Translation examples

3 Goal 大きく 寄与して いること が (great) (contribution) case-maker 大きく 寄与して いること が (great) (contribution) case-maker This paper shows great contributions of TFP ・・・ 示されている (show) 示されている (show) ・・・ 全要素生産性 が (TFP) case-maker 全要素生産性 が (TFP) case-maker

4 Problems For finding many correspondences Translation Dictionary 1: some words can not be consulted by a dictionary 2: ambiguity resolution of consulting dictionary 2 Problems

5 Overview Introduction Method Experiments Conclusion

6 Method Step 1 Detection of Phrasal Dependency Structure Detection of Basic Phrasal Correspondences by Consulting Dictionary Discovery of New Correspondences By Handling Remaining Phrases Step 2 Step 3

7 Step1: Phrasal Dependency Structures I bought this car by monthly installments I bought this car by monthly installments. ESG (English Parser) Rules

8 Step1: Phrasal Dependency Structures Rules  Function words are grouped together with a following content-word.  A compound noun is considered as one phrase.  Auxiliary verbs are grouped together with a following verb. (is playing, was tired, …)  A parallel-relation word is considered as one phrase. ( and, or,… )

9 Step2: Detection of Phrasal Correspondences information technologyin science technology 科学 技術 に (Science Technology) おける情報 技術 (Information Technology) …… … …

10 information technologyin science technology 科学 技術 に (Science Technology) おける情報 技術 (Information Technology) Step2: Detection of Phrasal Correspondences …… … …

11 information technologyin science technology 科学 技術 に (Science Technology) おける情報 技術 (Information Technology) Step2: Detection of Phrasal Correspondences …… … …

12 information technologyin science technology 科学 技術 に (Science Technology) おける情報 技術 (Information Technology) … … … …

13 Step2: Detection of Phrasal Correspondences in science technology 科学 技術 に (Science Technology) おける情報 技術 (Information Technology) … … … information technology …

14 Criteria to choose phrasal correspondences –Correspondences of content words –Correspondences of neighboring phrases # of word-link X 2 # of J content-word + # of E content-word Step2: Detection of Phrasal Correspondences

15 Method Step 1 Detection of Phrasal Dependency Structure Detection of Basic Phrasal Correspondences by Consulting Dictionary Discovery of New Correspondences By Handling Remaining Phrases Step 2 Step 3

16 Step3: Discovery of New Correspondences By Handling Remaining Phrases (New) in post Cold war years 冷戦 終結 後 に (cold-war) (end) (after) case-maker 冷戦 終結 後 に (cold-war) (end) (after) case-maker and services goods 物 や (object) サービス の (service) サービス の (service) (merge)

17 Criteria to discover new correspondences –Local and Global supports Local support: other phrasal correspondences within two-phrase distance in the dependency structure. Global support: phrase correspondences in the parallel sentences. –POS Consistency –Inner Sufficiency Step3: Discovery of New Correspondences By Handling Remaining Phrases

18 Japan the role 日本 は (Japan) case-maker 日本 は (Japan) case-maker 役割 を (Role) case-maker 役割 を (Role) case-maker 果たす (Achieve) play Step3: Discovery of New Correspondences By Handling Remaining Phrases

19 ・・・ technology become important 技術 が (technology) case-maker 技術 が (technology) case-maker 重要 と ( important ) 重要 と ( important ) なっている ( become ) has ・・・ Step3: Discovery of New Correspondences By Handling Remaining Phrases

20 Experiments Evaluation data: 200 sentence-pairs form White Paper & Example sentences in a Japanese-English dictionary Gold standard data: We manually tagged correct correspondences on these sentences. Correct : Exactly equal with a pre-aligned Near-correct: Partly matches with a pre-aligned Wrong : No match with Correct & Near-correct

21 Output Examples EnglishJapaneseScore is being pursued of G7 nations geographical proximity 行われている (is doing by ) 先進 7 カ国の (advanced 7 countries ) 地理的に近い (near in geography) 2.75 2.6 2.0 tree (become) went [to bed] She ( held) その木は (That tree is) 寝る (Go to bed) 彼女は (She is) 1.2 1.0 0.5 Near-correct Correct

22 Precision – Recall Correct→ Correct + Near-Correct × 0.5→

23 Conclusion We can find more correspondences than statistical approach. In comparable corpus, a statistical approach seems to be effective, however in parallel corpus, our approach is more effective to get large number of translation examples. Statistical approach 1-2% of the input corpus Our system51-68% of the input corpus

24

25 Future Directions Correspondences which are found by this system effectively works? Necessary for the tests in a translation system


Download ppt "Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi."

Similar presentations


Ads by Google