Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia.

Similar presentations


Presentation on theme: "Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia."— Presentation transcript:

1 Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

2 Presentation Overview Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

3 Presentation Overview Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

4 Natural Language is Ambiguous bank ??

5 Word Sense Disambiguation Given:  a list of meanings/senses of words (dictionaries)  input text containing occurrences of ambiguous words Assign the correct sense to particular instance of ambiguous word in context A.k.a. “sense-tagging” …. bank#1: a financial institution that accepts deposits and channels the money into lending activities bank#2: sloping land (especially the slope beside a body of water) …. …withdraw money from the bank... bank#1

6 Disambiguation in Machine Translation (1) …. bank#1: a financial institution that accepts deposits and channels the money into lending activities bank#2: sloping land (especially the slope beside a body of water) …. …withdraw money from the bank... (Malay translations) bank tebing …withdraw money from the bank#1... …mengeluarkan wang dari bank... English input Malay output sense-tag (WSD) select translation word That worked well…

7 Disambiguation in Machine Translation (2) …. circulation#6: the spread or transmission of something (as news or money) to a wider group or area …. (Malay translations) edaran (money) penyebaran (berita) …50 ringgit notes in circulation... … 50 ringgit notes in circulation#6... …duit kertas 50 ringgit dalam edaran?? penyebaran?... English input Malay output sense-tag (WSD) translate That DIDN’T work well…

8 Optimising WSD for MT Input wordSense numberTranslation word select (Lee and Kim 2002)

9 Presentation Overview Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

10 Main Objective Existing MT system:  Selects fragments (translation units) from previously translated examples  Re-combines selected translation units to produce translation output for new input text Improve the translation quality of this MT system by adapting a WSD algorithm specifically for MT purposes.

11 Need semantic knowledge about… Word senses  Use dictionary definitions Pairs of translation words  From bilingual knowledge bank (BKB) made up of pairs of sentences that are translations of each other  Corresponding words in each translation sentence pair are explicitly marked Need a model to capture semantic knowledge of lexical items  Conceptual Vectors (Lafourcade 2001)  Using a selection of concepts or themes  Construct mathematical vectors from concepts  Thematic similarity between lexical items ≡ angle between CVs

12 Need to: Compile CVs for word meanings on 2 levels:  Word sense (from dictionary)  Word/phrase translation unit (from BKB) using data compiled from previous step Use compiled information during translation runtime to select correct translation units

13 Presentation Overview Problem Background & Motivation Research Objectives Methodology Advantages and Contributions

14 Brief Outline Dictionary / Lexicon Word senses word → sense number level knowledge Concept Category Labels BKB Examples Translation units tag Translation Unit Profile (word → translation level knowledge) Input Text “clues” matching, comparison, selection selected translation units Translated Text Data Preparation PhaseEBMT Run-time Phase

15 Concept Hierarchy Example: GoiTaikei noun concrete abstract agent place object abstract thing event relation person organisation facility region nature animate inanimate mental state action human activity phenomenon natural phenomenon existence categorisation system relation characteristic state form numerical location time

16 circulation#6: the spread or transmission of something (such as news or money) to a wider group or area Definition CVs for Word Senses INFORMATION TRANSMISSION_ OF_INFORMATION SPREAD_MOVEMENT MONEY concepts Activation level concepts Activation level

17 Sense-tagging Translation Examples (English) … number [n] of [prep] one [num_card] ringgit [n] coins [n] in [prep] circulation [n]. … number [n]#2 of [prep] one [num_card]#1 ringgit [n]#1 coins [n]#1 in [prep] circulation [n]#6. … bilangan [n] syiling [n] seringgit [n] dalam [prep] edaran [n]. E: M:

18 circulation peredaran (2299, 2306, 2309) 2299:The circulation#5 of air through the pipes… Peredaran udara melalui paip-paip… 2306:… one ringgit coins in circulation#6. … syiling seringgit dalam peredaran. 2309:…dollar note… withdrawn from circulation#6. Wang kertas … ditarik daripada peredaran. BKB Examples V context (σ)V lex_def (σ)    == =  V profile (σ) V context ( σ, 2299)V lex_def ( σ, 2299) V context ( σ, 2306) V context ( σ, 2309) V lex_def ( σ, 2306) V lex_def ( σ, 2309) σ CVs of Translation Pairs

19 During Translation Dictionary / Lexicon Word senses word → sense number level knowledge Concept Category Labels BKB Examples Translation units tag Translation Unit Profile (word → translation level knowledge) Input Text “clues” matching, comparison, selection selected translation units Translated Text Data Preparation PhaseEBMT Run-time Phase

20 Some Results Translating ‘circulation’ to Malay  edaran or penyebaran TS: proposed translation selection using CVs BS: baseline strategy, chooses  the translation that co-occur with the same input words (and same structure) as in the BKB  or the most frequently occuring translation Input Translation chosen by TS Translation chosen by BS We will stop the circulation of that magazine. edaran  penyebaran We will stop the circulation of that rumour. penyebaran We will stop the circulation of that newspaper. edaran  penyebaran

21 Presentation Overview Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

22 Advantages and Weaknesses Pros:  optimized for EBMT focus on translation selection, bypass intermediate WSD at run time Handles many-to-many mapping of source word  sense  translation words  allows for bi-directional translation with sense-tagging for 1 language  mathematical operations on vectors are easy to implement  avoids combinatorial effect when multiple ambiguous words in input Cons:  not all ambiguities can be solved using co-occurring concepts  does not handle translation selection of function words  manual work required in data preparation

23 Research Contributions Adaptation of a WSD approach for the specific aim of translation selection Proposal of specific guidelines for assigning related concepts for word meanings from dictionaries Production of knowledge about word meanings on two levels:  Word senses as in dictionaries  Translations as in parallel text

24 Summary WSD can be customized for different NLP applications accordingly  Different requirements  Increase efficiency WSD and related tasks based on concepts common to co-occurring word senses can be facilitated using conceptual vector model  Requires a concept category hierarchy and word sense list  Concepts related to a word sense modelled as mathematical vector  Conceptual similarity = angular distance between vectors Future work  Automating data preparation tasks  Investigating suitable weights or normalizing factors during CV manipulation  Integration with other WSD or translation selection strategies

25 Future Work Automate tagging tasks that are currently done manually Investigate different weight values for CVs for different syntactic relations or word classes Integrate with other WSD/translation selection tasks

26 Thank You


Download ppt "Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia."

Similar presentations


Ads by Google