CS460/IT632 Natural Language Processing/Language Technology for the Web Guest Lecture (31/03/06) Prof. Niladri Chatterjee IIT Delhi Guest Lecture on Machine Translation
31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 2 Machine Translation Source Language Machine Translation System Target Language Understanding
31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 3 Problems in Machine Translation (MT) 1.I take rice with dal. I take rice with my friend. Same syntax but different semantics 2.Polysemy 3.The computer prints data. It is fast. The computer prints data. It is numeric. Different meaning for “it” in both cases.
31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 4 Problem with Multilingual MT systems Suppose we have a multilingual MT system with N languages O(N 2 ) translators required Interlingua: Intermediate language, which captures the semantics. The translation is: SL -> IL -> TL The number of MT translators required is O(2N)
31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 5 Other Approaches for MT Word Based Approach Rule Based Approach Statistical Approach Generation-Heavy Approach Example Based Approach
31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 6 Example Based Approach Knowledge base of translation examples. Given input, apply similarity metric to pick up a close match. Adapt the retrieved translation to suit the current requirement.
31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 7 Example for English to Bengali translation using Example Based Approach -Ram goes to school Ram bidyalaya jaay -Ram goes home Ram bari jaay -Sita goes to school ? (guess to get a feel)
31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 8 Some considerations 1.Similarity measure 2.What are the adaptation strategies?
31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 9 Typical Techniques used Word Deletion Ram eats rice with spoon. Ram chamach diye bhaat khaaye Ram eats rice ? (guess it, given that from dictionary you have Bengali word for spoon is “chamach”) Word Addition Word Replacement Word Swapping
31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 10 A simple assumption “Sentences of similar structure in the source language have a similar structure in the target language.”
31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 11 Problems with the assumption.. Translation Divergence –It is running Wah bhaag raha hai –It is raining Baarish ho rahi hai Structural Divergence –Ram will attend the meeting Ram sabha mein jayegaa –Ram will go to school Ram school jayegaa
Problems.. (contd.) Promotional Divergence –The fan is on [adverb] Pankha chal [verb] raha hai –The fan is good [adjective] Pankha achcha [adjective] hai Conflational Divergence (conflate: to make bigger) –To get same meaning we have to add more words than in SL. Ram killed Ravana –Ram ne Ravan ko mara => No divergence Ram stabbed Ravana –Ram ne Ravan ko chaku se mara => divergence 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 12
Problems.. (contd.) Categorical Divergence –She is hungry Use bhookh lagi hai –She is beautiful Wah sundar hai In approx. 12% of sentences divergence occur. 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 13
Solution to Divergence Classify as standard or divergence translation –Measure the similarity of a sentence in two databases. Example She is in panic She is in trouble She is in pain –Present all the solutions to the user. 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 14
Adaptation Problem There is more morphological variation in Hindi than in English 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 15
Divergence Identification 7 types of divergence between Hindi and English are defined –Based on 7K-8K sentences 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 16
Word Sense Disambiguation I saw the man with a binocular –Keep the ambiguity even in the translation 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 17