CS460/IT632 Natural Language Processing/Language Technology for the Web Guest Lecture (31/03/06) Prof. Niladri Chatterjee IIT Delhi Guest Lecture on Machine.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Linguistics Lecture-12: X-bar Theory; Adjuncts and Complements
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CS626: NLP, Speech and the Web
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 16: Perceptrons and their computing power 6 th and.
A cognitive study of subjectivity extraction in sentiment annotation Abhijit Mishra 1, Aditya Joshi 1,2,3, Pushpak Bhattacharyya 1 1 IIT Bombay, India.
Statistical NLP: Lecture 3
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
CSE Department, I.I.T. Bombay Automatic Lexicon Generation through WordNet by Nitin Verma and Pushpak Bhattacharyya Jan 21, 2004.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20-21– Natural Language Parsing.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011.
Artificial Intelligence for Universal Networking Language (UNL) (Perspective Bengali Language) By Deen Islam Muslim ID: Ariful Hoque Tuhin ID:
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
I could never play football in the playground carefully last year.
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
CS : Speech, Natural Language Processing and the Web/Topics in Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12: Deeper.
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 37– Semantics; Universal Networking Language) Pushpak Bhattacharyya CSE Dept.,
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.
Introduction to CL & NLP CMSC April 1, 2003.
English Comprehension and Composition – Lecture 7 Objectives: Verb Tenses in English Language Practice Exercises.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.
Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai.
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Linear Separability,
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28: Principal Component Analysis; Latent Semantic Analysis.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35: Semantic Relations; UNL; Towards Dependency Parsing.
Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Vishal Vachhani CFILT and DIL, IIT Bombay CS 671 ICT For Development 19 th Sep 2008.
Introduction to Dialogue Systems. User Input System Output ?
Grammars Grammars can get quite complex, but are essential. Syntax: the form of the text that is valid Semantics: the meaning of the form – Sometimes semantics.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 9 Continuation of Logic and Semantic Web.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 17 (14/03/06) Prof. Pushpak Bhattacharyya IIT Bombay Formulation of Grammar.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 6 (14/02/06) Prof. Pushpak Bhattacharyya IIT Bombay Top-Down and Bottom-Up.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
Linguistics Lecture-1: Words Pushpak Bhattacharyya, CSE Department, IIT Bombay 14 June, 2008.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 13 (17/02/06) Prof. Pushpak Bhattacharyya IIT Bombay Top-Down Bottom-Up.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 34: Precision, Recall, F- score, Map.
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /08/05 Prof. Pushpak Bhattacharyya Fuzzy Logic Application.
Natural Language Processing (NLP)
MORPHOLOGY. PART 1: INTRODUCTION Parts of speech 1. What is a part of speech?part of speech 1. Traditional grammar classifies words based on eight parts.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Parsing Algos.
A method to restrict the blow-up of hypotheses... A method to restrict the blow-up of hypotheses of a non-disambiguated shallow machine translation system.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture /08/05 Prof. Pushpak Bhattacharyya Fuzzy Inferencing.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 11: Evidence for Deeper Structure; Top Down Parsing.
Statistical NLP: Lecture 3
A Parser for Sinhala Language First Step Towards English to Sinhala Machine Translation
Basque language: is IT right on?
Statistical NLP: Lecture 13
CS : Speech, NLP and the Web/Topics in AI
Creation of English and Hindi Verb Hierarchies and their Application to Hindi WordNet Building and English-Hindi MT Debasri Chakrabarti, Gajanan Krishna.
Linguistic Essentials
Classical Part of Speech (PoS) Tagging
ARTIFICIAL INTELLIGENCE
Prof. Pushpak Bhattacharyya, IIT Bombay
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
Prof. Pushpak Bhattacharyya, IIT Bombay
Presentation transcript:

CS460/IT632 Natural Language Processing/Language Technology for the Web Guest Lecture (31/03/06) Prof. Niladri Chatterjee IIT Delhi Guest Lecture on Machine Translation

31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 2 Machine Translation Source Language Machine Translation System Target Language Understanding

31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 3 Problems in Machine Translation (MT) 1.I take rice with dal. I take rice with my friend. Same syntax but different semantics 2.Polysemy 3.The computer prints data. It is fast. The computer prints data. It is numeric. Different meaning for “it” in both cases.

31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 4 Problem with Multilingual MT systems Suppose we have a multilingual MT system with N languages O(N 2 ) translators required Interlingua: Intermediate language, which captures the semantics. The translation is: SL -> IL -> TL The number of MT translators required is O(2N)

31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 5 Other Approaches for MT Word Based Approach Rule Based Approach Statistical Approach Generation-Heavy Approach Example Based Approach

31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 6 Example Based Approach Knowledge base of translation examples. Given input, apply similarity metric to pick up a close match. Adapt the retrieved translation to suit the current requirement.

31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 7 Example for English to Bengali translation using Example Based Approach -Ram goes to school Ram bidyalaya jaay -Ram goes home Ram bari jaay -Sita goes to school ? (guess to get a feel)

31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 8 Some considerations 1.Similarity measure 2.What are the adaptation strategies?

31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 9 Typical Techniques used Word Deletion Ram eats rice with spoon. Ram chamach diye bhaat khaaye Ram eats rice ? (guess it, given that from dictionary you have Bengali word for spoon is “chamach”) Word Addition Word Replacement Word Swapping

31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 10 A simple assumption “Sentences of similar structure in the source language have a similar structure in the target language.”

31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 11 Problems with the assumption.. Translation Divergence –It is running Wah bhaag raha hai –It is raining Baarish ho rahi hai Structural Divergence –Ram will attend the meeting Ram sabha mein jayegaa –Ram will go to school Ram school jayegaa

Problems.. (contd.) Promotional Divergence –The fan is on [adverb] Pankha chal [verb] raha hai –The fan is good [adjective] Pankha achcha [adjective] hai Conflational Divergence (conflate: to make bigger) –To get same meaning we have to add more words than in SL. Ram killed Ravana –Ram ne Ravan ko mara => No divergence Ram stabbed Ravana –Ram ne Ravan ko chaku se mara => divergence 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 12

Problems.. (contd.) Categorical Divergence –She is hungry Use bhookh lagi hai –She is beautiful Wah sundar hai In approx. 12% of sentences divergence occur. 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 13

Solution to Divergence Classify as standard or divergence translation –Measure the similarity of a sentence in two databases. Example She is in panic She is in trouble She is in pain –Present all the solutions to the user. 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 14

Adaptation Problem There is more morphological variation in Hindi than in English 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 15

Divergence Identification 7 types of divergence between Hindi and English are defined –Based on 7K-8K sentences 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 16

Word Sense Disambiguation I saw the man with a binocular –Keep the ambiguity even in the translation 31/03/06Prof. Pushpak Bhattacharyya, IIT Bombay 17