Presentation is loading. Please wait.

Presentation is loading. Please wait.

DRAVIDIAN WORDNET S.Arulmozi Dravidian University 29 April 2013.

Similar presentations


Presentation on theme: "DRAVIDIAN WORDNET S.Arulmozi Dravidian University 29 April 2013."— Presentation transcript:

1 DRAVIDIAN WORDNET S.Arulmozi Dravidian University 29 April 2013

2 Tamil Thesaurus Preliminary work on lexical semantics. Monumental work on Tamil Thesaurus. Ontologicial classification of Tamil Vocabulary Rajendran, S. (2001) tamizhc coRkaLanjciyam. (in Tamil).Tamil University Publication. 29 April 2013

3 Domains in Tamil Thesaurus Tamil vocabulary is classified into four major domains: Entities Abstracts Events and Relationals 29 April 2013

4 parumaippeyarkaL `concrete nouns ' aHRinaippeyarkaL `irrational nouns' uyirillaatavai `non-living beings ' uruvaakkiya maRRum patananjceyta poruTkaL `manufactured and processed items' kaTTappaTTavai `constructed' Lexical Hierarchy of the Domain `Construction’

5 Nouns RelationsExample SynonymyviiTu ‘house’ - illam `house‘ Hypernymy-HyponymypaLLi 'school' – kalviccaalai 'educational institution‘ Hyponym-Hypernymykalluuri 'college' – aracukkalluuri `govt college‘ Holonymy-MeronymyndaaRkaali 'chair' - kaal 'leg‘ Meronymy-Holonymycakkaram 'wheel' to vaNTi 'cart‘ Related VerbpaTittal ‘reading’ – paTi ‘read’ Coordinate termskooyil `temple' – macuuti 'mosque' 29 April 2013

6 Verbs RelationsExample SynonympaTi ‘read’ – payilu ‘read’ Hypernymycuvai ‘taste’ – uNar TroponymykeeL ‘ask’– kenjcu ‘plead’ Nominalparuku `drink’ – parukutal `drinking’ Related NounkaNTupiTi `discover’ – kaNTupiTippu `discovery’ 29 April 2013

7 Tamil WordNet  Objective: To build a WordNet for Tamil to enhance machine translation  Resources: Tamil Thesaurus, Technical Glossaries (Tamil University Publications), Princeton English WordNet  Funding Agency: Tamil Software Development Fund, Tamil Virtual University - 4 lacs  Time Frame: 18 months 29 April 2013

8 Details  Software used  Front-end – Java  Back-end - Mysql Database  Project Deliverables  50k root words  Relationships coded  Stand-alone and web-based interface  Embedded morphological analyser 29 April 2013

9 Statistics  Total Words: 50497  Unique Senses: 41013  Nouns: 46710  Verbs: 2881  Adjectives: 416  Adverbs: 490 29 April 2013

10 Total Words: 50497 Unique Senses: 41013 29 April 2013 Project Completed (2004) http://www.nrcfosshelpline.in/code/wiki/TamilWordnet

11 29 April 2013

12 Standalone version – Tamil WordNet (Snapshot) 29 April 2013

13 Standalone version – Tamil WordNet (Snapshot) 29 April 2013

14 Web-version – Tamil WordNet (Snapshot) 29 April 2013

15 Web-version – Tamil WordNet (Snapshot) 29 April 2013

16 First Effort on Dravidian Languages National Workshop on WordNet for Dravidian Languages 2-3 June 2003 Organized by AU-KBC Research Centre, Chennai, Central Institute of Indian Languages, Mysore and Tamil University. Hands-on experience on specified domain – construction Report available on Global WordNet website 29 April 2013

17 MHRD Project  Creation of Machine Translation tools and resources for English to Dravidian Languages: Pilot Study  to develop Machine Translation(MT) system and needed linguistic resources for  English-Dravidian languages(Tamil, Malayalam, Telugu and Kannada),  This would facilitate the creation of rich educational contents in Indian languages.  This research effort is to make all the tools and translation system to be based on Machine Learning methodologies so that computer graduates and other such non-linguists are able to immediately participate in the national mission on literacy by contributing additional tools for language translation. 29 April 2013

18 Modules Module 1: Machine Translation aims at developing teaching material corresponding to the tools developed so that it can be delivered as part of undergraduate computer science and engineering curriculum on data mining/machine learning. This will ensure a critical amount of man power required for sustaining translation effort needed for national mission on education. Module 2: Training aims at training 500 faculties selected from across the country on machine translation methodologies using machine learning techniques. Module 3: Dravidian WordNet aims at developing a Dravidian WordNet required for translation. 29 April 2013

19 Total Budget IIT Bombay – 15 lacs Amrita University – 40 lacs Tamil University – 15 lacs University of Hyderabad – 15 lacs Dravidian University – 15 lacs Time Frame 12 months March 30, 2009 – March 29, 2010 29 April 2013

20 Work done Part of a one year Pilot project involving Tamil, Telugu, Malayalam and Kannada Funding Agency: Ministry of HRD Duration: 18 months (July 2009-Dec 2010) Deliverable: 13k synsets 7k synsets linked to IndoWordNet, available at http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php 29 April 2013

21 Statistics on Dravidian WordNet 29 April 2013

22 Publications  `Tamil WordNet’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran)  `Building a WordNet’ for Dravidian Languages, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran, S.Gopakumar, V.Dhanalakshmi)  `Representation of Kinship in WordNet’, Proceedings of the 9 th International Tamil Internet Conference, Coimbatore, 23-27 June 2010 (S.Arulmozi)  `Polysemy in Tamil and other Indian Languages’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi & Panchanan Mohanty)  `Telugu WordNet’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi) 29 April 2013

23 First IndoWordNet Workshop Amrita University 11-14 June 2009 Necessity for developing linked WordNets of different languages of India was stressed Challenges such as language divergence, lexical semantics, embedding WordNet in MT and cross-lingual search applications can be achieved Participation from groups: Hindi, Marathi, Sanskrit, Nepali, Assamese, Bodo, Manipuri, Konkani, Kashmiri, Tamil, Telugu, Malayalam, Kannada Proposal on Indhradhanush 29 April 2013

24 Dravidian WordNet Present Project Funded by DIT. 29 April 2013

25 Links  Tamil WordNet – Open Source http://www.nrcfosshelpline.in/code/wiki/TamilWordnet  VerbNet (English) http://verbs.colorado.edu/~mpalmer/projects/verbnet.html  Princeton English WordNet http://wordnet.princeton.edu/  Global WordNet Association http://www.globalwordnet.org/  WordNets in the World http://www.globalwordnet.org/gwa/wordnet_table.htm  WordNet Bibliography http://lit.csci.unt.edu/~wordnet/  IndoWordNet http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php 29 April 2013

26 Thank you! 29 April 2013


Download ppt "DRAVIDIAN WORDNET S.Arulmozi Dravidian University 29 April 2013."

Similar presentations


Ads by Google