Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indo WordNet A WordNet for Hindi

Similar presentations


Presentation on theme: "Indo WordNet A WordNet for Hindi"— Presentation transcript:

1 Indo WordNet A WordNet for Hindi
Debasri Chakrabarti, Dipak Kumar Narayan, Prabhakar Pandey, Madhu Prasad Sharma Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay

2 Introduction WordNet – A lexical database
Searching the dictionary conceptually Different organizing principle for different syntactic category Synsets or the Synonymy Sets are the basic building blocks Lexical knowledge base is the heart of any intelligent information processing system

3 WordNet for Hindi Hindi WordNet is an on-line lexical database for Hindi language Design has been inspired by the famous English WordNet Unique features Graded antonyms and meronymy relationships Efficient underlying database design Cross part of speech linkage

4 Semantic relations in WordNet
Synonymy Hypernymy / Hyponymy Antonymy Meronymy / Holonymy Gradation Entailment Troponymy

5 Semantic Relations Synonymy {Gar ‚ kmara} {Gar ‚ Aavaasa}
True synonyms are rare Synonymy related to a context {Gar ‚ kmara} {Gar ‚ Aavaasa} {Gar ‚ janmakuMDlaIya sqaana} {Gar ‚ svadoSa}

6 Semantic Relations Saor  pSau  sajaIva  Aist%va
Hypernymy and Hyponymy Relation between word meaning (synsets) X is a hyponym of Y if X is a kind of Y Hyponymy is transitive and asymmetrical Hypernymy is inverse of Hyponymy lionanimalliving entityentity Saor  pSau  sajaIva  Aist%va

7 Semantic Relations Antonymy Meronymy and Holonymy
Oppositeness in meaning Relation between word forms Meronymy and Holonymy Part-whole relation, branch is a part of tree X is a meronymy of Y if X is a part of Y Meronym is transitive and asymmetrical Holonymy is inverse relation of Meronymy

8 Troponym and Entailment
{ Kra-Ta laonaa – saaonaa £ Troponym { laÐgaD,anaa ‚ kdmatala krnaa – calanaa £ ¡ fusafusaanaa – baaolanaa £

9 Antonymy Relation CaoTa – baD,a AcCa – baura rat – idna rama – ravaNa
Size CaoTa – baD,a Quality AcCa – baura State rat – idna Personality rama – ravaNa Direction pUva- – piScama Action laonaa – donaa Amount kma – jyaada Place dUr – pasa Time saubah – Saama Gender baoTa – baoTI

10 Meronymy Relation maaqaa – SarIr p%qar – maUit- poD, – jaMgala
Component-object maaqaa – SarIr Stuff-object p%qar – maUit- Member-collection poD, – jaMgala Feature-Activity BaaYaNa – samaaraoh Place-Area idllaI – Baart Phase-State javaanaI – ]ma` Resource-process klama – laoKna Position-Area icaik%sak – icaik%saa

11 Gradation bacapna ‚ javaanaI ‚ bauZ,apa baD,a ‚ maÐJalaa ‚ CaoTa
State bacapna ‚ javaanaI ‚ bauZ,apa Size baD,a ‚ maÐJalaa ‚ CaoTa Light ]jaalaa ‚ QauÐQalaa ‚ AÐQaora Gender mad- ‚ napuMsak ‚ AaOrt Temperature garma ‚ gaunagaunaa ‚ zMDa Color gaaora ‚ saaÐvalaa ‚ kalaa Time idna ‚ gaaoQaUila ‚ rat Quality AcCa ‚ saamaanya ‚ Kraba Action saaonaa ‚ }ÐGanaa ‚ jaaganaa Manner tojaI sao ‚ maQyama gait sao ‚ QaIro – QaIro

12 Classification of verbs
Simple verbs (sarla iËyaa) : saaonaa‚ Kanaa Conjunct verbs iËyaa) Compound verbs (samaaisak iËyaa) Á Kanaa–pInaa Causative verbs (p`orNaa%mak iËyaa) Á saulavaanaa

13 WordNet Sub-Graph Gar , gaRh AQyana kxa Aavaasa , inavaasa Sayana kxa
Gloss AQyana kxa Hyponymy Aavaasa , inavaasa Sayana kxa rsaao[-Gar Gar , gaRh manauYyaaoM ka Cayaa huAa vah sqaana jaao dIvaaraoM sao Gaor kr banaayaa jaata hO Aitiqa gaRh baramada Aa^Mgana AaEama JaaopD,I saMrcanaa Meronymy M e r o n y m Hypernymy

14 Design and Implementation
Basic relations or lexical links are between synonym sets Lexical database is stored in MySQL package Sub-tasks identified Database design Data entry interface Implementation of Organizer Utility Application programs to access and display the information in the lexical database

15

16 Data Entry Interface GUI designed in Java/JFC
Separate screen for data entry of different categories Automatic generation of synset id’s Screen to view the entered data

17 Synset Entry Interface

18

19 Organizer Utility Designed to preprocess the data
Reflexive pointers are generated e.g. if A hypernym of B then B hyponym of A is automatically generated Each semantic relation is mapped to a separate table (normalized) Font conversion Roman Hindi  DV-TTYogesh

20 Storage Structure Relation between Synsets Relation between Word-forms
tblNounHypernyms Relation between Word-forms tblNounAntonyms Synset_Id HyperSynset_Id Synset_Id Synset_Word Anto_Id Anto_Word Anto_Type

21

22

23

24 System Statistics Over 8500 synsets entered in the database
MySQL used as the back-end database server Data entry interface designed in Java/JFC Organizer utility written in perl Web based data retrieval system developed in HTML and PHP DV-TTYogesh Font used to display Hindi Text

25 Application of WordNet
Word Sense Disambiguation Interface to Internet Search Engines Text classification Information Retrieval system Document Similarity

26 Conclusion The structure of Hindi Language have been studied and new features have been introduced in the Hindi WordNet Currently over 8500 synsets have been inserted into the database The MySQL database has been found to be quite efficient The web interface for querying the lexical database is under continuous evolution


Download ppt "Indo WordNet A WordNet for Hindi"

Similar presentations


Ads by Google