Presentation is loading. Please wait.

Presentation is loading. Please wait.

In association with CIIL-Mysore, IIT-Mumbai, IIIT-Hyderabad 1 st International Conference.

Similar presentations


Presentation on theme: "In association with CIIL-Mysore, IIT-Mumbai, IIIT-Hyderabad 1 st International Conference."— Presentation transcript:

1 In association with CIIL-Mysore, IIT-Mumbai, IIIT-Hyderabad 1 st International Conference

2 Words unite people. Words can divide nations – they indulge in war of words… Word-smiths fashion texts Word-mongers talk nineteen to the dozen Word-lords dont tell you that they double-speak Word-poets open the inner abyss of lanes & bye- lanes of meaning And so do WordNets Which is why we are all here!

3 Welcome to 1 st Global WordNet Conference First, I shall tell you a little about what the Indian linguistic scene is like, and what we at CIIL have been doing First, I shall tell you a little about what the Indian linguistic scene is like, and what we at CIIL have been doing Then, we will offer our suggestions on what we in India could do in WordNet MY ADDRESS HAS TWO PARTS

4 CENTRAL INSTITUTE OF INDIAN LANGUAGES ^maVr` ^mfm g§ñWmZ {ejm {d^mJ, ^maV gaH$ma Initiatives in LANGUAGE TECHNOLOGY

5 CIIL in the first three decades: EquippingLanguage teachers and Analyststechnologically

6 1. An Apex Institution under Languages Division, MHRD In July 2001, 32 years completed In July 2001, 32 years completed This 287-people institution works for development of Indian languages. This 287-people institution works for development of Indian languages. CIIL has five Centers with Research Groups (16) and Service Groups (6). CIIL has five Centers with Research Groups (16) and Service Groups (6). 7 Regional Language Centers are at Bhubaneswar, Guwahati, Lucknow, Mysore, Patiala, Pune, & Solan. 7 Regional Language Centers are at Bhubaneswar, Guwahati, Lucknow, Mysore, Patiala, Pune, & Solan.

7 2. Four Main Objectives 1. Develops languages by creating content, corpus, techniques and technologies. 2. Protects & Documents Minority & Tribal languages 3. Creates linguistic harmony by teaching 15 Indian tongues to non-native learners. 4. Above all, advices both Central and State governments on matters related to language.

8 3. Functionality and Multi-disciplinarity Although the mainstay are Indian Languages & Linguistics, the focus of all projects and programmes is on developing materials & products – in print, audio, video and computational. Although the mainstay are Indian Languages & Linguistics, the focus of all projects and programmes is on developing materials & products – in print, audio, video and computational. In addition, there is enough interest in Comp. Lit, Education, Language Technology & NLP, Folklore, Geography, Statistics Psychology,Sociology & Translation

9 4. Coverage of CIIL - sizable Archived 118 lgs data Creating Voice Corpora Studied 80 Tribal lgs 35 grammars on-line soon Published 490 books Cassette Courses in : Assamese, Urdu, Bengali Kashmiri & Marathi Radio courses in Hindi through Kannada

10 5. Major Publications – 490+ books all produced in-house 22 Grammars 22 Grammars 30 Intensive Courses 30 Intensive Courses 24 2 nd Lg Textbooks 24 2 nd Lg Textbooks 5 Common Vocab. 5 Common Vocab. 18 Dictionaries 18 Dictionaries 49 Apni Boli (KVS) 49 Apni Boli (KVS) 15 Pictorial Glossaries 15 Pictorial Glossaries 16 Literacy Books 16 Literacy Books 12 Folklore 12 Folklore 9 Bibliographies 9 Bibliographies 12 Rhymes/Lg Games 16 Proceedings

11 6. The Challenge before CIIL: Enormous

12 A truly plural world of languages 1,576 rationalized mother-tongues; 1,576 rationalized mother-tongues; 1,796 other mother-tongues; 1,796 other mother-tongues; 114 languages with 10,000+ speakers; 114 languages with 10,000+ speakers; Large variation: Hindi (337 m) to Maram of Manipur with 10,144; Large variation: Hindi (337 m) to Maram of Manipur with 10,144; Large non-scheduled lgs - Bhili (6 m) and Santali (5 m); Large non-scheduled lgs - Bhili (6 m) and Santali (5 m); 146 radio lgs/69 school lgs /35 lg dailies. 146 radio lgs/69 school lgs /35 lg dailies.

13 7. Programs - Modes of Delivery 10 months L2 teaching: 8000 teachers trained 10 months L2 teaching: 8000 teachers trained Distance Courses in Tamil/Telugu/Bengali/Urdu Distance Courses in Tamil/Telugu/Bengali/Urdu On-line Programs in 15 Indian languages On-line Programs in 15 Indian languages Kannada for officials in Karnataka Kannada for officials in Karnataka Radio courses with AIRs collaboration Radio courses with AIRs collaboration 3-months Courses in Communication 3-months Courses in Communication Orientation for Mother-tongue teachers Orientation for Mother-tongue teachers Refresher Courses in Linguistics Refresher Courses in Linguistics NLP Training modules NLP Training modules

14 8. Language Technology – Further Goals Enlargement of 3-million word Corpora: Enlargement of 3-million word Corpora: 100 m word corpora for Hindi-Urdu 100 m word corpora for Hindi-Urdu Multilingual multidirectional E- Dictionaries Multilingual multidirectional E- Dictionaries On-line Administrative Glossaries On-line Administrative Glossaries Lexical databases for MT Programs Lexical databases for MT Programs Tagging & Corpus Tools Tagging & Corpus Tools E-Zines and E-Journals E-Zines and E-Journals Language Information Services Language Information Services Anukriti: Web-based Translation services Anukriti: Web-based Translation services

15 9 Indian Lgs & IT at CIIL 132-node LAN set up V-SAT through STPI Brousing centre Has 2400 E-Journals & 350 paper journals. Collaborating with Schoolnet for electronic materials New generation Lg Labs Focus: Visual Phonetics

16 10. LIS-India Website Type Language Name: Type Area Name: Home or Home or General Information General Information Language/ Area Profile: Language/ Area Profile: Geolinguistic; Sociolinguistic; Cultural; Literary Language/Area History: Language/Area History: Genealogical; Archaeological; Cultural; Textual Genealogical; Archaeological; Cultural; Textual Language Vitality: Language Vitality: Attitudinal; Utilitarian; Socio-political; Referential Attitudinal; Utilitarian; Socio-political; Referential Grammatical Information: Grammatical Information: Phonetic; Graphemic; Phonological; Morphological; Lexical; Phonetic; Graphemic; Phonological; Morphological; Lexical; Syntactic; Semantic; Stylistic Syntactic; Semantic; Stylistic Biblio search Biblio search

17 11. Anukriti A Translation with NBT/SA WEB-BASED SERVICE SITE called ANUKRUTI. WEB-BASED SERVICE SITE called ANUKRUTI. To be maintained with NBT/Sahitya Akademi To be maintained with NBT/Sahitya Akademi E-journals E-journals Technological Tools Technological Tools Electronic lexicon Corpus & tools Parallel corpora Cultural Glossaries Thesauri Word finders WordNets

18 12. Bhasha Bharati Project Sahitya Akademi Sahitya Akademi Sangeet Natak Academy Sangeet Natak Academy All India Radio All India Radio Doordarshan Doordarshan National Library National Library National Archive National Archive National Book Trust National Book Trust Major TV Channels Major TV Channels Films Division Films Division Major Newspaper houses Numerous Foundations Individual writers Heirs of writers Personal libraries Little magazines This rich manuscriptorium will display plural literary and linguistic landscape of India. To be set up in collaboration with

19 13. Doctoral Programs under planning Already available through 22 Universities: Linguistics & Psychology Now being planned in NLP Folklore/Communication Translation Indian Gram.Tradition

20 14. Future Programs Dip in Experimental Phonetics Dip in Experimental Phonetics Masters by Research in Field Linguistics Masters by Research in Field Linguistics Courses in Statistical Linguistics Courses in Statistical Linguistics Diploma in Translation Studies Diploma in Translation Studies Dip in Folklore/Comp. Lit. & Semiotics Dip in Folklore/Comp. Lit. & Semiotics Internship in Linguistic Geography Internship in Linguistic Geography Internship in NLP & Corpus Linguistics Internship in NLP & Corpus Linguistics

21 WHAT COULD WE DO TO CREATE AN

22 India has already had a strong lexicographical tradition Working on WordNet, therefore, should come naturally to us. Working on WordNet, therefore, should come naturally to us. Efforts have already begun as we see in Hindi, Tamil, Oriya and a few other languages. Efforts have already begun as we see in Hindi, Tamil, Oriya and a few other languages. There does not seem to be any academic coordination, however. There does not seem to be any academic coordination, however. Early 20 th century Indian linguistics was dominated by studies on sound-system and etymologies Mid-20 th C focussed on word-formation patterns Late 20 th C emphasized on syntax

23 We havent so far worked seriously on Lexical Semantics While Sociolinguistics was a favourite, serious Psycholinguistics was almost absent While Sociolinguistics was a favourite, serious Psycholinguistics was almost absent Formal Syntax was highly valued, but intricacies of Semantics were not so attractive. Formal Syntax was highly valued, but intricacies of Semantics were not so attractive. Making of Dictionaries continued throughout, but major concerted efforts in each language were highly individualistic or had happened long ago. Making of Dictionaries continued throughout, but major concerted efforts in each language were highly individualistic or had happened long ago. While writing softwares or applying them means money, and is hence a crowded field, Language Technology has so far been neglected. While writing softwares or applying them means money, and is hence a crowded field, Language Technology has so far been neglected.

24 So, what do we need to do now? Create an Indian WordNet Association Work coordinatedly Remember to focus on areal semantic features because with so much linguistic & cultural diversity, India is ideal to test and validate the concept of WordNet.


Download ppt "In association with CIIL-Mysore, IIT-Mumbai, IIIT-Hyderabad 1 st International Conference."

Similar presentations


Ads by Google