Presentation on theme: "Language Prof. Udaya Narayan Singh DIRECTOR language"— Presentation transcript:
language technology @ciil Prof. Udaya Narayan Singh DIRECTOR language technology @ciil
Set up on July 17, 1969 Set up on July 17, 1969 Located in Mysore, Karnataka Located in Mysore, Karnataka Central Institute of Indian Languages
Functions under the Department of Secondary & Higher Education, Ministry of Human Resource Development Guided by a Governing Committee chaired by the Hon’ble HRM Headed by a Director Assisted by seven Deputy Directors Supported by Seven Principals of RLCs Administered with the help from an Assistant Director (Administration) Overall Structure
Main Objectives Main Objectives Advices and Assists both Central & State Govts in the matter of language Promotes all Indian languages by creating content and corpus Protects and Documents Minor, Minority and Tribal languages
CCCK program for officials in Karnataka Radio courses in Hindi for listeners Offers 3-months Courses in Communication Orientation Courses for Mother-tongue teachers Refresher Courses under Academic Staff College Organizes more than 100 Int’l & national seminars/workshops
Regional Language Centres Promote Linguistic harmony by teaching 15 Indian languages to non-native learners 10 months L2 teaching: 8000 teachers trained National Integration Camps and Refresher courses Distance Courses in Tamil/Telugu/Bengali/Urdu Originally conceived of only four RLCs in four corners of India with following aims NRLC at Patiala to handle Kashmiri, Urdu & Panjabi
SRLC, Mysore to handle all four Dravidian languages WRLC at Pune to handle Marathi, Sindhi & Gujarati ERLC to handle Oriya Bengali & Assamese Later two more were added in 1973, UTRC at Solan & in 1981, UTRC at Lucknow. Latest addition being the NERLC at Gauhati, 1999 Regional Language Centres
Human Resource Language Specialists 88 Information Scientists 12 Hardware Persons 05 Software Persons 21 Engineers/LLTs 07 Supporting Staff 125
Own printing press with all the facilities Published 515 books 22 Grammars 30 Intensive Courses 24 L2-Textbooks 5 Common Vocab. 18 Dictionaries 49 Apni Boli (for KVS) 15 Pict. Glossaries 16 Literacy 12 Folklore 12 Rhymes/Lg-Games 18 Proceedings 9 Bibliographies, etc.
Some other achievements Some other achievements Archived data of 118 languages Studied 80 Tribal/Border languages Cassette Courses in Four Language Kashmiri on the net Link Radio courses in Hindi through Kannada
Hardware 150-node LAN set up at CIIL and separate 10 node LANs at NRLC and ERLC Itanium Web server and database server at CIIL for launching sites High speed V-SAT connection through STPI Analog audiotick computerized lab at SRLC and ERLC Digital audiotick computerized labs at NRLC 2400 Electronic Journals acquired for CIIL & RLCs Browsing section in the library
Web based language resources Spoken language corpus Speech Science lab has following Hardware and Software Computerized Speech Lab. Model 4100 Developed by: Kay Elemetrics Corp. Lincoln Park, N. J. 07035-1488. Software (dependent (dependent on CSL Hardware)
1.Computerized Speech Lab Main Programme Version 2.5.2 2.Real-Time Spectrogram, Model 5129, Version 2.5.2 3.Video Phonetics Program and Database, Model 5150, Version 2.5.2 4.Multi-Dimensional Voice Program, Model 5105, Version 2.5.2 5.Multi-Dimensional Voice Program Advanced, Model 5105, Version 2.5.2 6.Real-Time Pitch, Model 5121, Version 2.5.2 7.Analysis Synthesis Laboratory, Model 5104, Version 2.5.2 Web based language resources Spoken language corpus
Software (without any hardware dependency) 1.Multi-Speech Signal Analysis Workstation, Model 3700, Version 2.5.2 2.Real-Time Spectrogram, Model 5129, Version 2.5.2 3.Video Phonetics Program and Database, Model 5150, Version 2.5.2 4.Real-Time Pitch, Model 5121, Version 2.5.2 5.Analysis Synthesis Laboratory, Model 5104, Version 2.5.2 CD-ROM CD-ROM Speech Production and Perception (CD-ROM Developed by Sensimetrics) Web based language resources Spoken language corpus
Articulatory Phonetics Experimental Phonetics Biological & Clinical Linguistic Speech Technology Forensic Phonetics Branches of study in Speech Science Web based language resources Spoken language corpus
Phonetic Readers Angami, Ao-Naga, Balti,Bengali, Brokskat, Gojri, Gujarati,Kashmiri, Khasi, Kota, Kurux, Kuvi, Ladakhi, Lotha,Manipuri, Mishmi, Mundari Sema, Shina,Tangkhul-Naga,Thaadou,Tripuri Web based language resources Spoken language corpus
Major Events International institute of phonetics Seminar Cum Workshop On Voice Modulation And Culture Workshop On Aspiration Seminar On Voice Quality Workshop On Nasalization Workshop On Multilingual Speech Analysis And Synthesis Instrumental Analysis Of Phonetic Features Across Major Indian Languages Analysis Of Retroflex Sounds etc Web based language resources Spoken language corpus
Training / orientation programmes in phonetics for the teachers from Haryana Himachal Pradesh Jammu & Kashmir Madhya Pradesh Rajasthan www.ciil-spokencorpus.net Tamil Nadu Uttar Pradesh Arunachal Pradesh Bihar Web based language resources Spoken language corpus
Web based language resources Web based Indian Languages Grammars Web based Indian Languages Grammars http://www.ciilgrammars.org Text corpora in major and minor Indian languages Text corpora in major and minor Indian languages http://www.ciilcorpora.net Web based Indian Language Courses Web based Indian Language Courses http://www.bangla-online.info/ Web based books and journals Web based books and journals http://www.ciil-ebooks.net/
In collaboration with Sahitya Akademi & NBT Eelectronic journal - Translation Today and Tools for translation Electronic dictionaries Annotated corpus & tools Parallel corpora Translational dictionaries Cultural Glossaries Thesauri Word finders Technical terminologies Web based Translation services Web based Translation services http://www.anukriti.net/
Linguistic Data Consortium for Indian Languages (LDC-IL) Takes advantage of the giant strides in Information Technology Model: Linguistic Data Consortium (LDC) hosted by the University of Pennsylvania, USA. Budget: One crore per year and ten crore for ten years. Funds: by the Ministry of Human Resource Development Preliminary discussion held in: International Workshop on Creation of Linguistic Data Consortium for Indian Languages on August 16-17, 2003. Meeting of the lead institutions to create LDC-IL on August 18, 2003 at IISc, Bangalore.
LDC-IL will focus on: Becoming a repository of linguistic resources in all Indian languages in the form of text, speech and lexical corpora. Facilitating creation of such databases by different member organizations. Setting standards for data collection and storage of corpora for different research and development activities. Supporting development and sharing of tools for data collection and management.
Facilitating training through workshops, seminars etc. in technical as well as process related issues. Creating and maintaining the LDC-IL website that would be the primary gateway for accessing LDC-IL resources. Designing or providing help in creation of appropriate language technology for mass use. Providing the necessary linkages between academic institutions, individual researchers and the masses LDC-IL
Major areas of languages covered: Speech corpora Handwritten corpora Text corpora including parallel corpora Natural Language Processing Several by-products like lexicon, thesauri etc., LDC-IL
Participating Institutions: Indian Institute of Science, Bangalore, Indian Institute of Technology, Bombay, Indian Institute of Technology, Madras, International Institute of Information Technology, Hyderabad ISI Calcutta; TIFR Mumbai; HP Labs India; BM; C-DOT; C-DAC; Tata InfoTechAll other IITs; KHS; NCPUL; Rashtriya Sanskrit Sansthan; TDIL, MIT LDC-IL
All academic institutes, research organizations and Corporate R&D groups from India and abroad working on Indian languages will be encouraged to participate in LDC-IL.: Different Indian Universities with major departments of Linguistics and computer science/Artificial Intelligence LDC-IL
Web Based Language Information Services General Information Language/ Area Profile: Geolinguistic; Sociolinguistic; Cultural; Literary Language/Area History: Genealogical; Archaeological; Cultural; Textual Language Vitality: Attitudinal; Utilitarian; Socio-political; Referential Grammatical Information: Phonetic; Graphemic; Phonological; Morphological; Lexical Syntactic; Semantic; Stylistic Biblio search Link to LIS site Link to LIS site
Website for Modern Indian Literary Classics in Translation In collaboration with Sahitya Akademi and NBT To promote the celebrated Indian fiction writers during the last 150 years both within the country and abroad through a series of initiatives. A library of 100 major contemporary fiction writing in English and several Other European languages.
Digital Library and Manu scriptorium Special Library with linguistics and allied disciplines as focus Over 65000 books Subscription to over 270 journals Subscription to 4200 online journals4200 online journals Back volumes of all the journals RLC 7 libraries with collection in Indian languages Has CDs (worth 50 lakhs) in Indian languages in digital form Library automation through VTLS packageVTLS package
Bhasa-Bharati will have display galleria as well as scanned copies of writings. Audio and video tapes of interviews, Lectures notes and recordings Their own as well as professional recitations. Films and tele-films and serials. Documentaries.
will also house and create hyper-texts of Indian languages classics. It will provide a service to common people who may either visit here actually or virtually and seek answers to their questions and queries. It will handle questions on different topics, ranging from knowledge and interpretation of a literary or religious text, or to seek information on a speech group or even on a word or an expression. Website for Modern Indian Literary Classics in Translation Bhasha Bharati
Web based information on Indian Scripts Linguistic Integration Project of India Aim: LIPIKA will promote greater understanding among Indian people, produce useful learning materials, create web-based information. LIPIKA will show unity in India's apparently diverse writing systems. LIPIKA will also help generate softwares with necessary tools like spell-checkers and grammar checkers. 25 Website for Modern Indian Literary Classics in Translation
Preparation of a brief history of various writing systems of India, such as Brahmi, Kharosthi, etc.; a learners' manual (aimed at both foreigners and Indians) into the structure of syllabic writing systems as prevalent in India, including a comparison of apparently divergent scripts used by Indian languages today. Task.1 Website for Modern Indian Literary Classics in Translation
(a) Preparation of a CD/Video version of the Learners' manual, based on the expertise of C- DAC/NCST/CIIL (b) Making the learning software in the public domain, for propagation of Indian writing systems. (a) Creation of new fonts and images in respect of Deva-nagari and a few other major Indian writing systems through a series of workshops (i) calligraphists, (ii) print making experts, (iii) computer experts, (iv) creative persons Task.2 Task.3
Some of the important collaborators of CIIL All IITs, IIIT Hyderabad, IISc., Government of Karnataka Andaman & Nicobar Administration Government of Singapore Lancaster University SASNET SIDA MGI-CIIL from Mauritius SchoolNet NCPUL and many more Website for Modern Indian Literary Classics in Translation
HP Labs NSOU University of Hyderabad NEHU Delhi Univ- NBT Sahitya Akademi Konkani Academy Dogri Sansthan Karnataka Nataka Rangayana CHD Director’s Speech Director’s Speech Website for Modern Indian Literary Classics in Translation