Proposed Vedic Sanskrit Coding Scheme: Some suggestions Akshar Bharati Amba Kulkarni Department of Sanskrit Studies University of Hyderabad Hyderabad email:

Slides:



Advertisements
Similar presentations
Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect
Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Creating a Research paper Using Modern Language Association “MLA” Documentation Style Introduction In both academic and business environment, you will.
Modern Language Association of America (MLA) Style.
Emerging Spelling: Stages and Teaching Strategies
Talking Letters Consonants Lessons 1 - 5
Kindergarten Skills (and Common Core Standards) Judy A. Kmak, Ed.D. January, 2012.
Common Core Reading Standards Foundational Skills K-2 KindergartenFirst Grade CCSS.ELA-LITERACY.RF.K.1 Demonstrate understanding of the organization and.
Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support.
Digital Fundamentals Floyd Chapter 2 Tenth Edition
Chapter three Phonology
The Development of Writing
Data Representation in Computers
Three Approaches to Phonics - based on an article by John Savage
Linguistic Phonics Co-ordinator Support Pack Linguistic Phonics.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
W3C WORKSHOP II Internationalizing Speech Synthesis Markup Language W3C Office in Heraklion, Crete, Greece, May 2006.
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
English and Chinese Orthography Instructor: Tsueifen Chen.
Reference Materials: Dictionary, thesaurus and glossary
The Cherokee Syllabary Carrie Clarady University of Maryland Center for Advanced Study of Language.
Letter Name Alphabetic Stage Rdg 360. Characteristics Early  Do Correctly Represent most salient sounds, usually beginning consonants Directionality.
1 SSML Extensions for TTS in Indian Languages II workshop on Internationalizing SSML May 2006, Greece Nixon Patel and Kishore Prahallad Bhrigus.
 Using Microsoft Expression Web you can: › Create Web pages and Web sites › Set what you site will look like as you design it › Add text, images, multimedia.
CC 2007, 2011 attrbution - R.B. Allen Text and Text Processing.
Phonetics and Phonology
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Enlightening minds. Enriching lives. Tamil Digital Industry Badri Seshadri K.S.Nagarajan New Horizon Media.
San Jose, California – September, 2002 Transliteration of Indic Scripts Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency.
Building digital libraries in Indian languages: case studies with Hindi and Kannada B.S. Shivaram Trainee ( ) National Center for Science Information.
Dan Wright Developing Algorithms for Computational Comparative Diachronic Historical Linguistics.
Phonemic Awareness = Phonics. Phonemic Awareness w The understanding that spoken words are made up of a series of discrete sounds Is different from Phonics:
CHAPTER SEVEN ASSESSING AND TEACHING READING: PHONOLOGICAL AWARENESS, PHONICS, AND WORD RECOGNITION.
Document Formats How to Build a Digital Library Ian H. Witten and David Bainbridge.
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
VIRTUAL MEMORY By Thi Nguyen. Motivation  In early time, the main memory was not large enough to store and execute complex program as higher level languages.
Phonological Awareness. Virginia Standards of Learning for Phonemic Awareness 1.4 The student will orally identify and manipulate phonemes in syllables.
Your Search for Indian languages ends at Modular InfoTech, Pune Web-Samhita from Modular InfoTech Pvt. Ltd. Modular InfoTech is proud to offer various.
Chapter 2: Linguistic Organization Mafuyu Kitahara
Web Page Design Introduction. The ________________ is a large collection of pages stored on computers, or ______________ around the world. Hypertext ________.
SEC (1.4) Representing Information as bit patterns.
Representing Characters in a computer Pressing a key on the computer a code is generated that the computer can convert into a symbol for displaying or.
Introduction to Phonetics & Phonology
L. Anne Spencer (c) 2001 Basic Web Design Document, text, & layout formatting tags & attributes.
Sorting it all out: An introduction to collation Cathy Wissink Michael Kaplan Globalization Infrastructure and Font Technology Windows International Microsoft.
Living Online Lesson 3 Using the Internet IC3 Basics Internet and Computing Core Certification Ambrose, Bergerud, Buscge, Morrison, Wells-Pusins.
A FIRST BOOK OF C++ CHAPTER 6 MODULARITY USING FUNCTIONS.
Kindergarten Language Arts
Chapter 5 Phonemic Awareness  Phonemic awareness is children’s basic understanding that speech is composed of a series of individual sounds, and it provides.
Dictionary Skills: What You Need to Know to Help You Learn.
Technische Universität München Introduction to English Pronunciation Syllable Structure.
TKT Tutoring Class Phonology.
English Vowels and diphthongs
An Efficient Hindi-Urdu Transliteration System Nisar Ahmed PhD Scholar Department of Computer Science and Engineering, UET Lahore.
English Pronunciation Clinic Week 1: Phonemes
N5 Databases Notes Information Systems Design & Development: Structures and links.
itranslit (Indic Transliteration Tool)
Manner of articulation is the way in which a speech sound
Representing Information as bit patterns
Syllabification Single consonants are attached to the vowel that follows. Two consonants are usually separated. When a consonant is followed by l or r,
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Automatic Language Identification – A Syntactic Approach
Devanagari Font Support For Linux
Sutton SignWriting Standard of 2017
Kindergarten/1st Grade
SANSKRIT ANALYZING SYSTEM
ASCII and Unicode.
Principal Vowel = yunfu
Presentation transcript:

Proposed Vedic Sanskrit Coding Scheme: Some suggestions Akshar Bharati Amba Kulkarni Department of Sanskrit Studies University of Hyderabad Hyderabad

Nature of Indian Scripts: क = क ् + अ का = क ् + आ क ् ष े = क् + ष् + ए

Salient Features of Indian Scripts – ' अ ' being the most frequent vowel in a syllable, the basic symbols of consonants were assumed to carry the vowel ' अ ' within them. – To represent concisely in less space, the concept of the secondary vowel signs or 'matra' must have been evolved.

Salient Features of Indian Scripts contd … – Syllable is a vowel optionally preceded by one or more consonants. Thus every syllable must have a vowel. – Orthographically the symbol corresponding to a vowel may be either to the left, or right or top or bottom position of the given consonant cluster.

Salient Features of Indian Scripts contd … – In ' वुद्धि ' viz. the vowel indicator orthographically precedes the consonant cluster, but is pronounced later. – Relation between Vowel and Vowel-indicator: क + ् + इ = क + ि or ि = ् + इ

Salient Features of Indian Scripts contd … – In case of consonant cluster, the consonants are written either from top to bottom or from left to right – Indian language scripts are syllabic in nature, and it has been a tradition to fall back to the alphabetic expansion. – Syllables are compositional in nature.

ISCII-91: problems – ISCII has codes for both the matras (vowel indicators) as well as the vowels. – Redundancy of vowel-indicators. – Not suitable for sandhi, search engines, etc. – Rendering engine is anyway necessary.

Unicode: Problems Based on ISCII-91  WORST of both the worlds a) Unity among Indian Scripts is lost (Separate pages for different language scripts) b) Increase in Storage Space

Advantages of the Proposed Scheme Redundancy in vowel indicators is gone. Rules for linguistic analysis follo Shastric texts The proposed InPa will be an important step towards evolving an IPA based on Devanagari/ Indian Scripts.

Phoneme Versus Syllables Sandhi Morphological Analysis Sorting Searching Display Meter (Chanda) analysis Speech processing Storage(less space)

Further Suggestions Treat all Indian Language scripts as different fonts with one underlying script –‘Indian Script’. Pool a space of 10 pages of 128 codes each, space enough for storing 1000 frequent syllables together with the basic phonemes. Define the UTF-8 encoding based on the relative addresses instead of absolute addresses.

Conclusion

Proposed Phoneme Based Scheme better than ISCII-91 as well as Unicode To preserve the unity and save on storage space: Treat Indian Scripts as Different Fonts, Pool the space for all Indian languages to Store frequent syllables, Use relative offset for UTF-8.