ADDITION OF IPA TRANSCRIPTION TO THE BELARUSIAN NOOJ MODULE

Slides:



Advertisements
Similar presentations
CEBUANO-VISAYAN A PEDAGOGIC GRAMMAR FOR Dr. Angel O. Pesirla,
Advertisements

No Stress in Stress: Secrets of English Pronunciation
Year 3 Objectives: Writing
1 Linguistics week 11 Finish assimilation; start morphology.
1 Università di Cagliari Corso di Laurea in Economia e Gestione Aziendale Economia e Finanza Economia e Finanza Lingue e Culture per la Mediazione Programma.
Let’s Use a Dictionary! What do we do first?.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
Phonetics The study of productive sounds within a language 2 Basic types of sounds in English: Consonants (C): restriction on airflow Vowels (V): no restriction.
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
1 Facoltà di Economia Corso di Laurea in Economia e Gestione Aziendale Economia e Finanza Economia e Finanza Economia e Gestione dei Servizi Turistici.
The Audiolingual Method
Phonetics and Phonology.
INGL 4235 STRUCTURAL ANALYSIS OF ENGLISH AND SPANISH: AN INTRODUCTION
My Marathi Marathi language learning CDs. My Marathi is a CD based Marathi self study tool built by the next generation, for the next generation.
Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University.
1 The role of the Arabic orthography in reading and spelling Salim Abu-Rabia University of Haifa.
Creating a Language: Getting Organized Form a group of 4-6 individuals –Give the group a name Exchange contact information with your group members.
How IPA is Used in SSML and PLS Paolo Baggia, Loquendo Wed. August 9 th, 2006.
Speech & Language Development 1 Normal Development of Speech & Language Language...“Standardized set of symbols and the knowledge about how to combine.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
Chapter 1: By: Ms. Ola Al-arjani
Language. Language Communication – transmitting information Many animals communicate Call systems – system of communication limited to a set number of.
Graphophonemic System – Phonics
Phonemes A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. These units are identified within.
Unit 3 Seminar.  "Brown's Stages" were identified by Roger Brown and described in his classic book (Brown,1973). The stages provide a framework.
Page 1 Proiect LINCOR – Introducere Dr. Ing. Stefan * SOFTWIN.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
S.3 English Speaking Camp 18th - 20th July, 2001.
Levels of Language 6 Levels of Language. Levels of Language Aspect of language are often referred to as 'language levels'. To look carefully at language.
Chapter 3 Monolingual Dictionaries II Arabic Dictionaries.
A Fully Annotated Corpus of Russian Speech
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Chapter Five Language Description language study and linguistic study 1Applied Linguistics Chapter 5 by TIAN Bing.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
Levels of Linguistic Analysis
School Kids Investigating Language & Life in Society 1 February 2015 Lesson 3: Linguistic Landscapes & Levels of Linguistic Structure Teaching Fellows.
Phonetics and Phonology.
PRONUNCIATION PRACTICE  40-minute expositions  20 minutes to show your research and provide examples using PPT, PREZI presentations, Youtube videos,
MORPHOLOGY. PART 1: INTRODUCTION Parts of speech 1. What is a part of speech?part of speech 1. Traditional grammar classifies words based on eight parts.
Module 1 Dictionary skills Part 2 Developed by Céline Benoit Aston University.
Pronunciation Course Class # 1 WELCOME. Our Course Divided into three main parts: Pronunciation Pronunciation Stress Stress Rhythm and Intonation Rhythm.
 A phoneme is the vocal gestures from which words are constructed. There are 42 pure sounds singly and in combinations needed to write our 26 letter.
What is SCIENCE? What is the difference between NATURAL and SOCIAL sciences? What is covered by LINGUISTICS? What is PHONETICS? How does it differ from.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Università di Cagliari
Università di Cagliari
English Language Learners (ESL/ELL)
Year 3 Objectives: Writing
Words, Phrases, Clauses, & Sentences
The role of the Arabic orthography in reading and spelling
Text-To-Speech System for English
Università di Cagliari
Language Module 8, Lesson 2. Why is language so important? ● Linguistic Determinism ○ Language influences the way we think. ○ Could you think without.
The English Language (I semestre)
Kindergarten Scope & Sequence Unit 10: School’s Out!
S.3 English Speaking Camp 18th - 20th July, 2001
HISTORY OF LINGUISTICS
Introduction to Linguistics
Introduction to Linguistics
Job Google Job Title: Linguistic Project Manager
Semester project Write a phonological and phonetic description of your native language using at least three published sources.
Information in Monolingual Dictionaries
Natural Language Processing
The English Language (I semestre)
Introduction to the IPA
Levels of Linguistic Analysis
Chall’s Reading Stages: Unlocking the Code
Facoltà di Economia Economia e Gestione Aziendale Economia e Finanza
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

ADDITION OF IPA TRANSCRIPTION TO THE BELARUSIAN NOOJ MODULE Hanna Stanislavenka, Stanislau Lysy, Yuras Hetsevich Speech Recognition and Speech Synthesis Laboratory, United Institute of Informatics Problems, National Academy of Sciences of Belarus Good afternoon, ladies and gentlemen! My name is Hanna. I’m here on behalf of our NooJ team from Belarus, from UIIP. Today I have the honor to present you the following report: ADDITION OF IPA TRANSCRIPTION TO THE BELARUSIAN NOOJ MODULE.

NooJ Levels typography and spelling; syllabification, phonemic and prosodic transcription; lexicons of simple words, multiword units, and discontinuous expressions;  inflectional, derivational and agglutinative morphology;  local and structural syntax;  transformational syntax and paraphrase generation;  semantic analysis and machine translation.  NooJ Levels NooJ allows linguists to formalize several levels of linguistic phenomena. We work mainly with phonetics. And this report is dedicated to this level of NooJ.

System of generating of dictionary in NooJ format with four types of phonetic transcriptions was created. NooJ-dictionary was successfully compiled and tested. It contained 46.384 first forms of nouns in 2015. Work on morphological NooJ grammar for creating a phonetic transcription for orthographic words has begun. Previous research Before I proceed to our new results I’d like to summarize previous research. System of generating of dictionary in NooJ format with four types of phonetic transcriptions was created. NooJ-dictionary was successfully compiled and tested. It contained 46.384 first forms of nouns in 2015. And we began to work on morphological NooJ grammar for creating a phonetic transcription for orthographic words. These results were presented on the Nooj 2015 conference.

Some transcription formats (IPA transcription as well) mark stress not on the vowel but before an accented syllable. The Problem What problem has appeared? What we needed to decide? Some transcription formats (IPA transcription as well) mark stress not on the vowel (as in our tradition in Belarus) but before an accented syllable. Некаторыя фарматы транскрыпцый (у тым ліку міжнародны фармат IPA) пазначаюць націск не над націскным галосным гукам, а перад націскным складам.

Illustration of the Problem in NooJ format WE HAVE: сакаляня,NOUN +TranscriptionCyr=[сакал'ан'а́] +TranscriptionIPA=[sakalʲanʲˈa] MUST BE: сакаляня,NOUN +TranscriptionIPA=[sakalʲaˈnʲa] Illustration of the Problem in NooJ format On this slide you can see the illustration of the problem that we faced. IPA transcription mark stress not on the vowel but before an accented syllable.

The main goal of our work is to correct IPA transcriptions in NooJ. We devide it into 3 parts: Developing syllabification algorithm for generation of IPA transcriptions of orthographic words in Belarusian. Advancing of high-quality tool for generation of phonetic transcription of orthographic words in Belarusian. Creating a dictionary in NooJ format that include correct phonetic transcriptions in IPA format for nouns and verbs. Goals To work with phonetic level in Belarusian NooJ Module we needed to correct IPA transcription. The solution of this problem we divide into 3 parts (3 steps): Firstly we develop syllabification algorithm for generation of IPA transcriptions of orthographic words in Belarusian. Then, secondly, we advance our tool for generation of phonetic transcription of orthographic words in Belarusian. Thirdly we create a dictionary in NooJ format that include correct phonetic transcriptions in IPA format for nouns and verbs. In our previous research we worked only with nouns. This year we compiled dictionaries for nouns and for verbs.

International Phonetic Alphabet Alphabetic system of phonetic notation based primarily on the Latin alphabet. Standardized representation of the sounds of oral language. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech-language pathologists, singers, actors, constructed language creators, and translators. International Phonetic Alphabet The International Phonetic Alphabet (IPA) – is an alphabetic system of phonetic notation based primarily on the Latin alphabet. It was devised by the International Phonetic Association as a standardized representation of the sounds of oral language. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech-language pathologists, singers, actors, constructed language creators, and translators. Unfortunately for Belarusian we didn’t have any dictionaries for it. By the learning Cyrillic transcription is used. And for people who use Latin alphabet it is hard to understand.

Algorithm of syllabification Here the first step is presented – developing syllabification algorithm for the Belarusian language.

I. Rules for syllabification 'aa' => 'a|a' 'akkma' => 'a|kkma' 'aka' => 'a|ka' 'akmka' => 'akm|ka' 'ama' => 'a|ma' 'akmma' => 'a|kmma' 'akka' => 'a|kka' 'amkka' => 'am|kka' 'akma' => 'a|kma' 'amkma' => 'am|kma' 'amka' => 'am|ka' 'ammka' => 'amm|ka' 'amma' => 'am|ma' 'ammma' => 'am|mma' 'akkka' => 'a|kkka' … (total rules are less then 20.) Where a – vowel phonemes, k – obstruent consonantal phonemes, m – sonorant consonantal phonemes, | – syllable border. I. Rules for syllabification To create an algorithm we wrote rules for syllabification like this… A stands for vowel. K shows obstruent consonantal phonemes And M - sonorant consonantal phonemes. If there are three consonant phonemes: one is obstruent, one is sonorant and one is again obstruent, first two phonemes will be in one syllable, and syllable border will be before obstruent phoneme. Let me show you an example….

Example (“кастрычнік”, eng. “october”) K004,A232,S002,T002,R022,Y022,CH002,N'004,I342,K000 kakkmakmak ka|kkma|kmak K004,A232,>,S002,T002,R022,Y022,>,CH002,N'004,I342,K000 Example (“кастрычнік”, eng. “october”) WORD кастрычнік (october) is in allophonic format. - K004,A232,S002,T002,R022,Y022,CH002,N'004,I342,K000 Using our rules for syllabification the algorithm transforms it into the following ‘word’ - kakkmakmak. Than syllable borders are put. - ka|kkma|kmak Then again we have our word in the allophonic format but with syllable borders. - K004,A232,>,S002,T002,R022,Y022,>,CH002,N'004,I342,K000

I. Tool for syllabification The work of the algorithm was tested on a special online tool for not only words but for texts.

II. Orthoepic Dictionary Generator Algorithm was inbuilt to online service Orthoepic Dictionary Genarator, that was presented last year. There you can see that we choose an option «First word processing in NooJ format». It means that the first word in the line of the input text will be processed and we will have a transcription of the word written in the NooJ format.

II. Correct transcriptions in NooJ format After processing we get the following material with which we create Dictionary for NooJ. And there you can see that there is no more mistakes with stresses.

III. Dictionary in NooJ: ~49 000 nouns With the help of the service “Orphoepic Dictionary” we can create dictionaries for NooJ with many entries. Thus we made a dictionary with nouns and with verbs. Here you can a dictionary for nouns. It contains more than 40 000 entries. And you can see that two types of transcription are presented. This one is IPA transcription. 11

III. Dictionary in NooJ: ~33 000 verbs This year we also compiled dictionary for verbs. It contains more than 30 000 entries. And the same, two types of transcription Cyrillic and IPA transcriptions. With the help of the IPA transcription any one can read a word in Belarusian.

Text Annotations: nouns On this slide text annotation with transcriptions for NOUN is presented.

Text Annotations: verbs On this slide text annotation with transcriptions for verbs is presented.

Syllabification algorithm for generation of IPA transcriptions of orthographic words in Belarusian was developed. High-quality tool for generation of phonetic transcription of orthographic words in Belarusian was advanced. Dictionary in NooJ format that include correct phonetic transcriptions in IPA format for nouns and verbs was created. Results of our work will help in introducing and learning the norms of the literary pronunciation of the Belarusian language. Moreover the results can be useful in dealing with other educational and linguistic problems. Conclusion To sum up I’d like to underline the following results. Syllabification algorithm for generation of IPA transcriptions of orthographic words in Belarusian was developed. High-quality tool for generation of phonetic transcription of orthographic words in Belarusian was advanced. Dictionary in NooJ format that include correct phonetic transcriptions in IPA format for nouns and verbs was created. Results of our work will help in introducing and learning the norms of the literary pronunciation of Belarusian language. Moreover the results can be useful in dealing with other educational and linguistic problems.

To examine correctness of the IPA transcriptions for NOUN and VERB, to correct mistakes in the rules if any To build NooJ morphology grammar for letter-to-phoneme conversion with right syllable border positions To add IPA transcriptions for Adjectives and Adverbs Plans PLANS To examine correctness of the IPA transcriptions for NOUN and VERB; To correct mistakes in the rules To build NooJ morphology grammar for letter-to-phoneme conversion with right syllable positions To add IPA for Adjecives and Adverbs

Hanna Stanislavenka, Stanislau Lysy, Yuras Hetsevich, Speech Recognition and Speech Synthesis Laboratory, United Institute of Informatics Problems, National Academy of Sciences of Belarus ДЗЯКУЙ ВАМ ЗА ЎВАГУ! [ˈd͡zʲakuj ˈvam ˈza ˈwvaɣu] DĚKUJI VÁM ZA POZORNOST! Děkuji Vám za pozornost!