Presentation is loading. Please wait.

Presentation is loading. Please wait.

ADDITION OF IPA TRANSCRIPTION TO THE BELARUSIAN NOOJ MODULE

Similar presentations


Presentation on theme: "ADDITION OF IPA TRANSCRIPTION TO THE BELARUSIAN NOOJ MODULE"— Presentation transcript:

1 ADDITION OF IPA TRANSCRIPTION TO THE BELARUSIAN NOOJ MODULE
Hanna Stanislavenka, Stanislau Lysy, Yuras Hetsevich Speech Recognition and Speech Synthesis Laboratory, United Institute of Informatics Problems, National Academy of Sciences of Belarus Good afternoon, ladies and gentlemen! My name is Hanna. I’m here on behalf of our NooJ team from Belarus, from UIIP. Today I have the honor to present you the following report: ADDITION OF IPA TRANSCRIPTION TO THE BELARUSIAN NOOJ MODULE.

2 NooJ Levels typography and spelling;
syllabification, phonemic and prosodic transcription; lexicons of simple words, multiword units, and discontinuous expressions;  inflectional, derivational and agglutinative morphology;  local and structural syntax;  transformational syntax and paraphrase generation;  semantic analysis and machine translation.  NooJ Levels NooJ allows linguists to formalize several levels of linguistic phenomena. We work mainly with phonetics. And this report is dedicated to this level of NooJ.

3 System of generating of dictionary in NooJ format with four types of phonetic transcriptions was created. NooJ-dictionary was successfully compiled and tested. It contained first forms of nouns in 2015. Work on morphological NooJ grammar for creating a phonetic transcription for orthographic words has begun. Previous research Before I proceed to our new results I’d like to summarize previous research. System of generating of dictionary in NooJ format with four types of phonetic transcriptions was created. NooJ-dictionary was successfully compiled and tested. It contained first forms of nouns in 2015. And we began to work on morphological NooJ grammar for creating a phonetic transcription for orthographic words. These results were presented on the Nooj 2015 conference.

4 Some transcription formats (IPA transcription as well) mark stress not on the vowel but before an accented syllable. The Problem What problem has appeared? What we needed to decide? Some transcription formats (IPA transcription as well) mark stress not on the vowel (as in our tradition in Belarus) but before an accented syllable. Некаторыя фарматы транскрыпцый (у тым ліку міжнародны фармат IPA) пазначаюць націск не над націскным галосным гукам, а перад націскным складам.

5 Illustration of the Problem in NooJ format
WE HAVE: сакаляня,NOUN +TranscriptionCyr=[сакал'ан'а́] +TranscriptionIPA=[sakalʲanʲˈa] MUST BE: сакаляня,NOUN +TranscriptionIPA=[sakalʲaˈnʲa] Illustration of the Problem in NooJ format On this slide you can see the illustration of the problem that we faced. IPA transcription mark stress not on the vowel but before an accented syllable.

6 The main goal of our work is to correct IPA transcriptions in NooJ.
We devide it into 3 parts: Developing syllabification algorithm for generation of IPA transcriptions of orthographic words in Belarusian. Advancing of high-quality tool for generation of phonetic transcription of orthographic words in Belarusian. Creating a dictionary in NooJ format that include correct phonetic transcriptions in IPA format for nouns and verbs. Goals To work with phonetic level in Belarusian NooJ Module we needed to correct IPA transcription. The solution of this problem we divide into 3 parts (3 steps): Firstly we develop syllabification algorithm for generation of IPA transcriptions of orthographic words in Belarusian. Then, secondly, we advance our tool for generation of phonetic transcription of orthographic words in Belarusian. Thirdly we create a dictionary in NooJ format that include correct phonetic transcriptions in IPA format for nouns and verbs. In our previous research we worked only with nouns. This year we compiled dictionaries for nouns and for verbs.

7 International Phonetic Alphabet
Alphabetic system of phonetic notation based primarily on the Latin alphabet. Standardized representation of the sounds of oral language. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech-language pathologists, singers, actors, constructed language creators, and translators. International Phonetic Alphabet The International Phonetic Alphabet (IPA) – is an alphabetic system of phonetic notation based primarily on the Latin alphabet. It was devised by the International Phonetic Association as a standardized representation of the sounds of oral language. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech-language pathologists, singers, actors, constructed language creators, and translators. Unfortunately for Belarusian we didn’t have any dictionaries for it. By the learning Cyrillic transcription is used. And for people who use Latin alphabet it is hard to understand.

8 Algorithm of syllabification
Here the first step is presented – developing syllabification algorithm for the Belarusian language.

9 I. Rules for syllabification
'aa' => 'a|a' 'akkma' => 'a|kkma' 'aka' => 'a|ka' 'akmka' => 'akm|ka' 'ama' => 'a|ma' 'akmma' => 'a|kmma' 'akka' => 'a|kka' 'amkka' => 'am|kka' 'akma' => 'a|kma' 'amkma' => 'am|kma' 'amka' => 'am|ka' 'ammka' => 'amm|ka' 'amma' => 'am|ma' 'ammma' => 'am|mma' 'akkka' => 'a|kkka' … (total rules are less then 20.) Where a – vowel phonemes, k – obstruent consonantal phonemes, m – sonorant consonantal phonemes, | – syllable border. I. Rules for syllabification To create an algorithm we wrote rules for syllabification like this… A stands for vowel. K shows obstruent consonantal phonemes And M - sonorant consonantal phonemes. If there are three consonant phonemes: one is obstruent, one is sonorant and one is again obstruent, first two phonemes will be in one syllable, and syllable border will be before obstruent phoneme. Let me show you an example….

10 Example (“кастрычнік”, eng. “october”)
K004,A232,S002,T002,R022,Y022,CH002,N'004,I342,K000 kakkmakmak ka|kkma|kmak K004,A232,>,S002,T002,R022,Y022,>,CH002,N'004,I342,K000 Example (“кастрычнік”, eng. “october”) WORD кастрычнік (october) is in allophonic format. - K004,A232,S002,T002,R022,Y022,CH002,N'004,I342,K000 Using our rules for syllabification the algorithm transforms it into the following ‘word’ - kakkmakmak. Than syllable borders are put. - ka|kkma|kmak Then again we have our word in the allophonic format but with syllable borders. - K004,A232,>,S002,T002,R022,Y022,>,CH002,N'004,I342,K000

11 I. Tool for syllabification
The work of the algorithm was tested on a special online tool for not only words but for texts.

12 II. Orthoepic Dictionary Generator
Algorithm was inbuilt to online service Orthoepic Dictionary Genarator, that was presented last year. There you can see that we choose an option «First word processing in NooJ format». It means that the first word in the line of the input text will be processed and we will have a transcription of the word written in the NooJ format.

13 II. Correct transcriptions in NooJ format
After processing we get the following material with which we create Dictionary for NooJ. And there you can see that there is no more mistakes with stresses.

14 III. Dictionary in NooJ: ~49 000 nouns
With the help of the service “Orphoepic Dictionary” we can create dictionaries for NooJ with many entries. Thus we made a dictionary with nouns and with verbs. Here you can a dictionary for nouns. It contains more than entries. And you can see that two types of transcription are presented. This one is IPA transcription. 11

15 III. Dictionary in NooJ: ~33 000 verbs
This year we also compiled dictionary for verbs. It contains more than entries. And the same, two types of transcription Cyrillic and IPA transcriptions. With the help of the IPA transcription any one can read a word in Belarusian.

16 Text Annotations: nouns
On this slide text annotation with transcriptions for NOUN is presented.

17 Text Annotations: verbs
On this slide text annotation with transcriptions for verbs is presented.

18 Syllabification algorithm for generation of IPA transcriptions of orthographic words in Belarusian was developed. High-quality tool for generation of phonetic transcription of orthographic words in Belarusian was advanced. Dictionary in NooJ format that include correct phonetic transcriptions in IPA format for nouns and verbs was created. Results of our work will help in introducing and learning the norms of the literary pronunciation of the Belarusian language. Moreover the results can be useful in dealing with other educational and linguistic problems. Conclusion To sum up I’d like to underline the following results. Syllabification algorithm for generation of IPA transcriptions of orthographic words in Belarusian was developed. High-quality tool for generation of phonetic transcription of orthographic words in Belarusian was advanced. Dictionary in NooJ format that include correct phonetic transcriptions in IPA format for nouns and verbs was created. Results of our work will help in introducing and learning the norms of the literary pronunciation of Belarusian language. Moreover the results can be useful in dealing with other educational and linguistic problems.

19 To examine correctness of the IPA transcriptions for NOUN and VERB, to correct mistakes in the rules if any To build NooJ morphology grammar for letter-to-phoneme conversion with right syllable border positions To add IPA transcriptions for Adjectives and Adverbs Plans PLANS To examine correctness of the IPA transcriptions for NOUN and VERB; To correct mistakes in the rules To build NooJ morphology grammar for letter-to-phoneme conversion with right syllable positions To add IPA for Adjecives and Adverbs

20 Hanna Stanislavenka, Stanislau Lysy, Yuras Hetsevich,
Speech Recognition and Speech Synthesis Laboratory, United Institute of Informatics Problems, National Academy of Sciences of Belarus ДЗЯКУЙ ВАМ ЗА ЎВАГУ! [ˈd͡zʲakuj ˈvam ˈza ˈwvaɣu] DĚKUJI VÁM ZA POZORNOST! Děkuji Vám za pozornost!


Download ppt "ADDITION OF IPA TRANSCRIPTION TO THE BELARUSIAN NOOJ MODULE"

Similar presentations


Ads by Google