Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lexicon-Grammar of Russian Verbal Idioms Tetyana Fukova University of Algarve - FCHS (Portugal) Supervisors: Jorge Baptista, University of Algarve – FCHS.

Similar presentations


Presentation on theme: "Lexicon-Grammar of Russian Verbal Idioms Tetyana Fukova University of Algarve - FCHS (Portugal) Supervisors: Jorge Baptista, University of Algarve – FCHS."— Presentation transcript:

1 Lexicon-Grammar of Russian Verbal Idioms Tetyana Fukova University of Algarve - FCHS (Portugal) Supervisors: Jorge Baptista, University of Algarve – FCHS and INESC-ID Lisboa - Spoken Language Lab (Portugal) Svitlana Chornobay, Crimean Federal University (Crimea)

2 idioms держать язык за зубами (derzhat’ jazyk za zubami) hold V one’s tongue C-acc behind Prep one’s N-gen teeth C-acc ‘keep one's tongue between one's teeth’ verbal idioms ( frozen sentences): Frozen sentences are elementary sentences where the main verb and at least one of its argument are distributionally invariable; usually, the global meaning of the expression cannot be calculated from the individual meaning of its component elements when they are used independently: 2

3 objectives to determine the relevant linguistic information required to identify Russian verbal idioms in texts to formalize that information into a database of idioms to build a library of finite-state transducers (FST) to process those idioms in real texts to evaluate the performance of the FST library 3

4 o available linguistic resources phraseological dictionaries (Molotkov, 1986; Fedosov and Lapisky, 2003) the machine readable dictionary (distributed with Unitex) o linguistic development platform U NITEX (Paumier 2003, 2014) linguistic resources and tools 4

5 methods about 1,000 verbal idioms collected from phraseological dictionaries (current, frequent) classified using the Lexicon-Grammar framework (M. Gross 1982, 1996) formalized in tabular format, forming a database aimed at computational processing of texts fine-grained description of the idiomatic expressions: - syntactic structure - lexical content of frozen elements, - distributional constraints on free syntactic slots (± human) - transformational properties (Passive, permutation) 5 data collection

6 methods data collection each idiom: -entry (verb in infinitive) -word-by-word translation with relevant morphosyntactic information (e.g. case) -free translation (gloss) or English equivalent -illustrative example 6

7 methods inspired in M. Gross (1982, 1996) proposal for French idioms already adapted for other languages (including non-Latin languages) -French (the four main varieties): French, Belgium, Switzerland and Québec (Lamiroy 2010) -Greek (Fotopoulou 1997) -Italian (Vietry 2015) -Portuguese, both European and Brazilian (Baptista et al. 2004, 2014;Vale 2001) 7 classification

8 methods M. Gross (1982) original classification proposal, based on: -number of free (N) and frozen (C) slots -their position as subject, first, or second complement -the prepositions introducing the complements / case -transformational properties (e.g. Passive, permutation, etc.) adapted to Russian in order to encompass CASE 8 classification

9 Classification of Russian verbal idioms ClassStructure Example Count C1 N 0 V C-acc 1 Бить баклуши (bit’ baklushi) N 0 beat/V spoons/C 1 -acc ‘to twiddle one's thumbs, to be idle’ 245 CP1 N 0 V (Prep 1 ) C 1 Влететь в копеечку (vletet’ v kopeechky) N 0 fly/V in/Prep penny/C 1 -acc ‘to cost smb. a pretty penny’ 298 CAN N 0 V (C-acc N-gen) 1 = N 0 V (C-acc 1 N-dat 2 ) Заговаривать зубы (zagovarivat’ zubi) N 0 talk/V teeth/C 1 -acc smb/N-dat|gen ‘distract the interlocutor by talking about extraneous matters’ 28 CPN N 0 V Prep (C-acc N- gen) 1 Играть на нервах (igrat’ na nervah) N 0 play/V on/Prep nerves/C 1 -obliq N-gen ‘to jangle on someone's ears/nerves’ 28 9

10 ClassStructure Example Count C1PN N 0 V C-acc 1 (Prep 2 ) N 2 Задать пару (zadat’ paru) N 0 set/V steam/C 1 -acc smb/N 2 -dat ‘to give smb. hell’ 80 CNP2 N 0 V N-acc 1 (Prep 2 ) C 2 Взять под крыло (vzyat’ pod krilo) N 0 take/V smb/N 1 -acc under/prep wing/C 2 -acc ‘to take smb. under one's wing’ 187 C1P2 N 0 V C-Acc 1 (Prep 2 ) C 2 Брать быка за рога (brat’ bika za roga) N 0 take/V bull/C 1 -acc of/Prep horns/C 2 -acc ‘to take the bull by the horns’ 98 CPP N 0 V w (Prep 1 ) C 1 (Prep 2 ) C 2 Лезть в душу без мыла (lezt’ v dushu bez mila) N 0 get/V into/Prep soul/C 1 -acc without/Prep soap/C 2 -gen ‘to try to gain smb.'s favor or trust by cunning’ 15 CADV N 0 V Adv 1 w Выходить боком (vihodit’ bokom) N 0 appear/V sideways/Adv ‘to turn out badly’ 23 Total 1,002 10 Classification of Russian verbal idioms

11 methods Corpus collection and annotation Russian National Corpus (www.ruscorpora.ru) 10 most frequent verbs from the lexicon-grammar -держать (derzhat) ‘to hold’, идти (idti) ‘to go’, играть (igrat) ‘to play’, бить (bit) ‘to beat’, смотреть (smotret) ‘to look’, класть (klast) ‘to put’, лезть (lezt) ‘to climb’, лежать (lezhat) ‘to lie’, выйти (viiti) ‘to go out’, жить (zhit) ‘to live’. -excluding verbs that are often support verbs (M.Gross 1996): брать (brat) ‘to take’, давать (davat) ‘to give’; and делать (delat) ‘to do’ 10

12 top search results for each verb lemma (and inflected forms); random selection of 50 sentences  Corpus 1 manual annotation of the idioms found (_idiom_) goal: to have a glimpse of the degree of completeness of the lexicon-grammar built so far top search results for each verb lemma+constant (frozen head noun), allowing a window of up to 3 words; random selection of 50 sentences  Corpus 2 manual annotation of the idioms found (_idiom_)/; #literal# goal: to evaluate the adequacy of the FST approach to the task of identifying verbal idioms in texts. 12 methods Corpus collection and annotation

13 Corpus 1 : data collection (from Russian National Corpus). verbtranslitglossRNC in sample (n=50) diff. idioms diff. LG entries Total LG entries w/ V держатьderzhat'to hold50,64354433 идтиidtito go241,22511123 игратьigrat'to play66,07700014 бить смотреть bit' smotret' to beat to look 33,393 157,516 6 06 0 5 05 0 4 04 0 12 11 кластьklast'to put10,45811111 лезтьlezt'to climb11,2733339 лежатьlezhat'to lie80,2352229 выйтиviitito go out165,3111118 житьzhit'to live187,8412227 Total1,003,972211918137 13

14 Corpus 2 : data collection (from Russian National Corpus). VerbTranslitGlossdiff. idioms matchesidioms 50 Держатьderzhat''hold/keep'291,048594 Идтиidti'go'23988490 Игратьigrat''play'13468189 Битьbit''beat'11501295 Кластьklast''put'922971 Лезтьlezt''climb'5225154 Лежатьlezhat''lie'6253152 Выйтиviiti'go out'6229151 Житьzhit''live'619380 Total1174,4302,334 14

15 methods building reference graphs (Class C1) 15

16 methods 16 building reference graphs (Class CP1)

17 methods 17 building reference graphs (Class C1P2)

18 methods resulting graph (C1) бить баклуши, N 0 beat/V spoons/C 1 ‑ Acc ‘be idle’ 18

19 The Unitex lexical resource for the Russian - sample dictionary (Nagel, 2002) built from the vocabulary of Dostoevsky’s novel Игрок (Igrok) ‘The Gambler’ Passive voice - ‘P’ same verb form - different lemmas or different inflection codes: бросался,бросать.V+nsv+tr:PeMVi, ‘throw’ бросался,бросаться.V+intr+nsv:AeMVi relative position of semantic information on transitivity is not consistent the passive code ‘P’ corresponds, in fact, not only to a passive construction but also to an active ‑ reflexive construction 19 Representation of Passive in the lexicon

20 we rendered the dictionary notation formally consistent we established a clear distinction, whenever it was possible, between ‘P’=passive and ‘P’=reflexive values of the suffix –ся/сь (sya/s’) adapted the dictionary to produce revised lexical resources: -the dictionary of text, excluding all verb forms (same as original); -the dictionary of verbs without ся/сь (sya/s’) suffixes; - the dictionary of forms ending in ся/сь (sya/s’) 20

21 Evaluation 21 CorpusClassTPFPFNPrecisionRecall F ‑ measure Corpus 1 C137081,000,820,90 CP114001,00 C1P27001,00 Total58081,000,880,94 Corpus 2 C1251100100,720,960,82 CP175210590,880,990,93 C1P21722830,860,980,92 Total1175233220,830,980,90

22 Conclusions We have presented the project of building a database of Russian verbal idioms: more than 1,000 entries collected from dictionaries (ongoing) built a Lexicon-Grammar for those idioms (with morhposyntactic information and examples) adopted M.Gross (1982) formal classification (the contribution was made to adapt it to a typologically distinct language) Produced a detailed description of each class and provided examples for each idiom (doesn’t exist yet in Russian) built reference graphs for the largest classes (C1, CP1, C1P2) improved the base dictionary provided with Unitex two experiments with 2 corpora (aimed at estimating LG coverage and FST precision) 22

23 Future work Extend the lexical coverage of the lexicon-grammar Build FSTs for the remaining classes in the LG Address free-order syntax of sentential constituents in Russian Address incompleteness and technical details of the base dictionary, distributed with Unitex Describe idioms corresponding to support verb constructions Include verbal idioms with frozen subject (C0x) Signal the ambiguity between idiomatic and literal meaning of the idioms 23

24 This work can be considered a first attempt at the automatic identification and detection of Russian verbal idioms. Much is still to be done. The following publications have been produced during the course of this project: FUKOVA, T., CHORNOBAY, S., BAPTISTA, J. 2016. Lexicon-Grammar of Russian verbal idioms. Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives. Proceedings from Europhras 2015, Malaga, Spain (June 30, 2015).Tradulex: Geneva, pp. 139-153 FUKOVA, T., CHORNOBAY, S., BAPTISTA, J. (to appear). Classification of Russian verbal idioms. Paper presented at the Web Conference: International scientific congress «Foreign Philology. Social and national variability of language and literature», Crimean Federal V.I. Vernadsky University (April 27, 2016) 24 Final words

25 Obrigada Thank you ! Cпасибо 25

26 BAPTISTA, J., CORREIA A. AND FERNANDES G. 2004. Frozen Sentences of Portuguese: Formal Descriptions for NLP, ACL: Barcelona. 54. BAPTISTA, J. Compositional vs. Frozen Sequences. Laporte, Eric; Ting Au-Chen, (eds).Proceedings of the Lexicon-Grammar Workshop. Beijing 14-18 de Outubro de 2004. Journal of Applied Linguistics, Special Issue on Lexicon-Grammar. Papers presented at the Lexicon-Grammar Workshop, pp. 81-93.(Chinese version) CHORNOBAY, S., BAPTISTA, J.. Semantic Peculiarities of Portuguese and Russian Idioms within the Conceptual Domain "Death". International Scientific Conference Modern Philology: Paradigms, trends, problems (Міжнародна наукова конференція "Сучасна філологія: парадигми, напрямки, проблеми"), October 9, 2014, Kiev, Kyiv National Taras Shevchenko University. COWIE, A. 1998. Phraseology. Theory, analysis, and applications. Clarendon press. Oxford. GROSS, M. 1982. Une classification des phrase “figées” du français. Revue Québécoise de Linguistique 11 ‑ 2: pp. 151 ‑ 185. GROSS, M.1996. Lexicon-Grammar. Concise Encyclopedia of Syntactic Theories. Cambridge. Pergamon. pp.244-258. FEDOSOV, I. and LAPITSKY, A. 2003. Phraseological dictionary of the Russian language. (Федосов, И. и Лапицкий, А. Фразеологический словарь русского языка). Moscow: Unves. references 26

27 references FOTOPOULOU, A. 1993. Une classification des phrases a complèments figés en grec moderne: étude morphosyntaxique des phrases figées. Ph.D. thesis, Université Paris VIII. LAMIROY, B., KLEIN, J. ‑ R. 2010. Lamiroy, Béatrice, and Jean ‑ René Klein. Les expressions ver­bales figées de la francophonie: Belgique, France, Québec et Suisse. Editions OPHRYS. MOLOTKOV, A. 1986. Phraseological dictionary of the Russian language (Молотков, А., Фразеологический словарь русского языка). Moscow: ACT. PAUMIER, S. 2003. De la reconnaissance des formes linguistiques à l’analyse syntaxique. PhD thesis, Université de Marne-la-Vallée, 2003. PAUMIER, S. 2014. Unitex 3.0 - User’s Manual. Paris: Université Paris-Est Marne-la-Vallée. VALE, O. 2001. Expressões Cristalizadas do Português do Brasil: Uma proposta de tipologia. Araraquara, SP (Brasil): Universidade Estadual Paulista. VIETRI, S. 2015. Idiomatic Constructions in Italian. A Lexicon ‑ Grammar Approach. Amsterdam: John Benjamins. 27


Download ppt "Lexicon-Grammar of Russian Verbal Idioms Tetyana Fukova University of Algarve - FCHS (Portugal) Supervisors: Jorge Baptista, University of Algarve – FCHS."

Similar presentations


Ads by Google