Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards an open-source universal-dependency treebank for Erzya

Similar presentations


Presentation on theme: "Towards an open-source universal-dependency treebank for Erzya"— Presentation transcript:

1 Towards an open-source universal-dependency treebank for Erzya
Jack Rueter and Francis Tyers IWCLUL 2018 January 8, 2018

2 Erzya UD Background Open-source Finite-state Morphological description (hfst at Giellatekno) Open-source original-language publications Three months into the project Useful tools Dependency relations A few issues Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018

3 Open-source original-language publications
Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018

4 Three months into the project
Native Prose from 5 authors Sentences: 1398 Tokens: 14109 Sentence length 2-60 tokens Dependency relations: 62 (+ 16 nonce) Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018

5 Initial work with constraint grammar disambiguation (Giellatekno)
Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018

6 Annotatrix Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018

7 Dependency relations 3717 punct, 1526 root, 1482 obl, 1297 nsubj, 1125 advmod, 988 conj, 742 obj, 517 case, 435 nmod, 387 amod, 340 aux:neg, 309 cc, 279 det, 265 advcl, 219 xcomp, 197 compound, 158 parataxis, 115 nummod, 110 mark, 108 discourse, 102 appos, 99 fixed, 97 ccomp, 81 acl, 79 nsubj:exist, 61 vocative, 57 nsubj:cop, 52 aux, 47 cop, 46 acl:relcl, 43 acl:conv, 35 obl:tmod, 31 flat, 29 advmod:comp, 28 orphan, 27 cop:exist, 21 _, 16 xcomp:ds, 16 csubj, 11 obl:agent, 11 flat:name, 11 compound:appos, 9 dislocated, 9 aux:q, 8 nmod:gsubj, 8 nmod:gobj, 7 expl, 7 cop:negexist, 7 compound:svc, 7 cc:preconj, 6 nmod:comp, 6 advmod:tmod, 5 nmod:own, 5 compound:coll, 4 obl:exist, 4 nmod:bahuv, 3 cop:neg, 2 compound:redup, 2 aux:opt, 2 advcl:tcl, 2 advcl:conv Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018

8 Special Dependency relations
340 aux:neg, 79 nsubj:exist, 57 nsubj:cop, 46 acl:relcl, 43 acl:conv, 35 obl:tmod, 29 advmod:comp, 27 cop:exist, 16 xcomp:ds, 11 obl:agent, 11 flat:name, 11 compound:appos, 9 aux:q, 8 nmod:gsubj, 8 nmod:gobj, 7 cop:negexist, 7 compound:svc, 7 cc:preconj, 6 nmod:comp, 6 advmod:tmod, 5 nmod:own, 5 compound:coll, 4 obl:exist, 4 nmod:bahuv, 3 cop:neg, 2 compound:redup, 2 aux:opt, 2 advcl:tcl, 2 advcl:conv Do we need them? Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018

9 A few issues Copula numerals
Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018

10 copula Dependent vs independent morphology
Equative, classification, Locative, Theme vs rheme 79 nsubj:exist, 57 nsubj:cop, 27 cop:exist, 7 cop:negexist, 3 cop:neg, Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018

11 numerals Nummod Advmod Xcomp Amod
Cardinal numerals, distributive numerals, collective/Set, ???iteratives Advmod multiplicatives Xcomp Secondary predication with associative collectives Amod Ordinals Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018

12 Сюкпря Кунсоломань кисэ
Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018

13 Special Thanks Full Finite-State Syntax: Capturing Ambiguity and Crossing Dependencies with Minimal Descriptive Complexity (Anssi Yli-Jyrä) Oct-Dec, 2017 HFST Giellatekno-Divvun infrastructure Annotatrix Universal dependencies community (Finnish, Estonian, Swedish, English, French, russian, North Sami, german, hungarian) Jack Rueter University of Helsinki, and Francis Tyers Higher School of Economics, Moscow. IWCLUL January 8--9, 2018


Download ppt "Towards an open-source universal-dependency treebank for Erzya"

Similar presentations


Ads by Google