Presentation is loading. Please wait.

Presentation is loading. Please wait.

Universal Dependencies

Similar presentations


Presentation on theme: "Universal Dependencies"— Presentation transcript:

1 Universal Dependencies
Joakim Nivre Uppsala University

2 Universal Dependencies
Background: Treebank annotation schemes vary across languages Hard to compare results across languages [Nivre et al. 2007] Hard to evaluate cross-lingual learning [McDonald et al. 2013] Hard to build multilingual systems Universal Dependencies ( Stanford universal dependencies [de Marneffe et al. 2014] Google universal part-of-speech tags [Petrov et al. 2012] Interset morphological features [Zeman 2008] First guidelines released Oct 1, 2014 First 10 treebanks released Jan 15, 2015

3 Universal Dependencies
Syntactic words – explicit splitting of clitics and contractions Universal part-of-speech tags + morphological features Dependency tree + augmented dependencies (not shown)

4 Goals Cross-linguistically consistent grammatical annotation
Support multilingual NLP and linguistic research Build on common usage and existing de-facto standards Complement – not replace – language-specific schemes Open community effort – anyone can contribute

5 Guiding Principles Maximize parallelism
Don't annotate the same thing in different ways Don't make different things look the same Don't annotate things that are not there Languages select from a universal pool of categories Allow language-specific extensions

6 Design Principles Dependency Lexicalism Recoverability
Widely used in practical NLP systems Available in treebanks for many languages Lexicalism Basic annotation units are words – syntactic words Words have morphological properties Words enter into syntactic relations Recoverability Transparent mapping from input text to word segmentation

7 Morphological Annotation
Le La DET Definite=Def Gender=Masc Number=Sing chat NOUN chasse chasser VERB Mood=Ind Person=3 les le chiens chien Number=Plur . PUNCT Lemma represent the semantic content of a word Part-of-speech tag represent its grammatical class Features represent lexical and grammatical properties of the lemma or the particular word form

8 Syntactic Annotattion
Content words are related by dependency relations Function words attach to the content word they modify Punctuation attach to head of phrase or clause

9 CoNLL-U Format 1 Le DET _ 2 det chat NOUN 3 nsubj boit boire VERB Root
ID FORM LEMMA UPOSTAG XPOSTAG FEATS HEAD DEPREL DEPS MISC 1 Le DET _ 2 det chat NOUN 3 nsubj boit boire VERB Root 4-5 du 4 de De ADP 6 case 5 le DEP lait Lait obj SpaceAfter=no 7 . PUNCT Punct

10 Dependency Structure English Swedish Keeping content words as heads promotes parallelism Function words often correlate with morphology

11 Dependency Relations [de Marneffe et al. 2014]
Taxonomy of 42 universal grammatical relations, broadly supported across many languages in language typology Language specific subtypes can be added

12 Morphology: POS Open class words Closed class words Other ADJ ADV INTJ NOUN PROPN VERB ADP AUX CONJ DET NUM PART PRON SCONJ PUNCT SYM X Taxonomy of 17 universal part-of-speech tags, based on the Google Universal Tagset [Petrov et al. 2012]

13 Morphology: Universal Features
Standardized inventory of morphological features, based on the Interset system [Zeman 2008] Lexical features Inflectional features Nominal* Verbal* PronType Gender VerbForm NumType Animacy Mood Poss Number Tense Reflex Case Aspect Foreign Definite Voice Abbr Degree Evident Polarity Person Polite

14 Morphology: Examples la Definite=Def|Gender=Fem|Number=Sing|PronType=Art hanno Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin fatto Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part casa Gender=Fem|Number=Sing


Download ppt "Universal Dependencies"

Similar presentations


Ads by Google