Presentation is loading. Please wait.

Presentation is loading. Please wait.

Olga Pustylnikov, Alexander Mehler Bielefeld University A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data.

Similar presentations


Presentation on theme: "Olga Pustylnikov, Alexander Mehler Bielefeld University A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data."— Presentation transcript:

1 Olga Pustylnikov, Alexander Mehler Bielefeld University A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data

2 SFB 673 Motivation  Exploring similarities among languages by means of syntactic treebanks  We collected a database covering 11 languages  Treebanks have been developed separately by different research projects  quantitative investigations on these treebanks -> the need for unification

3 SFB 673 Motivation John loves Mary Mary John loves 1 John n 2 2 loves v 0 3 Mary n 2 John loves <W DOM="2" ID="3“ Mary (loves v ( (John n) (Mary n) ) corpusstructureannotation

4 SFB 673 Motivation (+) generic: allowing to represent as many treebanks as possible (+) extensible to new treebanks (+) complete: preserving all corpus specific information (+) transferable to other kinds of corpora (–) complex: exhibiting the minimal complexity -> graph representations Demands on the unified format of treebanks

5 SFB 673 Motivation  Graph eXtensible Language is a graph model representig corpora in terms of graphs XML GXL WIKI Multimodal Data Treebanks TOOLS GXL (Holt et al., 2006)  GXL can be applied to any kinds of corpora. (See e.g. Mehler and Gleim (2005), Ferrer i Cancho et al. (2007), Pustylnikov and Mehler (2008)) Treebanks eGXL

6 1. eGXL 2. Data 3. Complexity Evaluation 4. Application 5. Conclusion SFB 673 Agenda

7 SFB 673 eGXL Sentences Types IDREF …...... 2-level data model

8 SFB 673 eGXL Sentences Types IDREF …...... 2-level data model

9 SFB 673 The eGXL Types-graph  The Types-graph contains treebank specific attributes (e.g.POS, morphological attribute etc.) -> nodes  Each instance of an attribute is given a unique identifier … a unique identifier the value of the attribute a unique identifier the value of the attribute

10 SFB 673 The eGXL Sentences-graph vill Dettabestämtjagbemöta....... each token of a treebank word form an IDREF to the POS-node of the Types-graph a (syntactic) relation from (e.g. a head verb) to (e.g. a dependent argument) from (e.g. a head verb) to (e.g. a dependent argument)

11 SFB 673 The eGXL Sentences-graph nodeeach token of a treebank ida unique identifier formword form posan IDREF to the POS-node of the Types-graph rela (syntactic) relation relenda relation anchor infrom (e.g. a head verb) outto (e.g. a dependent argument) vill Dettabestämtjagbemöta.......

12 SFB 673 eGXL

13 1. eGXL 2. Data 3. Complexity Evaluation 4. Application 5. Conclusion SFB 673 Agenda

14 SFB 673 11 Dependency Treebanks 7 different formats

15 SFB 673 Input vs. Output Formats Examples from Dutch, Swedish, Italian treebanks

16 SFB 673 Unification is possible… … due to the separation of the core from the secondary parts …...... diversity commonality

17 SFB 673 The TreebankWiki http://ariadne.coli.uni-bielefeld.de/wikis/treebankwiki/

18 1. eGXL 2. Data 3. Complexity Evaluation 4. Application 5. Conclusion SFB 673 Agenda

19 SFB 673 Complexity of eGXL Logical Scalling Factor (LSF): number of logical elements (e.g. XML-element) required to represent a treebank unit (e.g. a word form, POS etc.) noderel eGXLothereGXL other

20 1. eGXL 2. Data 3. Complexity Evaluation 4. Application 5. Conclusion SFB 673 Agenda

21 SFB 673 DTDB

22 1. eGXL 2. Data 3. Complexity Evaluation 4. Application 5. Conclusion SFB 673 Agenda

23 SFB 673 Conclusions  a database covering 11 languages  eGXL – a generic XML graph model adopted to syntactic treebanks  use of treebanks within a single application (Ariadne) olga.pustylnikov@uni-bielefeld.de alexander.mehler@uni-bielefeld.de ruediger.gleim@uni-bielefeld.de SFB 673 Thank you for your attention!


Download ppt "Olga Pustylnikov, Alexander Mehler Bielefeld University A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data."

Similar presentations


Ads by Google