Download presentation
Presentation is loading. Please wait.
Published byHoratio Webster Modified over 9 years ago
1
1 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Automatic Functor Assignment (AFA) in the Prague Dependency Treebank PDT : –a long term research project –at the Institute of Formal and Applied Linguistics –aimed at a complex annotation of a part of the Czech National Corpus –annotation scheme - 3 levels: Functors: –actants: ACT, PAT, ADDR, EFF, ORIG –free modifiers: TWHEN, LOC, DIR1, BEN, APP, CPR... Raw text Morphologically tagged text Analytic tree structures (ATS) Tectogrammatical tree structures (TGTS) AFA‘s position within the PDT
2
2 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Problem analysis, Data preprocessing Motivation –to reduce the huge amount of human work involved in the development of the PDT Problem statement –to assign a functor to every node in a TGTS Initial situation –no AFA system with a reasonable cover existed –human annotators use mostly only their language knowledge, not “formal“ rules –annotators take into account the whole-sentence context –a certain amout of manually annotated TGTSs are available What is the minimal amount of information that is sufficient to decide about the functor ? Problem reformulation –AFA to classify symbolic vectors into 53 classes Available material - 18 files (up to 50 sentences in each) –imperfect: incomplete, ambiguous –divided into two parts: testing set - 15 files (6049 vectors) training set - 3 files (1089 vectors) + feature selection feature extraction vectors with 12 symbolic attributes
3
3 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Components of the proposed AFA system Symbiosis of 4 different approaches: –7 Rule-based Methods (RBMs) –3 Dictionary-based Methods (DBMs) –Nearest vector (similarity) –Machine learning (Quinlan‘s C4.5, Sašo Džeroski) Implementation: –a set of small programs for preprocessing and format conversions, dictionary mining, functor assigning, and performance evaluation –Linux filters, Perl, SQL –assigners are applied in a strictly pipelined fashion Data Flow Diagram:
4
4 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Performance evaluation Detailed evaluation of several quantities for each assigner in a sequence Several sequences of assigners were tested – e.g., a sequence of RBMs: Comparison of different sequences of assigners:
5
5 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Further work Machine learning - searching for new regularities Improvement of dictionaries Tectogrammatical annotation of verb valency frames Categorial grammars Talks & Publications ZŽ: AFA in the PDT, TSD 2000 ZŽ: Introduction to the PDT, Faculty of Arts, Ljubljana, 2000 ZŽ: AFA in the PDT, seminar at the IFAL, 2000 S. Džeroski, ZŽ: ML approach to AFA in the PDT, 5th TELRI seminar, 2000 S. Džeroski, ZŽ: ML approach to AFA in the PDT, ACL, 2001 Straňáková, Skoumalová, Panevová, ZŽ: Tectogram. annotation of verb. val. frames, TSD 2001 ZŽ: Fuzzy ontroller as a Tool for Traffic Simulation. Mendel 1999 ZŽ: Constrained Fuzzy Arithmetic: Engineer’s View. CMP Research Rep. M. Navara, ZŽ: Comp. Problems of CFA, ISCI 2000 ZŽ: Comp. Problems of CFA,CMP seminar M. Navara, ZŽ: How to make CFA efficient, Soft Computing 2001 M. de Cock, ZŽ: Representing Ling. Hedges by L-Fuzzy Modifiers, CIMCA 2001 language fuzzy sets ? ? ?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.