Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.

Similar presentations


Presentation on theme: "1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010."— Presentation transcript:

1 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010

2 2 Overview General Standardized DCs? Multiple relevant DCs in ISOCAT Overlap with other projects Container Data Catgegories Almost Identical DCs Language Sections Existing Tagsets

3 3 General Always try to map to an existing ISOCAT DC, –Where possible –Irrespective of whether the ISOCAT DC is part of an official standard If not possible, or if there is uncertainty –Create a new DC, but –Also specify the relation with existing closely related ISOCAT DCs. Provide Type of the relation –dropdown list to be provided by RELCAT developers, »E.g. equals, almost-equals, is hyponym of, is hyperonym of, etc. Textual clarification of the deviation

4 4 General Relation to be entered into Relation Registry (RR) as soon as it is available Temporarily Proposed notation: –recordset in CSV format with records consisting of 4 fields: Relation type (from drop-down list; should be ISOCAT DCs themselves) Data-category 1 (ISOCAT PID) Data-category 2 (ISOCAT PID) Clarification (rich text) Plus some administrative info: User id, creation date etc. –To import into RR as soon as available

5 5 Standardized DCs? Ignore +/- standard status of DC in ISOCAT If needed, use relations in Relation Registry

6 6 Multiple ISOCAT DCs Map to an existing DC that is identical (wherever possible) Use relations to relate it to almost identical DCs in ISOCAT

7 7 Overlap with other projects Consult with other projects Registry of topics people/projects are working on –Dieter took some initiative –http://spreadsheets.google.com/ccc?key=0Al5Lw- npZ6ZTdDZlT2VjeGhwZm5iRW5IM3BTZFI5WEE &hl=en&authkey=CL_Wl4IDhttp://spreadsheets.google.com/ccc?key=0Al5Lw- npZ6ZTdDZlT2VjeGhwZm5iRW5IM3BTZFI5WEE &hl=en&authkey=CL_Wl4ID This workshop (and others if needed)

8 8 Container data categories ISOCAT might be extended for this Probably not really a problem in the short term(?)

9 9 Almost identical DCs For ill-defined DCs in ISOCAT –Suggest better definitions and submit them to the Thematic Domain Group –Use relations to relate your DC to existing slightly different DCs (see later)

10 10 Almost identical DCs Example: Noun Noun is a Part of Speech assigned to words which share specific morphosyntactic (inflectional), morphological, syntactic (and semantic) properties –morphosyntactic (inflectional) properties: person, number, gender/class. declension class, case, … Specific morphological combinatorial potential (derivation, compounding), in particular diminutives, augmentatives specific syntactic combinatorial potential Where each language selects a specific subset of these properties (as illustrated in the language sections.

11 11 Language Sections? The highly (Polish) language-specific –http://www.isocat.org/datcat/DC-2704 (noun)http://www.isocat.org/datcat/DC-2704 Noun [subst] contains lexemes infecting for number and case, with a lexically determined grammatical gender, which do not have the category of person, e.g., woda `water', profesor `professor', pięciokrotność 'fivefoldness'; this class also contains defective plurale tantum and singulare tantum lexemes, but not depreciative lexemes. Grammatical categories of noun [subst]: number (http://www.isocat.org/datcat/DC-2709), case (http://www.isocat.org/datcat/DC-2720), gender (http://www.isocat.org/datcat/DC-2728).http://www.isocat.org/datcat/DC-2728 Can now be part of the Polish language section of the DC Noun with the definition given in the previous slide

12 12 Existing Tagsets Make sure all DCs of an existing de facto standard tag set are in ISOCAT –Either existing DCs –Or newly added DCs Assign all DCs from such a tag set to a new closed complex category –E.g. DC d-coiTagset, ipipanTagset, etc. –(and/or to datacategory set?)

13 13 More… Problems and Proposed solutions –Odijk (2009), “Data Categories and ISOCAT: some remarks from a simple linguist", presentation held at FLaReNet/CLARIN Standards Workshop, Helsinki, September 27, 2009 –Odijk, J. (2010), ""Relations between Data Categories, presentation held at the CLARIN Relation Registry Workshop, MPI, Nijmegen, January 8, 2010 Both to be found (inter alia) on http://www.clarin.nl/node/80http://www.clarin.nl/node/80

14 14 CLARIN-NL Thanks for your attention! http://www.clarin.nl/


Download ppt "1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010."

Similar presentations


Ads by Google