Presentation is loading. Please wait.

Presentation is loading. Please wait.

Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright

Similar presentations


Presentation on theme: "Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright"— Presentation transcript:

1 Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
ISOcat Data Category Registry Defining widely accepted linguistic concepts Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright CLARIN-NL Info dag, 1 July 2009

2 ISOcat: a reference implementation
Terminology and other content and language resources — Specification of data categories and management of a Data Category Registry for language resources ISO 12620:1999 was a fixed list of data categories, this revision provides a data model and management procedures ISO Technical Committee 37 Terminology and other language and content resources CLARIN-NL Info dag, 1 July 2009

3 ISO 24613:2008 Lexical Markup Framework
Lexicon 1..* Lexical Entry partOfSpeech writtenForm grammaticalNumber lexicalType Word Form Lemma 1..* 0..* Form Sense 0..* CLARIN-NL Info dag, 1 July 2009

4 Data categories “result of the specification of a given data field ” (ISO 12620:2009) data element concept (ISO 11179) “concept for which the definition, identification and conceptual domain are specified independently of any particular representation” complex data categories are data element concepts CLARIN-NL Info dag, 1 July 2009

5 Data category types complex: open closed constrained simple:
writtenForm string open grammaticalGender string neuter masculine feminine closed string constrained Constraint: simple: CLARIN-NL Info dag, 1 July 2009

6 Data category specification
Administration Information Section Description Section Data Element Name Language Section Name Section Conceptual Domain Linguistic Section Mandatory: A mnemonic identifier An English definition An English name A conceptual domain CLARIN-NL Info dag, 1 July 2009

7 Data Category Selections
Anyone can register with ISOcat can create data categories can create data category selections (DCSs) can share DCSs can make DCSs public can submit DCSs for standardization CLARIN-NL Info dag, 1 July 2009

8 ISO standardization process
Submission group Thematic Domain Group Evaluation Data Category Registry Board Validation Stewardship group ISO Publication CLARIN-NL Info dag, 1 July 2009

9 Using data categories Each data category has a Persistent Identifier (PID): This PID can be embedded in the schemata of linguistic resources: <rng:element name=“gender” dcr:datcat=“…/DC-1297”> The full data category specification can be downloaded from ISOcat in the Data Category Interchange Format (DCIF) CLARIN-NL Info dag, 1 July 2009

10 ISOcat demonstration http://www.isocat.org/
CLARIN-NL Info dag, 1 July 2009

11 Status of ISOcat ISOcat is under active development: Now: Future:
You can access public data categories and selections You can create your own data categories and selections Future: Group features Cleanup by TDGs Standardization workflow CLARIN-NL Info dag, 1 July 2009

12 Relation Registry ISOcat contains a flat list of concepts
The Relation Registry will support storing (user-specific) relations between these concepts is-a part-of equivalent-to related-to Will support: Ontologies and taxonomies on top of data categories Searches across related data categories CLARIN-NL Info dag, 1 July 2009

13 Thanks for your attention!
CLARIN-NL Info dag, 1 July 2009


Download ppt "Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright"

Similar presentations


Ads by Google