ISO a tutorial Part 2: Representing data categories TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria
Why formalizing DatCats? 4 Systematizing data category description: –Notion of Data Category Registry (DCR) I need a data category: is it there? –Query by name, definition etc. 4 Automatizing processes: –Format control of TMLs –Filters from one TML to GMT
Which model for DatCats? 4 Using XML: –Coherence with TMF principles –Using stylesheet to generate schemas and filters 4 Using RDF (Resource Description Framework) –Intended format for representing meta-data: Description of a DatCat is meta-data with regards TMF
RDF - a quick presentation Cf. other file
Data Categories A Formal Description
Data Category Registry dcsd:DataCategory rdf:about Data Category DCRegistry Description VersionNumber dcsd:VersionNumber
Data Category description DCDefinition DCName Content dcsd:DCDefinition dcsd:DCName dcsd:Content dcsd:DCIdentifier dcsd:Level DCType (S, C) dcsd:DCType Salt /SEW dcsd:DCAdmin DCComment dcsd:DCComment Data Category Locus DCAdmin DCIdentifier DCParent dcsd:DCParent DCExample dcsd:DCExample
Simple and complex DatCats 4 Complex data categories –shall serve as field identifiers (not names) in databases and can have content. The datatype for this content shall be declared for each data category and can commonly take the form of different categories of text, defined data types (such as dates), and specified data domains, e.g., picklists comprising standardized permissible instances. »Example: /Part of Speech/ 4 Simple data categories – shall serve as the content of complex data categories. »Example: /Noun/, /Verb/, /Adjective/ etc.
Levels and content Content DataType TargetType Ref to other datcat(s) dcsd:DataType dcsd:TargetType rdf:Alt rdf:li List of References Ref to other datcats rdf:Alt rdf:li Level/Loci rdf:Alt Ref to other datcat(s) rdf:li List of References
Administrative properties dcsd:DCAdmin Data Category DCAdmin Status dcsd:Status StatusDate dcsd:StatusDate StatusNote dcsd:StatusNote EditionDate dcsd:EditionDate ShortFormAdmittedNameForbiddenName Source dcsd:Source VariantNames dcsd:VariantNames Dcsd:ShortForm Dcsd:AdmittedName Dcsd:ForbiddenName
RDF Representation
/term/ - RDF description (1) <dcsd:DataCategory dcsd:DCIdentifier="ISO12620A01" dcsd:DCName="term" dcsd:position="A.01" dcsd:DCType="C"> A verbal designation of a general concept in a specific subject field For definition of related term, see ISO , Terms can consist of single words or be composed of multiword strings… "radix" in annex C, figure C.1. A.1
/term/ - RDF description (2) TL TC <dcsd:DCAdmindcsd:OrgSource="ISO TC 37" dcsd:DocSource="ISO12620:1999" dcsd:subDate=" SEW" dcsd:registryComment="Prepared " dcsd:Status="Accepted"/>
/term type/ - RDF description (1) <dcsd:DataCategorydcsd:DCIdentifier="ISO12620A0201" dcsd:DCName="term type" dcsd:position="A.02.01" dcsd:DCType="C"> An attribute assigned to a term A.2.1 ISO12620A ISO12620A ISO12620A020119
/term type/ - RDF description (2) TL TC <dcsd:DCAdmindcsd:OrgSource="ISO TC 37" dcsd:DocSource="ISO12620:1999" dcsd:subDate=" SEW" dcsd:registryComment="Prepared " dcsd:Status="Accepted"/>
Actualizing a DatCat TMF specific properties
Styling properties dcsd:Style Data Category Style StyleName dcsd:StyleName ElementName dcsd:ElementName AttributeName dcsd:AttributeName TypeValue dcsd:TypeValue Simple Element Attribute TypedElement ValuedElement TVElement Value dcsd:Value For ‘ Simple ’ AnchorInfo dcsd:Anchor AnchorLevel
Attribute style description dcsd:StyleName="Attribute" –Conditions of use: Not valid for annotations –Required properties dcsd:AttributeName –Example: dcsd:AttributeName="id" …
Element style description dcsd:StyleName="Element" –Required properties dcsd:ElementName –Example: dcsd: ElementName ="definition" …
TypedElement style description dcsd:StyleName="TypedElement" –Required properties dcsd:ElementName, dcsd:TypeValue –Example: dcsd:ElementName ="termNote" dcsd:TypeValue="partOfSpeech" N
ValuedElement style description dcsd:StyleName="ValuedElement" –Conditions of use: Not valid for annotations –Required properties dcsd:ElementName –Example: dcsd:ElementName ="pos"
TVElement style description dcsd:StyleName="TVElement" –Conditions of use: Not valid for annotations –Required properties dcsd:ElementName, dcsd:TypeValue –Example: dcsd:ElementName ="free" dcsd:TypeValue="pos"
Simple style description dcsd:StyleName="Simple" –Conditions of use: Express the value of simple data categories –Required properties: dcsd:Value –Example: dcsd:Value ="Nom" Nom
Dealing with languages
Two types of languages 4 Working language The language used at a given place in a document, along the XML hierarchy Representation: xml:lang 4 Object language The language about which you speak at a given place in your terminological entry (e.g. describes the Language Section level) Representation: as a data category "language", with a narrow scope
Example — DXLT Une valeur entre 0 et 1 utilisée... alpha smoothing factor fullForm
Example — GMT en Une valeur entre 0 et 1 utilisée... alpha smoothing factor fullForm
Conclusion –A general model for analysing and representing terminological data collection –An underlying formalism expressed in XML,RDF –Associated tools (Salt project) DCSEditor, DCSBrowser, Automatic generation of XSLT filters and XML schemas from a given TML specification
Useful pointers 4 SALT project – – 4 The TMF site –