ISOcat tutorial DCR data model and guidelines. Simple and complex DCs Simple Data CategoryComplex Data CategoryConceptual Domain Data CategoryDescription.

Slides:



Advertisements
Similar presentations
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
Advertisements

ISOcat Data Model: Workflow & Guidelines Marc Kemps-Snijders a, Sue Ellen Wright b, Menzo Windhouwer a a Max Planck Institute for Psycholinguistics, b.
ISOcat Data Category Registry Defining widely accepted linguistic concepts Menzo Windhouwer 1CLARIN-NL MD tutorial, September 2009.
Bulk loading ISOcat data categories with the Data Category Interchange Format 10/24/20111CLARIN-NL ISOcat Call 2 followup.
RDF Schemata (with apologies to the W3C, the plural is not ‘schemas’) CSCI 7818 – Web Technologies 14 November 2001 Van Lepthien.
ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
Principles of ISOcat, a Data Category Registry Marc Kemps-Snijders a, Menzo Windhouwer a, Sue Ellen Wright b a Max Planck Institute for.
ISOcat introduction 19 June 20121CLARIN-NL ISOcat workshop.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Data Category specifications 19 June 20121CLARIN-NL 2012 ISOcat tutorial.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 6 Advanced Data Modeling.
Database Systems: Design, Implementation, and Management Tenth Edition
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 6 Advanced Data Modeling.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Geospatial standards Beyond FGDC Geog 458: Map Sources and Errors March 3, 2006.
Creating Web Page Forms
Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
Data Category specifications 20 March 20121CLARIN-NL ISOcat workshop.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
CSC271 Database Systems Lecture # 6. Summary: Previous Lecture  Relational model terminology  Mathematical relations  Database relations  Properties.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model Pearson Education © 2014.
Chapter 4 The Relational Model.
Chapter 3 The Relational Model Transparencies Last Updated: Pebruari 2011 By M. Arief
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
Principles of the GOLD Ontology & Conversion of GOLD to DCIF Presenters: Anthony Aristar, Evelyn Richter.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Profiling Metadata Specifications David Massart, EUN Budapest, Hungary – Nov. 2, 2009.
CLARIN-NL Call 3 ISOcat follow-up 10/10/20121CLARIN-NL ISOcat Call 3 follow-up.
Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Chapter 3 The Relational Model. 2 Chapter 3 - Objectives u Terminology of relational model. u How tables are used to represent data. u Connection between.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
IMDB Registration of Survey Variables Dec 19, 2005.
Tommie Curtis SAIC January 17, 2000 Open Forum on Metadata Registries Santa Fe, NM SDC JE-2023.
CLARIN-NL Call 4 ISOcat follow-up 2/10/20131CLARIN-NL Call 4 ISOcat follow-up.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
Chapter 8 Data Modeling Advanced Concepts Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
New Perspectives on XML, 2nd Edition
ISOcat introduction 20 June 20131CLARIN-NL ISOcat workshop.
ISOcat introduction 20 March 20121CLARIN-NL ISOcat workshop.
ISO/IEC : Framework for a Metadata Registry By Daniel W. Gillman Bureau of Labor Statistics USA.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
CRITERIA FOR STANDARDIZING DATA CATEGORIES The Well-Formed Data Category Specification SUE ELLEN WRIGHT METADATA TDG WEBINAR
The Software Development Process
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
ISOcat status
Objective: To describe the evolution of the Internet and the Web. Explain the need for web standards. Describe universal design. Identify benefits of accessible.
Application Profiles Application profiles -- are schemas which consist of data elements drawn from one or more namespaces, combined together by implementers,
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
The Relational Model © Pearson Education Limited 1995, 2005 Bayu Adhi Tama, M.T.I.
DC Architecture WG meeting Wednesday Seminar Room: 5205 (2nd Floor)
Group work and standardization features in ISOcat Menzo Windhouwer 8/14/20101Standardizing Data Categories in ISOcat - Implementing Group.
ISOcat introduction 10 May /20111CLARIN-NL ISOcat workshop.
Metadata models to support the statistical cycle: IMDB
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
XML QUESTIONS AND ANSWERS
New Perspectives on XML
New Perspectives on XML
Presentation transcript:

ISOcat tutorial DCR data model and guidelines

Simple and complex DCs Simple Data CategoryComplex Data CategoryConceptual Domain Data CategoryDescription Section

Administrative Information Section The mandatory identifier – is a mnemonic string used to refer to the data category – should be based on a meaningful English word or series of words presented as an alphanumeric character string; for multiword strings, begin with lowercase and express the identifier as one continuous string in camel case with no white space (for instance, /term/, /normativeAuthorization/, /preferredTerm/) – maybe used in XML vocabularies and thus must be a valid local part of a qualified name: Cannot start with a number, shouldn’t contain any whitespace, … ISOcat warns you when the identifier is invalid and will refuse to save the data category

Administrative Information Section The mandatory version – used to refine identifier to indicate the version of the data category – managed by the system – based on branching not on major and minor revisions 1:13:0 1:2 1:0 2:0 2:1 2:1:12:1:2 3:1

Administrative Information Section The mandatory data category type – complex/open conceptual domain is not restricted to an enumerated set of values – complex/constrained conceptual domain is non-enumerated, but is restricted to a constraint specified in a schema-specific language or languages – complex/closed conceptual domain is restricted to a set of enumerated simple data categories making up its value domain – simple

Data category types writtenForm string open grammaticalGender string neuter masculine feminine closed simple: string constrained complex:

Administrative Information Section The justification – a short description justifying why the data category should be included in the registry – mandatory for data categories to be standardized; desirable in general – even data categories that are common in a given thematic domain may be unfamiliar or ambiguous to users unfamiliar with that domain

Administrative Information Section The origin – (document, project, discipline or model) for the data category specification

Administrative Information Section The mandatory administration status – a designation of the status in the administration process for handling registration requests under the stewardship of the DCR Board – managed by the system – values: private submission pre-evaluation, evaluation, rejected-TDG, accepted-TDG pre-validation, validation, rejected-DCR Board accepted

Standardization process Submission group Data Category Registry Board Validation Thematic Domain Group Evaluation Stewardship group ISO Publication

Administrative Information Section The mandatory registration status – a designation of the status in the registration life- cycle of an administered item – managed by the system – values: private candidate standard superseded deprecated 1:13:0 1:2 1:0 2:0 2:1 2:1:12:1:2 3:1

Administrative Information Section The effective date – the date a data category specification has/will become available to DCR users The until date – the date a data category specification is no longer effective in the registry; this information is set when the registration status of the data category specification changes to deprecated or superseded ISOcat isn’t acting on these dates (yet)

Administrative Information Section The explanatory comments – descriptive comments about the data category specification The unresolved issues – a problem that remains unresolved regarding proper documentation of the data category specification

Description Section The mandatory profile – attribute used to relate the current data category specification to one or several thematic domains treated by ISO/TC 37 (for example, morphosyntax, syntax, metadata, language description, etc.) – the value of profile defaults to Private – submission for standardization requires the selection of at least one thematic domain profile because it is the relevant TDG that is responsible for maintenance of standardized data category specifications – if multiple profiles from multiple TDGs are selected one TDG will still be responsible, but the other TDGs will be involved in the harmonization process – if a user desires to create a new profile, consult the DCR Board

Profiles and TDGs Each Thematic Domain Group (TDG) manages at least one profile The following TDGs are active: – Metadata – Morphosyntax – Terminology The Language codes profile will be maintained separately and hooked up to ISOcat Other TDGs and their profiles are still asleep ISOcat will host a forum in the near future, which also includes the functionality to send users, i.e., an owner of a DC or the chair of a TDG, an

Data Element Name Sections used to record names for the data category as used in a given database, format or application language independent attributes: – the mandatory data element name one identifier (word, multi-word unit or (alpha)numeric representation – the mandatory source the database, format or application in which the data element name is used ISOcat will future have better support for keeping sources in sync

Working and object languages Working language: – language used to describe objects Object language: – language being described You can describe properties of the object language French in the working language Dutch: In de Franse taal worden vrouwelijke en mannelijk zelfstandige naamwoorden onderscheiden.

Working and object languages Simple Data Category Complex Data Category Data Category Description Section Conceptual Domain Language SectionLinguistic Section ** working languageobject language

Language Section The English language section is mandatory and has to contain at least one name and one definition Additional language sections can be added as needed, and should translate at least the definition of the English language section if a user needs an additional language, contact the DCR system administration

Language Section The Name Section – records a possible name for the data category in a specific language – status: standardized name preferred name admitted name deprecated name superseded name

Language Section The Definition Section – definition of the data category concept associated with the data category, written in the language of the language section – attributes: definition: – definitive formulation that should be general enough to apply to all thematic domains and implementations of the data category source: – from which the definition has been borrowed or adapted note: – any additional information about the definition

ISO 704 Terminology work — principles and methods Intensional definitions: – They should consist of a single sentence fragment; – They should begin with a meaningful broader concept, either immediately above or at a higher level of the data category concept being defined; – They should list critical and delimiting characteristic(s) that distinguish the concept from other related concepts. Actual concept systems, such as are implied here by the reference to broader and related concepts, should be modeled in Relation Registries outside the DCR. Furthermore, different domains and communities of practice may differ in their choice of the immediate broader concept, depending upon any given ontological perspective. Harmonized definitions for shared DCs should attempt to choose generic references insofar as possible.

Language Section The examples – a sample instance reflecting the data category – should be limited to those that illustrate the data category in general, excluding language specific usage, which should be documented in a Linguistic Section – may be accompanied by the source of the example The explanations – any additional information about the data category that would not be relevant for a definition (for example, more precise linguistic background concerning the use of the data category) – may be accompanied by the source from which the explanation has been borrowed or adapted The notes – any additional information associated with the data category, excluding technical information that would normally be described in an explanation

Conceptual domains Complex Data Category Conceptual Domain Linguistic Section Simple Data Category Schema Specific Domain Open Conceptual Domain Value Domain + + Constrained Data Category Open Data Category Closed Data Category * Constrained Linguistic Section Closed Linguistic Section * * * * *

Conceptual domains The mandatory data type – the data type, as defined for W3C XML Schema, of this complex data category – the default data type is string Just the data type is enough for an open data category

W3X XML Schema data types

Constrained Conceptual Domain The mandatory constraint: – allows users to express constraints on the possible values of a conceptual domain associated with a given data type in a rule language suitable for the schema in question – rule languages currently ‘supported’: Schematron Object Constraint Language Semantic Web Rule Language Relax NG – the DCR doesn’t do any interpretation of the rules, this has to be done by the user/TDG/DCR Board ISOcat does check if its valid XML (when applicable) ISOcat’s DCIF export is somewhat incomplete in this area: constraints expressed in XML are outputted as one text block instead of being an integrated part of the DCIF document

Profile value domains Simple Data CategoryValue Domain Closed Data Category + Profile Value Domain +

set of permissible values for a specific profile each value is represented by a simple data category the simple data category needs to be a member of the profile ISOcat makes an exception for the Private profile value domain which can contain any simple data category Standardized data categories can’t have a Private profile value domain ISOcat tries to simplify the creation of the value domain by automating the assignment to profile value domains. However, at the moment this is too simplistic and a more advanced editor is needed to fully manage profile value domains.

Example Data categoryMorposyntaxTerminology /partOfSpeech/XX /adjective/XX /ordinalAdjective/X /participleAdjective/X /qualifierAdjective/X /adposition/XX /circumposition/X /preposition/X /postposition/X

Linguistic Sections used to specify the behaviour of a complex data category in a specific object language Language specific examples, explanations and notes Refinement of the conceptual domain: – (additional) constraints for open and constrained complex data categories – subset value domains for closed complex data categories

Example /grammaticalGender/ has as (profile specific) value domain: – /masculine/ – /feminine/ – /neuter/ The French linguistic section limits it to: – /masculine/ – /feminine/

Hierarchical Simple Data Categories Simple data categories can be put in a subsumption (is-a) hierarchy – allows different levels of granularity in a value domain – make large value domains manageable – a simple data category can be only a member of one hierarchy, i.e., it can have only one parent

Changes Each time you save a data category you’re asked to enter a description of what you’ve changed These descriptions are available in the history log of each data category The DCIF contains the first (upon time of creation) and the last change description

Checker Not all mandatory parts of the specification has to be filled at once The check will tell you what is still missing Some errors or warnings can only be fixed by issues change requests to a standardized data category – Membership of another profile