1 Metadata Registry Standards: A Key to Information Integration Jim Carpenter Bureau of Labor Statistics MIT Seminar June 3, 1999 Previously presented to DAMA-NCR by Judith Newton, NIST May 11, 1999 see
2 Agenda Specification and Standardization of Data Elements: ISO 11179, Parts 1-6 Metamodel for Management of Shareable Data, ANS X3.285 Specification of Data Value Domains, ISO TR NWI for Content Issues
3 Part1: Framework DIS Part2: Classification DIS Part 3: Basic Attributes IS Part 4: Formulation of Definitions IS Part 5: Naming and Identification IS Part 6: Registration IS Parts of IS = International Standard DIS = Draft IS Status
4 Definitions Fundamental Concepts Other parts Informative Annexes Part 1 - Framework
5 Definition: Data Element A unit of data for which the definition, identification, representation, and permissible values are specified by means of a set of attributes.
6 Data Element Identifier Definition Name Value Domain Etc. Data Element Identifier Definition Name Value Domain Etc. Data Element Identifier Definition Name Value Domain Etc. Database, File, Etc. Record, Segment, Class, Tuple, Etc. Field, Column, Etc. Character, Image, Sound, Etc. Database, File, Etc. Transaction, Exchange Unit, Etc.
7 Fundamental Model Taken From Data Modeling 3 Components –object class –property –representation
8 Definition: Object Class Things for which to Store Data Entities in E-R Models Classes in O-O Models Examples: –Employers– Persons –Automobiles– Orders –….
9 Definition: Property A peculiarity common to all members of an object class. Distinguishes or Describes Objects Attributes or Data Members in Models Examples: –Identifier – Age –Address – Location –...
10 Definition: Representation The combination of –a representation class, –a value domain, –a datatype, –a unit of measure (if necessary) –a character set (if necessary)
11 Data Element Example Data Element Flower Color String:{red | blue} Object Class Representation Property
12 Part 2 - Classification What forms can a classification structure take? Keywords Controlled word lists Terms from models Thesaurus Taxonomy Ontology –Acyclic directed graph, lattice –Multiple inheritance
13 Each node in a classification structure is a taxon (plural: taxa). Given a classification structure, any taxa relating to a data element can be recorded The taxa can be recorded in a separate “classification” attribute With adequate software, users could access and navigate the classification structure A nonintelligent identifier for each taxon helps to deal with change Classification - Fundamental Notions
14 Part 2 Status ISO –Draft International Standard Continuing R&D –Search engines –Middleware - agents, mediators, request brokers –XML tags New ISO project: terminology management in metadata registries
15 Part 3 - Basic Attributes “Basic attributes” of data elements independent of their usage in application systems, data bases, data interchange messages. Recognizes need for additional attributes. No logical or physical structure of the data implied.
16 Categories of Basic Attributes Identification of a data element Definition of a data element Relations among data elements Representation of data element values Administrative: management and control
17 Example Data Element
18 Summary Part 3 is a good start to establishing an unambiguous set of specifics documenting data elements. However, –Further work on the other ISO parts and beyond has resulted in many refinements and advances addressing a variety of data- related concepts. –A new work item involves replacing Part 3 with ANSI X3.285.
19 Part 4 - Data Definitions A data definition shall: –Be unique (within a data dictionary) –Be stated in the singular –State what the concept is, rather than what it is not –Be stated as a descriptive phrase or sentence(s) –Contain only commonly understood abbreviations –Be expressed without embedding definitions of other data elements or underlying concepts
20 Data Definition Guidelines A data definition should: State the essential meaning of the concept Be precise and unambiguous Be concise Be able to stand alone Be expressed without embedding rationale, functional usage, domain information or procedural information Avoid circular reasoning Use consistent terminology and structure for related definitions
21 Part 5 - Naming and Identification Five attributes to identify a data element Name Context Registration Authority Identifier Data Identifier Version Identifier Always paired International Registration Data Identifier
22 Principles for Registration of Data Each data element has a unique identifier within the register of a Registration Authority. A data element is uniquely identified by Registration authority identifier Data identifier Version identifier To be assigned an identifier, the element must be derived, attributed, defined, named, and registered according to ISO/IEC A data element shall have at least one name within a context. Combined
23 Naming Data Elements Naming principles are described in general terms with examples furnished. Rules are derived from the principles by which standard names are developed. These rules form a naming convention. Because syntax, semantic and lexical rules vary by organization, such as corporations or standards-setting bodies for business areas, no specific naming convention rules are prescribed in the International Standard. The naming principles described in the standard can be applied to other entities, such as attributes and objects.
24 Rule Types Data element names are formed of components. The components are: –object class terms –property terms –representation terms –qualifier terms. Each is assigned meaning (semantics) and relative or absolute position (syntax) within a name. They are subject to lexical rules. Counterparts in the X3.285 metamodel
25 Naming Component Example OBJECT CLASS TERM: Country REPRESENTATION TERM: Name PROPERTY TERM: Identifier QUALIFIER TERMS: Trading partner NAME: Trading partner country name
26 Part 6 - Registration Non exclusive registration: Every organization may be a Registration Authority. Data sharing registration: Data may be shared intra- or inter-organizationally. Economically enforced registration: Utility determines longevity and usefulness. Flexible Registration: Meta data may be registered at different levels of quality. Meta Data Registration Principles
27 Recorded Certified Standardized Retired Incomplete Registration Status
28 X Metamodel Promote sharing of metadata for –understanding (meaning, representation, identification) –discovery –harmonization –reuse –analysis Provide a common base for metadata registries –management structure –components for interchange
29 Metamodel Regions Stewardship Naming & Identification Classification Data Element Administration Conceptual & Value Domain Administration Data Element Concept Administration
30 DATA ELEMENT Data Element Concept Object Class Property Conceptual Domain Value Meaning Permissible Values Data Value Domain Representation Class Data Element Representation Data Element Model
31 Data Element Model
32 Permissible Value permissible value label permissible value begin date permissible value end date Representation ClassData Element Concept Value Meaning value meaning identifier (VMID) value meaning descriptor value meaning begin date value meaning end date 11 +represents 1 +means 1 Data Value Domain value domain name value domain character set name value domain minimum character quantity value domain maximum character quantity value domain dependency description value domain format 2..n 1 +contained in 2..n +contains 1 enumerated value domain 0..* 1 Conceptual Domain conceptual domain identifier 1 0..* 1 2..n 1 +contained in 2..n +contains 1 enumerated conceptual domain 1..*1 1 Representation class name
33 Future Extensions & Work Promotion of X3.285 to an ISO standard Completion of TR Data Value Domains XML Tags Content consistency Extended classification/terminology support Object extensions
34 DTR Specification of Data Value Domains A set of permissible values. Types –Enumerated Countries of the world –Non-Enumerated All Real Numbers Between 0 & 1 17 Char Alpha-Num YYYYMMDD DTR = Draft Technical Report Definition: Value Domain
35 Value Domain Examples Geographic Codes Chemical Names Biological Classification
36 The Problem The sharing and reuse of data through equivalent data values will allow information to be exchanged faster and more efficiently. Sets of reusable domain values, with unique identifiers assigned, eliminate the need for exact representation matches. How can data values be mapped among representations so that the equivalent semantic meaning is determined, even if the language, format or character set of the representations differ? The Benefits
37 Attributes for identification, specification, development and reuse of data value domains for data elements. Assigning a unique identifier to each value within a domain. Defining a data element conceptual domain and describing mappings between the values of a conceptual domain and the values of each representational data value domain. Defining reuse of value domains among data elements. Scope of the TR
38 Permissible Value permissible value label permissible value begin date permissible value end date Representation ClassData Element Concept Value Meaning value meaning identifier (VMID) value meaning descriptor value meaning begin date value meaning end date 11 +represents 1 +means 1 Data Value Domain value domain name value domain character set name value domain minimum character quantity value domain maximum character quantity value domain dependency description value domain format 2..n 1 +contained in 2..n +contains 1 enumerated value domain 0..* 1 Conceptual Domain conceptual domain identifier 1 0..* 1 2..n 1 +contained in 2..n +contains 1 enumerated conceptual domain 1..*1 1 Representation class name
39 Permissible Value permissible value label permissible value begin date permissible value end date Representation ClassData Element Concept Value Meaning value meaning identifier (VMID) value meaning descriptor value meaning begin date value meaning end date 11 +represents 1 +means 1 Data Value Domain value domain name value domain character set name value domain minimum character quantity value domain maximum character quantity value domain dependency description value domain format 2..n 1 +contained in 2..n +contains 1 enumerated value domain 0..* 1 Conceptual Domain conceptual domain identifier 1 0..* 1 2..n 1 +contained in 2..n +contains 1 enumerated conceptual domain 1..*1 1 Conceptual Level: Object class and Property Logical Level: Representation with addition of qualifier, Application Level Representation class name
40
41
42 Conclusion Application of all principles of the ISO family to the development of meta data registries allows easy and effective exchange of data and meta data nationally and internationally.
43