Presentation on theme: "The <indecs> Data Dictionary"— Presentation transcript:
1The <indecs> Data Dictionary indecs DDN. PaskinELECTRONIC COMMUNICATION OF LICENCE TERMSThe <indecs> Data DictionaryNorman Paskin, International DOI Foundation(c) IDF 2004
2Influenced by CIS and FRBR: <indecs>: Interoperability of Data in E-Commerce Systems:Focus: generic intellectual property and how to make data about it interoperableEC + groups from the content, author, creator, library, publisher and rights communitiesPioneered a model of event-based metadata as a solution for integrating rights.For “e-commerce” read “automation”Influenced by CIS and FRBR:1995+ : Common Information System “CIS” (CISAC) – music rights1998: Functional Requirements of Bibliographic Records “FRBR” (IFLA) – library cataloguingHas been used and developed further
3doi> Why do we need a “data dictionary”? There’s lots of metadata alreadyWhich should be (re-) usedPeople use different schemesSo we need to map from one scheme to anotherData (identifiers, metadata) assigned in one context or scheme may be encountered, and may be re-used, in another place (or time or scheme) - without consulting the assigner. You can’t assume that your assumptions will be known to someone else.Interoperability = the possibility of use in services outside the direct control of the issuing assignerThis is a prerequisite for communication (of rights terms or anything else)Does “owner” in scheme A mean “owner” in scheme B?We need to map meaningsA prerequisite for extensibility
4doi> What is a “data dictionary”? A set of terms, with their definitionsused in a computerized systemSome data dictionaries are structured, with terms related to other terms through hierarchies and other relationships: structured data dictionaries are derived from ontologies.An ontology combines a data dictionary with a logical data model, providing a consistent and logical world view.An interoperable data dictionary contains terms from multiple computerized systems or metadata schemes, and shows the relationships they have with one another in a formal way.The purpose of an interoperable data dictionary is to support the use together of terms from different systems.Indecs DD is structured (ontology based) and interoperable
7Data Dictionary Metadata scheme Metadata scheme e.g. ONIX e.g. SCORM Term “Author”DataDictionaryMetadata SchemeNormanRightsTerm “Writer”ONIX:Author = NormanRights:Writer
8doi> Metadata interoperability: semantic problems But such mappings are not simple:Different names (and languages) for the same thing (journal_article vs SerialArticleWork)Same name for different things (title, Title)Data elements at different levels of speciality (title vs FullTitle, AlternativeTitle).Different allowed values for elements (pii vs not pii)Data at different levels of granularity (journal_article vs SerialArticleWork/SerialArticleVersion).Data in different structures (article as attribute of journal or vice versa).Data from different sources (local codes vs ONIX codes).Different contextual meaning (DOI of what…?)Different representation (1 title vs n titles).Different mandatory requirements (ISSN mandatory vs optional)Schemas are being updated all the time etc.Requires a coherent structured approach.
9doi> So how do we make sense of this? Data dictionary uses an “ontology”“An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them”Because relationships can be complex
10doi> The dictionary model The methodology is the <indecs> one (as developed in more detail for the MPEG RDD)This has also been developed further (OntologyX)It uses the “context model” – i.e. events based (a common ontology approach)We think of metadata as “thing” or “people” based.static views e.g. about “creation B”But then how do we link things, e.g. to describe rights activities?By describing “events”; relating things and peopledynamic views e.g. “A created B”Events description is also the key to rights metadataall rights transactions are events
17Building views of “metadata”… Q: “This isn’t how I think of my metadata! ”..”it’s just a series of “things about” something. How does this more complex approach fit what I have?A: This is simply a deeper view for the purposes of analysis..You don’t need to change your own approach.The “events” view builds from the simple “things about” view:
18Building views of “metadata”… entityattribute1. attribute view – simplest, most direct: “things about…”isbn “ ”Author S Pinker(values may be strings, IDs etc)
19Building views of “metadata”… relationshipentity2. association or relationship view – richer, more indirect:book “ ” hasTitle “Words & Rules”treats attributes as defined entitiesand others e.g.book “ ” hasAuthor “Stephen Pinker”allows multiple occurrences
20Building views of “metadata”… agentcontextresourcetimeplace3. context view – richest, most indirectpublishingEvent hasAgentType publisher “Weidenfeld”publishingEvent hasResourceType book “ ”publishingEvent hasTimeType dateOfPublication “2002”publishingEvent hasPlaceType placeOfPublication “UK”Analysis moves from attribution to attribution process (Event)Most efficient handling of complex multiple metadatae.g. a rights catalogue (“all rights transactions are events”)Allows analysis of complex relationships and meaning
21An ontology approach uses the deeper view of metadata Three levels of attribution, moving from simple (static)to richer (dynamic events):entityattributeAttribute (static view)relationshipentityRelationshipagentcontextresourcetimeplaceContext (dynamic view)
22TestediDD has a long history and is used in several major activities.Built using methodology from the <indecs> frameworkUsed as the basis for DOI data modelUsed as basis for the MPEG-21 Rights Data Dictionary (RDD)Heavily influenced the current development of messaging systems for the publishing industry (ONIX) and music industry (MI3P).Methodology has been validated against the W3C ontology language OWL-DLMethodology for constructing interoperable Data Dictionaries which underlies iDD is in use commercially (Ontologyx).The International DOI Foundation (IDF) and EDItEUR intend to harmonise ONIX and DOI metadata through the use of this common data dictionaryand welcome collaboration with others adopting a similar approach
23Neutral as to business model The semantic analysis underlying the iDD is independent of any implementation model.It was fundamental to indecs (despite “e-commerce” in its name) that it had no inherent commercial model, and it remains so for all the work that has followed it.It is just as critical to be able to say "this is not subject to copyright" as to say the opposite;any "non-commercial" person or organization has is to be able to state that something is freely available and under what circumstances.A broad ontology, supporting rights expressions, must be able to support any kind of expression of any kind of right, agreement or licence or any terms (or none).Most organizations have the need for both freedom and protection of intellectual property in different contexts.The iDD is not solely a tool for intellectual property as “commercial property” but is neutral as to the intellectual property regime being used.
24Does not mandate one metadata scheme The aim of the iDD is to facilitate mapping between schemesThe more precise the input, the more precise the outpute.g. a mapping from simple DC to SCORM will of necessity be “lossy”Some uses will set minimum standardse.g. DOI Registration Agencies have rules that must be followed in the DOI application to ensure that the metadata can be mapped into the iDD to declare Application ProfilesAny user is otherwise free to use their own metadata schemes for gathering, storing or disseminating metadata. iDD facilitates input and output to others schemes = semantic interoperability
25If reciprocally agreed, then can be an assured mapping Provides authorityEvery term entered into the iDD carries information on its status as to origin and mapping agreementIf reciprocally agreed, then can be an assured mappingwhich will enable users of the dictionary to interpolate mappings from their own schemes, through iDD, to scheme A and know that this will be considered authoritative by scheme A..Anyone contributing terms to the iDD can specify who is allowed to see or specify their own terms.Some terms will be accessible to all:e.g. ONIX, some kernel DOI terms, and the MPEG21 RDD.
26ConstructionBased on DD methodology and Contextual Ontologyx Architecture tools, terms from various sources (ONIX, RDD, DOI)…But users need not understand the underlying concepts and construction of the iDD.It is no more a requirement to know the details than it is for the designer of a web page to read all the underlying internet protocol RFCs.A fundamental role of the IDF and others with the iDD is to provide assurance to users that the work has been peer-reviewed and tested, and make available tools.Some key features are:Extensible and granular to whatever level of detail is required.Multiple, different, specialized views are available: these include a Rights Model, based on a set of specialized Contexts.Local terms: local (internal) data elements and names can be added into the ontologyExternal terms: incorporates external and standard schemes such ISO territory, currency and language codes, and sector specific external schemes
27UseCurrent use of the dictionary is on a project–by-project basis using technical consultancyAn automated web based look-up system for the Dictionary is under development for IDF use (and potentially others e.g. RDD)Access will be granular: those with authority to access the Dictionary able to view what is appropriateprivate terms are kept confidential.
28iDD and the MPEG Rights Data Dictionary (RDD) ISO MPEG-21 Rights Data Dictionary is another notable data dictionary built on similar principles.The MPEG Rights Data Dictionary provides semantic interoperability for use of rights expression languages and other tools.Derives from work funded by IDF and others using the same methodology as the iDD, so closely related and fully integrated.All terms in the RDD are mapped into the iDD;RDD is one of the authorities specifying terms within iDD.RDD is therefore at present a sub set of iDD.Some future RDD terms might be added to the RDD which are not within iDD; the two Data Dictionaries would then overlap and share some common terms.The MPEG 21 RDD requirements in terms of management and availability for MPEG use are very similar to those of the iDD in relation to DOI implementation.IDF is to be the Registration Authority for the MPEG-21 RDD, and will subcontract management of the RDD and iDD to Ontologyx.
29Data dictionaries 2004 IDF is authority RightsCom (Mi3p etc) IDF + Development of <indecs>Black = whatRed = who2004ISOMPEG21RDDIDF isauthorityOntologyXRightsCom(Mi3p etc)indecsDDIDF +ONIXindecs(2000)EC plusmany others:FrameworkIFPI/RIAA, MPA,IDF, DentsuMMG,Rightscom:methodology for DDCONTECS(2001+)
30The <indecs> Data Dictionary indecs DDN. PaskinELECTRONIC COMMUNICATION OF LICENCE TERMSThe <indecs> Data DictionaryNorman Paskin, International DOI Foundation(c) IDF 2004