Presentation on theme: "Metadata issues and DOI doi>. overview of presentation... Background Three conclusions The metadata landscape: which schemes matter most to DOI? DOI metadata."— Presentation transcript:
Metadata issues and DOI doi>
overview of presentation... Background Three conclusions The metadata landscape: which schemes matter most to DOI? DOI metadata - practical implications DOI applications: sets of metadata for a use DOI Kernel Handle and metadata Conclusion Metadata issues and DOI
Definitions of metadata popular... Metadata is data about data. Everyone logical... An item of metadata is a relationship that someone claims exists between two entities*. framework functional... Metadata is the life-blood of e-commerce. John Erickson (HP) *entity = something which has identity
#1: All metadata is just a view e.g. Views of a person: some (generic) ways in which you might be identified in metadata schemes... Son Legal person Agent Alien Scholar Library user Composer credit card holder Shoe purchaser Author Lottery entrant Hospital patient Citizen Car driver Rights owner Marathon runner Software licensee Parent Tax payer Club member e-consumer Back account holder Husband Charity giver Hotel guest Speeding ticket recipient DisneyWorld visitor Frequent Flyer Concert-goer Passenger Employee Voter Dog owner In each of these roles you will have different IDs and attributes. Three conclusions
#1: All metadata is just a view Creations are the same. An identifier for a published article may refer to... A manuscript The abstract work A draft A (class of) physical copy in a publication A (class of) digital copy (not in a publication) A (class of) digital copy in a publication A (class of) digital format A specific digital copy A (class of) paper copy A specific paper copy An edition A reprint A translation etc…and many combinations of the above Similar views apply to other types of creations. Three conclusions
#1: All metadata is just a view Views must not be confused for digital content and rights management. Mistaken identity can be catastrophic. Increasingly, views need to be interoperable (e.g. production workflow, rights, marketing within one business; supply chain transfer; etc.). The need for automated, interoperable views in d- commerce will be enormous. Three conclusions
#2: (Almost) all terms need identifiers Each of the values of a view must be defined and identified if other views are to recognize them (what do you mean by an abstract work? an edition? a format? a scholar? a name?) So views need comprehensive controlled vocabularies (nb our reliance on ISO language, territory, currency, time codes). Automation needs disambiguity. Terms of rights must be unambiguous. Anything may be a term of an agreement. Emergence of the value of structured ontologies for commerce (like the indecs model). Three conclusions
#3: Events are the key to interoperability Most metadata is thing or people based. static views e.g. a creation In the net future, metadata interoperability will be achieved by describing events; relating things and people dynamic views e.g. A created B Event descriptions will also be the key to rights metadata (transactions are events) Three conclusions
These conclusions are being reached increasingly often elsewhere. There is an explosion of metadata activity: Models, Identifiers, Vocabularies, Dictionaries, Ontologies. XML/RDF schemas. Registries/Repositories/crosswalks. Technical standards. The metadata landscape
The metadata landscape for creations Books Audio Audiovisual Libraries Copyright Journals Magazines Newspapers Standards Education MARC CAE ISBN ISSN Music Texts EAN Technology Archives Museums UPC ISO codes 1980s
The metadata landscape for creations Books Audio Audiovisual Multimedia Libraries Copyright Journals Magazines Newspapers Standards Education MARC ISRC CAE ISBN ISSN ISAN Music ISMN CIS Texts Dublin Core EAN Technology DOI IIM Archives Museums ISWC FRBR UPC url urn Handle ISO codes mid 90s IMS
The metadata landscape for creations Books Audio Audiovisual Multimedia Libraries Copyright Journals Magazines Newspapers Standards Education MARC ISRC CAE ISBN ISSN ISAN Music ISMN CIS UMID TextsISTC Dublin Core SMPTE DMCS EPICS ONIX EAN IMS LOM abc MPEG7 MPEG21 ISO11179 RDF Technology XML schema DOI IPDA PRISM eBooks EBooks IIM NITF Archives Museums CIDOC CROSSREF ISWC P/META XrML FRBR UPC urluri urn Handle BICI SICIISO codes today
Convergence All serious schemes are becoming... Granular (parts and versions) Modular (creations within creations) Multimedia Multinational Multilingual Multipurpose EPICS/ONIX (text) SMPTE (audiovisual) SDMI/DCMS (audio/music) eBooks DOI genres CIDOC (museums/archives) FRBR (libraries) Dublin Core CIS (copyright societies) PRISM (magazines) NITF (newspapers) MPEG21 (multimedia) Result: major sector schemes are now trying to define metadata with broadly the same scope, only different emphases.
Which initiatives matter most to DOI? MPEG21 SMPTE data dictionary ONIX XrML Criteria... Strong underlying data model Multi-purpose Extensive, structured vocabulary Commercial critical mass Outward-looking
MPEG21 Began 2000 (ISO Motion Picture Expert Group). Possible umbrella for digital multimedia standards. Place to bring technology and content standards together. MPEG track record of disciplined standards development. Most major players getting involved. Not many lawyers (yet). Short-term perception problem: MPEG is audiovisual. Is the challenge too great?
SMPTE Data Dictionary/UMID Began 1998 (Society of Motion Picture and Television Engineers). Well-structured multimedia technically-oriented data dictionary. ISO metadata registry based, good governance and update procedure. SMPTE track record of disciplined standards development. UMID (Unique Media Identifier) for digital material - complementary to editorial identifiers like DOI. Guaranteed implementation in home sector. Start point for MPEG7 metadata work.
EPICS & ONIX International EDItEUR (EPICS) and AAP (ONIX) convergence (May 2000). Substantial and extensible EPICS metadata dictionary, -model based, from which ONIX XML-tagged subset(s) are taken. Commerce-driven (Amazon etc) with transatlantic industry support and International Steering Group. Likely to be used by eBooks, ISTC. ONIX for video (Amazon initiative)? ONIX for audio? Best chance of e-commerce multimedia vocabulary and schema (and maybe d-commerce?).
XrML and Rights metadata DRM (Digital Rights Management) systems at present are for unitary rights: doesnt deal with modularity. Holdup 1: Rights vocabularies need descriptive vocabularies - not yet ready. Holdup 2: Events model needed to integrate descriptions and rights - event-based tools not yet developed. XrML likely focal point for next stage before more mature interoperable developments start to emerge. DOI-R? Interested partners in a prototype?
Standard controlled vocabularies Existing… Territories, Language, Currency, Date/Time (ISO) Measures (Unified Code for Units of Measure) Needed… Creation types Derivation types (adaptation, sample, compilation…) Contributor roles (author, translator, cameraman…) Title types (abbreviated, inverted, formal... etc) Media types (formats) Name types Identifier types Encoding types Tools/instruments User roles etc...and many identifiers need establishing or creating (Parties, Agreements, ISWC, ISTC, ISAN, UMID etc)
DOI metadata - practical implications DOI Application Profiles and User Communities (was Genres) DOI Kernel Handle and metadata Conclusion Metadata issues and DOI
DOI Application Profile A DOI Application Profile is a DOI view: mechanism for unity in diversity. Based on any interest groups view of a type of creation (a DOI User Community). Functional granularity: create a genre when you need it. DOI-APs can overlap: creations can be in multiple DOI- APs. DOI-AP has metadata kernel, Registration Agency, Governance /Development Group Base Set for new, unplaced DOIs. Zero Set = initial implementation DOIs (just a single URL redirection; zero additional metadata).
Activity tracking Activity tracking Full implementation Full implementation Initial implementation Initial implementation Single redirection (persistent identifier ) Metadata W3C, WIPO, NISO, ISO, UDDI, etc Multiple resolution
Defined App Profiles Defined App Profiles Zero App Profile Single redirection (persistent identifier ) Metadata W3C, WIPO, NISO, ISO, UDDI, etc Multiple resolution
Each DOI-AP starts from Base kernel (8 elements) and may add whatever else it needs: defined by the DOI User Community. A kernel extension model is being developed DOI metadata vocabulary to be developed - in tandem with EPICS/ONIX? Can/should coincide with or provide sector requirements (eg ISBN, ISRC, ISWC etc). Different DOI-APs metadata will interoperate if vocabularies are developed within indecs-based model. DOI Kernel
DOI /ISBN DOI GenreBook Identifier ISBN Title Two for the dough Type Manifestation Mode Visual Primary AgentJanet Evanovich Agent Role Author DOI Kernel Contains critical minimum metadata for basic recognition (but not complete disambiguation). Standard base vocabulary (eg manifestation, version) mean all DOI applications can expect base genre metadata. DOI -AP entity (e.g. book) must be analysable in terms of other attributes (e.g. media, mode, content, subject).
DOI Kernel Extensions IDF to develop an extended catalogue for all extended metadata requirements from indecs-based models and vocabulary, along these lines... DOI UASet Identifier(s) Title(s) + Types, Languages Primary type Mode Media Encoding Form(s) Subject(s) Content Language + Use Type Measures + Units of Measure Content Creations Content Link Sequence, Measure Related Creations + Link Type Creation Event + Type Primary Agent + Agent Role + Tool Source Creation Date(s) Location(s) Availability Event + Type Agent + Agent Role Date(s) Location(s) Price + Type
DOI Kernel as the basis of each app. profile Each Profile can be thought of as built from the kernel + extensions: DOI AP metadata for application Compulsory kernel for any DOI
Each DOI-AP can be thought of as built from the kernel + extensions…...But the kernel is actually what several APs have in common (compare the different views of a person) : Son Legal person Agent Alien Scholar Library user Composer credit card holder Shoe purchaser Author Lottery entrant Hospital patient Citizen Car driver Rights owner Marathon runner Software licensee Parent Tax payer Club member e-consumer Back account holder Husband Charity giver Hotel guest Speeding ticket recipient DisneyWorld visitor Frequent Flyer Concert-goer Passenger Employee Voter Dog owner DOI Kernel as the basis of each application set
This kernel cannot be logically defined from first principles In the absence of existing Application Profiles to define this overlap = kernel, we have made a reasonable estimate from the logical analysis of DOI Kernel as the basis of each Application
DOI AP 1 metadata for AP DOI AP 2 DOI AP 3 kernel for any DOI DOI-APs: all metadata in well-formed structure
Primary agent Agent role Creating events DOI> Relations Attributes quantitiesmode Creation identified by DOI names (titles) identifiers labelsquantities situations types language continuity measures qualities Content creations Related Creations events IP Rights statement agent time place tool Source IP entity infixion numbercurrency format genre audience origination IP type Using events analysis and DOI = kernel agent time place price DOI-AP
Metadata declarations WHAT: Base kernel metadata must be declared. DOI-AP-specific metadata is a matter for the DOI User Community (Governance Group/Registration Agency) to decide. HOW: Either local webpage or central repository or both (as decided by User Community rules). Automated access to metadata declaration via Handle data types? XML schemas.
Roles of declared metadata = Functional specification of the DOI kernel (a) to assign a unique DOI to the creation [DOI] (b) to link the DOI to the principal local identifier of a creation (if any) to enable the integration of DOI-related applications and metadata with others [Identifier] (c) to enable a searcher or application to identify the creation by its most common name and the parties(s) responsible for its creation or publication [Title, Primary Agent, Agent Role]
Roles of declared metadata (continued) (d) to enable a searcher or application to distinguish the fundamental type of creation (abstract, physical, digital or spatio-temporal), and thereby also to distinguish between creations of different types with the same names and creators. [Type] (e) to enable a searcher or application or distinguish the mode of the creation (visual, audio, etc.) [Mode] (f) to enable a searcher or application to determine to which DOI user/application set the creation belongs [DOI-AP].
Handle and metadata Handle data types could create a way of processing metadata as a distributed database of services: e.g. Data types (and results) must be consistent, so the Handle data type vocabulary must be developed with great care within indecs-based model. Some data types could be application specific. etc.
Metadata tasks for DOI Mapping ONIX to –reconcile any differences data dictionary –elements and iids tested in depth; for mappings maintaining iid registry –database –available to anyone building application schema, but not need to be public applications based on iid registry –technology tools to ease application set building
The DOI model: future extension Identifier DescriptionAction doi> 1. developing rights management aspects of dictionary.
Identifier DescriptionAction Rights doi> DOI for parties and events in future? The DOI model: future extension Developing rights management aspects of dictionary:
Conclusion: DOI as the Integrator DOI is the most ambitious identifier in the history of the world. (G. Rust 1998) But now several things are becoming established... …it has a persistent, granular, flexible, unique identifier which can be a wrapper for other IDs. Not competitive - enhances legacy identifiers functionality in d- commerce. DOI as the integrating digital identifier?...a strong, established metadata model and vocabulary. …a controlled but flexible development structure. …it does not confuse names with addresses. …allows multiple, standardised automated actions. Nothing else comes close...