Presentation on theme: "Digital Object Identifier workshop doi> Norman Paskin The International DOI Foundation."— Presentation transcript:
Digital Object Identifier workshop doi> Norman Paskin The International DOI Foundation
Background: why DOI What the DOI system consists of What DOI does DOI - outline of talk
Identifiers enable us to manage content Physical world: ISBN, ISSN, ISMN, SICI, etc good systems for publishers Digital world: ? URL? poor systems for publishers (e.g. E Books) how to use existing identifier systems? Make WWW transactions as invisible as telephone transactions –machine to machine, –not machine to people to machine Background - why now?
Digital world enables both use and protection Aim is to maximise value of information objects: - reduce copy infringement and - increase accessibility; - need to identify what it is you are managing Mass production mass customisation - components must be clearly identifiable - and terms defined The intellectual property background
International DOI Foundation: founded 1998 –following demonstration of prototype in 1997 Not-for-profit; paid membership support –similar principles to World Wide Web Consortium Open to all interested parties Democratic: board elected from members Full time staff (Director) 40+ organisations (growing) –Content owners (text publishers, music, etc ) –Technology companies –Content intermediaries (etc) DOI - organisation
Establish a way of identifying content in the digital environment –actionable identifier Which can be the basis of rights management –extensible; can be developed further DOI: aim
Identification of content - intellectual property in any form - precisely Actionable identification - automation; click to do something - services Interoperability, extensibility Open standard DOI requirements
Must be consistent Must be extensible: technology: changes –e.g. PC netC P2P …?; E-books; WAP multimedia: needed –e.g. music clip and image in E-Book with web update (media convergence) applications: cannot be known in advance Key issues:
Activity tracking Activity tracking Full implementation Full implementation Initial implementation Initial implementation Single redirection (persistent identifier ) Metadata W3C, WIPO, NISO, ISO, UDDI etc. Multiple resolution A continuing development activity DOI: development in three tracks
DOI: components An analogy: the telephone system
A number (or name) –assign a number to something –(compare: telephone number) DOI: components
A number (or name) –assign a number to something –(compare: telephone number) A description –what the number is assigned to –(compare: directory entry) DOI: components
A number (or name) –assign a number to something –(compare: telephone number) A description –what the number is assigned to –(compare: directory entry) An action –make the number do something –(compare: the telephone system) DOI: components
A number (or name) –assign a number to something –(compare: telephone number) A description –what the number is assigned to –(compare: directory entry) An action –make the number do something –(compare: the telephone system) Policies –how to get a phone number; billing –(compare: social structures) DOI: components
Deployment POLICIES Syntax /5678 NUMBERING DESCRIPTION Metadata Pieces of data which describe uniquely that which is identified Resolution System able to link the number to something useful ACTION
POLICIES Any form of identifier NUMBERING DESCRIPTION framework: DOI can describe any form of intellectual property, at any level of granularity ACTION Handle resolution allows a DOI to link to any and multiple pieces of current data doi> extensible
DOI syntax: how the number is made up - NISO standard (Z39.84) / = prefix (e.g. publisher, journal, etc) = suffix (combination is unique) Suffix can be anything (CrossRef example) An opaque string (a dumb number) –parts do not have separate meaning Permanent –stays the same if ownership or location changes 1. Numbering
What is numbered? Not as simple as you might think: 1. Not only digital files, but physical things and intangible things. 2. Not only things, but parts of things. 2. Description
Manuscript mss #ABC123 Not only digital things... paper journal/volume/page ISBN, ISSN, etc.
Components Book –Chapter Section –Figure Not only things, but parts of things
Components Book –Chapter Section –Figure Granularity Not only things, but parts of things
Components Book –Chapter Section –Figure Granularity Must be able to identify at whatever level is appropriate : functional granularity Not only things, but parts of things
Metadata is: Data Relationships between data - Book: ISBN (data) - Price: $12.95 (metadata) - Subject: Buenos Aires (metadata) One mans metadata is another mans data Description is by metadata
Not sufficient to assign an identifier without specifying precisely what the entity is – a paper or a book is not precise – must be precise, because: In an automated world, that specification must be by metadata (able to be used by machines) In an interoperable world, that metadata must be –unambiguous (well-formed) –follow a data model (able to be used consistently by machines) Description is by metadata
Interoperability of data in e-commerce systems Broad in scope: generic intellectual property management –description, transaction, rights Based on tested real world models –CIS (music industry); IFLA (library cataloguing) Wide endorsement of this approach –see recent papers Lagoze, Caplan (links at Now in use in applications –note especially EPICS/ONIX dictionary Extensible, structured, open standard DOI used indecs framework
A few (7-8) key pieces of data –title, type of content, origin, etc –varies according to what is needed (video, book, etc) about the object –does not include rights metadata but interoperates with rights data –because based on same data model –uses the same terms to mean the same thing DOI Genre defines key metadata for a family –see DOI Handbook DOI kernel metadata
Web Browser User etc. Actionable identifier Specified Action doi> / Actions
I have found what I want to link to, but: –I have a copy locally; or –I use an aggregator; or –The publisher provides alternative sources; or –I am linked to an authorised E-print archive; or –It is available in a public archive (etc) so I want to go to the appropriate copy –rights issues (access control) are implicit Example issue: getting the appropriate copy
Open Standard using internet Distributed, scalable, fast and reliable In use now in several places (e.g. Lib. of Congress) Very simple concept, powerful applications Fits with other standards (URL, URN, etc) Associates a name with values (e.g. URL) –input DOI –output URL (or some other defined value) Work by CNRI (Robert Kahn) DOI uses Handle System ®
Global Handle System Web Browser Local Client DOI? URL abc abc.doc
3 Handle dataDOIData type Index /456 URL 3 URL 2 9 URL 5 9 MD EM 9 9 IP /789 4 Background: DOIs resolve to Typed Data DOIHandle data
3 DOIData type Index /456 URL 3 URL 2 9 URL 5 9 MD EM 9 9 IP /789 4 DOIs resolve to Typed Data Multiple typed values per DOI
3 Handle dataDOIData type Index /456 URL 3 URL 2 9 URL 5 9 MD EM 9 9 IP /789 4 DOIs resolve to Typed Data Extensible typing
3 Handle dataDOIData type Index /456 URL 3 URL 2 9 URL 5 9 MD EM 9 9 IP /789 4 DOIs resolve to Typed Data Query by type
etc. For convenience we re-draw like this: URL URL2 RAP XYZ doi> /123 INPUTOUTPUT
DOI free to use –costs paid by assigner DOI applies to any Intellectual Property entity –copyright focus (Berne/WCT etc) Registration agencies to deal with assigning DOIs (and metadata/resolution) for publishers etc Business models determined by agencies Policies for agencies are now evolving 4. Policies
Digital Object Identifier A unique persistentidentifier…. - of a piece of intellectual property - in any form (tangible, intangible) - defined by some key metadata - an opaque string e.g. DOI: /123 What is DOI?
resolvable.. - routing, via proven internet technology, to associated state data…. - one or more current values of specified types of data (e.g. URL); - these data may be, or link to, services What is DOI?
in an information management substrate… - once the (meta)data has been obtained, it can interoperate with other data - e.g. about context (subscription etc) - to construct services and transactions - because (meta)data follows a generic interoperable architecture What is DOI?
A unique resolvable identifier and multiple pieces of associated state data in an information management substrate achieved by: Technical implementation + policies Two underlying technical tools: 1. intellectual property: framework 2. resolution: Handle System What is DOI?
1. Identify the item of intellectual property not its location, because: if the location changes the identifier should stay the same (persistence) the same resource can be at several locations at the same time (multiple copies) DOI does this What are the advantages of DOI?
2. Able to deal with relationships: –this item is a manifestation of that work –this item is a part of that item DOI does this: Metadata can express relationships –is part of… etc DOIs can resolve to other DOIs What are the advantages of DOI?
3. Apply to any intellectual property entity –any format (digital convergence) –any granularity (any part of something) 4. Enable complex actions –can express relationships between entities –interact with data from other sources –enables services (automated, predictable) to be constructed What are the advantages of DOI?
5. Extensible resolution system has capability for trusted transactions (p.k.i.) metadata framework has capability for full rights management architecture 6. Not limited to current environments not just the Web (other Internet applications) not just digital (intangibles etc) What are the advantages of DOI?
Web Browser User URL 404 not found 1. URL is not a persistent identifier - it refers to Location, not content URL ? 2. Same content at two different URLs has two different identifiers - cannot use as common reference...has moved to… One in five Web links >1yr old may be out of date (Alta Vista) Identifiers on the web
Web Browser User URL 1. Dont change the URL; persistence is a social, not a technology, problem People do change URLs There are good reasons to change URLs Does not deal with multiple copies Identifiers on the web
URL Web Browser User URL 2. Assign a Name and use http redirect name http Bookmarks and caches save the end point, not the name (in current browsers) does not deal with multiple copies Identifiers on the web
URL Web Browser User 3. Assign a Name and use resolver doi> DOI provides name URL Multiple resolution Identifiers on the web
Web Browser User URL Resolution 1. DOI is a persistent identifier DOI initial implementation 2. DOI identifies the content, irrespective of the location doi> /123
Web Browser User etc. URL URL2 Data 1 Data 2 Actionable identifier Multiple Resolution Full DOI implementation Identifier resolves to any piece of data doi> /123
Web Browser User etc. URL URL2 Data 1 Data 2 Actionable identifier Resolution service Specified Action doi> /123 Service /123
Digital Object Identifier workshop doi> Norman Paskin The International DOI Foundation