OCLC Online Computer Library Center OAI 4 CERN Issues in Managing Persistent Identifiers Stuart Weibel Senior Research Scientist October, 2005.

Slides:



Advertisements
Similar presentations
UKOLN is supported by: (Persistent) Identifiers for Concepts / Terms / Relationships Andy Powell, UKOLN, University of Bath NKOS Special.
Advertisements

Internet – Part I. What is Internet? Internet is a global computer network of inter-connected networks.
Basic Internet Terms Digital Design. Arpanet The first Internet prototype created in 1965 by the Department of Defense.
A REST-ful Web Services Approach to Library Federated Search using SRU Kevin Reiss Rutgers-Newark Law Library CALI 2005 – June 11th.
About the Internet: being Internet savvy!. What is a URL? Uniform Resource Locator (URL) = web address
II. Basic Web Concepts.
T.Sharon-A.Frank 1 Internet Resources Discovery (IRD) Introduction to the Internet/WWW.
Web development  World Wide Web (web) is the Internet system for hypertext linking.  A hypertext document (web page) is an online document. It contains.
Layer 7- Application Layer
Lesson 7 – THE BUSINESS OF NETWORKING. TCP/IP and UDP Other Internet protocols Important Internet protocols OVERVIEW.
SESSION 9 THE INTERNET AND THE NEW INFORMATION NEW INFORMATIONTECHNOLOGYINFRASTRUCTURE.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
Chapter Overview TCP/IP Protocols IP Addressing.
Evolved from ARPANET (Advanced Research Projects Agency of the U.S. Department of Defense) Was the first operational packet-switching network Began.
CS 350 Chapter-6. A brief history of TCP/IP 1983 TCP/IP came to ARPAnet ARPAnet and MILNET dissolved in 1990 BSD UNIX.
Protocols and the TCP/IP Suite Chapter 4. Multilayer communication. A series of layers, each built upon the one below it. The purpose of each layer is.
HTML Comprehensive Concepts and Techniques Intro Project Introduction to HTML.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Persistent Identifiers Reinhard.
CSI315 Lecture 1 WEEK 1. The Internet A world-wide network of millions of computers connected to share information and communication. The interconnected.
Packet Filtering. 2 Objectives Describe packets and packet filtering Explain the approaches to packet filtering Recommend specific filtering rules.
Human-Computer Interface Course 5. ISPs and Internet connection.
Lesson 24. Protocols and the OSI Model. Objectives At the end of this Presentation, you will be able to:
Internet-Based Client Access
Networking Basics TCP/IP TRANSPORT and APPLICATION LAYER Version 3.0 Cisco Regional Networking Academy.
Network Protocols. Why Protocols?  Rules and procedures to govern communication Some for transferring data Some for transferring data Some for route.
Chapter 9.
Introducing the Internet Source: Learning to Use the Internet.
The Internet in Education Objectives Introduction Overview –The World Wide Web –Web Page v. Web Site v. Portal Unique and Compelling Characteristics Navigation.
Chapter 8 The Internet: A Resource for All of Us.
OCLC Online Computer Library Center Erpanet Symposium on Persistent Identifiers PURLs Stuart Weibel Senior Research Scientist June 17, 2004.
1 Version 3.0 Module 11 TCP Application and Transport.
OCLC Online Computer Library Center Erpanet Symposium on Persistent Identifiers A framework for understanding Identifiers and “info” URIs Stuart Weibel.
Distributed Computing COEN 317 DC2: Naming, part 1.
UKOLN is supported by: To name: persistently: ay, there’s the rub Andy Powell, UKOLN, University of Bath DCC Persistent Identifiers.
An Overview of the Internet: The Internet: Then and Now How the Internet Works Major Features of the Internet.
European Endeavor Users Group Meeting Helsinki, Sept Esa-Pekka Keskitalo, System Analyst Helsinki University Library OpenURL 1.0.
Packet Filtering Chapter 4. Learning Objectives Understand packets and packet filtering Understand approaches to packet filtering Set specific filtering.
Slide 3-1 Chapter 3 Terms Electronic Commerce and Internet Technologies Introduction to Information Systems Judith C. Simon.
Communication, Networks, The internet and the Worldwide Web.
Attaching Rights to Content Larry Lannom Corporation for National Research Initiatives Copyright ©
Application Layer Khondaker Abdullah-Al-Mamun Lecturer, CSE Instructor, CNAP AUST.
Internet Research Tips Daniel Fack. Internet Research Tips The internet is a self publishing medium. It must be be analyzed for appropriateness of research.
Chapter 4 Networking and the Internet © 2007 Pearson Addison-Wesley. All rights reserved.
1 Chapter 8 – TCP/IP Fundamentals TCP/IP Protocols IP Addressing.
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
TCP/IP (Transmission Control Protocol / Internet Protocol)
Protocols COM211 Communications and Networks CDA College Olga Pelekanou
CHAPTER 4 PROTOCOLS AND THE TCP/IP SUITE Acknowledgement: The Slides Were Provided By Cory Beard, William Stallings For Their Textbook “Wireless Communication.
© 2003 Prentice Hall, Inc.4-1 Chapter 4 Telecommunications and the Internet Information Systems Today Leonard Jessup & Joseph Valacich.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
Computer Network Architecture Lecture 6: OSI Model Layers Examples 1 20/12/2012.
Wel come –Prepared by: BHAVIN TANK MILAN VEGAD. What is the.
Low-Risk Persistent Identification: the “Entity” (N2T) Resolver 10 October 2006 John Kunze, California Digital Library, University of California.
1 CS 502: Computing Methods for Digital Libraries Guest Lecture William Y. Arms Identifiers: URNs, Handles, PURLs, DOIs and more.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
Linked Data Publishing on the Semantic Web Dr Nicholas Gibbins
Networked Information Resources Federated search, link server, e-books.
Introduction Chapter 1. TCP/IP Reference Model Why Another Model? Although the OSI reference model is universally recognized, the historical and technical.
Instructor Materials Chapter 5 Providing Network Services
Level 2 Diploma Unit 10 Setting up an IT Network
Introduction to Persistent Identifiers
Naming in Distributed Web-based Systems
Protocols and networks in the TCP/IP model initially.
Electronic Resources and the WWW
Chapter 10: Application Layer
Electronic Resources and the WWW
Application layer Lecture 7.
OAI 4 CERN Issues in Managing Persistent Identifiers
Presentation transcript:

OCLC Online Computer Library Center OAI 4 CERN Issues in Managing Persistent Identifiers Stuart Weibel Senior Research Scientist October, 2005

In the digital world… Unambiguous identification of assets in digital systems is key: Physical Digital Conceptual Knowing you have what you think you have Comparing identity (referring to the same thing) Reference linking Managing intellectual property

What do we want from Identifiers? Global uniqueness Authority Reliability Appropriate Functionality (resolution and sometimes other services) Persistence – throughout the life cycle of the information object

The Identifier Layer Cake Identifiers come in many sizes, flavours, and colours… what questions do we ask? The Web: http…TCP/IP…future infrastructure? Functionality Application Policy Social Business

Social Layer The only guarantee of the usefulness and persistence of identifier systems is the commitment of the organizations which assign, manage, and resolve identifiers Who do you trust? Governments? Cultural heritage institutions? Commercial entities? Non-profit consortia? We trust different agencies for different purposes at different times

Business layer Who pays the cost? How, and how much? Who decides (see governance model)? The problem with identifier business models… Those who accrue the value are often not the same as those who bear the costs You probably cant collect revenue for resolution Identifier management generally needs to be subsidiary to other business processes

Policy Layer Who has the right to assign or distribute Identifiers? Who has the right to resolve them or offer serves against them? What are appropriate assets for which identifiers can be assigned, and at what granularity? Can identifiers be recycled? Can ID-Asset bindings be changed? Is there supporting metadata, and if so, is it public, private, or indeterminate? Is there a governance model?

Application Layer What underlying dependencies are assumed? http… tcp/ip…(bar code|RFID) scanners… What is the nature of the systems that support assignment, maintenance, resolution of identifiers? Are servers centralized? federated? peer to peer? How is uniqueness assured?

Functional Layer: Operational characteristics of Identifiers Is it globally unique? (easy) What is the means for matching persistence with the need? Can a given identifier be reassigned? Is it resolvable? To what? How does it behave? What applications recognize it and act on it appropriately? Is the name portion of the identifier opaque, or can it carry semantics? Do humans need to read and transcribe them? Do identifiers need to be matched to the characteristics of the assets they identify?

Technology layer: The Web Some fundamental questions: Must our identifiers be URIs (URLs, really)? Must they be universally actionable? If so, what is the desired action? Is there ever a reason to use a URI other than an http-URI as an identifier?

Pure Identifiers versus pure Locators But locators and identifiers are not the same…or are they? In Web-space, they are close: Not every identifier is a locator, but every locator is an identifier Google-like search makes non-locator identifiers pretty good locators as well Debates about purity of identifiers and locators are ideological and unhelpful.

How we got here In the beginning, there was DNS TimBL begat URLs (within meters of where we stand) Uniform Resource Identifiers URLs (Locators) A variety of schemes, mostly grandfathered from the pre- Web Internet URNs (Names, or identifiers) IRIs (a URI that knows the world has more than one character set… but talk is cheap) URI = SCHEME, HOST, and PATH (the global file system)

URI Schemes (as of ) ftp File Transfer Protocol http Hypertext Transfer Protocol gopher The Gopher Protocol mailto Electronic mail address news USENET news nntp USENET news using NNTP access telnet Reference to interactive sessions wais Wide Area Information prospero Prospero Directory z39.50s Z39.50 z39.50r Z39.50 Retrieval cid content identifier mid message identifier vemmi versatile multimedia Interfaceserviceservice location imap internet message access protocol nfs network file system protocol acap application configuration access protocolrtsp real time streaming protocol tip Transaction Internet Protocol pop Post Office Protocol v3 data dav opaquelocktoken sip session initiation protocol sips secure session intitiaion protocol tel telephone fax modem ldap Lightweight Directory Access Protocol https Hypertext Transfer Protocol Secure soap.beep soap.beeps xmlrpc.beep xmlrpc.beeps xmlrpc.beeps urn Uniform Resource Names go h323 H.323 ipp Internet Printing Protocol tftp Trivial File Transfer Protocol mupdate Mailbox Update (MUPDATE) Protocol pres Presence im Instant Messaging mtqp Message Tracking Query Protocol iris.beep dict dictionary service protocol snmp Simple Network Management Protocol crid TV-Anytime Content Reference Identifier tag Reserved URI Scheme Names: afs Andrew File System global file names tn3270 Interactive 3270 emulation sessions mailserver Access to data available from mail servers

But what can you really count on? HTTP–based URIs (URLs) are what we can count on today Current URI registration procedures are unworkable Scarcity of expertise Techeological: strong ideologies are embedded in the process New URI Scheme registration standards are in the pipeline… will they help or hinder?

Arguments for http-based identifiers Application Ubiquity: every Web application recognizes them. Achieving similar ubiquity for other URI schemes is very difficult Actionable identifiers are good – immediacy is a virtue If the Web is displaced, everyone has the problem of coping; if you invent your own solution, and it is displaced, you are isolated Using Non-ubiquitous identifiers will make it harder to maintain persistence over time by complicating the technical layer, which will compromise the ability to sustain long-term institutional commitments

Internet Space/time continuum Andy Powell - UKOLN time Internet space applications that are distant are less likely to share understanding about identifiers knowledge locked within domains or lost over time or, worse, both my application other application other application

Arguments for NON http-URIs as identifiers Separation of IDENTITY and RESOLUTION is a small but important component of a complete naming architecture, and is poorly accommodated in current Web Architecture URLs make a promise: click-here-for-resolution Sometimes you DONT want resolution, or you want context-dependant action Not always clear what the action should be It is difficult to avoid branding in locators, and branding changes, threatening identifier persistance

Resolution of a conceptual asset can be problematic Conceptual assets should be inherently language independent: Vietnamese War, DDC/22/eng// (English language version of DDC 22) American War, DDC/22/vie// (Vietnamese language version of DDC 22)

Business Models may mitigate in favor of separating identity and resolution Content owners/managers may want to expressly decouple identity and resolution Appropriate Copy Problem (eg, reference linking of scholarly publishing content across subscription agencies Identifiers that embed domain servers (including most http- URIs) are likely to degrade over time due to business consolidations URIs are global file system identifiers, and file systems change Web naming architectures should neither enforce nor prevent any given business model

The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces Internet Draft by Herbert Van de Sompel, Tony Hammond, Eammon Neylon, and Stuart L. Weibel Separate resolution from identity An effort to provide a missing part of the naming architecture of the Web Bridge legacy identifiers and the Web Basis for the naming architecture of Open URLs Is it a (registered) URI scheme?

INFO URIs (continued) Controversy about separating identity and resolution; IETF resistance has been substantial Adoption and use will determine its future – will adopters find it provides sufficient additional value to offset cost of adoption? Early registrants: Open URL LCCN DOI OCLC PubMed OCLC SRW Web Services Genbank Fedora SICI Astrophysics Bibcodes National Library of Australia

What does an info URI look like? info:ddc/22/eng// Info: specifies the info namespace, or scheme Namespace Token (ddc/ in this case) is a registered namespace or brand within the scheme Everything that follows is at the discretion of the namespace authority that manages a given registered namespace, (and conforms to URI encoding standards) No implication of resolution, though clearly services (including resolution) can be expected to emerge if info achieves wide use.

Opaque versus Semantic Identifiers Should identifiers carry semantics? People like semantic identifiers Semantic Drift can be a problem Semantics can compromise persistence Semantics is culturally laden

Varieties of semantics Opaque Nothing can be inferred, including sequence Cannot be reverse-engineered (feature or bug?) See ARCs, California Digital Library (John Kunze) Low-resolution date semantics LCCN Encoded semantics ISBN Country codes… agency codes… checksums… Sequential Semantics OCLC numbers

More Varieties Domain Branding Functional Branding: common behaviors established in the social or policy layers DOIs

Encodings matter the DOI /182 can be encoded as a URI in several ways: doi: /182 urn:doi: /182 Info:doi: /182 Which of these is a registered URI? Which is understood by all Web applications? Which is most useful?

Recommendations and Conclusions Be wary (but not ideological) about semantics in identifiers Deviate from widely-adopted standards at your own risk (and risk to your constituents) There be dragons beyond the safe seas of HTTP Technology will not save us – Institutional Commitment is key