New approaches to the catalog T. Hickey Svensk Biblioteksförening 2005 October 28.

Slides:



Advertisements
Similar presentations
xID Web Services (xISBN, xOCLCnum, xISSN) FRBR grouping of editions and formats Tim McCormick Product Manager, Grid Services Xiaoming.
Advertisements

Deconstructing Cataloging A Web Services Approach to Bibliographic Control Thomas Hickey.
XID Web services Xiaoming Liu Senior Software Engineer OCLC.
A centre of expertise in digital information management IMS Digital Repositories Interoperability Andy Powell UKOLN,
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
River Campus Libraries Find Articles A Web Redesign for ENCompass David Lindahl Web Initiatives Manager River Campus Libraries University of Rochester.
River Campus Libraries Find Articles A Web Redesign for ENCompass David Lindahl Web Initiatives Manager River Campus Libraries University of Rochester.
River Campus Libraries Find Articles A Web Redesign for ENCompass David Lindahl Web Initiatives Manager River Campus Libraries University of Rochester.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003.
OCLC Online Computer Library Center A Global OpenURL Resolver Registry Phil Norman OCLC Dlsr4lib Workshop March 23 rd, 2006 Arlington VA.
Libraries Australia Annual Report 2006/2007 Tony Boston Assistant Director-General Resource Sharing National Library of Australia.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Persistent Identifiers Reinhard.
OCLC Online Computer Library Center OCLC Research Eric Childress OCLC Research SHARES Meeting NYU New York, NY
OpenURL: Linking LC’s E-Resources Ardie Bausenbach Automated Planning and Liaison Office Library of Congress November 24, 2003.
Serenate1 Non-standard users: The Library Raf Dekeyser K.U.Leuven.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Access 2008 Using WorldCat Grid Services in Library Applications Roy Tennant Senior Program Officer OCLC Research.
The world’s libraries. Connected. WorldShare platform & Management Services Integrate all of your collections: print, licensed & digital Chris Thewlis.
Only Connect: Better Use of Library, Publisher and End-User Metadata in a Networked World 31 st International Supply Chain Seminar Tuesday 13 th October,
Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance.
Using IESR Ann Apps MIMAS, The University of Manchester, UK.
Linking resources Praha, June 2001 Ole Husby, BIBSYS
Cataloging and Metadata at the University Library.
Connecting to Ensemble: AlgoViz. AlgoViz Community  Sharing educational resources Visualizations for data structure and algorithms  Sharing experience.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
OCLC Research: an update Lorcan Dempsey
OCLC Research OCLC Online Computer Library Center Academic Library Association of Ohio, Technical Services IG 19 May 2006 OHIONET, Columbus, Ohio Web Services.
ALCME: OAI at OCLC Jeffrey A. Young OCLC Online Computer Library Center, Inc.
Organizing Internet Resources OCLC’s Internet Cataloging Project -- funded by the Department of Education -- from October 1, 1994 to March 31, 1996.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
OpenURL Link Resolvers 101
OCLC Research OCLC Online Computer Library Center Research & New Technologies Interest Group 24 October 2005 DeweyBrowser & Curiouser Diane Vizine-Goetz.
7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.
The DNER - a national digital library Andy Powell ZIG Meeting, York October 2001 UKOLN, University of Bath UKOLN is funded by Resource:
Linking electronic documents and standardisation of URL’s What can libraries do to enhance dynamic linking and bring related information within a distance.
OCLC Research: Selected projects Eric Childress Larry Olszewski Presentation for Dpto. Biblioteconomía y Documentación Universidad Carlos III de Madrid.
A Future for the Library Catalogue T. Hickey ACRL/DVC Bryn Mawr 3 November 2006.
DOI’s, Open URL’s and Context Sensitive Linking What Are They and How Can I Make Them Work for My Library Rachel L. Frick Head, Bibliographic Access Services.
Roy Tennant Life After MARC A Metadata Infrastructure for the 21st Century.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
DNER Architecture Andy Powell 6 March 2001 UKOLN, University of Bath UKOLN is funded by Resource: The Council for.
Accessing a national digital library: an architecture for the UK DNER Andy Powell ELAG 2001, Prague 7 June 2001 UKOLN, University of Bath
9/26/2007OCLC Orientation & Services1 What is OCLC?
Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
JISC Information Environment Service Registry (IESR) Ann Apps MIMAS, The University of Manchester, UK.
Serenate1 The librarian’s view Raf Dekeyser K.U.Leuven.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
The JISC Information Environment Service Registry (IESR) Ann Apps Mimas, The University of Manchester, UK.
Digital libraries research IG Cataloging and metadata IG Web services and metadata switch February 2003 Web services and metadata switch February 2003.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Matt Goldner Product & Technology Advocate Mela Kircher Product Manager WorldCat Local Metasearch 13 November 2009.
Taking the Library Back from Google Abe Lederman, President and CTO October 18-20, 2007.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
Networked Information Resources Federated search, link server, e-books.
Web Services Overview Thomas Hickey. 2 What are Web Services? Machine-to-machine communication Run over standard Web protocols –XML syntax, HTTP packaging.
AN ARCHETYPE FOR INFORMATION ORGANIZATION AND CLASSIFICATION OCLC WorldCat.
Resource Discovery Landscape
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Accessing a national digital library: an architecture for the UK DNER
A Future for the Library Catalogue
WorldCat: Broad Web visibility for our collection
JISC Information Environment Service Registry (IESR)
Onboarding Webinar 13 April 2019 Presented by and.
From Local Catalog to World Wide Web
Presentation transcript:

New approaches to the catalog T. Hickey Svensk Biblioteksförening 2005 October 28

OCLC  Founded 1967  Nonprofit membership organization  > 53,000 libraries  96 countries  ~1,000 employees  Cataloging  Interlibrary Loan  Preservation  Dewey Decimal Classification  netLibrary  FirstSearch

OCLC Research  Research for both OCLC services Membership  Metadata management  Knowledge organization  Content management  Interoperability  Systems & interaction design  ~30 employees

What do users want?  The right information – with minimum effort

How to give them what they want  Catch them where they are  Increase our data  Improve our data  Make the data work harder  Interconnect with other systems  Do all this efficiently

What has changed  Computers and telecommunications User expectations Digital materials Remoteness of our users Huge amounts of bandwidth, storage

The competition  Online booksellers Reviews Tables of contents Excerpts Inside-the-book searching  Web search engines Speed Full-text searching Global coverage (of web resources) Good enough  Ourselves Electronic journals

Current projects (my group)  Live search  Registries, PURLs  Dewey browser  Harvesting, electronic theses  VIAF, LAF  SRU/W, OpenURLs, OAI  FRBR, xISBN  Beowulf cluster  Map-reduce  Text searching  Batch loading  Open WorldCat  WorldCat Wiki  Publisher Names  MXG

Other Research Projects  FictionFinder, Curiouser  Schema Transformation  Terminology Services  Digital Preservation  Collection Analysis  Dublin Core  FAST  User Studies  Data mining  Also:

Catch them where they are  Google, Yahoo, etc. Open WorldCat Open URL OAI-PMH  Creation too WCat Wiki Tags?

Open WorldCat

Editions

OpenURL  OpenURL registry Supports version 1.0 Also registry of OpenURL servers Used for WikiD

WorldCat ‘Wiki’  Opening up WorldCat to user annotations Reviews Notes Tables of contents Cover art? Book lists?  Based on WikiD software Full Wiki Many features off for WorldCat Uses OpenURL 1.0 protocol internally Allows collections of pages of arbitrary XML schemas Tools for the creation of simple collections  Doesn’t look like a Wiki

Reviews

Tags?  Folksonomies?  User-generated key words  We’ve been here before Is it different? Is there another direction?

Opening Dewey

More data  Harvesting OAI-PMH ETDs  Batch load 60 million records 3 million new manifestations  Other Cover art Reviews WC

Better data and organization  VIAF  FRBR  Authority files in general LAF Publisher names Genre FAST  Registries PURLs Generalized solution? Get them nearer to creation

FRBR  Work-set algorithm Keys based on author/title Authority files Auxiliary authority files xISBN  Used for xISBN Open WorldCat FirstSearch (coming) Collection analysis (coming) Research

Authority Files  LAF  Publisher names Not normally controlled Looking for variations with ISBN prefixes Also worked with dissertations

VIAF  Merge national-level files  Library of Congress (NACO) and Die Deutsche Bibliothek Bibliographic records analyzed 15% would be erroneous based just on names  Basic matching now completed 435,000 matching names < 1% mismatched  Working on Public interface OAI harvesting Persistent identifiers

Maj

Registries  Show relationships between metadata  Often associated with an identifier  General solution?  Examples Authority files WorldCat PURLs

 Persistent URLs Map one URL to another -> ,000+ PURLs 111 million resolutions  Port to Wiki’D platform?  String of PURL servers? Use OAI-PMH for synchronization Spread responsibility  Generalized solution?

More connectivity  Open URL  RSS feeds  OpenSearch, SRU/W  OAI-PMH

OpenURL  Developed to address the ‘appropriate copy’ problem  Transitioning to OpenURL 1.0  OpenURL resolver Accepts requests specifying Resource Services  Generalized syntax Specifying a resource Services to be performed  Metadata elements specified in registry

SRU  Simplified version of Z39.50 Web based SRW – SOAP SRU – URL  Even simpler? OpenSearch No search syntax Looking for common ground  MXG Metasearch XML Gateway Simplifies metasearcher’s lives

OAI-PMH  Method of harvesting metadata More generally, a way of synchronizing databases  No real restriction to metadata  Becomes a repository protocol Identifiers Timestamps  Layered implementation OAI SRU Pears

Efficient processing  Beowulf cluster  Map reduce  Text searching

Beowulf Cluster  24 nodes 2 processors, 4 gigabytes of RAM, 120 gigabytes disk Gigabit network  Use it for FRBR processing Text indexing Text searching  ~ 30-fold speed up on many tasks 1 year ⇒ 2 weeks 1 week ⇒ 1 day 1 day ⇒ 1 hour 1 hour ⇒ 2 minutes  Extremely cheap processing

Map reduce  Pioneered by Google Petabytes of data on thousands of nodes  Adapted to our cluster Tens of gigabytes of data on dozens of nodes  Simple functional programming paradigm  Allows batch processing across cluster

Text Searching  Spread database across cluster  Two levels of aggregation 3 servers/node 24-way aggregation Aggregators run across cluster  SRU used HTTP based SRW (SOAP) slowed it down  Open source software

Better interfaces  More interactive Live search Dewey Browser  Better connected

Post-coordination of Services  Systems that expose low level services  Higher level coordination of those services  Loosely coupled services  Examples from OCLC Validation service RSS feeds SRU OpenURL, OAI-PMH xISBN DDC Browser built this way Very different interfaces have been built

DDC Browser XML  swe 

Do We Need It?  Just have Google harvest everything Our experience with Google Fielded searching Reliable searching  Possibility of user-supplied metadata  Cost of good metadata  Cost of non-existent metadata

Conclusions  Shift to remote users  Online availability – trend towards centralization  More flexibility in implementations  Patrons are better served  Less emphasis on physical collections

Thank you T. Hickey Swedish Library Association 2005 October 28