Presentation is loading. Please wait.

Presentation is loading. Please wait.

New approaches to the catalog T. Hickey Svensk Biblioteksförening 2005 October 28.

Similar presentations


Presentation on theme: "New approaches to the catalog T. Hickey Svensk Biblioteksförening 2005 October 28."— Presentation transcript:

1 New approaches to the catalog T. Hickey http://errol.oclc.org/laf/n82-54463.html Svensk Biblioteksförening 2005 October 28

2 OCLC  Founded 1967  Nonprofit membership organization  > 53,000 libraries  96 countries  ~1,000 employees  Cataloging  Interlibrary Loan  Preservation  Dewey Decimal Classification  netLibrary  FirstSearch

3 OCLC Research  Research for both OCLC services Membership  Metadata management  Knowledge organization  Content management  Interoperability  Systems & interaction design  ~30 employees

4 What do users want?  The right information – with minimum effort

5 How to give them what they want  Catch them where they are  Increase our data  Improve our data  Make the data work harder  Interconnect with other systems  Do all this efficiently

6 What has changed  Computers and telecommunications User expectations Digital materials Remoteness of our users Huge amounts of bandwidth, storage

7 The competition  Online booksellers Reviews Tables of contents Excerpts Inside-the-book searching  Web search engines Speed Full-text searching Global coverage (of web resources) Good enough  Ourselves Electronic journals

8 Current projects (my group)  Live search  Registries, PURLs  Dewey browser  Harvesting, electronic theses  VIAF, LAF  SRU/W, OpenURLs, OAI  FRBR, xISBN  Beowulf cluster  Map-reduce  Text searching  Batch loading  Open WorldCat  WorldCat Wiki  Publisher Names  MXG

9 Other Research Projects  FictionFinder, Curiouser  Schema Transformation  Terminology Services  Digital Preservation  Collection Analysis  Dublin Core  FAST  User Studies  Data mining  Also: http://www.oclc.org/research/researchworks/http://www.oclc.org/research/researchworks/

10 Catch them where they are  Google, Yahoo, etc. Open WorldCat Open URL OAI-PMH  Creation too WCat Wiki Tags?

11 Open WorldCat

12 Editions

13 OpenURL  OpenURL registry Supports version 1.0 Also registry of OpenURL servers Used for WikiD

14 WorldCat ‘Wiki’  Opening up WorldCat to user annotations Reviews Notes Tables of contents Cover art? Book lists?  Based on WikiD software Full Wiki Many features off for WorldCat Uses OpenURL 1.0 protocol internally Allows collections of pages of arbitrary XML schemas Tools for the creation of simple collections  Doesn’t look like a Wiki

15 Reviews

16 Tags?  Folksonomies?  User-generated key words  We’ve been here before Is it different? Is there another direction?

17 Opening Dewey

18

19 More data  Harvesting OAI-PMH ETDs  Batch load 60 million records 3 million new manifestations  Other Cover art Reviews WC

20 Better data and organization  VIAF  FRBR  Authority files in general LAF Publisher names Genre FAST  Registries PURLs Generalized solution? Get them nearer to creation

21 FRBR  Work-set algorithm Keys based on author/title Authority files Auxiliary authority files xISBN  Used for xISBN Open WorldCat FirstSearch (coming) Collection analysis (coming) Research

22

23 Authority Files  LAF http://errol.oclc.org/laf/n82-54463.html  Publisher names Not normally controlled Looking for variations with ISBN prefixes Also worked with dissertations

24 VIAF  Merge national-level files  Library of Congress (NACO) and Die Deutsche Bibliothek Bibliographic records analyzed 15% would be erroneous based just on names  Basic matching now completed 435,000 matching names < 1% mismatched  Working on Public interface OAI harvesting Persistent identifiers

25

26 Maj

27 Registries  Show relationships between metadata  Often associated with an identifier  General solution?  Examples Authority files WorldCat PURLs

28  Persistent URLs Map one URL to another http://purl.org/hickey/outgoing -> http://purl.org/hickey/outgoing http://outgoing.typepad.com/ 500,000+ PURLs 111 million resolutions  Port to Wiki’D platform? http://www.oclc.org/research/projects/wikid/  String of PURL servers? Use OAI-PMH for synchronization Spread responsibility  Generalized solution?

29 More connectivity  Open URL  RSS feeds  OpenSearch, SRU/W  OAI-PMH

30 OpenURL  Developed to address the ‘appropriate copy’ problem  Transitioning to OpenURL 1.0  OpenURL resolver Accepts requests specifying Resource Services  Generalized syntax Specifying a resource Services to be performed  Metadata elements specified in registry http://purl.org/openurl/

31

32 SRU  Simplified version of Z39.50 Web based SRW – SOAP SRU – URL  Even simpler? OpenSearch No search syntax Looking for common ground  MXG Metasearch XML Gateway Simplifies metasearcher’s lives

33 OAI-PMH  Method of harvesting metadata More generally, a way of synchronizing databases  No real restriction to metadata  Becomes a repository protocol Identifiers Timestamps  Layered implementation OAI SRU Pears

34 Efficient processing  Beowulf cluster  Map reduce  Text searching

35 Beowulf Cluster  24 nodes 2 processors, 4 gigabytes of RAM, 120 gigabytes disk Gigabit network  Use it for FRBR processing Text indexing Text searching  ~ 30-fold speed up on many tasks 1 year ⇒ 2 weeks 1 week ⇒ 1 day 1 day ⇒ 1 hour 1 hour ⇒ 2 minutes  Extremely cheap processing

36 Map reduce  Pioneered by Google Petabytes of data on thousands of nodes  Adapted to our cluster Tens of gigabytes of data on dozens of nodes  Simple functional programming paradigm  Allows batch processing across cluster

37 Text Searching  Spread database across cluster  Two levels of aggregation 3 servers/node 24-way aggregation Aggregators run across cluster  SRU used HTTP based SRW (SOAP) slowed it down  Open source software

38 Better interfaces  More interactive Live search Dewey Browser  Better connected

39

40

41

42

43

44 Post-coordination of Services  Systems that expose low level services  Higher level coordination of those services  Loosely coupled services  Examples from OCLC Validation service RSS feeds SRU OpenURL, OAI-PMH xISBN DDC Browser built this way Very different interfaces have been built

45 DDC Browser XML  swe 

46 Do We Need It?  Just have Google harvest everything Our experience with Google Fielded searching Reliable searching  Possibility of user-supplied metadata  Cost of good metadata  Cost of non-existent metadata

47 Conclusions  Shift to remote users  Online availability – trend towards centralization  More flexibility in implementations  Patrons are better served  Less emphasis on physical collections

48 Thank you T. Hickey http://errol.oclc.org/laf/n82-54463.html Swedish Library Association 2005 October 28


Download ppt "New approaches to the catalog T. Hickey Svensk Biblioteksförening 2005 October 28."

Similar presentations


Ads by Google