Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research.

Similar presentations


Presentation on theme: "Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research."— Presentation transcript:

1 Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research directions in the areas of metadata management and knowledge organization. Presented to Library of Congress cataloging managers retreat.

2 Topics Framework for WorldCat directions Metadata management and knowledge organization Working with web services Making data work harder Some research, some production Open WorldCat

3 Framework for WorldCat directions

4 Collections grid highlow high stewardship uniqueness Books Journals Newspapers Gov. docs CD, DVD Maps Scores Special collections Archives Rare books Local history materials Archives & Manuscripts Theses & dissertations Research and learning materials ePrints/tech reports Learning objects Courseware E-portfolios Research data Untransferred records Freely-accessible web resources

5 WorldCat – the what? WorldCat: - Grow - Version - Improve Easier to use (FRBR) Microcontent Evaluative content Add special collections & institutional content to WorldCat: dissertations, cultural heritage collections, Eprints, learning objects The Open Web Both surface and acquire WorldCat content

6 WorldCat – the how? Research in these areas

7 Some issues Metadata variety – Encoding, element sets, values/content – Provenance Metadata manipulation – Validation, identification – Enhancement, augmentation – Relation, FRBR, deduplication – Transformation Schematization and web services – Make data available in forms that allow machine services to be flexibly built on top of them – Everything is a service

8 Open WorldCat

9 Facilitate the rendezvous of users and library services on the web Surface the library where the users are Help release the value of library services in the working and learning lives of their users.

10 Open WorldCat Architecture Aggregators Schemas and Vocabularies Profiles and Relationships Content Owner Portals Metadata Distribution, Search, Display Access Google, Yahoo and Book Vendors Organization and Presentation OCLC Organizes WorldCat content in model suitable for harvesting, anticipate unique aspects of various portals OCLC Uses Host of Authentication and Authorization tools to progressively match content to rights OCLC Developed Geo-locator services to matches users to extensive FirstSearch WorldCat institution and user profiles WorldCat, Additional collections can be added to Worldcatlibraries domain OCLC will use tools such as xISBN and FRBR models to organize WorldCat public views suitable for low precision access

11 Current partners Book vendors and bibliographies  ABE Books  ABAA  Alibris  HCBIB  BookPage BookPage Search engines (pilot with 2M records exposed as web pages for harvesting)  Google Google  Yahoo! Yahoo! Click in presentation mode to go through to examples Try a search for: A history of caricature and grotesque in literature and art

12 8/14/03: Google contract signed 9/19/03: Google given go-ahead to harvest records 10/22/03: Google harvests 150,000 records Dec.’03: Records begin to appear in Google; 800 inbound-links logged (search- site-originating [SSO]) Jan.’04: 32,000 inbound links logged (SSO) Mar.’04: 109,000 inbound links logged (SSO) 5/21/04: Yahoo contract signed 5/28/04: Yahoo harvests records May’04: 725,000 inbound links logged (SSO) 6/6/04: Yahoo completes indexing of 2 million WC records Google and Yahoo! timeline

13 Traffic Full record displays. Projected for June.

14 Metadata management and knowledge organization

15 Research activities Structures – FRBR – VIAF  BT – FAST – Vocabulary encoding and mappings Services – xISBN – Metadata transformation services – Terminology services – Authority services – Automatic classification and cataloging  Eprints uk  Web harvesting

16 FRBR OR Work-set algorithm Work-based view incorporated into WorldCat in FirstSearch in late 2004 FictionFinder – 2.6+ million fiction records from Worldcat, clustered by OCLC’s FRBR algorithm – Make greater use of data (genres, settings, imaginary characters, etc) Participate in ongoing FRBR refinement Click in presentation mode to go through to FictionFinder

17 FAST

18 Vocabulary mappings

19 Services Web services – Computer to computer applications over the web Unplug and play – Unbundling monolithic applications and making functionality available in more modular ways Reuse and sharing – Of services! Release the value in a web environment of the historical library investment in vocabularies and structures

20 xISBN An experimental web service – Leverages FRBRization work – Give it an ISBN, it returns all related ISBNs – Based on WorldCat – Designed for machine-to-machine data exchange Examples: – Check user ILL requests against all editions/versions in OPAC – Find library’s editions when user finds any edition/version of item on Amazon – Check OPAC for all editions during selection/acquisitions/gift book processing – …

21 xISBN Click cover to search amazon.co.uk Click cover to search Seattle Public Library Install FRBR Bookmarklets in your browser to see xISBN working. See Bookmarklets page At www.oclc.org/research/researchworks/www.oclc.org/research/researchworks/ Install FRBR Bookmarklets in your browser to see xISBN working. See Bookmarklets page At www.oclc.org/research/researchworks/www.oclc.org/research/researchworks/

22 Metadata schema transformations Metadata Schema Transformation Services – Evaluate approaches to crosswalking metadata – Prototype transformation environments The XSLT “short path” – Supports lightweight XML processing – Designed for public access – Deliverables:  OAI repository of METS-captured xwalks [NEW] The “long path” option – Designed for high-fidelity translations – May be public or proprietary – Deliverables: Toolkit; expertise in non-MARC formats

23 11 File of records in format X 55 File of records in format Y 22 Transform to intermediate form STRUCTURAL TRANSFORM Translate input semantics to CORE 33 CORE SEMANTIC TRANSLATION Transform to output format Y STRUCTURAL TRANSFORM Translate CORE to output semantics 44 SEMANTIC TRANSLATION

24 A crosswalk as a METS record Describe the crosswalk object in the METS header. Assemble and identify six objects in the METS structural map: – The source metadata schema – The target metadata schema – The crosswalk – Human-readable and executable versions of each Associate metadata for each file in the METS Descriptive Metadata Section.

25 Crosswalk METS record in OAI repository

26

27

28 What the METS encoding solves The semantic and syntactic information required for interpreting and executing a crosswalk is collected into a single object. The repository is searchable by humans and automated processes. Services can be built on top of it. It encourages the development and standardization of crosswalks. These outcomes are possible because every component in the system is a standard.

29 Terminology Services Terminology services are web services for knowledge organization schemes (kos) – e.g., authority files, subject heading systems, thesauri, taxonomies, and classification schemes A web service that provides mappings from a term in one vocabulary to one or more terms in another vocabulary is an example of a terminology service

30 Current Situation A plethora of vocabularies Many encoding formats Few inter-vocabulary connections Identifiers inadequate – Unavailable – Temporary – Inconsistent

31 Terminology services system framework Schema transformation: – MARC XML – SKOS – Zthes Record enhancement: – Inter-vocabulary mappings – Persistent identifiers (info:uri) Access: – Human-readable: – Browse interface (ERRoLs) – Search/retrieve records (SRU/W) – Switch between schema-specific views (XSLT) – m2m:  Publishing (OAI)  Search/retrieve records (SRU/W)  info:uri resolution (OpenURL) Open standards: – MARC 21 – XML/XSLT/XPath – SKOS – Zthes – SRU/SRW – OAI – info:uri – OpenURL Open source software: – OCLC OAICat – OCLC SRU/SRW server – OCLC ERRoL J2EE webapp Open content: – GSAFD, others… Open access Web services-oriented

32 Schema Transformation MARC XML – Authority Format & Classification Format SKOS – Simple Knowledge Organization Systems Zthes – Z39.50 Profile for Thesaurus Navigation.5 – Based on Z39.19 (NISO Thesaurus Standard)

33 Vocabulary Processing Vocabulary X ZthesSKOS schema transformation Add: provenance (MARC Org. Codes) persistent identifiers (info:kos) Optionally, add: inter-vocabulary mappings Concepts & terms persistent identifers (info:kos) Vocabulary Y data enhancement Conversion from most formats: Z39.19 wordlists in PDF, etc. Initial conversion to MARC XML Authorities format, or, Classification format

34 Info:kos Info:uri – provides a mechanism for the registration of public namespaces that are used for the identification of information assets The kos identifier – provides a mechanism for identifying knowledge organization schemes and the concepts used in those schemes. It has two elements:  scheme  concept

35 http://errol.oclc.org [OpenURL base URL] http://errol.oclc.org/xyz.search [SRU-to-HTML gtwy] http://errol.oclc.org/xyz.html [HTML interface] server (info:uri resolver) http://alcme.oclc.org/srw/ [SRW request] New services environment DC SKOS Zthes server [SRW/SRU response] [ERRoLs server stylesheets applied] http://errol.oclc.org/xyz.rss [RSS feed] http://errol.oclc.org/xyz.sru [SRU gateway] http://errol.oclc.org/xyz.srw2oai [OAI gateway]

36

37

38

39

40

41 Name authority lookup Interactive As a web service Lorcan Dempsey An example: authority control service invoked from within Dspace  Click in presentation mode.

42

43

44

45

46 Working with web services

47 Making data work harder

48 Data mining Research Production – Collection analysis service in development phase – Leverages WorldCat data in interactive mode  Compare my collection to my peers  Compare my collection to my neighbors  Profile my collection by subject, by age, …  etc

49 Collection Change creates demand for better data. Growing interest in knowing more about: – Characteristics – Gaps and overlaps – Use Tuning collections based on data. Focus collection spending where creates most value.

50 Some projects Characteristics of collections – WorldCat – CIC Compare ILL, circulation and holdings data. Last copy: what is irreplaceable? ARL Global Resources. – Exploring coverage of overseas titles in ARL libraries. Depends on consistency, coverage, currency

51 Comparing CIC Collection Profiles

52 Audience level Forge Letters

53 Profiles of ‘Letters’ & ‘Forge’ Example 0.81 0.65

54 Topics Framework for WorldCat directions Metadata management and knowledge organization Working with web services Making data work harder Some research, some production Open WorldCat

55 Thoughts Machines will do more work – Consistency becomes more important Variety Low precision – Make data work

56 The pattern is new … The knowledge imposes a pattern and falsifies For the pattern is new in every moment The knowledge imposes a pattern and falsifies For the pattern is new in every moment

57 Further information Thanks to colleagues in OCLC Research for contributions to this presentation. Further information about OCLC Research projects can be found at http://www.oclc.org/research/http://www.oclc.org/research/ Thanks to colleagues in OCLC Collection Management Services for contributions to this presentation. Further information about Open WorldCat at http://www.oclc.org/worldcat/pilot/ http://www.oclc.org/worldcat/pilot/


Download ppt "Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research."

Similar presentations


Ads by Google