Excellent XML – systems interoperability at the Wellcome Library EIUG 11th Conference, Stirling University 1 & 2 September 2005 Margaret Savage-Jones

Slides:



Advertisements
Similar presentations
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Advertisements

Home-Grown Digital Library System Built Upon Open Source XML Technologies and Metadata Standards David Lacy Villanova University
EPrints Web Configuratio n Management. SQL database Web server Scripts to configure repository activities Configuration files EPrints - the Administrator's.
EndNote. What is EndNote:  EndNote is referencing software that enables you to create a database of references from your readings. Your database of references.
Library Online Catalog Tutorial Pentagon Library Last Updated March 2008.
Management Information Systems, Sixth Edition
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
Millennium Create Lists Claudia Conrad Product Manager, Cataloging Northwest IUG October 2003.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
EAD in A2A Bill Stockting, Senior Editor A2A and EAD Working Group: Central Archives of Historical Records, Warsaw, 26 April 2003.
XP Browser and Basics1. XP Browser and Basics2 Learn about Web browser software and Web pages The Web is a collection of files that reside.
Resource Discovery Module DigiTool Version 3.0. Resource Discovery 2 Deposit Approval Search & Index Dispatcher & Viewers Single & Bulk Web Services DigiTool.
Millennium Cataloging in Release 2005 Georgia Fujikawa Manager, Training Programs.
Browser and Basics Tutorial 1. Learn about Web browser software and Web pages The Web is a collection of files that reside on computers, called.
Tutorial 11: Connecting to External Data
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
Sharepoint Portal Server Basics. Introduction Sharepoint server belongs to Microsoft family of servers Integrated suite of server capabilities Hosted.
Putting it all together for Digital Assets Jon Morley Beck Locey.
Refworks Presented by Margaret Clark, Reference Librarian FSU College of Law Library September 20, 2005.
Global Update with Confidence Mary M. Strouse Innovative Users Group May 19, 2009.
WILIUG 1. June 2, 2005 Using Review Files with Millennium Rapid & Global Update jenny schmidt SWITCH Library Consortium.
Classroom User Training June 29, 2005 Presented by:
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
University of Illinois at Urbana-Champaign OAI Alpha Experiences Timothy W. Cole Thomas G. Habing Grainger Engineering.
MAHI Research Database Data Validation System Software Prototype Demonstration September 18, 2001
What’s New in VRS? GUGM May 15, 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
Weeding in ALEPH Library Staff Training © South Dakota Library Network, 2013 ©Ex Libris (USA), 2011 Modified for SDLN Version
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
TERRA KRIDLER SENIOR LIBRARIAN & ASSISTANT UNIVERSITY ARCHIVIST AMERICAN UNIVERSITY IN CAIRO MIDDLE EAST AND NORTH AFRICA INNOVATIVE USERS GROUP CONFERENCE.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May , 2009.
Getting Started with CONTENTdm Corey Harper, University of Oregon Terry Reese, Oregon State University OLA - April 8, 2005.
Electronic Theses at Rhodes University presented by Irene Vermaak Rhodes University Library National ETD Project CHELSA Stakeholder Workshop 5 November.
Project Overview Bibliographic merging, Endeca, and Web application.
Let VRS Work for You! ELUNA Conference 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
Running a Report.  List Bibliography Report  Found under: All Titles Purpose : Creates customized bibliographies by catalog, call number, or item characteristics.
A Brief Introduction to Encoded Archival Description Kevin Schlottmann Queens College Archives and Special Collections April 7, 2010.
 2004 Prentice Hall, Inc. All rights reserved. 1 Chapter 34 - Case Study: Active Server Pages and XML Outline 34.1 Introduction 34.2 Setup and Message.
Web OPAC & GUI (Staff) Search v.16 eSeminar Doron Greenshpan.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
PatentScope - Electronic Publication World Intellectual Property Organization.
EndNote. What is EndNote? EndNote is referencing software that enables you to create a database of references from your readings.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
Solutions using Microsoft Content Management Server 2002 Connector for SharePoint Technologies Sue Corke Mark Harrison Microsoft UK.
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
The physical parts of a computer are called hardware.
A Technical Guide to ERMS Bill Manago, CRM. What You Need to Plan For Implementing an Electronic Records Management System Out of the Box What you should.
IN THE NAME OF GOD. Reference Citing Software.
Digitization with Millennium & CONTENTdm Stuart Hunt IUG17 Anaheim May 2009.
Recent CMA Enhancements Java-based Scroller Component Sample Layout Fixed problem with Component Modifier when previewing Select List components Fixed.
EAD 101: An Introduction to Encoded Archival Description XML and the Encoded Archival Description: Providing Access to Collections Oregon Library Association.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Do Real Archivists Use OAI? Mid-Atlantic Regional Archives Conference Gettysburg, PA October 31, 2003 Chris Prom Assistant University Archivist University.
The ___ is a global network of computer networks Internet.
Sharing Your Finding Aids in CONTENTdm Encoded Archival Description (EAD) Files in Mountain West Digital Library June 3, 2009 Sandra McIntyre, Mountain.
The Open Archives Initiative: Perspectives on Metadata Harvesting OAI Provider & Harvesting Services at the University of Illinois Timothy W. Cole Mathematics.
Creative Create Lists Elizabeth B. Thomsen Member Services Manager
7th Annual Hong Kong Innovative Users Group Meeting
Using the Personal Image Photo Library
Information modeling and infrastructures for metadata
Reference Management Software Tools Zotero - Open Source (Module 12)
Bibliography and reference manager programs, Endnote 2018 Attila Skulteti
EndNote by: fatimah alotaibi.
Digitometric Services for Open Archives Environments
Bibliography and reference manager programs, Endnote 2018 Attila Skulteti
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Tutorial 7 – Integrating Access With the Web and With Other Programs
Presentation transcript:

Excellent XML – systems interoperability at the Wellcome Library EIUG 11th Conference, Stirling University 1 & 2 September 2005 Margaret Savage-Jones

Wellcome Library Systems  Millennium - Innovative Interfaces Inc. Includes online requesting from closed stack since mid 2003  Calm - Archive system – DS Ltd Online access to archive & mss holdings  Miro/MedPhoto image system – System Simulation Ltd Online access to over 100,000 images, image retrieval & delivery

Underlying protocol: OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting - protocol for sharing and harvesting metadata between different OAI-compliant systems Based on XML and HTTP One system (CALM or MedPhoto) exposes metadata via an OAI repository. This metadata is harvested by the other system (Millennium) and then loaded

Motivation With a MARC21, ISAD(G) & a bespoke image repository it was a strategic objective to make these systems interoperate Phase II of the Closed Stack project - Western Manuscripts and Archives had to be requestable online by summer 2004 XML Harvester development by Innovative with Michigan State University Wellcome placed an order for XML Harvester in January 2003 With CALM ver 4 it was possible to export EAD XML

Benefits  Online requesting - Western MSS & Archives collections  One circulation system to manage and one set of circ stats  Same interface for all online requests from stack  Archives & manuscripts like other collections  Image sets for library objects displayed in Web OPAC  User can jump from one system to another  No need to rekey user search in other system  Selective harvesting for onward record updating

Example: archive record (from Crick Coll.)

Harvested archive record in Web OPAC

Image of the archive item

Encoded Archival Description (EAD) Initially XML Harvester dealt only with EAD and needed encodinganalogs for parsing. Developed with Michigan State University (MSU) whose EAD finding aids had MARC encodinganalogs. Harvester parser read these tags. Encodinganalogs are attributes in XML records indicateing field, subfield, indicators etc. in another descriptive encoding system e.g. MARC21 equivalent to EAD tagged element

Archive system metadata Hierarchical, tree structure with collection and component item level records catalogued in General International Standard Archival Description, ISAD(G) Field export from CALM as default subset EAD DTD had some empty fields – had to export as “DServe Natural” XML which includes field tags. Catalog.xml output with catalog.DTD

Pilot – used “Haddad” catalogue XML Used small set of 87 XML Arabic records – a local variant of `MASTER’ XML DTD as a pilot to tes XML Harvester Used stylesheets to filter unwanted fields, add encodinganalogs and put 87.xml files in a web server directory ready to be harvested

Web crawler Harvester reaches the XML files through port 80. We added a page to the Millennium screens directory listing files with redirections to the web server folder. Harvester opened the page, scanned for `HREF’ strings which directed it to the XML records (file.xml) The XML Harvester parser read tags from encodinganalogs to create MARC21 records, writing to a file for loading

Redirection screen Harvester Test Mss Files Sample Screen # 2 Test to confirm if harvester can crawl files deposited on wtcalm

Example – encodinganalogs for View full manuscript record

Harvested MARC21 “Haddad” record

Links: to PDF and Request button

Lessons  Arabic records would be loaded only once but records from CALM would need regular reharvesting/overlay  Need a more sophisticated approach than crawling a web directory – XML Harvester can harvest from OAI Repository and use datestamps in OAI to harvest records created, or modified in specified date range  XSLT could be used to transform records to MARC21 OAI without using encodinganalogs.

Archives OAI repository  Built on CALM server using freeware University of Illinois Provider service tool (Runs under Windows IIS)  Other Requirements: Microsoft 2000 server Microsoft IIS ver 4 or higher Microsoft ASP Microsoft XML Parser (MSXML) 4.0 Microsoft ActiveX Data objects and ODBC compliant datasource i.e. MS Acces97+ database Firewall access on port 80

Key decisions  Metadata export – chose full CALM record XML DTD (not EAD)  Matchpoint – decided to load contents of Calm RefNo field to Millennium 001 indexed in `o’ Also had to consider:  Hierarchical record level to harvest  Navigation between the two systems  Millennium parameters

Decision: Record level to harvest A “Collection” could consist of more than 40 boxes. Must have 1:1 record relationship to make requesting and retrieval work Decision to exclude archives Collection records & use Component level records. Each of these represent 1 item (box, folder, piece) and links to a single bib records with attached item for circulation in Millennium

Decision: Navigation Archivists wanted the archives (CALM) interface to offer the main search route for Western Archives & MSS User is taken from CALM record into Millennium to place their request then back to their CALM record to continue browsing their hit list - – two links were needed Forward: runs cgi script to search Millennium for corresponding bib record Back: 856 with URL link (can be inserted by Harvester)

Example: Links Forward: cgi script runs search of Millennium `o’ index for match on CALM RefNo value Back: RefNo PP/CRI/A/1/2/8 built into OAI record URL linking to CALM web front end - RefNo value built into search string dsqIni= Dserve.ini&dsqApp=Archive&dsqCmd=show.tcl& dsqDb= Catalog&dsqPos=0&dsqSearch=((text)='PP/CRI/A/1/2/8')

Calm XML export file - Component MS4385/4404 MS.4404 Notes and extracts on Chemistry, Volumetric Analysis, (etc.) c Item 1 volume Bentley House Western MSS series 3 - Requestable

Mapping Calm XML to Marc21  Fields tags used: 001, 008, 245, 260, 500, 506, 655, 856 And 949 to make the item. Harvester inserts a 99x tag with load identification code e.g. CALM  Found that Component records do not have `author’ which is only held at Collection level – but not a problem  `Mock’ bib and item records keyed to Millennium to: - demonstrate navigation & agree content with team - act as a benchmark when harvested records loaded

XSLT – eXtensible Style Language Transformation Used XSLT to split the XML single output file into 48,000 component.xml records using the as record delimiter and then transform them to MARC21 OAI records listed to XML Harvester by our OAI repository The OAI repository installed on the CALM staging server uses the University of Illinois Provider service tool - freeware

Millennium parameters To cope with `open’ v `closed’ archive collections – new codes were added to archives records and mapped to new Millennium branch codes which would trigger Millcirc rules New branch codes added to Request Rules, Determiner Table, WWWOPTIONS, Locations served New MATTYPE to exclude Western Mss and archives from the Asian Mss scope

Config file for archives

Management interface for XML Harvester

Archive record: Request link to Web OPAC

Harvested archive record in Millennium

Patron login screen to place request

Confirmation of request

Interoperation sought with image system To integrate MedPhoto, a bespoke photo library system, and Millennium for seamless display and ordering of images MedPhoto holds images and records for more than 60,000 items catalogued in Millennium – Iconographic collection, archives & manuscripts, rare books etc. Specific need for Millennium User to see images associated with library objects

Media management interface

Config file for image

Selective Harvesting – images Harvest full “bib” set and load to Millennium populating 962s then each month request list of all new image URLs created since the last harvest with a Millennium.b number in their record. < dataPrefix=marc21&set=bib&from= &until= > (for records in May) < dataPrefix=marc21&set=bib&from= &until= > (for records in June and so on)

Harvesting: Image OAI repository  OAI repository built by SSL on MedPhoto server  Metadata matchpoint.b bib record no. is common element  Between Millennium and MedPhoto  XML Harvester selectively requests record set “bib” which all  Have.b nos, parses the returned list of MARC21 OAI records and creates a file of MARC records for loading  Matches on.b and overlays inserting 962 for each image  962|u holds URL for thumbnail and |e holds `launchpad`URL

MARC21 record ready to load File Name: DONE-MEDPHOTO_ marc (411,392 bytes) Offset: 256 Blocks: LEADER 00403nam a uu 4500 DIRECTORY TAGS nam a uu |uhttp://medphoto.wellcome.ac.uk/ixbin/imageserv?MIDMIRO=L |zView |a000:000:URL:b :000000:0:0:0:0:0:0|tImage|vn|uhttp://medphoto.wellcome.ac.uk/ixbin/hixclient.exe?MIROPAC=L |ehttp://medphoto.wellcome.ac.uk/ixbin/i

Example: with |t default

“Launch pad” We saw an opportunity for further integration – used Intermediate screen – URL delivered by MedPhoto repository and loaded to 962 |e User can hotlink from this “launch pad” into image system to register, use a light box, , download or order the image online from the image system before returning to Web OPAC

What we used  XML Harvester product (III)  OAI repository software  VBScript – for file splitting operation  Instant Saxon (command line XSLT processor)  Microsoft MSXML core services (e.g. ver 5)  Media Management for 962 (or load URLs to 856)  Three OAI-PMH compliant library systems  Shared Record IDs as matchpoints  Some experience of working with stylesheets  Some experience of load tables and record loading

Work in progress  Harvesting legacy catalogues/XML for other Asian MSS e.g.Iskander and Jain project (with Oxford University)  Complete testing and batch loading of 60,000 thumbnail and “launchpad” URLs to 962’s  Establish routines to manage updates for new, deleted or amended records – utilise OAI-PMH selective harvesting  Further automation of routines where practicable

Wish List/Enhancements  Global edit for 962 tag  More documentation for XML Harvester  Access to underlying harvester parameters e.g. for XSLT processor and XML parser  Automation of selective harvesting for maintenance

Useful links  XML  EAD  OAI software  XSLT    OAI tutorial  OAI repository testing

Some example records

Excellent XML: systems interoperability at the Wellcome Library Thanks for your attention Margaret Savage-Jones Library Systems Administrator