Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Publishing Profiles to dump data out of Alma needed for resource sharing systems such as HathiTrust Margaret Briand Wolfe Systems Librarian Boston.

Similar presentations


Presentation on theme: "Using Publishing Profiles to dump data out of Alma needed for resource sharing systems such as HathiTrust Margaret Briand Wolfe Systems Librarian Boston."— Presentation transcript:

1 Using Publishing Profiles to dump data out of Alma needed for resource sharing systems such as HathiTrust Margaret Briand Wolfe Systems Librarian Boston College ELUNA May 8, 2015

2 When the call for data comes HathiTrust Rapid ILL Browzine Your data extraction headache here ELUNA 2015 2

3 Frustrations dumping data out of Alma and Analytics 5,000 row Excel export limit in Alma 65,000 row Excel export limit in Analytics Alma Bibliographic Export Processes MARC21 Binary MARC XML Entire MARC is too much data to sift through Alma APIs Too slow for millions of records Daily limit to the number of API calls ELUNA 2015 3

4 Solution: Alma Publishing Profiles Is set based Can only be published in full once, subsequent publishing contains the delta Ex Libris says full re-publish is coming in a future release Need a place for the published files to land, such as S/FTP server ELUNA 2015 4

5 HathiTrust Files Requirement Print Holdings in 3 separate files: Single Print Monographs Multi-Part Monographs Serials ELUNA 2015 5

6 BC’s Managed Sets for HathiTrust Sets built for 9 separate libraries for both books and serials using the Advanced Repository Search Physical titles where library = O’Neill and material type = Books Physical titles where library = O’Neill and material type = Issue or Bound Issue Can combine sets but once combined sets become itemized sets instead of logical sets I combined all serials sets into one itemized set and all sub-library book sets into one itemized set O’Neill books stayed in its own logical set ELUNA 2015 6

7 Normalization Rules Publishing Profiles can use normalization rules to determine what data is output See Alma Help, browse normalization rules if unsure how to add or edit a rule Briefly: Resource Management -> Cataloging -> Metadata Editor -> File -> New -> Normalization Rule OR Resource Management -> Cataloging -> Metadata Editor -> Rules -> Normalization Rules ELUNA 2015 7

8 Normalization Rules We use a rule that removes all of the MARC fields except: 001 – contains system number (MMS ID) * 035 – contains OCLC number * 022 – contains ISSN. Used when set is for serials 074 – contains government document number 901 – publishing profile puts item description in 901 subfield a (more on this soon) * Required by HathiTrust ELUNA 2015 8

9 Publishing Profiles – Profile Details Resource Configuration -> Configuration Menu -> Publishing Profiles -> Add Profile -> General Profile BC ended up with 3 publishing profiles: 1. O’Neill Books – uses logical set 2. All other sub-libraries’ books – uses combined itemized set 3. All serials – uses combined itemized set Under Content -> Publish On: Bibliographic Level Under Publishing Protocol can choose: FTP or OAI. BC uses FTP MARC Output format = MARC21 XML or MARC 21 Binary BC uses MARC21 XML, 10,000 records per file Added filename prefix to distinguish files for each of the 3 sets ELUNA 2015 9

10 Publishing Profiles – Profile Details ELUNA 2015 10

11 Publishing Profiles – Data Enrichment Under Bibliographic Normalization – select normalization rule you created to only export the MARC data you want Under Physical Inventory Enrichment – Check Add Items Information if profile is for books. Set repeatable field = 901, set description subfield = a. This puts the item description/enumeration in 901 tag, subfield a. This is used to find multi-part monographs. ELUNA 2015 11

12 Publishing Profiles – Data Enrichment ELUNA 2015 12

13 Publishing Profiles - Actions ELUNA 2015 13

14 What to do with all those files Unzip them – I wrote a PERL script to unzip all of the files FTP’d by Alma onto one of our servers Process them – I wrote a PERL script to read each XML file and process each record in the file. To go to Hathi Trust each record needed an MMS ID and OCLC number. For Serials files I added the ISSN(s) if present Multi-part monographs could only be identified by the presence of a description field If 074 then set Gov Doc indicator = 1 ELUNA 2015 14

15 HathiTrust elements I ignored Holding Status CH – Current Holding WD - Withdrawn LM – Lost or missing Condition BRT – Brittle, damaged and/or deteriorating ELUNA 2015 15

16 Why I ignored them Alma does not distinguish between items that are deleted versus items that have been withdrawn. Lost and Missing statuses are stored in the item processing type. Could add to data enrichment from items. We store brittle or deteriorating condition in the item internal note. Ditto. ELUNA 2015 16

17 Your Turn What have you done? How can we do this better? What should we ask Ex Libris for to make this process easier? ELUNA 2015 17

18 Contact Me Margaret Briand Wolfe briandwo@bc.edu ELUNA 2015 18


Download ppt "Using Publishing Profiles to dump data out of Alma needed for resource sharing systems such as HathiTrust Margaret Briand Wolfe Systems Librarian Boston."

Similar presentations


Ads by Google