Automating a Vendor File Load Process with Perl and Shell Scripting Roy Zimmer Western Michigan University.

Slides:



Advertisements
Similar presentations
Directorate of Learning Resources Accessing electronic journals from off-campus This causes lots of headaches, but dont despair, heres how to do it! If.
Advertisements

Accessing electronic journals from off- campus This causes lots of headaches, but dont despair, heres how to do it! (Please note – this presentation is.
TEA/TUG + ALDOT(Mobile) = H(O+I) The TEA/TUG being hosted by ALDOT in Mobile causes Hurricanes to come to Alabama. The TEA/TUG being hosted by ALDOT in.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
Millennium Cataloging in Release 2005 Georgia Fujikawa Manager, Training Programs.
Marcive Documents : Catching Up and Keeping Up Implementation Details.
Cross-platform Batch Reports Waldo Library Western Michigan University.
Introduction to a Programming Environment
End and Start of Year Administration Tasks. Account Administration Deleting Accounts Creating a Leavers Group Creating New Accounts: Creating accounts.
Guide To UNIX Using Linux Third Edition
Lecture 02CS311 – Operating Systems 1 1 CS311 – Lecture 02 Outline UNIX/Linux features – Redirection – pipes – Terminating a command – Running program.
Perl DBI Scripting with the ILS Roy Zimmer Western Michigan University.
Management Information Systems MS Access 2003 By: Mr. Imdadullah Lecturer, Department of M.I.S. College of Business Administration, KSU.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
TCP/IP suit 4th Edition by Behrouz A Forouzan. 2 Internet Computing (CS-413)
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
The New Books List Michael Doran, Systems Librarian Ex Libris Southwest Users Group February 6, 2008 – Santa Ana College.
Leveraging the UpSideDown21 Content Management System Tutorial #2.
Advanced Shell Programming. 2 Objectives Use techniques to ensure a script is employing the correct shell Set the default shell Configure Bash login and.
PHP Tutorials 02 Olarik Surinta Management Information System Faculty of Informatics.
Year End Processing Financial Management 1Freedom Systems - Year End Processing Webinar WELCOME TO THE YEAR END PROCESSING WEBINAR WE WILL BE WITH YOU.
Microsoft Word 2000: Mail Merge Basics Peggy Serfazo Marple Molly Calvello Support Professionals Business Applications - Desktop Microsoft Corporation.
G.T.R. Data Inc. Welcome to our EDI Overview. G.T.R. Data Inc. EDI Demonstration This demonstration will take you on a guided tour of our software. After.
1 Chapter 4. To familiarize you with methods used to 1. Access input and output files 2. Read data from an input file 3. Perform simple move operations.
Adding Content To Your Faculty Page 1.Login 2.Create your Faculty Page 3.
AQS Web Quick Reference Guide Changing Raw Data Values Using Maintenance 1. From Main Menu, click Maintenance, Sample Values, Raw Data 2. Enter monitor.
MySQL + PHP.  Introduction Before you actually start building your database scripts, you must have a database to place information into and read it from.
WILIUG June 2015 Julie Woodruff Indianhead Federated Library System Eau Claire, WI.
Introduction to Unix (CA263) File Processing. Guide to UNIX Using Linux, Third Edition 2 Objectives Explain UNIX and Linux file processing Use basic file.
Winrunner Usage - Best Practices S.A.Christopher.
Management Information Systems MS Access MS Access is an application software that facilitates us to create Database Management Systems (DBMS)
6 th Annual Focus Users’ Conference Manage Integrations Presented by: Mike Morris.
® IBM Software Group © 2008 IBM Corporation Setting up Build Forge demo projects for ALM Windows only May – work in progress Stuart Poulin
Writing macros and programs for Voyager cataloging Kathryn Lybarger ELUNA 2013 May 3, #ELUNA2013.
OPAC Training aid (Library solutions & Library world)
Beyond sh Not everyone is as fond of UNIX as most other people. The tutorial talks about the dark side of UNIX.
Colleague, Excel & Word Best of Friends Presented by: Joan Kaun & Yvonne Nelson College of the Rockies.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Chapter 7 File I/O 1. File, Record & Field 2 The file is just a chunk of disk space set aside for data and given a name. The computer has no idea what.
MS Access 2007 Management Information Systems 1. Overview 2  What is MS Access?  Access Terminology  Access Window  Database Window  Create New Database.
Forms and Server Side Includes. What are Forms? Forms are used to get user input We’ve all used them before. For example, ever had to sign up for courses.
Page 1 Non-Payroll Cost Transfer Enhancements Last update January 24, 2008 What are the some of the new enhancements of the Non-Payroll Cost Transfer?
GTR Data Inc. Welcome to our EDI Demonstration G.T.R. Data Inc. August 1997.
MARCIVE - An Overview Part one of an authority workshop presented September 2001 by: Jenifer Marquardt Assistant Authorities Librarian University of Georgia.
Level 1 Tutorial Project How to put a movie player on your Weebly website using an HTML code.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
A & M Libraries Voyager Training Bulk Export, Import, and Prebulk Processing February 21, 2007 Co-ming Chan Oklahoma State University, STW.
Voyage meets MeLCat: MC’ing the Introductions. MeLCat extract sequences Voyager bibout.pl bib extract patout.pl today’s extract yesterday’s extract patdiff.pl.
Module 3: Dealing with Files Robotics – ll. Objectives Understand the file access block and its configuration Create and use files inside NXT programs.
Limiting datasets Some reports can take hours and even days to run. The Retrieve Catalog Records (p_ret_01) is one such report. One way to significantly.
Advanced Task Engine Doing Cool Stuff with Cool stuff!
: Information Retrieval อาจารย์ ธีภากรณ์ นฤมาณนลิณี
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
SILO File Upload & Feedback System By Marie Harms State Library of Iowa August 18 & 19, 2010.
Using Publishing Profiles to dump data out of Alma needed for resource sharing systems such as HathiTrust Margaret Briand Wolfe Systems Librarian Boston.
Merge Rules and Routines
Microsoft Word 2016 Lesson 6 Part 1.
Creating a Resumé.
Publishing to OCLC Yoel Kortick Senior Librarian.
Educational Session Presented by Joni Welling
PO INVOICE ENTRY Creating a PO Invoice Let’s Get Started
Topics Introduction to File Input and Output
Video list editor BIS1523 – Lecture 24.
Cmdlets “Command-lets”
Periodic Processes Chapter 9.
SFTP file transfers for Imports and Exports.
FormTrap Invoice (Original for Mayne Pharma) Made generic and available where you approve of / like this format invoice.
Topics Introduction to File Input and Output
Presentation transcript:

Automating a Vendor File Load Process with Perl and Shell Scripting Roy Zimmer Western Michigan University

We needed to get Promptcat approval files, from OCLC’s ftp site.

Historically, we’ve done file retrieval and processing via shell scripting, with some supporting Perl software.

We needed to get Promptcat approval files, from OCLC’s ftp site. Historically, we’ve done file retrieval and processing via shell scripting, with some supporting Perl software. In this case, we started out mostly manual, with some programmatic support.

We needed to get Promptcat approval files, from OCLC’s ftp site. Historically, we’ve done file retrieval and processing via shell scripting, with some supporting Perl software. In this case, we started out mostly manual, with some programmatic support. We kind of snuck up on the final method of retrieval and processing.

The ftp site has quite a few files, including a number of different types: LBL, RPT, APPR, FIRM…

Let’s use a representative sample for this presentation…

Out of this large number of files, only one or a few will be of interest. For example, take the files for May 7.

Out of this large number of files, only one or a few will be of interest. In this case, the files for May 7. How do we pick them out?

This is where Perl comes to the rescue. With Perl, you can do many things.

Code details, main program, ftp stuff ftppcatappr.pl Required when using ftp within Perl

Site password is stored here. Code details, main program, ftp stuff ftppcatappr.pl Required when using ftp within Perl

Site password is stored here. Code details, main program, ftp stuff ftppcatappr.pl Required when using ftp within Perl - Site URL - Username - directory where files are - transfer mode

Code details, main program, ftp stuff ftppcatappr.pl Self-explanatory Required when using ftp within Perl Site password is stored here. - Site URL - Username - directory where files are - transfer mode

Site password is stored in a file. Code details, main program, ftp stuff ftppcatappr.pl

Site password is stored in a file. Setting up for FTP Code details, main program, ftp stuff ftppcatappr.pl

Retrieve ftp site file listing into a variable as an array of directory entries. Code details, main program, ftp stuff ftppcatappr.pl

Set each line up to be split on the space character and then do so. Code details, main program, ftp stuff Retrieve ftp site file listing into a variable as an array of directory entries. ftppcatappr.pl

Set each line up to be split on the space character and then do so. The last piece in each line will be the filename. Split this into pieces based on the period. Code details, main program, ftp stuff Retrieve ftp site file listing into a variable as an array of directory entries. ftppcatappr.pl

Set each line up to be split on the space character and then do so. The last piece in each line will be the filename. Split this into pieces based on the period. Look for the one(s) that correspond(s) with yesterday’s date and keep those. Code details, main program, ftp stuff Retrieve ftp site file listing into a variable as an array of directory entries. ftppcatappr.pl

Want the files to be processed in order Code details, main program, ftp stuff ftppcatappr.pl

Code details, main program, processing each file ftppcatappr.pl Get the files

Code details, main program, processing each file Records will need some editing… (Thanks to Birong Ho, our systems librarian, for originally supplying this editing code.) Get the records ftppcatappr.pl

Records will need some editing… “grab” the fields of interest Get the records Code details, main program, processing each file ftppcatappr.pl

Records will need some editing… Some fields are deleted… Code details, main program, processing each file ftppcatappr.pl

Records will need some editing… …and others are edited. More edits than this are performed; the basic syntax is the same for each of them. Code details, main program, processing each file ftppcatappr.pl

File will need some splitting… Split each file up based on the invoice number found in field 980 |f Code details, main program, processing each file

File will need some splitting… Split each file up based on the invoice number found in field 980 |f The next program takes care of this… I did say we snuck up on this, didn’t I? Code details, main program, processing each file

File will need some splitting…split each file up based on the invoice number found in field 980 |f Rather than using the familiar LF, the MARC format uses a different EOL character. Code details, helper program, processing each file oclc980.pl

File will need some splitting…split each file up based on the invoice number found in field 980 |f This section reads each MARC record, looking for the 980 field. Code details, helper program, processing each file oclc980.pl

File will need some splitting…split each file up based on the invoice number found in field 980 |f Get the subfields into an array. Code details, helper program, processing each file oclc980.pl

File will need some splitting…split each file up based on the invoice number found in field 980 |f Get the subfields into an array. Code details, helper program, processing each file Look for subfield f and read it to get the invoice number. oclc980.pl

File will need some splitting…split each file up based on the invoice number found in field 980 |f Get the subfields into an array. Code details, helper program, processing each file Look for subfield f and read it to get the invoice number. Determine if it’s a new or “existing” invoice number. This also lets us count records for each invoice. oclc980.pl

File will need some splitting…split each file up based on the invoice number found in field 980 |f Get the subfields into an array. Code details, helper program, processing each file Look for subfield f and read it to get the invoice number. Determine if it’s a new or “existing” invoice number. This also lets us count records for each invoice. Use append mode to open, write a record, and close the file for each invoice number. oclc980.pl

There are usually several files after splitting the file being processed. Each one must be further processed and loaded into Voyager. This is controlled via a small shell script. Code details, helper program, processing each invoice file

There are usually several files after splitting the file being processed. Each one must be further processed and loaded into Voyager. This is controlled via a small shell script. It calls another shell script for preprocessing and bulk loading of each of the invoice files. Code details, helper program, processing each invoice file (Thanks to Keith Kelley, director of systems, for creating this script.) importall.sh

Code details, helper program, importing each invoice file $1 is the default first parameter to the script. Let’s use a more descriptive variable. prodimport.script

Code details, helper program, importing each invoice file $1 is the default first parameter to the script. Let’s use a more descriptive variable. Let’s also drop the filename extension, so that we can “reuse” the filename. prodimport.script

Code details, helper program, importing each invoice file $1 is the default first parameter to the script. Let’s use a more descriptive variable. Let’s also drop the filename extension, so that we can “reuse” the filename. Get ready for the prebulk processing. prodimport.script

Code details, helper program, importing each invoice file prodimport.script 1 st use of file referenced by $1, so that use is OK

Code details, helper program, importing each invoice file Start with some final edits… prodimport.script

Code details, helper program, importing each invoice file Start with some final edits… We’ll use marcedit.pl to replace the contents of field 981 |a, as illustrated: There are 59 such edits possible. prodimport.script

Code details, helper program, importing each invoice file Start with some final edits… Prep for bulkimport, too prodimport.script

Code details, helper program, importing each invoice file Start with some final edits… Prep for bulkimport, too Self-explanatory prodimport.script

Code details, helper program, importing each invoice file Prebulk output is bulkimport input. Perform the bulkimport prodimport.script

Code details, helper program, importing each invoice file Prebulk output is bulkimport input. Perform the bulkimport The final step for each file is to do some cleanup and moving files to the loaded directory. prodimport.script

Password maintenance The ftp site requires us to change our password every 90 days. We wanted all this to run hands-off, so that had to be automated, also. The password gets changed every two months.

Password maintenance, getpromptcatpw.ksh

Password maintenance, pwgen.pl Want an 8-character password

Password maintenance, pwgen.pl Want an 8-character password Password length defaults to 10

Password maintenance, pwgen.pl Want an 8-character password Password length defaults to 10 Password consists of these characters

Password maintenance, pwgen.pl Want an 8-character password Password length defaults to 10 Password consists of these characters Seed the random number generator

Password maintenance, pwgen.pl Want an 8-character password Password length defaults to 10 Password consists of these characters Seed the random number generator Generate the password

Review Run ftppcatappr.pl Login to OCLC ftp site for promptcat Find desired files and retrieve them Do process each file remove unwanted 6xx, 938, 948 fields edit some 856 fields run oclc980.pl do process each record in the current file look at the 980 |f (contains the invoice number) if it contains invoice NNN, (create and) put this record in file NNN.marc, etc. end do run importall.sh do process each file created by oclc980.pl run prodimport.script use marcedit.pl to process 981 |a replacements (59 possible edits) prebulk bulk import wait 1.5 minutes before continuing end do move all interim.marc,.preimp, and.imp files to /loaded End do Move all RCD* files to /loaded

Review Run ftppcatappr.pl Login to OCLC ftp site for promptcat Find desired files and retrieve them Do process each file remove unwanted 6xx, 938, 948 fields edit some 856 fields run oclc980.pl do process each record in the current file look at the 980 |f (contains the invoice number) if it contains invoice NNN, (create and) put this record in file NNN.marc, etc. end do run importall.sh do process each file created by oclc980.pl run prodimport.script use marcedit.pl to process 981 |a replacements (59 possible edits) prebulk bulk import wait 1.5 minutes before continuing end do move all interim.marc,.preimp, and.imp files to /loaded End do Move all RCD* files to /loaded

The files listed below are available at fileload.pptthis presentation ftppcatappr.plgets the files and controls the processing oclc980.plsplits retrieved files based on invoice number pwgen.plgenerates a password importall.shensures that each “split file” for a particular retrieved file is processed prodimport.kshdoes the actual processing of each file getpromptpw.kshhandles all the details of a password change Resources except for marcedit.plenables batch editing of MARC files which is at

CPANhttp://cpan.org FTP I’m not sure if the FTP module is supplied on Voyager boxes or not. If you don’t have it, go to the above URL. It also has good documentation on this module. Resources

Thank you for listening. Roy Zimmer Picture © 2008 by Roy Zimmer