Presentation is loading. Please wait.

Presentation is loading. Please wait.

Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting.

Similar presentations


Presentation on theme: "Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting."— Presentation transcript:

1 Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting Nov. 14, 2012

2 John Cook Wyllie Library http://library.uvawise.edu/ Ebook titles in OPAC & Ebook packages on web in finding aids Rate of e-book acquisition increased  netLibrary – 3k titles per year  EBSCOhost Ebook Academic Collection – 65k titles initial load – 5-10k titles additional every quarter 2

3 Batch Loading Problems Existing procedures were difficult to follow Procedures were inconsistent – especially for different vendors Didn't take advantage of MARCEdit Tools 949 holdings field now includes $a class# – previously, files loaded with AUTO “call#” 3

4 Solution? Wish list?  Determine quality of MARC records – OCLC files vs. other vendor files  Determine editing priorities – required (001/949), recommended, optional  Learn to construct Regular Expression Strings – Batch Editing Tools & Find/Replace Streamlined format – needed both an outline & more detailed info Make available on-line/web-page 4

5 MARCEdit proficiency Beginner  Advanced Beginner – Uses MARCEditor Tools window (Add/Delete field, Edit Subfield Data, Sort by... ) – Can apply Regular Expression Strings  Intermediate – Uses MARC Tools wizard (Extract Selected Records, MARCSplit, Extract selected records) – Can construct Regular Expressions Expert 5

6 Batch-Load Points Counter (BLPC) people.uvawise.edu/acv6d/ 6

7 Batch-Load Points Counter (BLPC) Webpage & Project link people.uvawise.edu/acv6d/ 1.Introduction – project concept & desired outcomes 2.Checklist # – outlines the batch-load procedures & steps – points counter: “what to do” & “when to stop” 3.Processing Guidelines # – procedures & how-to s & copy/paste info 4.949 processing 7

8 BLPC Introduction & Outcomes Validation – determine integrity of the file Processing – determine quality of the records Statistics – track vendor pkgs, record counts, 001 prefixes Points – max. points = 150 (2.5 hours) STOP & contact vendor (request corrected file) 8

9 BLPC CheckList w/Time estimates Step 1 & 2: Preparation & validation – number of records in file – integrity of file – valid URL links Step 3-4: Review & processing – quality of records – lists all processing/edits possible Step 5: 949 holdings  Print on one page (2 p. per sheet / front&back) 9

10 BLPC Processing Guidelines ( Procedures) Gives details for CheckList – Steps 1-2, Steps 3-4, Step 5 Gives the regular expression strings (copy/paste) – Finding/ Replacing/Deleting – MARCEditor Tools & MARCEdit Tools Always use along with Checklist – includes information to process every field, BUT – not every field needs processing  Do not print out 10

11 BLPC Step 1: Preparation & Reports MARC Validator – Identify Invalid Records – Validate Record (copy/paste into text file) Material Type Report Field Count – verify vendor count against MARCEditor count (LDR/000) – count early / count often Deduplicate (See Addt’l Instruct.) 11

12 Reports/ MARC Validator: Identify Invalid Records 12

13 Reports/ MARC Validator: Validate Records 13

14 Reports / Material Type 14

15 BLPC Step 2: Verify Field Counts Reports/ FieldCount for error checking – first field listed is 000 (corresponds to =LDR) – last field listed is “numeric” – 245 count Reports/ MARCValidator errors – open text file created in Step 1 – look for specific errors in error file Check URL links to make sure they work 15

16 Reports/ Field Count (vendor count = 8556) 16

17 Field Count Error & "bad field tag" (vendor count =694) 17

18 Reports/ Field Count: Detail (highlight field & right-click) 18

19 Review Validate Records report (saved as text file in Step 1.B) 19

20 BLPC: Review for processing Checklist Step 3 workflow Check field counts Mark-up notes on the Checklist – Track/count fields that need processing Track points for fields that need processing Track points for fields that need manual editing Each record to fix means extra points Rule of thumb: for more than 12 manual edits Treat as separate post-load maintenance project 20

21 BLPC Checklist Step 3: Review Fields Examples of required processing Examine first record & check field count Title control# – 001 (prefer OCLC#) If lacking: use info. from 035 or create local 001 Check field counts / subfield counts Title/GMD – 245 $h URL – 856 $3 $y $u Check Validate Record text file for errors “Invalid field format” / “Subfield cannot repeat” Check field counts / indicator counts Subject – 650 Ind2 = 4/7 or 5/6/8 21

22 BLPC Checklist Step 4: Review fields Examples of optional processing Check field count & delete if present 029 / 583 / 584 / 938 Check field data and delete Other vendor pkg names (netLibrary/ebrary/myiLibrary/24x7/Ebsco) Check field data & ignore/defer 300 lacks phrase: (1 electronic resource) 22

23 BLPC Checklist with mark-ups 23

24 BLPC Processing workflow Step 3 - Step 4 Review Field Count Review Field data – Use Find/Sort window and review first/last field Add/Delete/Edit field Review Field data – look at field in first record or Find/Sort window – Mistake? Typo? – use the Edit/SpecialUndo Review FieldCount Save edited file / SaveAs new filename 24

25 MARCEditor Tools window adding/editing/deleting fields adding/editing deleting subfields MARCEditor Edit/Find window editing/replacing field data displays sortable list MARCEdit Tools wizard for select & extract records extract tab-delimited records for Excel MARCEditor / MARCEdit Tools BLPC Checklist identifies fields to process 25

26 BLPC Processing: Add std. Phrase 506 => Step 3.S Check Field Count for presence of 506 Delete existing 506 field (if present) Consult Step 3.S in BLPC Procedures – Determine that AddField Tool is needed for processing – Copy Std.phrase from Step 3.S notes – Paste into AddField Tool window and submit Review 506 data in first record Check field count Save file 26

27 MARCEditor Tools: Add std. Phrase 506 => Step 3.S 27

28 BLPC Processing: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V Check Field Count for Presence of 650 Ind2=5/6/8 Consult Step 3.V in BLPC Procedures – Optional Review – FindAll(RegEx) instructions – Determine that Tools/DeleteField tool is needed – Copy RegEx pattern from Step 3.V – Paste into Tools/DeleteField window – Use Regular Expressions radio button option – Submit using Delete button Check Field Count & Indicator count Save file 28

29 MARCEditor: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V 29

30 Regular expressions (RegEx) Finding/Editing patterns in strings (letters/numbers) – Like learning another language Parentheses are used to group data – Forces the computer to "store" data in "chunks" – Data “chunks” are numbered for recall/retrieval/use – Helps the programmer "read" the pattern Optional functionality, and not necessary Some punctuation is "reserved" (has a special meaning) BLPC uses consistent format for RegEx patterns 30

31 Reading RegEx Patterns 650 Ind2= 5/6/8 (non-LC) Pattern: (=650 )(.[568])(\$a)(.+) (=650 ) look for 650 fields with two blank spaces (. [568])look for any Ind1 & listed Ind2 numbers (\$a) look for subfield $ a (used as "anchor chunk") (.+)any letter/number to the end of the field Use Edit/FindAll(RegEx) to verify pattern 31

32 Interpreting RegEx punctuation Pattern: (=650 )(.[568])(\$a)(.+) ( )Parentheses for data “chunks”.Period for any single letter/number [ ]Square brackets for a list using “OR” \Backslash before “reserved” punctuation esp.: $ \ ( ) [ ] +Plus sign for more of the same “Chunks” are stored as: $1$2$3$4 32

33 Creating RegEx patterns Start with known pattern: For non-LC Subjects: (=650 )(.[568])(\$a)(.+) FindAll(RegEx) for “local” Subjects (Ind2 = 4/7) (=650 )(.[47])(\$a)(.+) FindAll(RegEx) for “local” Genres (Ind2 = 4/7) (=655 )(.[47])(\$a)(.+) 33

34 Editing with RegEx string pattern 650 BISAC subjects => 690 Start with known pattern: (=650 )(.[568])(\$a)(.+) Use Edit/Replace(RegEx): Change 650 to 690 Identify “BISAC” subjects: Ind2=7 & $2 = bisacsh Determine which “chunks” change/stay the same Find(RegEx): (=650 )(.[7])(\$a)(.+)(\$2bisacsh) Replace(RegEx): (=690 )$2$3$4$5 34

35 Reading RegEx Patterns 650 BISAC subjects => 690 Pattern: (=650 )(.[7])(\$a)(.+)(\$2bisacsh ) (=650 ) look for 650 fields with two blank spaces (.[7])look for any Ind1 & Ind2 =7 (\$a) look for subfield $a (optional “anchor” text) (.+)any letter/number to the next “chunk” (\$2bisacsh) look for subfield & data at end of field Can be shortened (which makes the pattern look complicated) : Find(RegEx): (=650)(.+\$2bisacsh) Replace(RegEx): (=690)$2 35

36 MARCEditor: FindAll(RegEx) Testing the pattern: 650 BISAC subjects 36

37 MARCEditor: Replace(RegEx) 650 BISAC subjects => 690 37

38 BLPC Step 5: 949 processing Required processing Policy: Include Class# in Unicorn Item record 949 $a -- Pull the call# from the 050$a -- Insert the standard phrase: ' INTERNET' $v -- Pull the 001/OCLC# as a unique no. $w $h $t $x $z -- Add standard holdings data See Addt'l instruct, 38

39 Batch-loading MARCEdit with files no larger than 10k records – MARCEdit/Tool MARCSplit MARCEditor/File: Compile File into MARC Unicorn batch load rpt uses 001 match point – 'o' for OCLC# o & 'g' for local vendor key Unicorn batch load rpt settings – create new bibliographic records only Date cataloged -- back dated to prev. month – prevents interference w/scheduled Authority reports – max. load two files a day 39

40 Identifying records for Cleanup Checklist finds problems to correct post-load Item maintenance projects – 949 lacks call# Bibliographic record maintenance projects – 245 lacks $h (if more than 5-12 records) – URLs lacking Record reload/overlay project – Record already in OPAC (P-N duplicates) 40

41 MARCEdit Tools: Select/Extract selected records Step 3.F: 245 lacks $h 41

42 MARCEdit Tools: Export Tab Delimited records 42

43 Help! MarcEdit Help http://people.oregonstate.edu/~reeset/marcedit/html/help.html – Click thru the Contents menu: Contents / Using MARCEdit / Using the MARCEditor / Editing Functions / Using Regular Expressions. RegularExpressions.info http://www.regular-expressions.info/ MARCEDIT-L list http://metis3.gmu.edu/cgi-bin/wa?A0=MARCEDIT-L BATCH list http://listserv.vt.edu/cgi-bin/wa?A0=batch 43

44 Amelia C. VanGundy The University of Virginia's College at Wise John Cook Wyllie Library 276-328-0154 acv6d@uvawise.edu http://people.uvawise.edu/acv6d/ Virginia SirsiDynix Library Users Group Meeting Nov. 14, 2012 44

45 BLPC Project Presentation revisions Originally presented Nov. 14, 2012 Additional Slides: – BLCP Project web-page – MARCEditor: FindAll(RegEx) – MARCEdit Tools: Export Tab Delimited records – BLPC Project: Presentation revisions 45


Download ppt "Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting."

Similar presentations


Ads by Google