Presentation is loading. Please wait.

Presentation is loading. Please wait.

W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington.

Similar presentations

Presentation on theme: "W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington."— Presentation transcript:

1 W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington University SAA Chicago Session #801 September 1, 2007


3 W HY M ANAGE WITH A D ATABASE ? Scale Centralized management Access Reusability Rearrange-ability

4 R EAL D RIVERS OF C HANGE Demand for item level access Born Digital content Digitized content Researcher demands and expectations

5 M ANY INPUTS, M ANY OUTPUTS Physica l object Storage location Digital object Storage location

6 O BJECTS AND A TTRIBUTES I belong to a collection I belong to a series I came from somewhere I am an image I am a certain file format(s) I am about something(s) I am green, blue, and brown


8 Project Ungava, National Research Council of Canada C LUSTERING


10 © 2007 Gregory C. Colati C ONTEXTUALIZE THE R ESOURCE The Encyclopedia of Chicago


12 A C ULTURAL S HIFT General SpecificAssociation Object

13 E XTEND I NTEROPERABILITY Descriptive standards at the item level

14 M ANAGE FROM THE BOTTOM UP Items and attributes Create associations, implicit and explicit

15 PRODUCTIVITY APPROACH TO PROCESSING, MANAGEMENT, AND ACCESS Automate metadata creation Metadata extraction Pre-populate metadata fields using default and automatically generated terms Stop writing extensive biographical and historical notes Automate digital content creation


17 U SE THE POWER OF DATABASE TOOLS Ingest tools discussed above Export templates for: MARC EAD Various XML schemas for item level export: MARCXML, DC, TEI, VRA etc.

18 L EVERAGE USE OF DIGITAL REPOSITORIES We don’t have to be self-sufficient Outsource low-level functions Mass storage Backup

19 C REATE PARTNERSHIPS Computer scientists Librarians Academic technologists

20 G ET INTO MAINSTREAM DISCOVERY TOOLS G ET “ INTO THE F LOW ” Can everyone say Google MySpace YouTube Facebook

21 C REATE ACCESS TOOLS BASED ON USER NEEDS Understand how all of our constituencies seek information and use information Make our tools reflect these behaviors. When those behaviors change, our tools should change with them.

22 N EW S KILLS FOR THE D IGITAL E RA Jennifer King George Washington University




26 From Document To Database Sylvia Augusteijn George Washington University Special Collections and University Archives SAA session 801 September 1, 2007

27 Out from the binders  Scope and content notes, series descriptions simple to cut and paste into Re:Discovery  Cut and paste not feasible for thousands of item-level records  “Container list” project is born  Goal: to separate elements of each item name (number, title, date) so Re:Discovery could import them into their respective fields

28 Container lists  Each item has a number, title, and date, but formats vary slightly in punctuation or spacing Ways of writing the same name: 1. Correspondence, 1950-57 I. Correspondence – 1950-1957 i. correspondence 1950 to 1957  Naming conventions generally consistent within each finding aid  How to automate?

29 Automation, part 1: Delimiting the text  Container lists saved in a text editor (TextPad)  Delimiters are special characters placed within the text to separate the elements  We chose * to signal the beginning and end of each field and % to signal the boundary between fields  Item as it appears in text of finding aid: 1. Correspondence, 1950-57  Item with delimiters inserted: *1*%*Correspondence*%*1950-57*

30 Delimiting the text (continued)  Re:Discovery can import directly from the text editor, with instructions  Instructions to Re:Discovery: the first element of this name will be the number, the second will be the title, the third will be the date *1*%*Correspondence*%*1950-57*  How to add these delimiters to thousands of item records?

31 Automation, part 2: Regular expressions  A regular expression is a string that uses special characters (such as \ + $ ^ ]) to describe and match patterns of text within a document

32 Regular expressions (continued)  First used regular expressions to search through our text for anything formatted like an item (i.e. to search for a pattern in which an item number is followed by a title and date)  Then used regular expressions to insert our delimiters in between those elements To turn a page of this: 1. Journals, 1950-60 2. Photographs, 1970-80 3. Postcards, 1940-50 Into a page of this: *00001*%*Journals*%*1950-60* *00002*%*Photographs*%*1970-80* *00003*%*Postcards*%*1940-50*

33 Examples of regular expressions To turn 1. Correspondence, 1950-1957 into *00001*%*Correspondence, 1950-1957 Find: \([0-9]\). (find any digit followed by a period) Replace: *0000\1*%* (replace with *, four zeroes, that digit and *%*) Then to turn *00001*%*Correspondence, 1950-1957 into *00001*%*Correspondence*%*1950-1957 Find:, \([0-9]\{4\}\) (find any four-digit number preceded by a comma and space) Replace: *%*\1 (replace the comma and space with *%*)

34 Challenges  Tweaking expressions slightly for each new container list  Writing the wrong expression and accidentally replacing the wrong text  Failing to export correctly to Re:Discovery due to small number of missing delimiters

35 Re:Discovery and beyond  Delimited text exported into Re:Discovery  From Re:Discovery, easy creation of EAD finding aids using a template  To date: 257 collections in Re:Discovery (and EAD finding aids on the web) 0 binders

36 C ONTACT I NFORMATION : Greg Colati Digital Initiatives Coordinator University of Denver Jennifer King Manuscripts Librarian George Washington University Washington, DC Sylvia Augusteijn Project Archivist George Washington University

Download ppt "W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington."

Similar presentations

Ads by Google