W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington.

W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington University SAA Chicago Session #801 September 1, 2007

W HY M ANAGE WITH A D ATABASE ? Scale Centralized management Access Reusability Rearrange-ability

R EAL D RIVERS OF C HANGE Demand for item level access Born Digital content Digitized content Researcher demands and expectations

M ANY INPUTS, M ANY OUTPUTS Physica l object Storage location Digital object Storage location

O BJECTS AND A TTRIBUTES I belong to a collection I belong to a series I came from somewhere I am an image I am a certain file format(s) I am about something(s) I am green, blue, and brown

Project Ungava, National Research Council of Canada C LUSTERING

Bungee View: http://cityscape.inf.cs.cmu.edu/bungee/ V ISUALIZATION

I WANT WHAT I WANT …

A C ULTURAL S HIFT General SpecificAssociation Object

E XTEND I NTEROPERABILITY Descriptive standards at the item level

M ANAGE FROM THE BOTTOM UP Items and attributes Create associations, implicit and explicit

PRODUCTIVITY APPROACH TO PROCESSING, MANAGEMENT, AND ACCESS Automate metadata creation Metadata extraction Pre-populate metadata fields using default and automatically generated terms Stop writing extensive biographical and historical notes Automate digital content creation

U SE THE POWER OF DATABASE TOOLS Ingest tools discussed above Export templates for: MARC EAD Various XML schemas for item level export: MARCXML, DC, TEI, VRA etc.

L EVERAGE USE OF DIGITAL REPOSITORIES We don’t have to be self-sufficient Outsource low-level functions Mass storage Backup

C REATE PARTNERSHIPS Computer scientists Librarians Academic technologists

G ET INTO MAINSTREAM DISCOVERY TOOLS G ET “ INTO THE F LOW ” Can everyone say Google MySpace YouTube Facebook

C REATE ACCESS TOOLS BASED ON USER NEEDS Understand how all of our constituencies seek information and use information Make our tools reflect these behaviors. When those behaviors change, our tools should change with them.

N EW S KILLS FOR THE D IGITAL E RA Jennifer King George Washington University

R E : DISCOVERY MAIN PAGE

R E : DISCOVERY FOR I NTERNET SEARCH

RFI AND FINDING AID

From Document To Database Sylvia Augusteijn George Washington University Special Collections and University Archives SAA session 801 September 1, 2007

Out from the binders  Scope and content notes, series descriptions simple to cut and paste into Re:Discovery  Cut and paste not feasible for thousands of item-level records  “Container list” project is born  Goal: to separate elements of each item name (number, title, date) so Re:Discovery could import them into their respective fields

Container lists  Each item has a number, title, and date, but formats vary slightly in punctuation or spacing Ways of writing the same name: 1. Correspondence, 1950-57 I. Correspondence – 1950-1957 i. correspondence 1950 to 1957  Naming conventions generally consistent within each finding aid  How to automate?

Automation, part 1: Delimiting the text  Container lists saved in a text editor (TextPad)  Delimiters are special characters placed within the text to separate the elements  We chose * to signal the beginning and end of each field and % to signal the boundary between fields  Item as it appears in text of finding aid: 1. Correspondence, 1950-57  Item with delimiters inserted: *1*%*Correspondence*%*1950-57*

Delimiting the text (continued)  Re:Discovery can import directly from the text editor, with instructions  Instructions to Re:Discovery: the first element of this name will be the number, the second will be the title, the third will be the date *1*%*Correspondence*%*1950-57*  How to add these delimiters to thousands of item records?

Automation, part 2: Regular expressions  A regular expression is a string that uses special characters (such as \ + $ ^ ]) to describe and match patterns of text within a document

Regular expressions (continued)  First used regular expressions to search through our text for anything formatted like an item (i.e. to search for a pattern in which an item number is followed by a title and date)  Then used regular expressions to insert our delimiters in between those elements To turn a page of this: 1. Journals, 1950-60 2. Photographs, 1970-80 3. Postcards, 1940-50 Into a page of this: *00001*%*Journals*%*1950-60* *00002*%*Photographs*%*1970-80* *00003*%*Postcards*%*1940-50*

Examples of regular expressions To turn 1. Correspondence, 1950-1957 into *00001*%*Correspondence, 1950-1957 Find: $[0-9]$. (find any digit followed by a period) Replace: *0000\1*%* (replace with *, four zeroes, that digit and *%*) Then to turn *00001*%*Correspondence, 1950-1957 into *00001*%*Correspondence*%*1950-1957 Find:, $[0-9]\{4\}$ (find any four-digit number preceded by a comma and space) Replace: *%*\1 (replace the comma and space with *%*)

Challenges  Tweaking expressions slightly for each new container list  Writing the wrong expression and accidentally replacing the wrong text  Failing to export correctly to Re:Discovery due to small number of missing delimiters

Re:Discovery and beyond  Delimited text exported into Re:Discovery  From Re:Discovery, easy creation of EAD finding aids using a template  To date: 257 collections in Re:Discovery (and EAD finding aids on the web) 0 binders

C ONTACT I NFORMATION : Greg Colati Digital Initiatives Coordinator University of Denver greg.colati@du.edu Jennifer King Manuscripts Librarian George Washington University Washington, DC Jenking@gw.edu Sylvia Augusteijn Project Archivist George Washington University augusteijn@gelman.gwu.edu

W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington.

Similar presentations

Presentation on theme: "W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington.

Similar presentations

Presentation on theme: "W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington."— Presentation transcript:

Similar presentations

About project

Feedback