Download presentation
Presentation is loading. Please wait.
Published byConor Anker Modified over 9 years ago
1
W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington University SAA Chicago Session #801 September 1, 2007
3
W HY M ANAGE WITH A D ATABASE ? Scale Centralized management Access Reusability Rearrange-ability
4
R EAL D RIVERS OF C HANGE Demand for item level access Born Digital content Digitized content Researcher demands and expectations
5
M ANY INPUTS, M ANY OUTPUTS Physica l object Storage location Digital object Storage location
6
O BJECTS AND A TTRIBUTES I belong to a collection I belong to a series I came from somewhere I am an image I am a certain file format(s) I am about something(s) I am green, blue, and brown
8
Project Ungava, National Research Council of Canada C LUSTERING
9
Bungee View: http://cityscape.inf.cs.cmu.edu/bungee/ V ISUALIZATION
10
© 2007 Gregory C. Colati C ONTEXTUALIZE THE R ESOURCE The Encyclopedia of Chicago http://www.encyclopedia.chicagohistory.org/
11
I WANT WHAT I WANT …
12
A C ULTURAL S HIFT General SpecificAssociation Object
13
E XTEND I NTEROPERABILITY Descriptive standards at the item level
14
M ANAGE FROM THE BOTTOM UP Items and attributes Create associations, implicit and explicit
15
PRODUCTIVITY APPROACH TO PROCESSING, MANAGEMENT, AND ACCESS Automate metadata creation Metadata extraction Pre-populate metadata fields using default and automatically generated terms Stop writing extensive biographical and historical notes Automate digital content creation
17
U SE THE POWER OF DATABASE TOOLS Ingest tools discussed above Export templates for: MARC EAD Various XML schemas for item level export: MARCXML, DC, TEI, VRA etc.
18
L EVERAGE USE OF DIGITAL REPOSITORIES We don’t have to be self-sufficient Outsource low-level functions Mass storage Backup
19
C REATE PARTNERSHIPS Computer scientists Librarians Academic technologists
20
G ET INTO MAINSTREAM DISCOVERY TOOLS G ET “ INTO THE F LOW ” Can everyone say Google MySpace YouTube Facebook
21
C REATE ACCESS TOOLS BASED ON USER NEEDS Understand how all of our constituencies seek information and use information Make our tools reflect these behaviors. When those behaviors change, our tools should change with them.
22
N EW S KILLS FOR THE D IGITAL E RA Jennifer King George Washington University
23
R E : DISCOVERY MAIN PAGE
24
R E : DISCOVERY FOR I NTERNET SEARCH
25
RFI AND FINDING AID
26
From Document To Database Sylvia Augusteijn George Washington University Special Collections and University Archives SAA session 801 September 1, 2007
27
Out from the binders Scope and content notes, series descriptions simple to cut and paste into Re:Discovery Cut and paste not feasible for thousands of item-level records “Container list” project is born Goal: to separate elements of each item name (number, title, date) so Re:Discovery could import them into their respective fields
28
Container lists Each item has a number, title, and date, but formats vary slightly in punctuation or spacing Ways of writing the same name: 1. Correspondence, 1950-57 I. Correspondence – 1950-1957 i. correspondence 1950 to 1957 Naming conventions generally consistent within each finding aid How to automate?
29
Automation, part 1: Delimiting the text Container lists saved in a text editor (TextPad) Delimiters are special characters placed within the text to separate the elements We chose * to signal the beginning and end of each field and % to signal the boundary between fields Item as it appears in text of finding aid: 1. Correspondence, 1950-57 Item with delimiters inserted: *1*%*Correspondence*%*1950-57*
30
Delimiting the text (continued) Re:Discovery can import directly from the text editor, with instructions Instructions to Re:Discovery: the first element of this name will be the number, the second will be the title, the third will be the date *1*%*Correspondence*%*1950-57* How to add these delimiters to thousands of item records?
31
Automation, part 2: Regular expressions A regular expression is a string that uses special characters (such as \ + $ ^ ]) to describe and match patterns of text within a document
32
Regular expressions (continued) First used regular expressions to search through our text for anything formatted like an item (i.e. to search for a pattern in which an item number is followed by a title and date) Then used regular expressions to insert our delimiters in between those elements To turn a page of this: 1. Journals, 1950-60 2. Photographs, 1970-80 3. Postcards, 1940-50 Into a page of this: *00001*%*Journals*%*1950-60* *00002*%*Photographs*%*1970-80* *00003*%*Postcards*%*1940-50*
33
Examples of regular expressions To turn 1. Correspondence, 1950-1957 into *00001*%*Correspondence, 1950-1957 Find: \([0-9]\). (find any digit followed by a period) Replace: *0000\1*%* (replace with *, four zeroes, that digit and *%*) Then to turn *00001*%*Correspondence, 1950-1957 into *00001*%*Correspondence*%*1950-1957 Find:, \([0-9]\{4\}\) (find any four-digit number preceded by a comma and space) Replace: *%*\1 (replace the comma and space with *%*)
34
Challenges Tweaking expressions slightly for each new container list Writing the wrong expression and accidentally replacing the wrong text Failing to export correctly to Re:Discovery due to small number of missing delimiters
35
Re:Discovery and beyond Delimited text exported into Re:Discovery From Re:Discovery, easy creation of EAD finding aids using a template To date: 257 collections in Re:Discovery (and EAD finding aids on the web) 0 binders
36
C ONTACT I NFORMATION : Greg Colati Digital Initiatives Coordinator University of Denver greg.colati@du.edu Jennifer King Manuscripts Librarian George Washington University Washington, DC Jenking@gw.edu Sylvia Augusteijn Project Archivist George Washington University augusteijn@gelman.gwu.edu
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.