Presentation is loading. Please wait.

Presentation is loading. Please wait.

Working With Digital Archives at the Harry Ransom Center A Presentation About Processing the Digital Archives of British Playwright Arnold Wesker Metadata.

Similar presentations


Presentation on theme: "Working With Digital Archives at the Harry Ransom Center A Presentation About Processing the Digital Archives of British Playwright Arnold Wesker Metadata."— Presentation transcript:

1 Working With Digital Archives at the Harry Ransom Center A Presentation About Processing the Digital Archives of British Playwright Arnold Wesker Metadata and Digital Object Roundtable Society of American Archivists Annual Meeting 2007 Catherine Stollar Peters New York State Archives

2 Background Worked at Harry Ransom Center in Austin, Texas from 2004 to early 2007

3 Austin

4 Albany

5 Background Now work at the New York State Archives Cultural Education Center (New York State Archives)

6 In January 2007 the Ransom Center was Processing collections with electronic records Developing policies and procedures for processing electronic records Evaluating options for a Trusted Digital Repository –At the School of Information at the University of Texas at Austin –At the University Libraries at the University of Texas at Austin –Or develop institutional TDR Conducting a general electronic records survey and needs assessment (with a more thorough survey planned for the fall)

7 HRC Dspace at School of Information https://pacer.ischool.utexas.edu/handle/2081/288

8 About the Case Study

9 In January 2007 at the School of Information Dr. Patricia Galloway offering Problems in Permanent Retention of Electronic Records Course Dr. Galloway contacts Ransom Center for potential support of group projects

10 School of Information Course Three collections were processed by students during Spring 2007 semester Leon Uris Papers –Lessons in digital archeology –Limited migrated content John Crowley Papers –Standard manual processing Arnold Wesker Papers –Largely automated processing, migration, ingest procedures –Fragile media –Living author

11 School of Information Course Three collections were processed by students during Spring 2007 semester Leon Uris Papers –Lessons in digital archeology –Limited migrated content John Crowley Papers –Standard manual processing Arnold Wesker Papers –Largely automated processing, migration, ingest procedures –Fragile media –Living author

12 Arnold Wesker British playwright and author Born in London in 1932 The Four Seasons ran in March 2007 at Arcola Theatre Ransom Center maintains paper archives Works include -As Much as I Dare (autobiography) -Longitude (adaptation of Dava Sobel’s book) -Groupie -Chips with Everything

13 Automated Processing Largely automated processing, migration and ingest procedures possible because One author Similar content/materials (works, correspondence, diaries, personal files) Mostly same format (Corel WordPerfect 5.0, 9.0 and Microsoft Word 97 and 2000) Easily migrated (to RTF) Well arranged Manageable number of files (5,000 +) Readable disks (75 3.5 inch floppies and 1 zip disk)

14 Processing Issues Some files were password restricted Bank account numbers were included Encoded date fields would automatically update

15 Archival Theory Applied to Digital Materials Acquisition Create a disk catalog with all pertinent metadata Copy to a processing computer drive Appraisal Appraise for duplicates and restricted material Arrangement Arrange material according to author’s original arrangement Description Create a file catalog with the pertinent metadata Create and record checksums Extract metadata Transform metadata from NLNZ Schema to Dublin Core Preservation Migrate all of the files to a more stable format, such as Rich Text Format Make physical copies of all the files onto new media Ingest the files into DSpace Ingest the project documentation Reference Integration into paper-based finding aid

16 Archival Theory Applied to Digital Materials Acquisition Create a disk catalog with all pertinent metadata Copy to a processing computer drive Appraisal Appraise for duplicates and restricted material Arrangement Arrange material according to author’s original arrangement Description Create a file catalog with the pertinent metadata Create and record checksums Extract metadata Transform metadata from NLNZ Schema to Dublin Core Preservation Migrate all of the files to a more stable format, such as Rich Text Format Make physical copies of all the files onto new media Ingest the files into DSpace Ingest the project documentation Reference Integration into paper-based finding aid

17 Disk Catalog

18 File Catalog

19 Appraise for Duplicates Files on zip disk contained some duplicates Developed rules for removing duplicates to prevent automatic deletion of duplicate names but not duplicate files Erased duplicate files but recorded presence of duplicates in file catalog Zizasoft’s comparison software zsCompare and zsDuplicate Hunter Standard 2.31

20 Restricted Material Bank Account numbers –Investigate to see if the accounts were closed Password protected diary entries –Remove password to migrate –Place restrictions on access through DSpace instead of word processing software –Paper copy already exists and is in restricted section of stacks

21 Checksums Command line utility automatically creates checksum Jacksum is one Java checksum utility Export results to spreadsheet Compare to MD5 hash created by DSpace

22 Migrate Text to More Stable Format Chose RTF because it is widely accessible by multiple readers and it retains formatting –ODF is new and untested yet –TXT loses formatting –Microsoft Word DOC and Corel WordPerfect WPD are proprietary and accessible by few readers Used ABC Text Converter to migrate files from DOC or WPD into RTF –Used Perl script to add extensions to files to mitigate Wesker’s use of 3 digit extension

23 Create Duplicate Physical Copy Save files to CD, DVD or harddrive for extra, short-term backup copy while processing (and before ingest into Institutional Repository)

24 Extract Metadata

25 National Library of New Zealand XML

26 National Library of New Zealand XML (cont.)

27 Dublin Core XML

28 Directory Arrangement for DSpace Bulk Ingest

29

30 Automated Processes Created Perl scripts to automate processing –Modified Perl scripts from Queen’s University Library in Ontario, Canada http://library.queensu.ca/webir/qspace- project/tutorials/qspace_bulk_upload.doc –Metadata conversion script (from National Library of New Zealand Metadata Extraction Tool v 3.0) –Script to move individual xml files into individual directories –Script to create contents file for each directory –Scripts to rename files for format transformation

31 Issues with Metadata Extraction Author unreliable –Partially solved by adding code to Perl scripts to export standard author information) No subject metadata Inaccurate dates –Date created sometimes newer than date modified due to Windows file system Inaccurate titles –First line in document –Title from template Format problems when extensions are used as part of name field No recipient information (potential text mining project) Path name derived from location of file on processing computer, not original author’s system Sometimes NLNZ Metadata Extractor v 3.0 processes files with default adapter instead of actual suitable adapter Dublin Core metadata is not robust enough for digital preservation needs

32 New Zealand XML Wrong Author

33

34 Dublin Core XML

35 Ingest Created detailed ingest procedures based on Cornell’s ecommons@Cornell procedures as example DSpace instructions

36 Takeaways More automated tools Toolkit to aggregate tasks Better metadata extraction potential Support of more schemas

37 MetaTools--Investigating Metadata General Tools JISC funded grant project undertaken by the Arts and Humanities Data Service, King’s College London 18 month project, ends September 2008 Project goals –Develop a methodology for evaluating metadata generation tools –Compare the quality of currently available metadata generation tools (including NLNZ Metadata Extractor, Droid, Jhove) –Develop, test and disseminate prototype web services that integrate metadata generation tools.

38 Student Publication Lorraine Dong, Megan Durden and Sarah Kim Presented Silicon Chips with Everything: Preserving Arnold Wesker’s Digital Manuscripts at SSA 2007 https://pacer.ischool.utexas.edu/handle/2081/2322 (Look for their forthcoming publication)

39 Contact Information Catherine Stollar Peters New York State Archives Cultural Education Center Albany, New York 12230 cstollar@mail.nysed.gov (518)486-7820


Download ppt "Working With Digital Archives at the Harry Ransom Center A Presentation About Processing the Digital Archives of British Playwright Arnold Wesker Metadata."

Similar presentations


Ads by Google