Digital Curation: Round One

1 Digital Curation: Round One
Digital Curation: Round One by Gretchen Gueguen is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Digital Curation: Round One Hi, thanks for coming today. My name is Gretchen Gueguen, I’m currently the Digital Archivist at the University of Virginia, where I’ve been since May 2nd. But today I’m going to talk with you about a project I worked on earlier in the year, and through the past 3 years before that at East Carolina University. Gretchen Gueguen University of Virginia Formerly of East Carolina University

2 Agenda What is Digital Curation The Digital Curation Lifecycle Model
Case Study: The Eastern North Carolina Digital Library, So here is a little overview of what I’ll cover: First we’ll talk about what Digital Curation is Then I’m going to talk with you about one particular model: The Digital Curation Lifecycle Model And then finally, we’ll take a closer look at The Eastern North Carolina Digital Library

What is Digital Curation?
Preservation Collection So, what is “Digital Curation”… The Digital Curation Center, a UK group that does a lot of work in this area, has defined this term as “the selection, preservation, maintenance, collection and archiving of digital assets. So this covers the whole spectrum of actions that might be carried out with digital materials. It is more than just preservation. Actually, preservation is only part of curation… Digital curation then includes things like: Collecting verifiable digital assets Providing digital asset search and retrieval Certification of the trustworthiness and integrity of the collection content Archiving Maintenance Selection

Key elements of the DCC Curation Lifecycle Model
The Digital Curation Lifecycle Model was created by the Digital Curation Centre to graphically represent in an understandable way all the processes involved in digital curation, but it is a bit difficult to follow. So I’m going to break it down for you…

Key elements of the DCC Curation Lifecycle Model
Data any information in binary digital form, is at the centre of the Curation Lifecycle. We start with data: The donut hole, as it were; the center of the solar system. Data is what it’s all about: digital objects, databases, datasets…it’s the stuff we are “curating”

Key elements of the DCC Curation Lifecycle Model
Full Lifecycle Actions Description and Representation Information Preservation Planning Community Watch and Participation Curate and Preserve The next ring shows the “Full Lifecycle Actions” These are the things that are continuous and ongoing. There really doesn’t have to be an order in which they occur, they don’t necessarily rely on each other, and they aren’t completed. These activities are also relevant to many different sequential actions.

Key elements of the DCC Curation Lifecycle Model
Sequential Actions Conceptualise Create or Receive Appraise and Select Ingest Preservation Action Store Access, Use and Reuse Transform Next the sequential actions build on each other: These are the things that you generally do once in a lifecycle, life ingesting material or appraising it

Key elements of the DCC Curation Lifecycle Model
Occasional Actions Dispose Reappraise Migrate The Occasional actions are like decision points, they interrupt the regular flow of activity and spur new action. They are related to the sequential actions in that two types of action: migration and re-appraisal come at the preservation action step and the other, disposal, is a result of the appraisal step.

The Eastern North Carolina Digital Library
Case Study The Eastern North Carolina Digital Library Okay, so I’m going to move on to the actual case study now, which is related to a project I worked on at ECU, The Eastern North Carolina Digital Library.

The Eastern North Carolina Digital Library: the 4 W's
Who: Joyner Library at East Carolina University+ What: A digital library of books+ about eastern North Carolina When: , initial project , partnership project So, I’ll start with the Who, What and When. Who was a collaborative group. The project started at the J.Y. Joyner Library, the main academic library at East Carolina University. As the project grew more partners came on board from area museums mainly relating to the agricultural and social community in the area in the 18th and 19th centuries. This collaboration was made possible through a grant from NC ECHO, a statewide body that manages the distribution of LSTA funds through grant projects among other activities. What began then as a digital library of books about eastern North Carolina, grew to include museum artifacts, maps, videos, and original lesson plans created for the project. When spanned from the initial creation of the collection of digitized books in 2003 up through the creation of the joint Digital Library with all the partners in 2007

The Eastern North Carolina Digital Library: the 4 W's
Where: You are here. I mention where, eastern North Carolina, and specifically Greenville, for a particular purpose. Eastern North Carolina is a distinct cultural region within the state. During the Colonial period it was the seat of government and economically powerful. It’s counties were some of the highest producers of tobacco in the world. But today it contains some of the poorest areas in the state, with poverty rates as high as 32% in 2009. This is related to one of the primary reasons why this project was undertaken…

The Eastern North Carolina Digital Library: the 4 W's
Why: ECU is the largest university in the eastern region, serving some of the poorest and most under-served counties in the state. Material on eastern NC not widely available The expertise and interest existed in the library to create a great digital project. ECU is the largest university in the eastern region, and the third largest in the state overall. Institutionally speaking there is a large drive to provide a high level of service to the region in addition to the student and faculty population at the university. The ENCDL project provided a unique opportunity to really showcase and put at the forefront some of the treasures of the community and make them more broadly accessible across the state. In addition to this sense of mission, the expertise and interest existed in the library to work on the project. An experimental vibe had taken root there in the late nineties and the staff were really excited to take advantage of the possibilities of new media.

The Eastern North Carolina Digital Library: and 1 H
How: interface Digitization Lesson Activities Finally, the nitty gritty: Book digitization was done in-house on a Zeutschel scanner. Artifacts were professionally photographed and accompanying video descriptions created A TEI XML transcription created for every book and indexed using TextML Artifact metadata created in SQL database and exported to simple XML records to facilitate searching alongside TEI XML. This was non-standard metadata but relatively robust. Additional summaries, author biographies, and curator “tours” of artifacts were also created The Site interface created using ASP.NET Lesson activities created using the repository of books and artifacts were added to site interface Transcription Metadata

14 So here is just a quick overview of the site as it appeared when the project was completed in 2007… this is the home page with access to different formats and methods for browsing.

15 You could browse a list of the books by title

16 Or, as geography is considered a very important access point for this collection, you could browse through all items by county.

17 When viewing an artifact there was an accompanying video and short narrative description (a voice-over in the video)

18 Books had a similar record page, and the text can be viewed in a single page with transcription and image

19 Or read in a flash-based book viewer created using a tool called Zoomify

20 The classroom activities included lesson plans as well as lists of materials by reading level and alignment with the North Carolina Standard Course of Study…the state curriculum guidelines for K-12 education…

Case Study: The Eastern North Carolina Digital Library
Case Study: The Eastern North Carolina Digital Library
, creation of Joyner Library Digital Collections, a sister repository more broad in scope , migration of ENCDL into JLDC I started at ECU in 2008 and the first task I set out on with Digital Collections was to create a more broadly based repository service, called imaginatively Joyner Library Digital Collections. The idea was that this repository would be the general infrastructure and backbone for future projects. It would be a platform to serve many different kinds of objects from many different kinds of collections. We knew from the beginning that ENCDL should become part of the repository, but we didn’t really embark on it until the system was pretty robust in 2010.

23 Just for comparison’s sake, here’s a quick overview of the repository
Just for comparison’s sake, here’s a quick overview of the repository. This is the home page

24 This is one of the generic “collections” which is really just a link to results from a specific search.

25 This is an example of the home page of a collection within the repository. Just an extra static page with some information about the materials in the collection.

26 And here is a specific item, in this case a diary.
So what you can see is that the repository is pretty generic and stripped down. It’s built to suit pretty much everything, but be flexible enough to give us some personalization as was seen in the last slide with the particular collection home page.

Comparing ENCDL JLDC
Non-standard metadata (aside from TEI transcriptions) Two basic material types Non-standard filenaming Significant supplementary documentation for each object Text and Image/artifact in different search and browse Extensive web-presence with educational activities TextML / Metadata standards (METS, MODS, MIX, TEI) Variety of materials Each object has Persistent Identifier (PID) and consistent filenaming Full-repository search Basic web-presence, but robust searching tools It wasn’t just the user experience that had updated either. While we continued to use some of the same basic tools like TextML and, we used more standardized metadata, we supported a wider variety of formats including audio, as well as a variety of texts, manuscripts, photographs, and artworks. We implemented the idea of persistent identifiers and focused on some new (at that time) ideas in search functionality like cross-collection searching and faceted search results.

Digital Curation Round One…
So, next I’m going to use the lifecycle model to frame our migration planning and implementation process.

Applying the lifecycle model
Community Watch and Participation What are the common standards endorsed by our community? JPEG2000 EPUB PREMIS NC ECHO’s PMDO Flash HTML 5 Curate and Preserve What are the standards that will best fit our curation and preservation needs? To begin with we were actively involved in Community Watch and Participation to determine the best practice standards endorsed by our community. We are also involved in determining the best ways to curate and preserve the project as well as standards for the JLDC repository. We investigated many new standards for both digital objects and metadata including: JPEG2000 EPUB PREMIS NC ECHO’s PMDO Flash HTML 5 Our analysis of these standards took into account, cost, fit with the repository architecture, community support, and long-term benefits

Applying the lifecycle model
Preservation Planning What actions are in the best long-term interest of the ENCDL? JLDC? Meetings with stakeholders Web analytics Reproduction requests Review of infrastructure Migration Digital objects, metadata, web application So as part of preservation planning (i.e. determining how best to care for these materials in the long term (in the dark blue)), we carried out several activities to formulate a plan. We then initiated a preservation action, namely: the migration action was initiated, which will involve a transformation

Applying the lifecycle model
Create a new collection in the repository Create a “PID” for each digital object into the repository Create METS/MODS/MIX/TEI/PREMIS record for each Incorporate supplemental metadata Create new PREMIS records for each Create new hybrid object type for image + video JPEG2000 for all images pdf and epub for books The analysis done in the prior steps helped us to formulate a basic plan for migration: We would create a new collection in the repository for this material. Each item would then have a persistent URL in the new system that we could redirect the old link to. A new metadata record meeting our new standards would be mapped. Supplemental information would be incorporated both within the metadata and through links to external documents. I note here that part of the migration process involved the decision to adopt a new standard across the repository for preservation metadata: PREMIS We needed to create a brand new technical “type” of object to support the museum artifacts: image + video We also decided to adopt the JPEG2000 standard. The reasons we went in this direction were the smaller file size and zooming without flash and “tiling” that JPEG2000 afforded We also adopted pdf and epub as download formats.

Applying the lifecycle model
JPEG2000 Complicated algorithms Inadequate software Web application development with Kakadu Presentation copies only at this time. Metadata PREMIS Mapping and scripting multiple times Repository structure Modeling new object types Functional requirements for UI and metadata Use case scenarios in ENCDL mapped to JLDC Now we can return to the lifecycle model to start with actions, Create, appraise, Ingest, Preservation Action, and Store steps are where the bulk of the work lies: Really, this part could be a presentation in and of itself: To just give you an idea of what goes on in these steps, we had three basic tasks: First the use of JPEG2000. It’s a complicated standard with really flexible compression algorithms and we had to do a lot of tests to determine the options we wanted to use. Once we decided on that the conversion and quality assurance could take anywhere from an hour to several days depending on the size of the book. In addition, JPEG200 isn’t natively supported by any web browsers, so we had to implement an open source delivery application call kakadu. At this time, we have only created JPEG2000 files for the derivative copies, so we are still storing 5TB of Tiff masters. For future digitization, we will use JPEG2000 as the standard. In terms of metadata work, we needed to adapt the JHOVE tool to extract and create valid PREMIS metadata. We also had to do several rounds of mapping and scripting. For the book collection we mapped from MARC to MODS, we also had to integrate the additional summaries and biographies. For the artifacts we had to map from our homegrown XML schema to MODS. Finally, the repository structure had to be adapted to suit the new materials. We had new combination objects of image plus video for the artifacts. We also had to develop some new features in the user interface, so we created use case scenarios from the existing site and mapped out the functions in the new repository.

Applying the lifecycle model
Access, Use and Reuse Recreate the book viewer using JPEG2000 Create subject and map browse for the entire repository Recreate ENCDL pages with repository’s stylesheet When the site is fully migrated, we will find ourselves in the last stage, Access, Use and Reuse. From the UI end we could now recreate the book viewer using the JPEG2000 files We recreated a subject and map-based browse, which we could then use across the entire repository And we also recreated the static HTML pages with the repository’s stylesheet

So this is where we are with the migration at this point
So this is where we are with the migration at this point. This is on our test server at the moment, and not everything is 100% operational, but you can see that the same major areas are present here, but we used the migration as an opportunity to simplify the navigation a bit.

35 The same navigational browsing capabilities are present, they are just organized together more logically

The same navigational browsing capabilities are present, they are just organized together more logically

37 This is an example of the title browse
This is an example of the title browse

38 This is an example of one of the book records
This is an example of one of the book records

39 This is the new bookviewer that utilizes JPEG2000 for native zooming
This is the new bookviewer that utilizes JPEG2000 for native zooming

40 And finally, this is an example of the new image+video object
And finally, this is an example of the new image+video object

The End
So, once the plan is fully carried out, we will finally be finished… …except that those continuous actions shown in the lifecycle model: Updating and fine-tuning description Preservation planning Community Watch And Curation and Preservation Will continue and will most likely inspire future projects.

42 What Have We Learned? Many of us will eventually need to migrate not just data, but collections and “experiences” into other repositories. Digital Curation Lifecycle Model can help us think through Curation activities and evaluate them. The Lifecycle Model is not linear, nor will our activities be. The Lifecycle Model is not finite, but iterative. So, to finish, I thought I’d take the opportunity to solidify what I think are the take-away points. First, the realization that when we migrate data, we will also need to think about collections and “experiences” or the way that data is used and reused was key Second, I hope I’ve shown that the Digital Curation Lifecycle Model can help us think through Curation activities and evaluate them Finally, we realize that both the model and our curation work is neither linear nor finite.

43 Thanks! East Carolina University At-Large In Spirit… Emily Gore
Michael Reece Joe Barricella Justin Tew Mark Custer Maury York John Lawrence Linda Teel Hazel Walker At-Large Emily Gore Justin Vaughn Amy Chiles In Spirit… Chuck Jones

44 Web:
Contacts Gretchen Gueguen Web: Eastern North Carolina Digital Library Joyner Library Digital Collections

