Presentation on theme: "Digital Curation: Round One"— Presentation transcript:
1 Digital Curation: Round One Digital Curation: Round One by Gretchen Gueguen is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.Digital Curation: Round OneHi, thanks for coming today. My name is Gretchen Gueguen, I’m currently the Digital Archivist at the University of Virginia, where I’ve been since May 2nd. But today I’m going to talk with you about a project I worked on earlier in the year, and through the past 3 years before that at East Carolina University.Gretchen GueguenUniversity of VirginiaFormerly of East Carolina University
2 Agenda What is Digital Curation The Digital Curation Lifecycle Model Case Study: The Eastern North Carolina Digital Library,So here is a little overview of what I’ll cover:First we’ll talk about what Digital Curation isThen I’m going to talk with you about one particular model: The Digital Curation Lifecycle ModelAnd then finally, we’ll take a closer look at The Eastern North Carolina Digital Library
3 What is Digital Curation? PreservationCollectionSo, what is “Digital Curation”…The Digital Curation Center, a UK group that does a lot of work in this area, has defined this term as “the selection, preservation, maintenance, collection and archiving of digital assets. So this covers the whole spectrum of actions that might be carried out with digital materials. It is more than just preservation. Actually, preservation is only part of curation…Digital curation then includes things like:Collecting verifiable digital assetsProviding digital asset search and retrievalCertification of the trustworthiness and integrity of the collection contentArchivingMaintenanceSelection
4 Key elements of the DCC Curation Lifecycle Model The Digital Curation Lifecycle Model was created by the Digital Curation Centre to graphically represent in an understandable way all the processes involved in digital curation, but it is a bit difficult to follow. So I’m going to break it down for you…
5 Key elements of the DCC Curation Lifecycle Model Dataany information in binary digital form, is at the centre of the Curation Lifecycle.We start with data: The donut hole, as it were; the center of the solar system.Data is what it’s all about: digital objects, databases, datasets…it’s the stuff we are “curating”
6 Key elements of the DCC Curation Lifecycle Model Full Lifecycle ActionsDescription and Representation InformationPreservation PlanningCommunity Watch and ParticipationCurate and PreserveThe next ring shows the “Full Lifecycle Actions”These are the things that are continuous and ongoing. There really doesn’t have to be an order in which they occur, they don’t necessarily rely on each other, and they aren’t completed. These activities are also relevant to many different sequential actions.
7 Key elements of the DCC Curation Lifecycle Model Sequential ActionsConceptualiseCreate or ReceiveAppraise and SelectIngestPreservation ActionStoreAccess, Use and ReuseTransformNext the sequential actions build on each other:These are the things that you generally do once in a lifecycle, life ingesting material or appraising it
8 Key elements of the DCC Curation Lifecycle Model Occasional ActionsDisposeReappraiseMigrateThe Occasional actions are like decision points, they interrupt the regular flow of activity and spur new action.They are related to the sequential actions in that two types of action: migration and re-appraisal come at the preservation action step and the other, disposal, is a result of the appraisal step.
9 The Eastern North Carolina Digital Library Case StudyThe Eastern North Carolina Digital LibraryOkay, so I’m going to move on to the actual case study now, which is related to a project I worked on at ECU, The Eastern North Carolina Digital Library.
10 The Eastern North Carolina Digital Library: the 4 W’s Who:Joyner Library at East Carolina University+What:A digital library of books+ about eastern North CarolinaWhen:, initial project, partnership projectSo, I’ll start with the Who, What and When.Who was a collaborative group. The project started at the J.Y. Joyner Library, the main academic library at East Carolina University. As the project grew more partners came on board from area museums mainly relating to the agricultural and social community in the area in the 18th and 19th centuries. This collaboration was made possible through a grant from NC ECHO, a statewide body that manages the distribution of LSTA funds through grant projects among other activities.What began then as a digital library of books about eastern North Carolina, grew to include museum artifacts, maps, videos, and original lesson plans created for the project.When spanned from the initial creation of the collection of digitized books in 2003 up through the creation of the joint Digital Library with all the partners in 2007
11 The Eastern North Carolina Digital Library: the 4 W’s Where:You are here.I mention where, eastern North Carolina, and specifically Greenville, for a particular purpose. Eastern North Carolina is a distinct cultural region within the state. During the Colonial period it was the seat of government and economically powerful. It’s counties were some of the highest producers of tobacco in the world. But today it contains some of the poorest areas in the state, with poverty rates as high as 32% in 2009.This is related to one of the primary reasons why this project was undertaken…
12 The Eastern North Carolina Digital Library: the 4 W’s Why:ECU is the largest university in the eastern region, serving some of the poorest and most under-served counties in the state.Material on eastern NC not widely availableThe expertise and interest existed in the library to create a great digital project.ECU is the largest university in the eastern region, and the third largest in the state overall. Institutionally speaking there is a large drive to provide a high level of service to the region in addition to the student and faculty population at the university. The ENCDL project provided a unique opportunity to really showcase and put at the forefront some of the treasures of the community and make them more broadly accessible across the state.In addition to this sense of mission, the expertise and interest existed in the library to work on the project. An experimental vibe had taken root there in the late nineties and the staff were really excited to take advantage of the possibilities of new media.
13 The Eastern North Carolina Digital Library: and 1 H How:ASP.net interfaceDigitizationLesson ActivitiesFinally, the nitty gritty:Book digitization was done in-house on a Zeutschel scanner. Artifacts were professionally photographed and accompanying video descriptions createdA TEI XML transcription created for every book and indexed using TextMLArtifact metadata created in SQL database and exported to simple XML records to facilitate searching alongside TEI XML. This was non-standard metadata but relatively robust. Additional summaries, author biographies, and curator “tours” of artifacts were also createdThe Site interface created using ASP.NETLesson activities created using the repository of books and artifacts were added to site interfaceTranscriptionMetadata
14 So here is just a quick overview of the site as it appeared when the project was completed in 2007… this is the home page with access to different formats and methods for browsing.
16 Or, as geography is considered a very important access point for this collection, you could browse through all items by county.
17 When viewing an artifact there was an accompanying video and short narrative description (a voice-over in the video)
18 Books had a similar record page, and the text can be viewed in a single page with transcription and image
19 Or read in a flash-based book viewer created using a tool called Zoomify
20 The classroom activities included lesson plans as well as lists of materials by reading level and alignment with the North Carolina Standard Course of Study…the state curriculum guidelines for K-12 education…
21 Case Study: The Eastern North Carolina Digital Library The End.So, basically we thought that was that in 2007..until…until
22 Case Study: The Eastern North Carolina Digital Library , creation of Joyner Library Digital Collections, a sister repository more broad in scope, migration of ENCDL into JLDCI started at ECU in 2008 and the first task I set out on with Digital Collections was to create a more broadly based repository service, called imaginatively Joyner Library Digital Collections. The idea was that this repository would be the general infrastructure and backbone for future projects. It would be a platform to serve many different kinds of objects from many different kinds of collections.We knew from the beginning that ENCDL should become part of the repository, but we didn’t really embark on it until the system was pretty robust in 2010.
23 Just for comparison’s sake, here’s a quick overview of the repository Just for comparison’s sake, here’s a quick overview of the repository. This is the home page
24 This is one of the generic “collections” which is really just a link to results from a specific search.
25 This is an example of the home page of a collection within the repository. Just an extra static page with some information about the materials in the collection.
26 And here is a specific item, in this case a diary. So what you can see is that the repository is pretty generic and stripped down. It’s built to suit pretty much everything, but be flexible enough to give us some personalization as was seen in the last slide with the particular collection home page.
27 Comparing ENCDL JLDC TextML / ASP.net Non-standard metadata (aside from TEI transcriptions)Two basic material typesNon-standard filenamingSignificant supplementary documentation for each objectText and Image/artifact in different search and browseExtensive web-presence with educational activitiesTextML / ASP.netMetadata standards (METS, MODS, MIX, TEI)Variety of materialsEach object has Persistent Identifier (PID) and consistent filenamingFull-repository searchBasic web-presence, but robust searching toolsIt wasn’t just the user experience that had updated either.While we continued to use some of the same basic tools like TextML and ASP.net,we used more standardized metadata,we supported a wider variety of formats including audio, as well as a variety of texts, manuscripts, photographs, and artworks.We implemented the idea of persistent identifiersand focused on some new (at that time) ideas in search functionality likecross-collection searchingand faceted search results.
28 Digital Curation Round One… So, next I’m going to use the lifecycle model to frame our migration planning and implementation process.
29 Applying the lifecycle model Community Watch and ParticipationWhat are the common standards endorsed by our community?JPEG2000EPUBPREMISNC ECHO’s PMDOFlashHTML 5Curate and PreserveWhat are the standards that will best fit our curation and preservation needs?To begin with we were actively involved in Community Watch and Participation to determine the best practice standards endorsed by our community. We are also involved in determining the best ways to curate and preserve the project as well as standards for the JLDC repository. We investigated many new standards for both digital objects and metadata including:JPEG2000EPUBPREMISNC ECHO’s PMDOFlashHTML 5Our analysis of these standards took into account, cost, fit with the repository architecture, community support, and long-term benefits
30 Applying the lifecycle model Preservation PlanningWhat actions are in the best long-term interest of the ENCDL? JLDC?Meetings with stakeholdersWeb analyticsReproduction requestsReview of infrastructureMigrationDigital objects, metadata, web applicationSo as part of preservation planning (i.e. determining how best to care for these materials in the long term (in the dark blue)), we carried out several activities to formulate a plan. We then initiated a preservation action, namely: the migration action was initiated, which will involve a transformation
31 Applying the lifecycle model Create a new collection in the repositoryCreate a “PID” for each digital object into the repositoryCreate METS/MODS/MIX/TEI/PREMIS record for eachIncorporate supplemental metadataCreate new PREMIS records for eachCreate new hybrid object type for image + videoJPEG2000 for all imagespdf and epub for booksThe analysis done in the prior steps helped us to formulate a basic plan for migration:We would create a new collection in the repository for this material.Each item would then have a persistent URL in the new system that we could redirect the old link to.A new metadata record meeting our new standards would be mapped.Supplemental information would be incorporated both within the metadata and through links to external documents.I note here that part of the migration process involved the decision to adopt a new standard across the repository for preservation metadata: PREMISWe needed to create a brand new technical “type” of object to support the museum artifacts: image + videoWe also decided to adopt the JPEG2000 standard.The reasons we went in this direction were the smaller file sizeand zooming without flash and “tiling” that JPEG2000 affordedWe also adopted pdf and epub as download formats.
32 Applying the lifecycle model JPEG2000Complicated algorithmsInadequate softwareWeb application development with KakaduPresentation copies only at this time.MetadataPREMISMapping and scripting multiple timesRepository structureModeling new object typesFunctional requirements for UI and metadataUse case scenarios in ENCDL mapped to JLDCNow we can return to the lifecycle model to start with actions, Create, appraise, Ingest, Preservation Action, and Store steps are where the bulk of the work lies:Really, this part could be a presentation in and of itself:To just give you an idea of what goes on in these steps, we had three basic tasks:First the use of JPEG2000. It’s a complicated standard with really flexible compression algorithms and we had to do a lot of tests to determine the options we wanted to use. Once we decided on that the conversion and quality assurance could take anywhere from an hour to several days depending on the size of the book.In addition, JPEG200 isn’t natively supported by any web browsers, so we had to implement an open source delivery application call kakadu. At this time, we have only created JPEG2000 files for the derivative copies, so we are still storing 5TB of Tiff masters. For future digitization, we will use JPEG2000 as the standard.In terms of metadata work, we needed to adapt the JHOVE tool to extract and create valid PREMIS metadata. We also had to do several rounds of mapping and scripting. For the book collection we mapped from MARC to MODS, we also had to integrate the additional summaries and biographies. For the artifacts we had to map from our homegrown XML schema to MODS.Finally, the repository structure had to be adapted to suit the new materials. We had new combination objects of image plus video for the artifacts. We also had to develop some new features in the user interface, so we created use case scenarios from the existing site and mapped out the functions in the new repository.
33 Applying the lifecycle model Access, Use and ReuseRecreate the book viewer using JPEG2000Create subject and map browse for the entire repositoryRecreate ENCDL pages with repository’s stylesheetWhen the site is fully migrated, we will find ourselves in the last stage, Access, Use and Reuse.From the UI end we could now recreate the book viewer using the JPEG2000 filesWe recreated a subject and map-based browse, which we could then use across the entire repositoryAnd we also recreated the static HTML pages with the repository’s stylesheet
34 So this is where we are with the migration at this point So this is where we are with the migration at this point. This is on our test server at the moment, and not everything is 100% operational, but you can see that the same major areas are present here, but we used the migration as an opportunity to simplify the navigation a bit.
35 The same navigational browsing capabilities are present, they are just organized together more logically
36 We recreated the county browse, and in fact, this tool is now being used across the entire repository.
37 This is an example of the title browse This is an example of the title browse. Whereas previously books and artifacts had to be browsed separately, now they can be browsed together. As you can see it is the same results stylesheet as is used for the rest of the repository, but the default sort is alphabetically by title.
38 This is an example of one of the book records This is an example of one of the book records. It may be difficult to see here, but there are links to open up the author biography and abstract in the metadata record, and when the derivative is prepared there will also be a link to download a copy in pdf format
39 This is the new bookviewer that utilizes JPEG2000 for native zooming This is the new bookviewer that utilizes JPEG2000 for native zooming. The book reader offers a full-text search within the book in the upper right, the results show up below that, with links to each page with a valid result.
40 And finally, this is an example of the new image+video object And finally, this is an example of the new image+video object. Clicking on the video tab along the top, opens up the embedded video in the same space where the image now resides.
41 The End Curation Preservation Community Watch and participation So, once the plan is fully carried out, we will finally be finished……except that those continuous actions shown in the lifecycle model:Updating and fine-tuning descriptionPreservation planningCommunity WatchAnd Curation and PreservationWill continue and will most likely inspire future projects.
42 What Have We Learned?Many of us will eventually need to migrate not just data, but collections and “experiences” into other repositories.Digital Curation Lifecycle Model can help us think through Curation activities and evaluate them.The Lifecycle Model is not linear, nor will our activities be.The Lifecycle Model is not finite, but iterative.So, to finish, I thought I’d take the opportunity to solidify what I think are the take-away points.First, the realization that when we migrate data, we will also need to think about collections and “experiences” or the way that data is used and reused was keySecond, I hope I’ve shown that the Digital Curation Lifecycle Model can help us think through Curation activities and evaluate themFinally, we realize that both the model and our curation work is neither linear nor finite.
43 Thanks! East Carolina University At-Large In Spirit… Emily Gore Michael ReeceJoe BarricellaJustin TewMark CusterMaury YorkJohn LawrenceLinda TeelHazel WalkerAt-LargeEmily GoreJustin VaughnAmy ChilesIn Spirit…Chuck Jones
44 Web: http://www.gretchengueguen.com ContactsGretchen GueguenWeb:Eastern North Carolina Digital Library Joyner Library Digital Collections