Digital Curation: Round One

Slides:



Advertisements
Similar presentations
Presented by Alan R. Bailey Hazel J. Walker Teaching Resources Center
Advertisements

Evaluating JPEG2000 for Cultural Heritage Organizations Gretchen Gueguen Digital Archivist 9/13/2011 Evaluating JPEG2000 for Cultural Heritage Organizations.
Home-Grown Digital Library System Built Upon Open Source XML Technologies and Metadata Standards David Lacy Villanova University
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.
Curating Research: problems and policy Dale Peters Scientific Technical Manager DRIVER II.
Richard Jones, Systems Developer, Edinburgh University Library DSpace Ingest Workflows Workshop 13 th – 15 th October 2004.
A centre of expertise in data curation and preservation DCC/NeSC eScience Workshop, June 2008 Working in partnership with the eScience community This work.
… because good research needs good data DAF at KeepIt Digital preservation tools for repositories, 19/01/10, Southampton Funded by: This work is licensed.
A centre of expertise in digital information management UKOLN is supported by: Serving Digital Citizens : Public Libraries in the 21stC?
A centre of expertise in digital information management UKOLN is supported by: Research Data & Institutions Roles & Responsibilities? Dr.
A centre of expertise in digital information management UKOLN is supported by: Digital Futures for MLAs? A snapshot in real time. Dr Liz.
A centre of expertise in digital information management UKOLN is supported by: UKOLN Update on Selected Activities Dr Liz Lyon, Director,
A centre of expertise in digital information management UKOLN is supported by: Memory institutions and the social fabric of the Web Dr.
OAI and Publishers metadata Using the static repositories approach to disclose small journals.
Can We Talk? MICHAEL Conference London May 23, 2008Joyce Ray.
A centre of expertise in data curation and preservation DCC Workshop: Curating sApril 24 – 25, 2006 Funded by: This work is licensed under the Creative.
A centre of expertise in data curation and preservation UKOLN Open ForumIWMW June 2006 Funded by: This work is licensed under the Creative Commons.
A centre of expertise in data curation and preservation National FoI Group Birmingham07 March 2007 Funded by: This work is licensed under the Creative.
A centre of expertise in data curation and preservation SoA Annual Conference::York::August 2008 Funded by: This work is licensed under the Creative Commons.
A centre of expertise in data curation and preservation CETIS MDR SIG::28 June 2006::University of Bath Funded by: This work is licensed under the Creative.
1 PORTO Open Repository Publications TORINO Technical architecture of U-GOV Pubblications Archive and PORTO Open Repository Publications Maddalena Morando.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
Columbia University Libraries / Information Services Digital Asset Management Digital Preservation Digital Publishing Stephen Davis, October 28, 2010.
IIM WA Branch Event - 21 September 2007 Classification in an ECM World ….
PREMIS: To Be or Not To Be in My METS The Preservation Journey at the University of Connecticut Libraries ALA Annual 2013 ALCTS PARS Intellectual Access.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
BORN DIFFICULT? MICHELLE LIGHT Director, UNLV Libraries Special Collections (formerly Head of Special Collections, Archives, and Digital Scholarship at.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
A centre of expertise in data curation and preservation MIS Seminar :: University of Edinburgh :: 2 October 2006 Funded by: This work is licensed under.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Please Describe Data ingestion. This includes support for real-time sensor data (object ring buffers) as well as simulation output (grid portals) –We have.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.
T.Sharon-A.Frank 1 Internet Resources Discovery (IRD) Definition of Digital Libraries.
Museum-Library Digital Project Collaboration and the K–12 Community Emily Gore (NC ECHO) – Linda Teel (ECU) – Lynn.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
Sheet Music Consortium: Tools for Data Providers Jenn Riley Head, Carolina Digital Library and Archives The University of North Carolina at Chapel Hill.
University of North Texas Enhancing the Quality of Metadata: Modular Approach to Digital Resource Lifecycle Management Daniel Gelaw Alemneh & Mark E. Phillips.
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
The DiVA System: Current Status and Ongoing Development Uwe Klosa Electronic Publishing Centre, Uppsala University, Sweden Eva Müller.
The Portal to Texas History: Harnessing Technology to Enable Collaboration with Small Museums and Libraries CNI, December 6, 2005 Cathy Nelson Hartman.
Metadata Handling in the North Carolina Geospatial Data Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives Rob Farrell Geospatial.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
S YCAMORE S CHOLARS ISU Institutional Repository.
UVa's Digital Library CSG - September 2005 Slides courtesy of: Leslie Johnston Director, Digital Access Services, UVA Library Tim Sigmon University of.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Collection and Preservation of At- Risk Digital Geospatial Data: North Carolina Geospatial Data Archiving Project (NDIIPP Partnership) Steve Morris Head.
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
Habing1 Integrating PREMIS and METS PREMIS Tutorial Implementers’ Panel June 21, 2007, 9:00-5:30 Library of Congress, Jefferson Building, Whittall.
Introduction to metadata
Data in the NEES Data Repository Conditions for Current and Future Use and Re-Use Quake Summit 2012, Boston, Massachusetts July 12, 2012 Stanislav Pejša.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
NCSU Libraries 13 June 2006 JCDL 2006 NDIIPP Preservation Network: Progress, Problems, and Promise Jim Tuttle, Geospatial Data Librarian.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
NLW. Object Classes Class 1  1 MARC Record  1 Image  No METS Class 2  1 MARC Record  Many images  No METS Class 3  1 MARC Record  Many.
WEB Access of Library Content YooLib WEB Access of Library Content YooLib ….and what is Hyperbook? Michael Maxwell Director, Worldwide Sales Kirtas Technologies,
FACES General Overview ViRR (Virtueller Raum Reichsrecht) Software Solutions Kristina Büchner and Bastien Saquet Contact:Kristina Buechner:
Repository-specific Spoke Scripts Content Repository JSR-170/283 Content Repository for Java Technology API Normalized H&S METS Files METS Import/ExportMETS.
DAITSS: Dark Archive in the Sunshine State
Integrating PREMIS and METS
EFETAC and NEMAC Collaboration
Metadata to fit your needs... How much is too much?
Presentation transcript:

Digital Curation: Round One Digital Curation: Round One by Gretchen Gueguen is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Digital Curation: Round One Hi, thanks for coming today. My name is Gretchen Gueguen, I’m currently the Digital Archivist at the University of Virginia, where I’ve been since May 2nd. But today I’m going to talk with you about a project I worked on earlier in the year, and through the past 3 years before that at East Carolina University. Gretchen Gueguen University of Virginia Formerly of East Carolina University

Agenda What is Digital Curation The Digital Curation Lifecycle Model Case Study: The Eastern North Carolina Digital Library, 2003-2011 So here is a little overview of what I’ll cover: First we’ll talk about what Digital Curation is Then I’m going to talk with you about one particular model: The Digital Curation Lifecycle Model And then finally, we’ll take a closer look at The Eastern North Carolina Digital Library

What is Digital Curation? Preservation Collection So, what is “Digital Curation”… The Digital Curation Center, a UK group that does a lot of work in this area, has defined this term as “the selection, preservation, maintenance, collection and archiving of digital assets. So this covers the whole spectrum of actions that might be carried out with digital materials. It is more than just preservation. Actually, preservation is only part of curation… Digital curation then includes things like: Collecting verifiable digital assets Providing digital asset search and retrieval Certification of the trustworthiness and integrity of the collection content Archiving Maintenance Selection

Key elements of the DCC Curation Lifecycle Model The Digital Curation Lifecycle Model was created by the Digital Curation Centre to graphically represent in an understandable way all the processes involved in digital curation, but it is a bit difficult to follow. So I’m going to break it down for you…

Key elements of the DCC Curation Lifecycle Model Data any information in binary digital form, is at the centre of the Curation Lifecycle. We start with data: The donut hole, as it were; the center of the solar system. Data is what it’s all about: digital objects, databases, datasets…it’s the stuff we are “curating”

Key elements of the DCC Curation Lifecycle Model Full Lifecycle Actions Description and Representation Information Preservation Planning Community Watch and Participation Curate and Preserve The next ring shows the “Full Lifecycle Actions” These are the things that are continuous and ongoing. There really doesn’t have to be an order in which they occur, they don’t necessarily rely on each other, and they aren’t completed. These activities are also relevant to many different sequential actions.

Key elements of the DCC Curation Lifecycle Model Sequential Actions Conceptualise Create or Receive Appraise and Select Ingest Preservation Action Store Access, Use and Reuse Transform Next the sequential actions build on each other: These are the things that you generally do once in a lifecycle, life ingesting material or appraising it

Key elements of the DCC Curation Lifecycle Model Occasional Actions Dispose Reappraise Migrate The Occasional actions are like decision points, they interrupt the regular flow of activity and spur new action. They are related to the sequential actions in that two types of action: migration and re-appraisal come at the preservation action step and the other, disposal, is a result of the appraisal step.

The Eastern North Carolina Digital Library Case Study The Eastern North Carolina Digital Library Okay, so I’m going to move on to the actual case study now, which is related to a project I worked on at ECU, The Eastern North Carolina Digital Library.

The Eastern North Carolina Digital Library: the 4 W’s Who: Joyner Library at East Carolina University+ What: A digital library of books+ about eastern North Carolina When: 2003-2004, initial project 2004-2007, partnership project So, I’ll start with the Who, What and When. Who was a collaborative group. The project started at the J.Y. Joyner Library, the main academic library at East Carolina University. As the project grew more partners came on board from area museums mainly relating to the agricultural and social community in the area in the 18th and 19th centuries. This collaboration was made possible through a grant from NC ECHO, a statewide body that manages the distribution of LSTA funds through grant projects among other activities. What began then as a digital library of books about eastern North Carolina, grew to include museum artifacts, maps, videos, and original lesson plans created for the project. When spanned from the initial creation of the collection of digitized books in 2003 up through the creation of the joint Digital Library with all the partners in 2007

The Eastern North Carolina Digital Library: the 4 W’s Where: You are here. I mention where, eastern North Carolina, and specifically Greenville, for a particular purpose. Eastern North Carolina is a distinct cultural region within the state. During the Colonial period it was the seat of government and economically powerful. It’s counties were some of the highest producers of tobacco in the world. But today it contains some of the poorest areas in the state, with poverty rates as high as 32% in 2009. This is related to one of the primary reasons why this project was undertaken…

The Eastern North Carolina Digital Library: the 4 W’s Why: ECU is the largest university in the eastern region, serving some of the poorest and most under-served counties in the state. Material on eastern NC not widely available The expertise and interest existed in the library to create a great digital project. ECU is the largest university in the eastern region, and the third largest in the state overall. Institutionally speaking there is a large drive to provide a high level of service to the region in addition to the student and faculty population at the university. The ENCDL project provided a unique opportunity to really showcase and put at the forefront some of the treasures of the community and make them more broadly accessible across the state. In addition to this sense of mission, the expertise and interest existed in the library to work on the project. An experimental vibe had taken root there in the late nineties and the staff were really excited to take advantage of the possibilities of new media.

The Eastern North Carolina Digital Library: and 1 H How: ASP.net interface Digitization Lesson Activities Finally, the nitty gritty: Book digitization was done in-house on a Zeutschel scanner. Artifacts were professionally photographed and accompanying video descriptions created A TEI XML transcription created for every book and indexed using TextML Artifact metadata created in SQL database and exported to simple XML records to facilitate searching alongside TEI XML. This was non-standard metadata but relatively robust. Additional summaries, author biographies, and curator “tours” of artifacts were also created The Site interface created using ASP.NET Lesson activities created using the repository of books and artifacts were added to site interface Transcription Metadata

So here is just a quick overview of the site as it appeared when the project was completed in 2007… this is the home page with access to different formats and methods for browsing.

You could browse a list of the books by title

Or, as geography is considered a very important access point for this collection, you could browse through all items by county.

When viewing an artifact there was an accompanying video and short narrative description (a voice-over in the video)

Books had a similar record page, and the text can be viewed in a single page with transcription and image

Or read in a flash-based book viewer created using a tool called Zoomify

The classroom activities included lesson plans as well as lists of materials by reading level and alignment with the North Carolina Standard Course of Study…the state curriculum guidelines for K-12 education…

Case Study: The Eastern North Carolina Digital Library The End. So, basically we thought that was that in 2007 ..until …until

Case Study: The Eastern North Carolina Digital Library 2008-2009, creation of Joyner Library Digital Collections, a sister repository more broad in scope 2010-2011, migration of ENCDL into JLDC I started at ECU in 2008 and the first task I set out on with Digital Collections was to create a more broadly based repository service, called imaginatively Joyner Library Digital Collections. The idea was that this repository would be the general infrastructure and backbone for future projects. It would be a platform to serve many different kinds of objects from many different kinds of collections. We knew from the beginning that ENCDL should become part of the repository, but we didn’t really embark on it until the system was pretty robust in 2010.

Just for comparison’s sake, here’s a quick overview of the repository Just for comparison’s sake, here’s a quick overview of the repository. This is the home page

This is one of the generic “collections” which is really just a link to results from a specific search.

This is an example of the home page of a collection within the repository. Just an extra static page with some information about the materials in the collection.

And here is a specific item, in this case a diary. So what you can see is that the repository is pretty generic and stripped down. It’s built to suit pretty much everything, but be flexible enough to give us some personalization as was seen in the last slide with the particular collection home page.

Comparing ENCDL JLDC TextML / ASP.net Non-standard metadata (aside from TEI transcriptions) Two basic material types Non-standard filenaming Significant supplementary documentation for each object Text and Image/artifact in different search and browse Extensive web-presence with educational activities TextML / ASP.net Metadata standards (METS, MODS, MIX, TEI) Variety of materials Each object has Persistent Identifier (PID) and consistent filenaming Full-repository search Basic web-presence, but robust searching tools It wasn’t just the user experience that had updated either. While we continued to use some of the same basic tools like TextML and ASP.net, we used more standardized metadata, we supported a wider variety of formats including audio, as well as a variety of texts, manuscripts, photographs, and artworks. We implemented the idea of persistent identifiers and focused on some new (at that time) ideas in search functionality like cross-collection searching and faceted search results.

Digital Curation Round One… So, next I’m going to use the lifecycle model to frame our migration planning and implementation process.

Applying the lifecycle model Community Watch and Participation What are the common standards endorsed by our community? JPEG2000 EPUB PREMIS NC ECHO’s PMDO Flash HTML 5 Curate and Preserve What are the standards that will best fit our curation and preservation needs? To begin with we were actively involved in Community Watch and Participation to determine the best practice standards endorsed by our community. We are also involved in determining the best ways to curate and preserve the project as well as standards for the JLDC repository. We investigated many new standards for both digital objects and metadata including: JPEG2000 EPUB PREMIS NC ECHO’s PMDO Flash HTML 5 Our analysis of these standards took into account, cost, fit with the repository architecture, community support, and long-term benefits

Applying the lifecycle model Preservation Planning What actions are in the best long-term interest of the ENCDL? JLDC? Meetings with stakeholders Web analytics Reproduction requests Review of infrastructure Migration Digital objects, metadata, web application So as part of preservation planning (i.e. determining how best to care for these materials in the long term (in the dark blue)), we carried out several activities to formulate a plan. We then initiated a preservation action, namely: the migration action was initiated, which will involve a transformation

Applying the lifecycle model Create a new collection in the repository Create a “PID” for each digital object into the repository Create METS/MODS/MIX/TEI/PREMIS record for each Incorporate supplemental metadata Create new PREMIS records for each Create new hybrid object type for image + video JPEG2000 for all images pdf and epub for books The analysis done in the prior steps helped us to formulate a basic plan for migration: We would create a new collection in the repository for this material. Each item would then have a persistent URL in the new system that we could redirect the old link to. A new metadata record meeting our new standards would be mapped. Supplemental information would be incorporated both within the metadata and through links to external documents. I note here that part of the migration process involved the decision to adopt a new standard across the repository for preservation metadata: PREMIS We needed to create a brand new technical “type” of object to support the museum artifacts: image + video We also decided to adopt the JPEG2000 standard. The reasons we went in this direction were the smaller file size and zooming without flash and “tiling” that JPEG2000 afforded We also adopted pdf and epub as download formats.

Applying the lifecycle model JPEG2000 Complicated algorithms Inadequate software Web application development with Kakadu Presentation copies only at this time. Metadata PREMIS Mapping and scripting multiple times Repository structure Modeling new object types Functional requirements for UI and metadata Use case scenarios in ENCDL mapped to JLDC Now we can return to the lifecycle model to start with actions, Create, appraise, Ingest, Preservation Action, and Store steps are where the bulk of the work lies: Really, this part could be a presentation in and of itself: To just give you an idea of what goes on in these steps, we had three basic tasks: First the use of JPEG2000. It’s a complicated standard with really flexible compression algorithms and we had to do a lot of tests to determine the options we wanted to use. Once we decided on that the conversion and quality assurance could take anywhere from an hour to several days depending on the size of the book. In addition, JPEG200 isn’t natively supported by any web browsers, so we had to implement an open source delivery application call kakadu. At this time, we have only created JPEG2000 files for the derivative copies, so we are still storing 5TB of Tiff masters. For future digitization, we will use JPEG2000 as the standard. In terms of metadata work, we needed to adapt the JHOVE tool to extract and create valid PREMIS metadata. We also had to do several rounds of mapping and scripting. For the book collection we mapped from MARC to MODS, we also had to integrate the additional summaries and biographies. For the artifacts we had to map from our homegrown XML schema to MODS. Finally, the repository structure had to be adapted to suit the new materials. We had new combination objects of image plus video for the artifacts. We also had to develop some new features in the user interface, so we created use case scenarios from the existing site and mapped out the functions in the new repository.

Applying the lifecycle model Access, Use and Reuse Recreate the book viewer using JPEG2000 Create subject and map browse for the entire repository Recreate ENCDL pages with repository’s stylesheet When the site is fully migrated, we will find ourselves in the last stage, Access, Use and Reuse. From the UI end we could now recreate the book viewer using the JPEG2000 files We recreated a subject and map-based browse, which we could then use across the entire repository And we also recreated the static HTML pages with the repository’s stylesheet

So this is where we are with the migration at this point So this is where we are with the migration at this point. This is on our test server at the moment, and not everything is 100% operational, but you can see that the same major areas are present here, but we used the migration as an opportunity to simplify the navigation a bit.

The same navigational browsing capabilities are present, they are just organized together more logically

We recreated the county browse, and in fact, this tool is now being used across the entire repository.

This is an example of the title browse This is an example of the title browse. Whereas previously books and artifacts had to be browsed separately, now they can be browsed together. As you can see it is the same results stylesheet as is used for the rest of the repository, but the default sort is alphabetically by title.

This is an example of one of the book records This is an example of one of the book records. It may be difficult to see here, but there are links to open up the author biography and abstract in the metadata record, and when the derivative is prepared there will also be a link to download a copy in pdf format

This is the new bookviewer that utilizes JPEG2000 for native zooming This is the new bookviewer that utilizes JPEG2000 for native zooming. The book reader offers a full-text search within the book in the upper right, the results show up below that, with links to each page with a valid result.

And finally, this is an example of the new image+video object And finally, this is an example of the new image+video object. Clicking on the video tab along the top, opens up the embedded video in the same space where the image now resides.

The End Curation Preservation Community Watch and participation So, once the plan is fully carried out, we will finally be finished… …except that those continuous actions shown in the lifecycle model: Updating and fine-tuning description Preservation planning Community Watch And Curation and Preservation Will continue and will most likely inspire future projects.

What Have We Learned? Many of us will eventually need to migrate not just data, but collections and “experiences” into other repositories. Digital Curation Lifecycle Model can help us think through Curation activities and evaluate them. The Lifecycle Model is not linear, nor will our activities be. The Lifecycle Model is not finite, but iterative. So, to finish, I thought I’d take the opportunity to solidify what I think are the take-away points. First, the realization that when we migrate data, we will also need to think about collections and “experiences” or the way that data is used and reused was key Second, I hope I’ve shown that the Digital Curation Lifecycle Model can help us think through Curation activities and evaluate them Finally, we realize that both the model and our curation work is neither linear nor finite.

Thanks! East Carolina University At-Large In Spirit… Emily Gore Michael Reece Joe Barricella Justin Tew Mark Custer Maury York John Lawrence Linda Teel Hazel Walker At-Large Emily Gore Justin Vaughn Amy Chiles In Spirit… Chuck Jones

Web: http://www.gretchengueguen.com Contacts Gretchen Gueguen Email: gmg2n@virginia.edu Web: http://www.gretchengueguen.com Eastern North Carolina Digital Library http://digital.lib.ecu.edu/historyfiction Joyner Library Digital Collections http://digital.lib.ecu.edu