Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developing a digital repository infrastructure for King’s College London RSP Training Day, 22 nd January 2009 Gareth Knight Centre for e-Research.

Similar presentations


Presentation on theme: "Developing a digital repository infrastructure for King’s College London RSP Training Day, 22 nd January 2009 Gareth Knight Centre for e-Research."— Presentation transcript:

1 Developing a digital repository infrastructure for King’s College London RSP Training Day, 22 nd January 2009 Gareth Knight Centre for e-Research

2 2 Approach 1. Analyse existing practices & limitations of current system 2. Establish requirements for Information management & access 3. Investigate alternative approaches (software choice, extensibility, applicability to your data, use by others) 4. Prototype – smaller projects and experiments

3 3 Centre for e-Research CeRch (http://www.kcl.ac.uk/iss/cerch) is:http://www.kcl.ac.uk/iss/cerch A R&D department in Information Services and Systems (ISS) that performs: Management and preservation of research outputs from KCL researchers in all disciplines Research, teaching and consultancy on e-infrastructure, data curation and preservation and others. Formerly Arts & Humanities Data Service: Executive Management and preservation of research outputs from UK researchers in arts and humanities

4 4 Context: Existing approach Formal, but manual ingest procedures ‘Bespoke’ repository for data management Not scaleable – code could not easily be reapplied to other projects. Functional limitations Preservation, provenance metadata Limited delivery systems Collection-level identifiers (mostly) Diverse, semi-structured data

5 5 Requirements Persistent identifiers down to the level of individual datastreams, accommodating compound content models Versioning of content and metadata Automated processing and user input Able to integrate specialised third-party tools (e.g. format conversion) Preservation metadata management Audit trail/provenance metadata Standard distribution methods for specific content types (Disseminators)

6 6 What do we use Fedora for? Digital repository King’s Research Archive – An institutional repository for open access research papers written by King’s College London staff Virtual Research Environment (VRE) – supporting research management EIDER Project – Demonstrator for enhanced deposit and ingest Preservation Services: SOAPI – an architecture for (partially) automating preservation and ingest workflows in digital repositories SHERPA DP2 – developing preservation services for content located in disparate locations. Digitisation projects: Historical Hansard - Digitisation project scanning and markup of 50 years of debates from the Upper Chamber of the Northern Ireland Parliament from 1921 to 1972 East London Theatre Archive - Digitisation of 15,000 performing arts resources, from playbills and programmes to press cuttings and photographs from East London theatres

7 7 Capture & Ingest workflow Activities performed during Ingest

8 8 Metadata (1): Descriptive Each project has specific descriptive MD requirements: Scholarly Works Application Profile (SWAP) – created schema for IR Metadata Object Description Schema (MODS) – ELTA and SHERPA DP2 MarcXML – SHERPA DP2 Simple DC (various)

9 9 Metadata (2): SWAP

10 10 Metadata (3): Preservation Preservation: PREMIS Object PREMIS Event (forthcoming) Generated by DROID, JHOVE & others Rights: Rights MD Provided by Sherpa-Romeo

11 11 Metadata (4): Preservation Rights metadata provided by Sherpa Romeo Technical metadata provided by JHOVE

12 12 Data Capture (1): King’s research data Collection of King’s research data: Web interface for deposit Deposit via SWORD from desktop/web client Capture of metadata from Research Gateway, Web of Science and other sources.

13 13 Data Capture (2): Archiving services SHERPA DP2 provides archiving and preservation services for varied software repositories and web resources Content providers supported: Repositories: Fedora, CDS Invenio, DSpace, EPrints, DigiTool Website: Large dynamic sites (through Subversion), static sites. Capture methods OAI-PMH for metadata capture Data capture over HTTP/FTP and VPN.

14 14 Digitisation (1): East London Theatre Archive 15,000 digital objects – playbills, programmes, press cuttings and photographs. Object model representing 2 layers: Performance venue Item (3 manifestations of each image (high-quality, distribution, thumbnail) Each will contain MODS metadata Accessible through browse, search & Google maps-style UI

15 15 Digitisation (2): Historical Hansard 50 years of debates from the Upper Chamber of the Northern Ireland Parliament from 1921 to 1972. Separated into collection and volume. 45,100 items containing: Page images (3 manifestations of each image (high-quality, distribution, thumbnail) OCR’d text stored as XML Relationship MD UI: Experiment with Fez, Muradora, Vital, Existing Stormont

16 16 Lessons we have learnt… Understand your needs No one-size-fits-all approach Match requirements to functionality, not visa versa Implementation of a Fedora repository requires time No out-of-box solution, though likely to change in the near future Consider a long-term development plan. Some customisation may be required Consider future expansion plans Where do you want to be tomorrow? Don’t be intimidated Lots of features, but don’t need to use them all Possible to break implementation into well-defined stages Avoid reinventing the wheel Examine existing Fedora projects that may save development time. Develop code that can be repurposed to other project

17 17 Thank you! Gareth Knight Centre for e-Research gareth.knight@kcl.ac.uk


Download ppt "Developing a digital repository infrastructure for King’s College London RSP Training Day, 22 nd January 2009 Gareth Knight Centre for e-Research."

Similar presentations


Ads by Google