Presentation on theme: "Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,"— Presentation transcript:
Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould, Archives & Information Management (AIM)
2 Overview 1.Aims & objectives of PEKin project 2.Project methodology 3.Findings on current state of data management 4.Action taken to address issues 5.Further work to be performed 6.Lessons learnt 7.Potential for reuse of project deliverables
3 What is PEKin? Title: Preservation Exemplar at Kings (PEKin) Funder: JISC, Preservation strand of Information Environment 09-11 Time period: 1 April 09 – 31 October 10 Project partners: Centre for e-Research (CeRch) Archives & Information Management (AIM) Based at Kings College London
4 What is a Digital Record? Recorded information generated, collected or received in the initiation, conduct or completion of an activity and that comprises sufficient content, context and structure to provide proof or evidence of that activity International Committee on Archives (ICA)
5 New archiving challenges Changing state of digital information: Changing notion of what constitutes a record of business: Core business: student information, committees, estates, etc. Increasingly research outputs (data, papers) – funder requirements Changing composition: Born digital content (static and dynamic resourcees) Hybrid (paper+digital), digital only Lifecycles: Creation process: Create, revise, publish 1 st version, revise, publish 2 nd version. Repeat. Access lifecycle: Technology dependencies (hardware & software) Implications: Archival process: Archive at earlier stage? Capture using different technologies? Data value: Can we be sure that everything has business value?
6 Methodology 1.Evaluate existing information management procedures and working practices at institutional level and revise accordingly What remains viable? Elements that require revision Gaps and omissions 2.Determine the data management needs that data producers and systems managers in academic units/professional services encounter and determine most effective approach to address requirements 3.Implement a technical system capable of curating and preserving digital records of long-term archival value.
7 Review of existing frameworks Reviewed DAF, DRAMBORA & DIRKS, etc. All req. further refinement to apply to own situation. DAF: Data Asset/Audit Framework + Useful for gathering detailed information on data assets located in departments + Useful for analysing data management practices - Time-consuming to perform - Does not provide a method of evaluating problems & developing a mitigation strategy DRAMBORA + Provides formal structure for identifying, describing & evaluating risks & developing a strategy to mitigate or avoid them. + well-defined list of risk categories and factors - Intended for OAIS-like environments rather than less formalised research systems - Focus upon OAIS workflow rather than data creation lifecycle
8 Integrating frameworks DAF & DRAMBORA are broadly similar, but some work needed: Normalised terminology and definitions & adopted some archival terminology Activity classification: Activities placed in diff. categories in DRAMBORA & DAF. Light touch approach - establish balance between DRAMBORA system-level & DAF asset level analysis high-level analysis of data assets using DAF Omitted various DRAMBORA risk categories unrelated to data management Adopted e-Research lifecycle model Stages were tied-in with distinct project outputs
10 Administrative Case Studies Business departments & content types examined: Committee: Council, Academic Board and sub-committees Estates: Project & operational records Student: Records held outside the Student system (SITS) Archival value digital records Mapped to current College paper holdings
11 Research Case Studies Research groups/projects/departments examined: Environmental Research Group (ERG) Environment Monitoring group Environment Modelling group Twins Early Development Study (TEDS) Regional Information Collection Centre (RICC) Period of change – since April 2010, IT provided centrally with storage provision review underway Archives have previously ly accepted pioneering research data in past Acquisition policy is now under review for born digital/digitised records
12 Administrative study findings Opportunity to redefine collections All areas required digital records management support before archives could be identified Quality control varied between records Duplication with paper and born-digital versions retained Lack of ownership of born digital records by administrative staff
13 Research study findings Challenge to identify data sets of archival value TEDS & ERG funded dedicated data management roles including back-up & information security processes However, majority of research groups do not have equivalent support, placing data at risk Funding bids lacked formal data management plans to provide assurance or influence further funding Continuing preservation of data not considered with focus on current work
14 Comparison of research & admin data management Individual researchers & administrative staff lack understanding of risk and use personal data approach Understanding of digital environment is still outside their comfort zone - hybrid duplicated collections High risk when staff – Principle Investigators or Administrators leave No point of contact for advice or support
15 Risk Assessment of research data management Multiple risks identified Active data management was good - recommendations made for best practice Mitigation Content versioning system Store multiple versions of each data file Implement integrity monitoring Data management plan to document approach
16 Risk Assessment of administrative data management More risks identified than with research data Lack of business owner for data sets ISS provide storage & systems management but little data management expertise ISS Data Management role now in place Move to digital capture will address risks Risk mitigation as for research records
17 Actions taken by project 1.Institution-level Policies 2.Work with departments to address data management risks 3.Documentation 4.Implementation of KCL Digital Archive
18 1. Institution-level Policies Update of existing policies: Acquisition policy: Refinements to existing acquisition policies Retention Policy: Appraisal criteria for records of value Information Management: Appraisal criteria and advisory material Develop new policies: Preservation Policy: content preservation strategy for institutional data of short and long-term value See http://www.kcl.ac.uk/iss/igc/tools/staff.html for guidance currently available
19 2. Liaise with data creators & managers Enable management to gain a better understanding of data assets within their department/group and the potential risk factors that may limit data usage. Work with data producers & systems managers to address data management issues that they identified as a concern, e.g. versioning Make data producers & management aware of risk factors that exist and make recommendations for actions that may help to avoid or mitigate issues. Make them aware of support available within College & other departments/groups/projects that are working to resolve common issues.
20 3. Documentation Self-help documentation to help data creators & managers to: Understand data management issues & key concepts Practical steps to diagnose and address DM issues/people to contact Data Management workbook: Creating your data: Issues to consider prior and in early stages of development to ensure data is fit for purpose & usable over time. Organising your data: Methods for structuring & documenting data to enable it to be used & understood Maintaining access and use of data: Approaches that may be adopted to ensure continued access & use of data. Appraising your data: Recommendations for applying archival principles Content Type Reports: Short pragmatic reports tailored to specific content types (raster images, audio, e-mail, documents) To be published on KCL web site in near future
21 4. KCL Digital Archive Implemented Alfresco ECM (Community Edition) to manage college data of long-term archival value Standards compliance OAIS RM, U.S. Department of Defense 5015.2-STD, ISO 15489, TRAC when in full service Bitstream preservation: fixity creation/verification, online + offline storage Information Content Preservation: Format conversion, event logging – audit trail Access: Limited to archive reading room, catalogue descriptive MD to common standard
22 Rules-based approach to data management jBPM synchronous or asynchronous workflows Content model compliance Conforms to defined structure & object types Fixity generation All: MD5, SHA-1, CRC Format identification All: File(1), DROID Technical metadata extraction Format specific: JHOVE, MP3Info, others Conversion to preservation & dissemination derivative parameters for each format & MD criteria (e.g. OpenOffice, ImageMagick) Record action results as PREMIS Event Close collection to prevent further update Obsolescence monitoring? Risk assessment based upon future development of PRONOM/UDFR Manual activity for future date?
23 Future Plans Embedding approach into archives & wider institution Identify research management needs at early stage (funding proposal, active/semi- active use) rather than end Skills audit & needs assessment Support & training for data management staff College Storage strategy Increased availability of College storage
24 Lessons Learnt Better understanding of data ecosystem in college – data lifecycle, infrastructure Progress made with identifying & addressing data management support – need to scale-up to college as whole. Need to manage semi-current record, in addition to active and archival records Requirements for storage Raised profile for Archives & CeRch Need for cross-disciplinary approach to managing data – combination of expertise & shared language
25 What may be used by other projects? OutputUse Project Methodology Anyone wishing to combine archival/curation approach for managing digital records Audit methodology + templates Anyone wishing to perform similar assessment and evaluation of DM activities. Data Management workbook & Content Type reports Anyone wishing to implement DM practices in their own institution/compare against others/ staff wishing to improve DM practices Data management systemExperience & documentation on use of Alfresco as preservation system
Thank You Any questions? Gareth KnightLindsay Ould Centre for e-Research (CeRch) Archives & Information Management (AIM) firstname.lastname@example.org@kcl.ac.uk