Data Processing A simple model and current UKDA practice Alasdair Crockett, Data Standards Manager, UKDA.

Slides:



Advertisements
Similar presentations
UK DATA ARCHIVE Louise Corti, ODAF April UK Data Archive an internationally-renowned centre of expertise in data acquisition, preservation, dissemination.
Advertisements

DDI for the Uninitiated ACCOLEDS /DLI Training: December 2003 Ernie Boyko Statistics Canada Chuck Humphrey University of Alberta.
Accessing longitudinal data via the UK Data Archive / ESDS Jack Kneeshaw NCDS summer school course, July 2005 ESDS Longitudinal.
The Economic and Social Data Service (ESDS) Kevin Schürer ESDS/UKDA ESDS Awareness Day 5 December 2003.
Quantitative Data Preparation Louise Corti ESDS/ UKDA Social Science Data Archives for Social Historians: creating, depositing and using qualitative data.
Accessing the MCS via the Economic and Social Data Service Jack Kneeshaw MCS workshop 10 November 2004 ESDS Longitudinal.
Access to Economic and Social Data via the UK Data Archive Jack Kneeshaw UKDA.
Accessing the MCS via the Economic and Social Data Service Jack Kneeshaw MCS workshop 23 June 2005 ESDS Longitudinal.
Accessing the MCS from the Economic and Social Data Service Jack Kneeshaw MCS workshop 13 October 2009 ESDS Longitudinal.
Accessing the NCDS and BCS70 via the Economic and Social Data Service Jack Kneeshaw NCDS/BCS70 workshop 27 October 2004 ESDS Longitudinal.
Accessing the NCDS and the BCS70 via the Economic and Social Data Service Jack Kneeshaw NCDS/BCS70 workshop 21 February 2007 ESDS Longitudinal.
New Services for Data Creators and Providers Louise Corti, Head ESDS Qualidata/ Outreach & Training Alasdair Crockett, ESDS Data Services Manager.
Quantitative Data Preparation Alasdair Crockett, Data Services Manager UK Data Archive.
ESDS Qualidata Libby Bishop, ESDS Qualidata Economic and Social Data Service UK Data Archive ESDS Awareness Day Friday 5 December 2003Royal Statistical.
Anne Etheridge Economic and Social Data Service IASSIST May 2006 METADATA MANAGEMENT THE FORGOTTEN WORLD OF THE BACK OFFICE.
Accessing the MCS from the Economic and Social Data Service Jack Kneeshaw MCS workshop 28 June 2007 ESDS Longitudinal.
Accessing the UK Longitudinal Studies via the ESDS Jack Kneeshaw UK Data Archive/Economic and Social Data Service 21 June 2004 ESDS Longitudinal.
Accessing the NCDS and the BCS70 via the Economic and Social Data Service Jack Kneeshaw NCDS/BCS70 workshop 16 October 2007 ESDS Longitudinal.
Nesstar, ESDS International and ESDS Qualidata online demonstrations ASLIB visit to the UK Data Archive Wednesday 24 November 2004 Louise Corti, Associate.
Accessing the MCS via the Economic and Social Data Service Jack Kneeshaw and Alasdair Crockett MCS workshop 20 November 2003 ESDS Longitudinal.
Configuration management
MANAGING YOUR DATA WELL …………………………………………
Metadata at ICPSR Sanda Ionescu, ICPSR.
Lesson 17: Configuring Security Policies
FlareCo Ltd ALTER DATABASE AdventureWorks SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS Slide 1.
Administration & Workflow
NESSTAR - the data archive perspective by Margaret Ward UK Data Archive.
1 Sharing Learning Objects in Health Care - 24 th March 2009www.jorum.ac.uk Repositories and communities: how Jorum can enhance sharing Nicola Siminson.
Data format translation and migration Future possibilities Alasdair Crockett, Data Standards Manager UK Data Archive.
Examine Quality Assurance/Quality Control Documentation
DATA LIFECYCLE & DATA MANAGEMENT PLANNING ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…... RESEARCH DATA.
The British Library’s METS Experience The Cost of METS Carl Wilson
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
This chapter is extracted from Sommerville’s slides. Text book chapter
World Bank: Microdata Library Development Data Group.
Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Getting started on informaworld™ How do I register with informaworld™? What do I do if I forget my password? My institution does not subscribe to any journals,
Content Strategy.
Copy cataloguing in Finland Juha Hakala The National Library of Finland
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Access to the LSYPE and associated resources at the Economic and Social Data Service Jack Kneeshaw LSYPE workshop 1 October 2009 ESDS Longitudinal.
The repositories Landscape: where are Repositories now and what’s around the corner? UKDA-store Louise Corti UKDA, University of Essex MIMAS OPEN FORUM.
ASP.NET.. ASP.NET Environment ASP.NET is Microsoft's programming framework that enables the development of Web applications and services. It is an easy.
On-line data submission training California Partnership for Achieving Student Success.
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
Copyright 2010, The World Bank Group. All Rights Reserved. ICT - a core management issue Part 1 Managing ICT resources Produced in Collaboration between.
CVS – concurrent versions system Network Management Workshop intERlab at AIT Thailand March 11-15, 2008.
An Introduction to CCP4i The CCP4 Graphical User Interface Peter Briggs CCP4.
ESDS resources for managing and analysing data Beate Lichtwardt Economic and Social Data Service UK Data Archive Research Method Festival, Oxford 1 July.
Metadata management: DDI and Nesstar at the Czech Social Science Data Archive Jindrich Krejci & Yana Leontiyeva Data without Boundaries, Ljubljana 24 &
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
Population Census Data Dissemination through Internet H. Furuta Lecturer/Statistician SIAP 1 Training Course on Analysis and Dissemination of Population.
Swift HUG April Swift data archive Lorella Angelini HEASARC.
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
CAA Database Overview Sinéad McCaffrey. Metadata ObservatoryExperiment Instrument Mission Dataset File.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Ingest – Workflow Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
Core LIMS Training: Project Management
An Overview of Data-PASS Shared Catalog
Karen Dennison Collections Development Manager
ESDS resources for managing and analysing data
DDI for the Uninitiated
EDDI12 – Bergen, Norway Toni Sissala
Presentation transcript:

Data Processing A simple model and current UKDA practice Alasdair Crockett, Data Standards Manager, UKDA

An idealised throughflow of routine work into and out of a ‘data processing’ section Acquisitions Stage/Section 1. Orderly Ingest and Review of datasets Data Processing Stage/Section 2. Orderly ‘release’ of datasets to data processing section 3. Clear data processing policies and effective tools for processing data and creating metadata (esp. the DDI). 4. Recording when processing tasks are completed and how long they took 5. Generation of ‘processing metadata’ 6. Post-processing and reprocessing Preservation and User Support Stages/Sections

1. Orderly Ingest and Review of datasets Lies within the ‘acquisitions’ rather than the ‘data processing’ section, though smaller archives may not make such a distinction Review incoming materials prior to processing: -UKDA’s ‘Acquisition Review Committee’ – useful for high volume archive Find out about materials as early as possible, don’t wait for them to arrive at the Archive. -the UKDA’s data submission form Get data creator to create catalogue record/provide metadata -UKDA about to start using electronic ‘deposition programme’ rather than traditional ‘deposit form’

2. Orderly ‘release’ of datasets to the data processing section Study Reviewed by senior staff member before release (particularly useful if being processed by junior/temporary staff member). Study assigned to a primary individual. Study assigned a level/standard of processing [if Archive has differential standards UKDA has 4 levels ranging from A* (most rigorous/value added) thru A and B to C (least rigorous/value added)] Clear prioritization of tasks when many datasets are released to processing at the same time -UKDA has ‘service level definition’ performance targets, which include processing times, imposed by its funders

3. Clear data processing policies and effective data processing tools. At the UKDA data processing falls into three mains sets of activities, all of which need clear documentation (so that staff know what activities are expected) and effective tools to automate procedures as far as is possible: i) Processing of data: validation, checking values and labels, confidentiality, congruence of data and documentation, data format translation (for preservation and dissemination) - UKDA uses Sax Basic Scripts to manipulate SPSS objects ii) Creation of metadata: catalogue record and/or DDI (both study level and in many instances variable level) - UKDA has New User Friendly Catalogue input Program iii) ‘Publication’ of data and metadata in a specialist online browsing environment - Improvements to Nesstar Publisher

4. Recording when processing tasks are completed and how long they took Now essential for the UKDA since we provide quarterly reporting of performance indicators (inc. processing times) to our funders (ESRC and JISC) Advisable even if funders don’t require information - archive management often distanced from the coalface of data processing and may have little idea which processing activities take most time.

5. Generation of ‘processing metadata’ i) Internal record of what was done to the materials provided to create the dataset ready for dissemination - this can prove very useful if problems arise with the study years afterwards ii) External record of what was done to the materials provided to create the dataset ready for dissemination - to tell users what the Archive has done in getting the dataset ready for their use [useful as they tend to assume it was simply copy it from a CD onto a server!]

6. Post-processing and reprocessing Release of the study to the outside world may not be the end of processing: i) All ‘legacy’ issues, i.e. old datasets that need reprocessing, now logged centrally, since staff time doesn’t always permit immediate action, we need to make sure we don’t forget what needs doing (in piecemeal fashion when we do have time) ii) All dissemination copies of data are now linked dynamically to preservation server: download file bundles on the UKDA’s download service server are recreated if any constituent files are changed on the preservation server [still have a problem with Nesstar though, data have to be manually republished if new data files are supplied to the Archive]

How does one achieve these 6 data processing steps efficiently ‘Tracking and logging’ database to log major processing events, store and generate processing metadata (helps achieve points 1,2,4,5 and 6) … And predominantly to achieve point 3: Clear documentation of data processing procedures and standards Effective and easy to use data processing tools to automate any tasks which can be automated

Facilitating stages 1,2,4, 5 and 6: A ‘tracking and logging’ Database. UKDA’s new acquisitions and Processing database will: Alert Acquisitions staff when follow up action is required (e.g. promised data has not been sent in within 3 months) Log all major processing activities: when completed, by whom and how long they took Record and generate ‘processing metadata’ Generate performance indicator figures for funders –Currently takes about 2 hours, should reduce to 2 minutes

Facilitating stage 3: Clear Documentation and Data processing Tools i) Documentation of Procedures and Standards - the UKDA produces what it terms Process Guides to attempt to define procedures and standards. These are available via the SIDOS Website: ii) Effective tools - UKDA uses SAX BASIC to manipulate the SPSS command processor and SPSS ‘objects’. SAX BASIC gives a Visual Basic for Applications style programming interface to SPSS. Very useful if you use SPSS as your core processing package as the UKDA does. For more details see:

Example of a SAX Basic processing script i) Makes a tab-delimited and STATA version of each SPSS file ii) Makes a UKDA data dictionary in rich text (rtf) format, marking up differences between the STATA and SPSS files iii) Generates a report in rtf format that documents all unavoidable loss of data/labels upon conversion from SPSS to STATA iv) Imposes UKDA directory structure on all files