National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC.

Slides:



Advertisements
Similar presentations
Limitations of the relational model 1. 2 Overview application areas for which the relational model is inadequate - reasons drawbacks of relational DBMSs.
Advertisements

ESDS Qualidata Libby Bishop, ESDS Qualidata Economic and Social Data Service UK Data Archive ESDS Awareness Day Friday 5 December 2003Royal Statistical.
An Leabharlann UCD Órna Roche UCD James Joyce Library Metadata Documenting your data
Copyright 2008 Tieto Corporation Database merge. Copyright 2008 Tieto Corporation Table of contents Please, do not remove this slide if you want to use.
Documenting the Resource Malcolm Polfreman
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
MIT’s DSpace A good fit for ETDs Margret Branschofsky Keith Glavash MIT LIBRARIES.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Präsentationstitel IAB-ITM Find the right tags in DDI IASSIST 2009, 27th-30th Mai 2009 IAB-ITM Finding the Right Tags in DDI 3.0: A Beginner's Experience.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Geospatial standards Beyond FGDC Geog 458: Map Sources and Errors March 3, 2006.
Archiving Data. Essential stuff to know Why deposit? Digital repositories ADS Guidelines Deposit evaluation & requirements Deposit checklist & template.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PROJECT VISTA: Integrating Heterogeneous Utility Data A very brief overview.
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
Design, goal of design, design process in SE context, Process of design – Quality guidelines and attributes Evolution of software design process – Procedural,
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Managing Data Interoperability with FME Tony Kent Applications Engineer IMGS.
ISO as the metadata standard for Statistics South Africa
Collision Recording And Sharing System (CRASH)
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
“Filling the digital preservation gap” an update from the Jisc Research Data Spring project at York and Hull Jenny Mitcham Digital Archivist Borthwick.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
MAHI Research Database Data Validation System Software Prototype Demonstration September 18, 2001
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Project Builder and MediaMatrix: Redefining Access in the Digital Age Dean Rehberger and Michael Fegan MERLOT August 7-10, 2006 New Orleans, LA.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
Data Management David Nathan & Peter Austin & Robert Munro.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Monthly Publishing System (MPS) Developer Workshop 25 August, 2015.
FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica.
Some comments on using research data in the social sciences Paul Lambert, School of Applied Social Science, University of Stirling, 25 March 2013.
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Documentation and Cataloguing in Data.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
1 Digital Preservation Testbed Database Preservation Issues Remco Verdegem Bern, 9 April 2003.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
CCSDS Meeting data Archive Ingest - January 2007 CNES 1 CCSDS - MOIMS Area Data Archive Ingest WG CNES Report Colorado Springs meeting – January 2007 Claude.
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Experience with XML Schema Ashok Malhotra Schema Usage  Mapping XML Schema and XML documents controlled by the Schema to object classes and instances.
@ulccwww.ulcc.ac.uk IRMS Cymru October 2015 From EDRMS to digital archive: a wish-list for ways to preserve digital records.
Oracle Business Intelligence Foundation – Testing and Deploying OBI Repository.
IPT + Darwin Core OBIS XML Schema OBIS Database Schema Explained Mike Flavell OBIS Data Manager OBIS Nodes Training Course, Oostende, Belgium, 6 May 2014.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
Long Term Preservation of Digital Data Raymond A. Lorie JCDL ‘01 June 24-28, 2001.
University of Colorado at Denver and Health Sciences Center Department of Preventive Medicine and Biometrics Contact:
1 XML and XML in DLESE Katy Ginger November 2003.
7th Annual Hong Kong Innovative Users Group Meeting
Data Dictionaries ER Diagram.
ICT Database Lesson 1 What is a Database?.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 1 Database Systems
Datasets in CRM Site Proposal
Presentation transcript:

National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC

National Digital Repository ® Presdb07 - Edinburgh 2 Overview Issues that arise when databases are records Informing (expensive, important) decisions Tensions between ideal formats and non-ideal data Representation mechanisms for access control and absent data Concentrating on R&D issues

National Digital Repository ® Presdb07 - Edinburgh 3

National Digital Repository ® Presdb07 - Edinburgh 4 What is NDAD? A service for UK government records which exist as ‘structured information’ Contains data + contextual information Established in service in March 1998 First service by a national archive to provide online public access to preserved material Selection undertaken by National Archives and government departments Everything else at ULCC: under contract to TNA

National Digital Repository ® Presdb07 - Edinburgh 5

National Digital Repository ® Presdb07 - Edinburgh 6 Preservation Data transformed to canonical form - originals kept Paper documentation digitised Technical metadata produced or transformed Consistency checks applied:  For transformation process  Against original system  Against published information  Internal cross-checks

National Digital Repository ® Presdb07 - Edinburgh 7 Consequences Preservation far removed from creation Unlike actively curated systems: preservation and use can take place simultaneously Multiple use scenarios - more than views

National Digital Repository ® Presdb07 - Edinburgh 8 Where are the problems? Management

National Digital Repository ® Presdb07 - Edinburgh 9 Perfect Preservation Formats? DDI: XML-based  good for survey/social science data  Not so good for complex relational stuff  Likes clean data XML representations  More flexible  Not so good when data is unclean As SQL  Much metadata or needs another scheme  Useless for unclean data

National Digital Repository ® Presdb07 - Edinburgh 10 How bad is bad? Data out of range is a quality problem, not a preservation problem (e.g. ‘Age’ of 230) But…  Age = -20?  Age = B0 ?  Age = Thursday? All present problems if ‘Age’ is a positive integer in our preservation schema Date = ‘31 Feb 2007’ is syntactically but not semantically valid

National Digital Repository ® Presdb07 - Edinburgh 11 More bad stuff Absent key fields or mandatory fields Encoded data that uses bad codes  if days of week are 1 - 7, what is day 9? Day X ? ‘Encoded’ data which is stored translated mappings that aren’t

National Digital Repository ® Presdb07 - Edinburgh 12 What’s the problem? Must preserve errors - their nature is informative Would like to understand original system behaviour with these errors Don’t want to use tools that force all fields to be text Want a datatype like ‘almost always integer’ or ‘often a date’ - and intelligent behaviour when it isn’t.

National Digital Repository ® Presdb07 - Edinburgh 13 How does it get that way? Data validation often in application, not database  Isn’t always well-implemented People hack around the application Past migrations were poor

National Digital Repository ® Presdb07 - Edinburgh 14 Missing and absent values Common occurrence in survey and experimental data Different types of ‘missing’:  No information  Known to be unreadable  Refused to answer  Subject didn’t know All mechanisms for representation ad-hoc Knowledge in application, not database Query engines don’t understand concept

National Digital Repository ® Presdb07 - Edinburgh 15 Access: restricted viewing People Trips Vehicles Not available until 2050

National Digital Repository ® Presdb07 - Edinburgh 16 Access - goal Duplicate original system Advanced analysis tools Simple viewing via a generic tool Multimedia datatypes Extensible via object-like design Traditional database systems not up to task without significant additional effort Hence much software home-grown  

National Digital Repository ® Presdb07 - Edinburgh 17 New issues from temporal GIS Temporal GIS allows one system to represent changing features and knowledge Queries like:  Which features are newer than feature X?  What did area Y look like 10 years ago?  What present-day names correspond to ‘Hetfelle’? In a preserved temporal GIS:  What would the answer to question 2 have been if I asked it 5 years ago?

National Digital Repository ® Presdb07 - Edinburgh 18 Inconsistencies and errors Schools census - 4 datasets per year for different school types But 1976 only has 3 - no nursery schools Further examination shows files have been merged Confirmation came from completed census forms held by schools - not by government department

National Digital Repository ® Presdb07 - Edinburgh 19 Cornell’s DP model