Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC.

Similar presentations


Presentation on theme: "National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC."— Presentation transcript:

1 National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC

2 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 2 Overview Issues that arise when databases are records Informing (expensive, important) decisions Tensions between ideal formats and non-ideal data Representation mechanisms for access control and absent data Concentrating on R&D issues

3 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 3

4 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 4 What is NDAD? A service for UK government records which exist as ‘structured information’ Contains data + contextual information Established in 1997 - service in March 1998 First service by a national archive to provide online public access to preserved material Selection undertaken by National Archives and government departments Everything else at ULCC: under contract to TNA

5 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 5

6 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 6 Preservation Data transformed to canonical form - originals kept Paper documentation digitised Technical metadata produced or transformed Consistency checks applied:  For transformation process  Against original system  Against published information  Internal cross-checks

7 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 7 Consequences Preservation far removed from creation Unlike actively curated systems: preservation and use can take place simultaneously Multiple use scenarios - more than views

8 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 8 Where are the problems? Management

9 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 9 Perfect Preservation Formats? DDI: XML-based  good for survey/social science data  Not so good for complex relational stuff  Likes clean data XML representations  More flexible  Not so good when data is unclean As SQL  Much metadata or needs another scheme  Useless for unclean data

10 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 10 How bad is bad? Data out of range is a quality problem, not a preservation problem (e.g. ‘Age’ of 230) But…  Age = -20?  Age = B0 ?  Age = Thursday? All present problems if ‘Age’ is a positive integer in our preservation schema Date = ‘31 Feb 2007’ is syntactically but not semantically valid

11 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 11 More bad stuff Absent key fields or mandatory fields Encoded data that uses bad codes  if days of week are 1 - 7, what is day 9? Day X ? ‘Encoded’ data which is stored translated 1 - 1 mappings that aren’t

12 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 12 What’s the problem? Must preserve errors - their nature is informative Would like to understand original system behaviour with these errors Don’t want to use tools that force all fields to be text Want a datatype like ‘almost always integer’ or ‘often a date’ - and intelligent behaviour when it isn’t.

13 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 13 How does it get that way? Data validation often in application, not database  Isn’t always well-implemented People hack around the application Past migrations were poor

14 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 14 Missing and absent values Common occurrence in survey and experimental data Different types of ‘missing’:  No information  Known to be unreadable  Refused to answer  Subject didn’t know All mechanisms for representation ad-hoc Knowledge in application, not database Query engines don’t understand concept

15 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 15 Access: restricted viewing People Trips Vehicles Not available until 2050

16 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 16 Access - goal Duplicate original system Advanced analysis tools Simple viewing via a generic tool Multimedia datatypes Extensible via object-like design Traditional database systems not up to task without significant additional effort Hence much software home-grown  

17 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 17 New issues from temporal GIS Temporal GIS allows one system to represent changing features and knowledge Queries like:  Which features are newer than feature X?  What did area Y look like 10 years ago?  What present-day names correspond to ‘Hetfelle’? In a preserved temporal GIS:  What would the answer to question 2 have been if I asked it 5 years ago?

18 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 18 Inconsistencies and errors Schools census - 4 datasets per year for different school types But 1976 only has 3 - no nursery schools Further examination shows files have been merged Confirmation came from completed census forms held by schools - not by government department

19 National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 19 Cornell’s DP model


Download ppt "National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC."

Similar presentations


Ads by Google