Presentation is loading. Please wait.

Presentation is loading. Please wait.

European Archival Records and Knowledge Preservation Database preservation Format and toolkit Jan Dalsten Sørensen Danish National Archives DLM Forum Riga.

Similar presentations


Presentation on theme: "European Archival Records and Knowledge Preservation Database preservation Format and toolkit Jan Dalsten Sørensen Danish National Archives DLM Forum Riga."— Presentation transcript:

1 European Archival Records and Knowledge Preservation Database preservation Format and toolkit Jan Dalsten Sørensen Danish National Archives DLM Forum Riga 2015

2 Pan-European SIP format and tools The E-ARK project will provide –a pan-European SIP format which provides sufficient standardisation to allow for automated solutions –SIP creator tools that are compatible with the defined SIP specification An important part of the SIP format and the SIP creator tools is the handling of databases

3 Database format and toolkit SIARD 2.0 Format (draft) –Harmonizing SIARD, SIARDDK, DBML Database Preservation Toolkit –Harmonizing Database Preservation Toolkit and DBExport – with inspiration from SIARD Suite (closed source)

4 SIARD 2.0 Format (draft) (previous working title: SIARD-E) E-ARK will harmonize –SIARD (Software Independent Archival of Relational Databases – from Switzerland) –SIARDDK (variation of SIARD – from Denmark) –DBML (Database Markup Language – from Portugal) into an open archival relational database format –It will be based on SIARD, taking the best from SIARDDK and DBML

5 SIARDDK format Danish variation –Storing BLOBS, CLOBS and related files in folders outside the table folders to manage large amount of files and data –Spanning and splitting a submission into many parts –Better restriction on SQL:1999 datatypes –Better restriction on SQL Identifiers

6 Current situation at the DNA Examples: –15 TB Submission from the tax authorities –8 TB Submission from the Environment Agency –6 TB Submission from the University of Copenhagen

7 SIARD 2.0 Format (draft) new specifications Upgrade of SQL:1999 support to SQL:2008 support. Support for all SQL:2008 types, in particular user- defined data types (UDTs) More explicit validation rules for data type definitions using regular expressions Small modification of the definition, when to store large objects inline as part of the table XML Support of storing large objects outside of the SIARD file using “file:” URIs. Support of “deflate” as a compression mechanism.

8 SIARD 2.0 Format (draft) new recommendations The specification is designed to allow for recommendation of –where to store large objects outside the SIARD file – the folder structure –how to store large objects which are not original SQL BLOBS, but just references to files outside the database –how to register normalization of files to archival format (using PREMIS)

9 SIARD 2.0 Format (draft) request for comments The SIARD 2.0 format (draft) and recommendations will be open for comments as soon as the working grouping has finished the draft.

10 Open source database preservation toolkit Database Preservation Toolkit –Harmonizing Database Preservation Toolkit (DBPT - Portugal) and DBExport (Denmark) – with inspiration from SIARD Suite (Switzerland) - into an open source relational database preservation toolkit –based on DBPT taking the best from DBExport and with inspiration from SIARD Suite (closed source) –The open source database preservation toolkit (DBPT) will be modified to support SIARD 1.0, SIARDDK, DBML and SIARD 2.0.

11 Open source database preservation toolkit Database Preservation Toolkit –The toolkit will be able to –export from the most common databases to SIARD 1.0, SIARDDK, DBML and SIARD 2.0 –import into the most common databases from SIARD 1.0, SIARDDK, DBML and SIARD 2.0

12 Pilot use of SIARD 2.0 and DBPT The SIARD 2.0 format and the Database Preservation Toolkit will be used in a pilot at the Danish National Archives in 2016

13 SIARD 2.0 and DBPT will not solve all problems The Danish experience: –On average more than two re-submissions needed to fix errors missing primary keys missing foreign keys invalid data according to data type isolated tables

14 SIARD 2.0 and DBPT but a great leap forward Common open standard Commone open source tool

15 Questions and maybe answers


Download ppt "European Archival Records and Knowledge Preservation Database preservation Format and toolkit Jan Dalsten Sørensen Danish National Archives DLM Forum Riga."

Similar presentations


Ads by Google